Skip to contents

This function allows to split continuous values, e.g. (risk) scores or (bio)markers, into two or more categories by specifying one or more cutoff values.

Usage

categorize(
  values,
  cutoffs = rep(0, ncol(values)),
  map = 1:ncol(values),
  labels = NULL
)

Arguments

values

(matrix)
numeric matrix of continuous values to be categorized. Assume an (n x r) matrix with n observations (subjects) of r continuous values.

cutoffs

(numeric | matrix)
numeric matrix of dimension m x k. Each row of cutoffs defines a split into k+1 distinct categories. Each row must contain distinct values. In the simplest case (k=1), cutoffs is a single column matrix whereby each row defines a binary split (<=t vs. >t). In this case (k=1), cutoffs can also be a numeric vector.

map

(numeric)
integer vector of length k with values in 1:r, whereby r = ncol(values). map_l gives the value which column of values should be categorized by ...

labels

(character)
character of length m (= number of prediction r)

Value

(matrix)
numeric (n x k) matrix with categorical outcomes after categorizing.

Examples

set.seed(123)
M <- as.data.frame(mvtnorm::rmvnorm(20, mean = rep(0, 3), sigma = 2 * diag(3)))
M
#>             V1          V2         V3
#> 1  -0.79263226 -0.32552013  2.2043464
#> 2   0.09971392  0.18284047  2.4254682
#> 3   0.65183395 -1.78906676 -0.9713566
#> 4  -0.63026120  1.73111308  0.5088536
#> 5   0.56677642  0.15652900 -0.7860781
#> 6   2.52707679  0.70406690 -2.7812167
#> 7   0.99186703 -0.66862802 -1.5101308
#> 8  -0.30826308 -1.45098941 -1.0308079
#> 9  -0.88393901 -2.38534456  1.1848098
#> 10  0.21690234 -1.60956869  1.7731621
#> 11  0.60311149 -0.41729409  1.2658988
#> 12  1.24186829  1.16189111  0.9738844
#> 13  0.78335786 -0.08755638 -0.4326965
#> 14 -0.53806725 -0.98246403 -0.2940394
#> 15 -1.78954068  3.06736694  1.7083162
#> 16 -1.58831539 -0.56976520 -0.6599503
#> 17  1.10303725 -0.11790166  0.3582465
#> 18 -0.04037121 -0.06062798  1.9354959
#> 19 -0.31928839  2.14461330 -2.1902672
#> 20  0.82676869  0.17515635  0.3053875
categorize(M)
#>    V1_0 V2_0 V3_0
#> 1     0    0    1
#> 2     1    1    1
#> 3     1    0    0
#> 4     0    1    1
#> 5     1    1    0
#> 6     1    1    0
#> 7     1    0    0
#> 8     0    0    0
#> 9     0    0    1
#> 10    1    0    1
#> 11    1    0    1
#> 12    1    1    1
#> 13    1    0    0
#> 14    0    0    0
#> 15    0    1    1
#> 16    0    0    0
#> 17    1    0    1
#> 18    0    0    1
#> 19    0    1    0
#> 20    1    1    1
C <- matrix(rep(c(-1, 0, 1, -2, 0, 2), 3), ncol = 3, byrow = TRUE)
C
#>      [,1] [,2] [,3]
#> [1,]   -1    0    1
#> [2,]   -2    0    2
#> [3,]   -1    0    1
#> [4,]   -2    0    2
#> [5,]   -1    0    1
#> [6,]   -2    0    2
w <- c(1, 1, 2, 2, 3, 3)
categorize(M, C, w)
#>    V1_a V1_b V2_a V2_b V3_a V3_b
#> 1     1    1    1    1    3    3
#> 2     2    2    2    2    3    3
#> 3     2    2    0    1    1    1
#> 4     1    1    3    2    2    2
#> 5     2    2    2    2    1    1
#> 6     3    3    2    2    0    0
#> 7     2    2    1    1    0    1
#> 8     1    1    0    1    0    1
#> 9     1    1    0    0    3    2
#> 10    2    2    0    1    3    2
#> 11    2    2    1    1    3    2
#> 12    3    2    3    2    2    2
#> 13    2    2    1    1    1    1
#> 14    1    1    1    1    1    1
#> 15    0    1    3    3    3    2
#> 16    0    1    1    1    1    1
#> 17    3    2    1    1    2    2
#> 18    1    1    1    1    3    2
#> 19    1    1    3    3    0    0
#> 20    2    2    2    2    2    2