Categorize continuous values — categorize • cases

This function allows to split continuous values, e.g. (risk) scores or (bio)markers, into two or more categories by specifying one or more cutoff values.

Usage

categorize(
  values,
  cutoffs = rep(0, ncol(values)),
  map = 1:ncol(values),
  labels = NULL
)

Arguments

values: (matrix)
numeric matrix of continuous values to be categorized. Assume an (n x r) matrix with n observations (subjects) of r continuous values.
cutoffs: (numeric | matrix)
numeric matrix of dimension m x k. Each row of cutoffs defines a split into k+1 distinct categories. Each row must contain distinct values. In the simplest case (k=1), cutoffs is a single column matrix whereby each row defines a binary split (<=t vs. >t). In this case (k=1), cutoffs can also be a numeric vector.
map: (numeric)
integer vector of length k with values in 1:r, whereby r = ncol(values). map_l gives the value which column of values should be categorized by ...
labels: (character)
character of length m (= number of prediction r)

Value

(matrix)
numeric (n x k) matrix with categorical outcomes after categorizing.

Examples

set.seed(123)
M <- as.data.frame(mvtnorm::rmvnorm(20, mean = rep(0, 3), sigma = 2 * diag(3)))
M
#>             V1          V2         V3
#> 1  -0.79263226 -0.32552013  2.2043464
#> 2   0.09971392  0.18284047  2.4254682
#> 3   0.65183395 -1.78906676 -0.9713566
#> 4  -0.63026120  1.73111308  0.5088536
#> 5   0.56677642  0.15652900 -0.7860781
#> 6   2.52707679  0.70406690 -2.7812167
#> 7   0.99186703 -0.66862802 -1.5101308
#> 8  -0.30826308 -1.45098941 -1.0308079
#> 9  -0.88393901 -2.38534456  1.1848098
#> 10  0.21690234 -1.60956869  1.7731621
#> 11  0.60311149 -0.41729409  1.2658988
#> 12  1.24186829  1.16189111  0.9738844
#> 13  0.78335786 -0.08755638 -0.4326965
#> 14 -0.53806725 -0.98246403 -0.2940394
#> 15 -1.78954068  3.06736694  1.7083162
#> 16 -1.58831539 -0.56976520 -0.6599503
#> 17  1.10303725 -0.11790166  0.3582465
#> 18 -0.04037121 -0.06062798  1.9354959
#> 19 -0.31928839  2.14461330 -2.1902672
#> 20  0.82676869  0.17515635  0.3053875
categorize(M)
#>    V1_0 V2_0 V3_0
#> 1     0    0    1
#> 2     1    1    1
#> 3     1    0    0
#> 4     0    1    1
#> 5     1    1    0
#> 6     1    1    0
#> 7     1    0    0
#> 8     0    0    0
#> 9     0    0    1
#> 10    1    0    1
#> 11    1    0    1
#> 12    1    1    1
#> 13    1    0    0
#> 14    0    0    0
#> 15    0    1    1
#> 16    0    0    0
#> 17    1    0    1
#> 18    0    0    1
#> 19    0    1    0
#> 20    1    1    1
C <- matrix(rep(c(-1, 0, 1, -2, 0, 2), 3), ncol = 3, byrow = TRUE)
C
#>      [,1] [,2] [,3]
#> [1,]   -1    0    1
#> [2,]   -2    0    2
#> [3,]   -1    0    1
#> [4,]   -2    0    2
#> [5,]   -1    0    1
#> [6,]   -2    0    2
w <- c(1, 1, 2, 2, 3, 3)
categorize(M, C, w)
#>    V1_a V1_b V2_a V2_b V3_a V3_b
#> 1     1    1    1    1    3    3
#> 2     2    2    2    2    3    3
#> 3     2    2    0    1    1    1
#> 4     1    1    3    2    2    2
#> 5     2    2    2    2    1    1
#> 6     3    3    2    2    0    0
#> 7     2    2    1    1    0    1
#> 8     1    1    0    1    0    1
#> 9     1    1    0    0    3    2
#> 10    2    2    0    1    3    2
#> 11    2    2    1    1    3    2
#> 12    3    2    3    2    2    2
#> 13    2    2    1    1    1    1
#> 14    1    1    1    1    1    1
#> 15    0    1    3    3    3    2
#> 16    0    1    1    1    1    1
#> 17    3    2    1    1    2    2
#> 18    1    1    1    1    3    2
#> 19    1    1    3    3    0    0
#> 20    2    2    2    2    2    2