R/categories_to_binary.R
categories_to_binary.Rd
Get binary representation of categorical variables
categories_to_binary(categories, use_combinations = FALSE)
A matrix representing the categorical variables in binary form ("dummy coding")
The conversion of categorical variable to binary variables is done via
model.matrix
. This function can be used to include
categorical variables as part of the optimization criterion in k-means /
k-plus anticlustering, rather than including them as hard constraints as
done in anticlustering
. This can be useful when there are several
categorical variables or when the group sizes are unequal (or both).
See examples.
Papenberg, M. (2024). K-plus Anticlustering: An Improved k-means Criterion for Maximizing Between-Group Similarity. British Journal of Mathematical and Statistical Psychology, 77(1), 80–102. https://doi.org/10.1111/bmsp.12315
# Use Schaper data set for example
data(schaper2019)
features <- schaper2019[, 3:6]
K <- 3
N <- nrow(features)
# - Generate data input for k-means anticlustering -
# We conduct k-plus anticlustering by first generating k-plus variables,
# and also include the categorical variable as "numeric" input for the
# k-means optimization (rather than as input for the argument `categories`)
input_data <- cbind(
kplus_moment_variables(features, T = 2),
categories_to_binary(schaper2019$room)
)
kplus_groups <- anticlustering(
input_data,
K = K,
objective = "variance",
method = "local-maximum",
repetitions = 10
)
mean_sd_tab(features, kplus_groups)
#> rating_consistent rating_inconsistent syllables frequency
#> 1 "4.49 (0.25)" "1.10 (0.07)" "3.41 (0.95)" "18.31 (2.40)"
#> 2 "4.49 (0.25)" "1.10 (0.07)" "3.44 (0.91)" "18.31 (2.42)"
#> 3 "4.49 (0.25)" "1.10 (0.07)" "3.41 (0.95)" "18.31 (2.42)"
table(kplus_groups, schaper2019$room) # argument categories was not used!
#>
#> kplus_groups bathroom kitchen
#> 1 16 16
#> 2 16 16
#> 3 16 16