Compute k-plus variables
kplus_moment_variables(x, T, standardize = TRUE)
A vector, matrix or data.frame of data points. Rows correspond to elements and columns correspond to features. A vector represents a single feature.
The number of distribution moments for which variables are generated.
Logical, should all columns of the output be standardized (defaults to TRUE).
A matrix containing all columns of x
and all additional
columns of k-plus variables. If x
has M columns, the output matrix
has M * T columns.
The k-plus criterion is an extension of the k-means criterion
(i.e., the "variance", see variance_objective
).
In kplus_anticlustering
, equalizing means and variances
simultaneously (and possibly additional distribution moments) is
accomplished by internally appending new variables to the data
input x
. When using only the variance as additional criterion, the
new variables represent the squared difference of each data point to
the mean of the respective column. All columns are then included—in
addition to the original data—in standard k-means
anticlustering. The logic is readily extended towards higher order moments,
see Papenberg (2024). This function gives users the possibility to generate
k-plus variables themselves, which offers some additional flexibility when
conducting k-plus anticlustering.
Papenberg, M. (2024). K-plus Anticlustering: An Improved k-means Criterion for Maximizing Between-Group Similarity. British Journal of Mathematical and Statistical Psychology, 77(1), 80–102. https://doi.org/10.1111/bmsp.12315
# Use Schaper data set for example
data(schaper2019)
features <- schaper2019[, 3:6]
K <- 3
N <- nrow(features)
# Some equivalent ways of doing k-plus anticlustering:
init_groups <- sample(rep_len(1:3, N))
table(init_groups)
#> init_groups
#> 1 2 3
#> 32 32 32
kplus_groups1 <- anticlustering(
features,
K = init_groups,
objective = "kplus",
standardize = TRUE,
method = "local-maximum"
)
kplus_groups2 <- anticlustering(
kplus_moment_variables(features, T = 2), # standardization included by default
K = init_groups,
objective = "variance", # (!)
method = "local-maximum"
)
# this function uses standardization by default unlike anticlustering():
kplus_groups3 <- kplus_anticlustering(
features,
K = init_groups,
method = "local-maximum"
)
all(kplus_groups1 == kplus_groups2)
#> [1] TRUE
all(kplus_groups1 == kplus_groups3)
#> [1] TRUE
all(kplus_groups2 == kplus_groups3)
#> [1] TRUE