Perform anticlustering using the k-plus objective to maximize between-group similarity. This function implements the k-plus anticlustering method described in Papenberg (2024; <doi:10.1111/bmsp.12315>).
kplus_anticlustering(
x,
K,
variance = TRUE,
skew = FALSE,
kurtosis = FALSE,
covariances = FALSE,
T = NULL,
standardize = TRUE,
...
)A feature matrix where rows correspond to elements and columns correspond to variables (a single numeric variable can be passed as a vector).
How many anticlusters should be created. Alternatively:
(a) A vector describing the size of each group, or (b) a vector
of length nrow(x) describing how elements are assigned
to anticlusters before the optimization starts.
Boolean: Should the k-plus objective include a term to maximize between-group similarity with regard to the variance? (Default = TRUE)
Boolean: Should the k-plus objective include a term to maximize between-group similarity with regard to skewness? (Default = FALSE)
Boolean: Should the k-plus objective include a term to maximize between-group similarity with regard to kurtosis? (Default = FALSE)
Boolean: Should the k-plus objective include a term to maximize between-group similarity with regard to covariance structure? (Default = FALSE)
Optional argument: An integer specifying how many distribution moments should be equalized between groups.
Boolean. If TRUE, the data is standardized through
a call to scale before the optimization starts.
Defaults to TRUE. See details.
Arguments passed down to anticlustering. All of the
arguments are supported except for objective.
This function implements the unweighted sum approach for k-plus anticlustering. Details are given in Papenberg (2024).
The optional argument T denotes the number of distribution
moments that are considered in the anticlustering process. For example,
T = 4 will lead to similar means, variances, skew and kurtosis.
For the first four moments, it is also possible to use the boolean
convenience arguments variance, skew and kurtosis; the
mean (the first moment) is always included and cannot be "turned off".
If the argument T is used, it overrides the arguments
variance, skew and kurtosis (corresponding to
the second, third and fourth moment), ignoring their values.
The standardization is applied to all original features and the
additional k-plus features that are appended to the data set in order
to optimize the k-plus criterion. When using standardization,
all criteria such as means, variances and skewness receive a comparable
weight during the optimization. It is usually recommended not
to change the default setting standardization = TRUE.
This function can use any arguments that are also possible in
anticlustering
(except for `objective` because the objective optimized here
is the k-plus objective; to use a different objective,
call anticlustering directly). Any arguments that are
not explicitly changed here (i.e., standardize = TRUE) receive the
default given in anticlustering
(e.g., method = "exchange".)
Papenberg, M. (2024). K-plus Anticlustering: An Improved k-means Criterion for Maximizing Between-Group Similarity. British Journal of Mathematical and Statistical Psychology, 77(1), 80–102. https://doi.org/10.1111/bmsp.12315
# Generate some data
N <- 180
M <- 4
features <- matrix(rnorm(N * M), ncol = M)
# standard k-plus anticlustering: optimize similarity with regard to mean and variance:
cl <- kplus_anticlustering(features, K = 3, method = "local-maximum")
mean_sd_tab(features, cl)
#> [,1] [,2] [,3] [,4]
#> 1 "-0.07 (1.06)" "0.02 (0.92)" "-0.12 (1.00)" "-0.02 (1.13)"
#> 2 "-0.07 (1.06)" "0.02 (0.92)" "-0.11 (1.00)" "-0.02 (1.13)"
#> 3 "-0.07 (1.06)" "0.02 (0.92)" "-0.12 (1.00)" "-0.01 (1.13)"
# Visualize an anticlustering solution:
plot(features, col = palette()[2:4][cl], pch = c(16:18)[cl])
# Also optimize with regard to skewness and kurtosis
cl2 <- kplus_anticlustering(
features,
K = 3,
method = "local-maximum",
skew = TRUE,
kurtosis = TRUE
)
# The following two calls are equivalent:
init_clusters <- sample(rep_len(1:3, nrow(features)))
# 1.
x1 <- kplus_anticlustering(
features,
K = init_clusters,
variance = TRUE,
skew = TRUE
)
# 2.
x2 <- kplus_anticlustering(
features,
K = init_clusters,
T = 3
)
# Verify:
all(x1 == x2)
#> [1] TRUE