Compute k-plus variables

kplus_moment_variables(x, T, standardize = TRUE)

Arguments

x

A vector, matrix or data.frame of data points. Rows correspond to elements and columns correspond to features. A vector represents a single feature.

T

The number of distribution moments for which variables are generated.

standardize

Logical, should all columns of the output be standardized (defaults to TRUE).

Value

A matrix containing all columns of x and all additional columns of k-plus variables. If x has M columns, the output matrix has M * T columns.

Details

The k-plus criterion is an extension of the k-means criterion (i.e., the "variance", see variance_objective). In kplus_anticlustering, equalizing means and variances simultaneously (and possibly additional distribution moments) is accomplished by internally appending new variables to the data input x. When using only the variance as additional criterion, the new variables represent the squared difference of each data point to the mean of the respective column. All columns are then included---in addition to the original data---in standard k-means anticlustering. The logic is readily extended towards higher order moments, see Papenberg (2024). This function gives users the possibility to generate k-plus variables themselves, which offers some additional flexibility when conducting k-plus anticlustering.

References

Papenberg, M. (2024). K-plus Anticlustering: An Improved k-means Criterion for Maximizing Between-Group Similarity. British Journal of Mathematical and Statistical Psychology, 77(1), 80--102. https://doi.org/10.1111/bmsp.12315

Author

Martin Papenberg martin.papenberg@hhu.de

Examples


# Use Schaper data set for example
data(schaper2019)
features <- schaper2019[, 3:6]
K <- 3
N <- nrow(features)

# Some equivalent ways of doing k-plus anticlustering:

init_groups <- sample(rep_len(1:3, N))
table(init_groups)
#> init_groups
#>  1  2  3 
#> 32 32 32 

kplus_groups1 <- anticlustering(
  features,
  K = init_groups,
  objective = "kplus",
  standardize = TRUE,
  method = "local-maximum"
)

kplus_groups2 <- anticlustering(
  kplus_moment_variables(features, T = 2), # standardization included by default
  K = init_groups,
  objective = "variance", # (!)
  method = "local-maximum"
)

# this function uses standardization by default unlike anticlustering():
kplus_groups3 <- kplus_anticlustering(
  features, 
  K = init_groups,
  method = "local-maximum"
)

all(kplus_groups1 == kplus_groups2)
#> [1] TRUE
all(kplus_groups1 == kplus_groups3)
#> [1] TRUE
all(kplus_groups2 == kplus_groups3)
#> [1] TRUE