Compute the dispersion objective for a given clustering (i.e., the minimum distance between two elements within the same cluster).
dispersion_objective(x, clusters)
The data input. Can be one of two structures: (1) A
feature matrix where rows correspond to elements and columns
correspond to variables (a single numeric variable can be
passed as a vector). (2) An N x N matrix dissimilarity matrix;
can be an object of class dist
(e.g., returned by
dist
or as.dist
) or a matrix
where the entries of the upper and lower triangular matrix
represent pairwise dissimilarities.
A vector representing (anti)clusters (e.g.,
returned by anticlustering
).
The dispersion is the minimum distance between two elements within
the same cluster. When the input x
is a feature matrix, the
Euclidean distance is used as the distance unit. Maximizing the
dispersion maximizes the minimum heterogeneity within clusters and
is an anticlustering task.
Brusco, M. J., Cradit, J. D., & Steinley, D. (2020). Combining diversity and dispersion criteria for anticlustering: A bicriterion approach. British Journal of Mathematical and Statistical Psychology, 73, 275-396. https://doi.org/10.1111/bmsp.12186
N <- 50 # number of elements
M <- 2 # number of variables per element
K <- 5 # number of clusters
random_data <- matrix(rnorm(N * M), ncol = M)
random_clusters <- sample(rep_len(1:K, N))
dispersion_objective(random_data, random_clusters)
#> [1] 0.1107932
# Maximize the dispersion
optimized_clusters <- anticlustering(
random_data,
K = random_clusters,
objective = dispersion_objective
)
dispersion_objective(random_data, optimized_clusters)
#> [1] 0.405511