Cluster dispersion — dispersion_objective • anticlust

Compute the dispersion objective for a given clustering (i.e., the minimum distance between two elements within the same cluster).

dispersion_objective(x, clusters)

Arguments

x: The data input. Can be one of two structures: (1) A feature matrix where rows correspond to elements and columns correspond to variables (a single numeric variable can be passed as a vector). (2) An N x N matrix dissimilarity matrix; can be an object of class dist (e.g., returned by dist or as.dist) or a matrix where the entries of the upper and lower triangular matrix represent pairwise dissimilarities.
clusters: A vector representing (anti)clusters (e.g., returned by anticlustering).

Details

The dispersion is the minimum distance between two elements within the same cluster. When the input x is a feature matrix, the Euclidean distance is used as the distance unit. Maximizing the dispersion maximizes the minimum heterogeneity within clusters and is an anticlustering task.

References

Brusco, M. J., Cradit, J. D., & Steinley, D. (2020). Combining diversity and dispersion criteria for anticlustering: A bicriterion approach. British Journal of Mathematical and Statistical Psychology, 73, 275-396. https://doi.org/10.1111/bmsp.12186

Examples


N <- 50 # number of elements
M <- 2  # number of variables per element
K <- 5  # number of clusters
random_data <- matrix(rnorm(N * M), ncol = M)
random_clusters <- sample(rep_len(1:K, N))
dispersion_objective(random_data, random_clusters)
#> [1] 0.1107932

# Maximize the dispersion 
optimized_clusters <- anticlustering(
  random_data,
  K = random_clusters, 
  objective = dispersion_objective
)
dispersion_objective(random_data, optimized_clusters)
#> [1] 0.405511