This post shows how Base R can be used to visualize a cluster analysis in such a way that all clusters can be well distinguished. I use the classical iris data set.

# Load data
data(iris)
cluster_data <- iris[, 1:4]

# Do k-means clustering, store the clustering vector
clusters <- kmeans(cluster_data, 3)$cluster

# define 3 colors, 3 cex values, and 3 pch values that differ between
# clusters - for more clusters, define more values.
colors <- c("#a9a9a9", "#df536b", "#61d04f")
cex <-  c(0.7, 1.2, 1.5)
pch <- c(19, 15, 17)

# Plot the data while visualizing the different clusters
plot(
  cluster_data,
  col = colors[clusters],
  cex = cex[clusters],
  pch = pch[clusters]
)

This solution exploits the beautiful indexing capabilities of R, seen by the call to the col, cex and pch arguments in plot(). The solution requires that the clustering vector is integer and only has values \(1, ..., K\) where \(K\) is the number of clusters. If a clustering vector x has any other form (e.g., is of type character), you can convert it to such an integer vector by calling as.numeric(factor(x)).

Anticlustering

As an addon, I also visualize the results of an anticlustering analysis, where instead of maximizing homogeneity within clusters, we maximize heterogeneity within clusters (and equivalently: maximize homogeneity between clusters, i.e., make the different clusters as similar as possible). To this end, I make use of the package anticlust.

library(anticlust)

# generate some random data, N = 120
data <- data.frame(
  x = rnorm(120),
  y = rnorm(120)
)

# Generate 6 anticlusters that are as similar as possible
groups <- anticlustering(
  data, 
  K = 6
)

# define 6 colors, 6 cex values, and 6 pch values:
colors <- c("#a9a9a9", "#df536b", "#61d04f", "#2297e6", "#28e2e5", "#eec12f")
cex <-  c(0.7, 0.9, 1.2, 1.5, 1.7, 2)
pch <- 15:20
plot(
  data,
  col = colors[groups],
  cex = cex[groups],
  pch = pch[groups]
)

As we can see, anticlustering is a lot more messy than clustering. All 6 groups overlap to a large degree. This is what anticlustering does: the groups should be similar to each other, as opposed to cluster analysis that seeks groups that are dissimilar from each other.


Last updated: 2020-10-22

Back to the front page