This function implements the three phase search algorithm TPSPD for anticlustering by Yang et al. (2022; <doi.org/10.1016/j.ejor.2022.02.003>). The description of their algorithm is given in Section 2 of their paper (in particular, see the Pseudocode in Algorithm 1).

three_phase_search_anticlustering(
  x,
  K,
  N,
  objective = "diversity",
  number_iterations = 50,
  clusters = NULL,
  upper_bound = NULL,
  lower_bound = NULL,
  beta_max = 15,
  theta_max = NULL,
  theta_min = NULL,
  beta_min = NULL,
  eta_max = 3,
  alpha = 0.05
)

Arguments

x

The data input, as in anticlustering.

K

Number of anticlusters to be formed.

N

Number of elements.

objective

The anticlustering objective, can be "diversity" or "dispersion".

number_iterations

A number that defines how many times the steps in the search algorithm are repeated.

clusters

A vector of length K that specifies the number of elements each cluster can contain. If this vector is not NULL, the lower and upper bounds will be disregarded.

upper_bound

Maximum number of elements in each anticluster. By default, anticlusters are of equal size, calculated as the total number of items divided by the number of clusters.

lower_bound

Minimum number of elements in each anticluster. By default, anticlusters are of equal size, calculated as the total number of items divided by the number of clusters.

beta_max

The algorithm begins with a pool of random initial solutions of size beta_max. Over time, the size of the solution pool decreases linearly until it reaches beta_min.

theta_max

Parameter for the strength of undirected perturbation, which decreases linearly over time from theta_max to theta_min.

theta_min

Parameter for the strength of undirected perturbation, which decreases linearly over time from theta_max to theta_min.

beta_min

The minimum solution pool size the algorithm should reach before making a determination.

eta_max

Parameter that specifies how many times the steps in the direct perturbation are executed.

alpha

Parameter for weighing the discrimination of a slightly worse local optimal child solution.

Value

A vector of length N that assigns a group (i.e, a number between 1 and K) to each input element

Details

Details of the implementation of the algorithm can be found in the pseudocode of the paper Yang et al. (2022). However, we performed one change as compared to the original description of the algorithm: Instead of setting a time limit, we define the number of iterations the algorithm performs before terminating (via argument number_iterations).

References

Xiao Yang et al. “A three-phase search approach with dynamic population size for solving the maximally diverse grouping problem”. In: European Journal of Operational Research 302.3 (2022) <doi:10.1016/j.ejor.2022.02.003>

Author

Hannah Hengelbrock Hannah.Hengelbrock@hhu.de, Martin Papenberg martin.papenberg@hhu.de

Examples


# Generate some random data
N <- 120
M <- 5
K <- 4
dat <- matrix(rnorm(N * M), ncol = M)
distances <- dist(dat)

# Perform three hase serach algorithm
result1 <- three_phase_search_anticlustering(dat, K, N)

# Compute objectives function
diversity_objective(distances, result1)
#> [1] 5357.225

# Standard algorithm:
result2 <- anticlustering(distances, K=K, method="local-maximum", repetitions = 10)
diversity_objective(distances, result2)
#> [1] 5356.094