Skip to contents

Function to provide graphical visualization for selecting the optimal number of clusters. The function performs clustering for a range of potential numbers of clusters. The optional graphical visualization shows the minimal (weighted) average distance for every cluster numbers. Detailed fuzzyclara clustering results can be returned with return_results = TRUE. For clara clustering, the same samples are used for all numbers of clusters.

Usage

evaluate_cluster_numbers(
  data,
  clusters_range = 2:5,
  metric = "euclidean",
  algorithm = "clara",
  samples = 10,
  sample_size = NULL,
  num_local = 5,
  max_neighbors = 100,
  type = "hard",
  cores = 1,
  seed = 1234,
  m = 1.5,
  scale = TRUE,
  build = FALSE,
  verbose = 1,
  plot = TRUE,
  return_results = FALSE,
  ...
)

Arguments

data

data.frame to be clustered

clusters_range

Evaluated range for the number of clusters. Defaults to 2:5.

metric

A character specifying a predefined dissimilarity metric (like "euclidean" or "manhattan") or a self-defined dissimilarity function. Defaults to "euclidean". Will be passed as argument method to dist, so check ?proxy::dist for full details.

algorithm

One of c("clara","clarans")

samples

Number of subsamples

sample_size

Number of observations belonging to a sample. If NULL (default), the minimum of nrow(data) and 40 + clusters * 2 is used as sample size.

num_local

Number of clustering iterations (only if algorithm = "clarans").

max_neighbors

Maximum number of randomized medoid searches with each cluster (only if algorithm = "clarans")

type

One of c("hard","fuzzy"), specifying the type of clustering to be performed.

cores

Numbers of cores for computation. cores > 1 implies a parallel call. Defaults to 1.

seed

Random number seed. Defaults to 1234.

m

Fuzziness exponent (only for type = "fuzzy"), which has to be a numeric of minimum 1. Defaults to 2.

scale

Scale numeric variables before distance matrix calculation? Default TRUE

build

Additional build algorithm to choose initial medoids (only relevant for type = "fuzzy". Default FALSE.)

verbose

Can be set to integers between 0 and 2 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1.

plot

Should a plot with minimum distances be returned? Defaults to TRUE.

return_results

Indicator if clustering results ("fuzzyclara" objects) should be returned as a list. Defaults to FALSE.

...

Additional arguments passed to the main clustering algorithm call with fuzzyclara.

Value

Object of class "fuzzyclara"

Examples

# Select optimal number of clusters
cc_number <- evaluate_cluster_numbers(data            = USArrests,
                                      clusters_range  = 2:6,
                                      metric          = "euclidean",
                                      samples         = 1,
                                      sample_size     = NULL,
                                      type            = "hard",
                                      seed            = 3526,
                                      verbose         = 0)
cc_number