Perform CLARA clustering algorithm — clustering

Function to perform a CLARA clustering in a hard or fuzzy way. The function can either be called using a common dissimilarity metric or a self-defined distance function.

Usage

clustering_clara(
  data,
  clusters = 5,
  metric = "euclidean",
  samples = 10,
  sample_size = NULL,
  type = "hard",
  cores = 1,
  seed = 1234,
  m = 1.5,
  verbose = 1,
  build = FALSE,
  ...
)

Arguments

data: data.frame to be clustered
clusters: Number of clusters. Defaults to 5.
metric: A character specifying a predefined dissimilarity metric (like "euclidean" or "manhattan") or a self-defined dissimilarity function. Defaults to "euclidean". Will be passed as argument method to dist, so check ?proxy::dist for full details.
samples: Number of subsamples
sample_size: Number of observations belonging to a sample. If NULL (default), the minimum of nrow(data) and 40 + clusters * 2 is used as sample size.
type: One of c("hard","fuzzy"), specifying the type of clustering to be performed.
cores: Numbers of cores for computation. cores > 1 implies a parallel call. Defaults to 1.
seed: Random number seed. Defaults to 1234.
m: Fuzziness exponent (only for type = "fuzzy"), which has to be a numeric of minimum 1. Defaults to 2.
verbose: Can be set to integers between 0 and 2 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1.
build: Additional build algorithm to choose initial medoids (only relevant for type = "fuzzy". Default FALSE.)
...: Additional arguments passed to the main clustering algorithm and to proxy::dist for the calculation of the distance matrix (pam or vegclust)

Value

Object of class fuzzyclara

Details

If the clustering is run on mulitple cores, the verbose messages are printed in a file clustering_progress.log (if verbose > 0).

References

Kaufman, L., and Rousseeuw, P. J. (1986). Clustering large data sets. Pattern Recognition in Practice, 425–437.