Skip to contents

Function to perform a CLARA clustering in a hard or fuzzy way. The function can either be called using a common dissimilarity metric or a self-defined distance function.

Usage

clustering_clara(
  data,
  clusters = 5,
  metric = "euclidean",
  samples = 10,
  sample_size = NULL,
  type = "hard",
  cores = 1,
  seed = 1234,
  m = 1.5,
  verbose = 1,
  build = FALSE,
  ...
)

Arguments

data

data.frame to be clustered

clusters

Number of clusters. Defaults to 5.

metric

A character specifying a predefined dissimilarity metric (like "euclidean" or "manhattan") or a self-defined dissimilarity function. Defaults to "euclidean". Will be passed as argument method to dist, so check ?proxy::dist for full details.

samples

Number of subsamples

sample_size

Number of observations belonging to a sample. If NULL (default), the minimum of nrow(data) and 40 + clusters * 2 is used as sample size.

type

One of c("hard","fuzzy"), specifying the type of clustering to be performed.

cores

Numbers of cores for computation. cores > 1 implies a parallel call. Defaults to 1.

seed

Random number seed. Defaults to 1234.

m

Fuzziness exponent (only for type = "fuzzy"), which has to be a numeric of minimum 1. Defaults to 2.

verbose

Can be set to integers between 0 and 2 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1.

build

Additional build algorithm to choose initial medoids (only relevant for type = "fuzzy". Default FALSE.)

...

Additional arguments passed to the main clustering algorithm and to proxy::dist for the calculation of the distance matrix (pam or vegclust)

Value

Object of class fuzzyclara

Details

If the clustering is run on mulitple cores, the verbose messages are printed in a file clustering_progress.log (if verbose > 0).

References

Kaufman, L., and Rousseeuw, P. J. (1986). Clustering large data sets. Pattern Recognition in Practice, 425–437.