Skip to contents

Function to perform a CLARANS clustering in a hard or fuzzy way. The function can either be called using a common dissimilarity metric or a self-defined distance function.

Usage

clustering_clarans(
  data,
  clusters = 5,
  metric = "euclidean",
  type = "hard",
  num_local = 5,
  max_neighbors = 100,
  cores = 1,
  seed = 1234,
  m = 1.5,
  verbose = 1,
  ...
)

Arguments

data

data.frame to be clustered

clusters

Number of clusters. Defaults to 5.

metric

A character specifying a predefined dissimilarity metric (like "euclidean" or "manhattan") or a self-defined dissimilarity function. Defaults to "euclidean". Will be passed as argument method to dist, so check ?proxy::dist for full details.

type

One of c("hard","fuzzy"), specifying the type of clustering to be performed.

num_local

Number of clustering iterations. Defaults to 5. (pam or vegclust)

max_neighbors

Maximum number of randomized medoid searches with each cluster. Defaults to 100.

cores

Numbers of cores for computation. cores > 1 implies a parallel call. Defaults to 1.

seed

Random number seed. Defaults to 1234.

m

Fuzziness exponent (only for type = "fuzzy"), which has to be a numeric of minimum 1. Defaults to 2.

verbose

Can be set to integers between 0 and 2 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1.

...

Additional arguments passed to the main clustering algorithm and to proxy::dist for the calculation of the distance matrix (pam or vegclust)

Value

Object of class fuzzyclara

Details

If the clustering is run on mulitple cores, the verbose messages are printed in a file clustering_progress.log (if verbose > 0).

References

Ng, R. T., and Han, J. (2002). CLARANS: A method for clustering objects for spatial data mining. IEEE transactions on knowledge and data engineering, 14(5), 1003–1016. doi:10.1109/tkde.2002.1033770 .