Perform clustering algorithm on a data sample — clustering

Function to perform clara clustering algorithm in a hard or fuzzy way. The function can either be performed using a common dissimilarity metric or a self-defined distance function.

Usage

clustering_sample(
  data,
  sample_ids,
  dist,
  clusters = 5,
  metric = "euclidean",
  sample_size = NULL,
  type = "hard",
  seed = 1234,
  m = 1.5,
  verbose = 1,
  verbose_toLogFile = FALSE,
  build = FALSE,
  ...
)

Arguments

data: data.frame to be clustered
sample_ids: ids for the sample's observations
dist: Dissimilarity matrix for subsample
clusters: Number of clusters. Defaults to 5.
metric: A character specifying a predefined dissimilarity metric (like "euclidean" or "manhattan") or a self-defined dissimilarity function. Defaults to "euclidean". Will be passed as argument method to dist, so check ?proxy::dist for full details.
sample_size: Number of observations belonging to a sample
type: One of c("hard","fuzzy"), specifying the type of clustering to be performed.
seed: Random number seed. Defaults to 1234.
m: Fuzziness exponent (only for type = "fuzzy"), which has to be a numeric of minimum 1. Defaults to 2.
verbose: Can be set to integers between 0 and 2 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1.
verbose_toLogFile: If TRUE, the diagnostic messages are printed to a log file clustering_progress.log. Defaults to FALSE.
build: Additional build algorithm to choose initial medoids (only relevant for type = "fuzzy". Default FALSE.)
...: Additional arguments passed to the main clustering algorithm (pam or vegclust)

Value

Clustering solution for data sample