Perform clustering algorithm on a data sample
clustering_sample.RdFunction to perform clara clustering algorithm in a hard or fuzzy way. The function can either be performed using a common dissimilarity metric or a self-defined distance function.
Usage
clustering_sample(
data,
sample_ids,
dist,
clusters = 5,
metric = "euclidean",
sample_size = NULL,
type = "hard",
seed = 1234,
m = 1.5,
verbose = 1,
verbose_toLogFile = FALSE,
build = FALSE,
...
)Arguments
- data
data.frame to be clustered
- sample_ids
ids for the sample's observations
- dist
Dissimilarity matrix for subsample
- clusters
Number of clusters. Defaults to 5.
- metric
A character specifying a predefined dissimilarity metric (like
"euclidean"or"manhattan") or a self-defined dissimilarity function. Defaults to"euclidean". Will be passed as argumentmethodtodist, so check?proxy::distfor full details.- sample_size
Number of observations belonging to a sample
- type
One of
c("hard","fuzzy"), specifying the type of clustering to be performed.- seed
Random number seed. Defaults to 1234.
- m
Fuzziness exponent (only for
type = "fuzzy"), which has to be a numeric of minimum 1. Defaults to 2.- verbose
Can be set to integers between 0 and 2 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1.
- verbose_toLogFile
If TRUE, the diagnostic messages are printed to a log file
clustering_progress.log. Defaults to FALSE.- build
Additional build algorithm to choose initial medoids (only relevant for type = "fuzzy". Default FALSE.)
- ...
Additional arguments passed to the main clustering algorithm (
pamorvegclust)