Recent tarball: modecluster.tgz, or can be obtained from the mirror
The mode clustering algorithm first selects a density threshold, then identifies ``dense points'' which exceed a given density threshold as defined by more than threshold K points being located within radius R of the candidate point.
The algorithm works by successively adding points to the clustering solution, that exceed the density threshold. These points either initiate a new cluster, or join the nearest cluster if that cluster is within distance R of the point. After all dense points have been added, a merge is performed according to a chosen merge rule.
Optionally, this algorithm can be iterated by gradually increasing R and hence increasing the list of dense points.
In Wishart's original version, R is increased in such a way as to ensure that only one point is added at each introduction cycle. That is, points are sorted by distance from the Kth nearest neighbor, and those distances in ascending order are used as the radii. But this is impractical for very large numbers of points, so instead we prefer to simply increase R in steps.
--radius|-r : search radius --threshold|-t : threshold
--bucket
carve up space into buckets. This is useful if points are spread out over a limited area (like a brain volume) because it avoids quadratic-time searches when testing density
--densepoints
only show points that exceed density threshold. This is useful as a spatial filtering step prior to applying some other clustering algorithm, such as single linkage
--variance
show mean squared distance between clusters divided by mean squared distance between centroid. This is a goodness of fit parameter that provides information as to how well separated clusters are.
--nnbetween
use nearest neighbor distance, instead of centroid distance, to calculate sum squared error between clusters. A motivating example that illustrates why this is a good idea is as follows: suppose there is one large cluster that absorbs most of the convex hull of the set being clustered, and also several smaller clusters. This solution clearly does not exhibit good between-cluster separation, yet a centroid based sum squared between measure of distance will indicate that it does. Using nearest neighbor corrects this problem, since the big cluster is close to all of the other clusters.
use nearest neighbor distance to determine whether merge takes place. Merge if this distance is less than the radius R.
Romeo-Juliet rule. Merge if average within cluster distance is less than nearest-neighbor between cluster distance
At present, this is unsupported, but discussion is included for completeness. This rule is discussed in ``Cluster Analysis'' by Brian Everitt, in his discussion of Wishart's method. Since this method is not included in Wishart's discussion, for now I am attributing it to Everitt.
This rule is the same as RJ merge, except given a threshold K, only take average of the smallest K elements in the upper triangle of the within cluster distance matrix to compute within-cluster distance for each cluster.
This will produce smaller within-cluster distances, hence this rule is less likely to facilitate merges than the RJ rule. By default, NN merge is used. RJ merge may be used by passing the argument --rjmerge