Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spanning Paths

Murtagh, Fionn; and Contreras, Pedro. 2016. Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spanning Paths. In: Adalbert F.X. Wilhelm and Hans A. Kestler, eds. Analysis of Large and Complex Data. Springer, pp. 43-52. ISBN ISBN-10: 3319252240 ISBN-13: 978-3319252247 [Book Section]
Copy

We study how random projections can be used with large data sets in order (i) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Baire metric; and (ii) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering. We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.

visibility_off picture_as_pdf

picture_as_pdf
MurtaghContreras_v5.pdf
subject
Published Version
lock
Restricted to Administrator Access Only


Atom BibTeX OpenURL ContextObject in Span OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation METS MODS RIOXX2 XML Reference Manager Refer ASCII Citation
Export

Downloads