Peter (dot) Langfelder (at) gmail (dot) com, SHorvath (at) mednet (dot) ucla (dot) edu

Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for
calculating Pearson correlation can handle calculations without missing values efficiently,
but is inefficient when applied to data sets with a relatively small number of missing
data. We present an implementation of Pearson correlation calculation that can lead to
substantial speedup on data with relatively small number of missing entries. Further, we
parallelize all calculations and thus achieve further speedup on systems where parallel
processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a
similar manner and provides comparable speed. The functions `cor`

and
`bicor`

for fast Pearson and biweight midcorrelation, respectively, are part of the updated,
freely available R package WGCNA.
The hierarchical clustering algorithm implemented in R function hclust is an order
n^{3} (n is the number of clustered objects) version of a publicly available clustering algorithm
by Fionn Murtagh
(http://www.classification-society.org/csna/mda-sw/
) . We present the package flashClust that implements the original
algorithm which in practice achieves order approximately n^{2} , leading to substantial time
savings when clustering large data sets.

**Update (October 2014):** R core team recently modified the code in the standard function
`hclust`

implemented in package stats. The new "standard" `hclust`

is now as fast or faster than the
`flashClust`

presented here.

Peter Langfelder and Steve Horvath, * Fast R Functions for Robust Correlations and Hierarchical
Clustering.* Journal of Statistical Software **46** (11) 1--17 (2012).
http://www.jstatsoft.org/v46/i11

Functions described here are part of two R packages:

- Functions implementing fast correlation calculations in R are part of the updated WGCNA package
- The fast
hierarchical clustering developed by
Fionn Murtagh has been packaged in the R package flashClust, also
available from CRAN.
**Update (October 2014):**The standard R function`hclust`

is now as fast or faster than the`flashClust`

implemented in the package flashClust, so there is no reason to use`flashClust`

over`hclust`

from the package stats (but there is also no reason not to).

On a separate page we provide the R code that we used to measure the performance of the functions presented here compared to the standard R functions.