Cluster and propensity based approximation of a network


John Ranola, Peter Langfelder, Kenneth Lange, Steve Horvath


Human Genetics and Biostatistics, University of California, Los Angeles

SHorvath (at) mednet (dot) ucla (dot) edu
Peter (dot) Langfelder (at) gmail (dot) com

BMC Systems Biology (link opens in a new tab/window)

Quick navigation

Abstract

The models in the article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets. Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM). The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust.

R Talk


R Tutorials

A Set of tutorials that illustrate various aspects of PropClust is available.

Click here to access the tutorial page.

Automatic installation from CRAN

The PropClust package is available from the Comprehensive R Archive Network (CRAN), the standard repository for R add-on packages. To install the required packages and PropClust, simply type


install.packages("PropClust")


This will install the PropClust package and all necessary dependencies. The catch is that this only installs the newest version of PropClust if your R version is also the newest (minor) version. Users using older versions of R will need to follow the manual download and installation instructions below. But we recommend to use the latest version of R.

Note for Mac users: CRAN may occasionally fail to compile the PropClust package for Mac OS X. This leads to the error message "Package PropClust is not available..." when calling install.packages(). If this occurs, please download the binary version from here and follow the installation instructions (or, if you are able to compile packages locally, download the source and install that).

Note of caution: The newest versions of PropClust is available from CRAN only for the current R version. Please update your R to the newest version or use the manual download below.

Problems installing or using the package? Please see our list of frequently asked questions. Your problem and the solution may already be posted there.

Manual download and installation

Please follow these steps only if the automatic package installation above does not work.

Installation instructions: Short installation instructions, including other required and recommended packages, are available here. Should you discover bugs (of which there are most likely plenty), please report them to Peter Langfelder (peter.langfelder at gmail.com) and Steve Horvath.

Problems installing or using the package

Please see our list of Frequently Asked Questions (and frequently given answers); the solution to your problem may already be posted there. In particular, you can find answers about spurious Mac errors, compatibility problems when upgrading PropClust, and others.

If you find a bug in the newest version on CRAN, please see whether this web site has posted a newer version where the bug may be fixed. If you still cannot solve the problem, email Peter Langfelder and Steve Horvath.

Getting started with R and the PropClust package

The package described here is an add-on for the statistical language and environment R (free software). Our tutorial, described below, contains step by step instructions.

Old versions of R package PropClust

Older version of the packages presented on this page are available here.

Citing the PropClust package

If you use PropClust in published work, please cite it as follows:

The method, software and evaluations are described in

Acknowledgments

The original code was written by John M Ranola, Peter Langfelder, Kenneth Lange, and Steve Horvath. Peter Langfelder is mainly in charge of maintaining and improving the package.




hit counter