When is hub gene selection better than standard meta-analysis?

Peter Langfelder1, Paul S. Mischel2 and Steve Horvath1,3

PLoS ONE 8(4): e61505. doi:10.1371/journal.pone.0061505
(link to paper)

1Department of Human Genetics
2Department of Pathology and Laboratory Medicine
3Deptartment of Biostatistics, University of California, Los Angeles

Peter (dot) Langfelder (at) gmail (dot) com, SHorvath (at) mednet (dot) ucla (dot) edu


Since hub nodes have been found to play important roles in many networks, highly connected hub genes are expected to play an important role in biology as well. However, the empirical evidence remains ambiguous. An open question is whether (or when) hub gene selection leads to more meaningful gene lists than a standard statistical analysis based on significance testing when analyzing genomic data sets (e.g. gene expression or DNA methylation data). Here we address this question for the special case when multiple genomic data sets are available. This is of great practical importance since for many research questions multiple data sets are publicly available. In this case, the data analyst can decide between a standard statistical approach (e.g., based on meta-analysis) and a co-expression network analysis approach that selects intramodular hubs in consensus modules. We assess the performance of these two types of approaches according to two criteria. The first criterion evaluates the biological insights gained and is relevant in basic research. The second criterion evaluates the validation success (reproducibility) in independent data sets and often applies in clinical diagnostic or prognostic applications.

We compare meta-analysis with consensus network analysis based on weighted correlation network analysis (WGCNA) in three comprehensive and unbiased empirical studies: (1) Finding genes predictive of lung cancer survival, (2) finding methylation markers related to age, and (3) finding mouse genes related to total cholesterol. The results demonstrate that intramodular hub gene status with respect to consensus modules is more useful than a meta-analysis p-value when identifying biologically meaningful gene lists (reflecting criterion 1). However, standard meta-analysis methods perform as good as (if not better than) a consensus network approach in terms of validation success (criterion 2). The article also reports a comparison of meta-analysis techniques applied to gene expression data and presents novel R functions for carrying out consensus network analysis, network based screening, and meta analysis. All data and analysis R code can be found at this web site.

Data and R code/tutorials

We provide data and code necessary to reproduce our analysis. The code is presented in annotated PDF documents that contain code together with explanations and notes. The code documents also serve as tutorials on the use of consensus module methods, marginal meta-analysis, and meta-analysis of module membership. The code and data together can be downloaded a single zip bundle: Please save the the zip bundle on your hard drive and unpack it. The unpacked files should be stored in a folder (directory) named Project-MetaAnalysis. This folder contains the following main sub-folders:

The zip bundle contains additional data files and directories that are not listed above; please do not remove the additional files (in particular, the data files) since they may be needed for the analysis.

counter customisable