Random generalized linear model: a highly accurate and interpretable ensemble predictor


Lin Song, Peter Langfelder, Steve Horvath


Human Genetics and Biostatistics, University of California, Los Angeles

SHorvath (at) mednet (dot) ucla (dot) edu
Peter (dot) Langfelder (at) gmail (dot) com

BMC Bioinformatics 14:5 (2013). DOI: 10.1186/1471-2105-14-5 (link opens in a new tab/window)

Quick navigation

Abstract

The random generalized linear model (RGLM) is a state of the art predictor that shares the advantages of a random forest (excellent predictive accuracy, feature importance measures, out-of-bag estimates of accuracy) with those of a forward selected generalized linear model (interpretability). The RGLM is a boostrap aggregated (bagged) GLM predictor that incorporates several elements of randomness and instability (random subspace method, optional interaction terms, forward variable selection) and often outperforms alternative prediction methods as shown in hundreds of genomic data sets, the UCI machine learning benchmark data, and simulations. The RGLM predictor provides variable importance measures that can be used to define a thinned ensemble predictor (involving few features) that retains excellent predictive accuracy.

R Talk


R Tutorials

A Set of tutorials that illustrate various aspects of randomGLM is available.

Click here to access the tutorial page.

Automatic installation from CRAN

The randomGLM package is available from the Comprehensive R Archive Network (CRAN), the standard repository for R add-on packages. To install the required packages and randomGLM, simply type


install.packages("randomGLM")


This will install the randomGLM package and all necessary dependencies. The catch is that this only installs the newest version of randomGLM if your R version is also the newest (minor) version (currently R 2.15.x). Users using older versions of R will need to follow the manual download and installation instructions below. But we recommend to use the latest version of R.

The version posted here may be newer than that posted on CRAN: CRAN rules prohibit us from making frequent (weekly) updates of the package posted to CRAN. Therefore, occasionally the packages posted here may be newer and may have an extra bugfix that did not make it to CRAN yet.

Note for Mac users: CRAN may occasionally fail to compile the randomGLM package for Mac OS X. This leads to the error message "Package randomGLM is not available..." when calling install.packages(). If this occurs, please download the binary version from here and follow the installation instructions (or, if you are able to compile packages locally, download the source and install that).

Note of caution: The newest versions of randomGLM is available from CRAN only for the current R version. Please update your R to the newest version or use the manual download below.

Problems installing or using the package? Please see our list of frequently asked questions. Your problem and the solution may already be posted there.

Manual download and installation

Please follow these steps only if the automatic package installation above does not work.

Prerequisites:

The current version of the randomGLM package requires R version 2.14 or higher. If you have an older version of R, please upgrade your R.

The randomGLM package requires the following packages to be installed: MASS, gtools, foreach, doParallel. If your system does not have them installed, the easiest way to install them is to issue the following command at the R prompt:


install.packages(c("gtools","foreach","doParallel"))


R package download and installation: Package randomGLM (last updated 2013/05/09) is available here as source code and pre-compiled versions for Windows and Mac OSX. In general it is preferable to download the source and compile the package locally; however, if this is not practical, please select an appropriate compiled version.

The package version numbers follow the format packageName_major.minor-revision. Minor versions typically add or change some functionality; revisions typically contain bugfixes or minor enhancements.

Installation instructions: Short installation instructions, including other required and recommended packages, are available here. Should you discover bugs (of which there are most likely plenty), please report them to Peter Langfelder (peter.langfelder at gmail.com) and Steve Horvath.

Problems installing or using the package

Please see our list of Frequently Asked Questions (and frequently given answers); the solution to your problem may already be posted there. In particular, you can find answers about spurious Mac errors, compatibility problems when upgrading randomGLM, and others.

If you find a bug in the newest version on CRAN, please see whether this web site has posted a newer version where the bug may be fixed. If you still cannot solve the problem, email Peter Langfelder and Steve Horvath.

Getting started with R and the randomGLM package

The package described here is an add-on for the statistical language and environment R (free software). Our tutorial, described below, contains step by step instructions.

Old versions of R package randomGLM

Older version of the packages presented on this page are available here.

Citing the randomGLM package

If you use randomGLM in published work, please cite it as follows:

The method, software and evaluations are described in

Acknowledgments

The original code was written by Lin Song and Steve Horvath. Peter Langfelder is mainly in charge of maintaining and improving the package. The package also builds on functions adapted/adopted from external packages, e.g. the glm function from the stats package and other functions from the MASS package.




free web stats