Welcome to the NEO software page

Correspondence: Steve Horvath, Jason E. Aten

 

Reference

Aten JE, Fuller TF, Lusis AJ, Horvath S (2008) Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC Systems Biology 2008, 2:34. April 15.

The Network Edge Orienting (NEO) method and software addresses the challenge of inferring unconfounded and directed gene networks by integrating microarray-derived gene expression data with genetic marker data and Structural Equation Model (SEM) comparison. The NEO software implements several manual and automatic methods for building multi-marker QTL to create directed networks. Networks are oriented by considering each edge separately, thus reducing error propagation. To summarize the genetic evidence in favor of a given edge orientation, we propose several edge orienting scores: the Local SEM-based Edge Orienting (LEO) score compares the fit of several competing causal graphs; the correlation-based edge orienting scores are fast approximations to the LEO scores. SEM fitting indices allow the user to assess local and overall model fit. The NEO software allows the user to carry out a robustness analysis with regard to genetic marker selection. We demonstrate NEO in both simulation and in application to the relationship between a disease gene (Cidec or Fsp27) and a weight-related gene co-expression module in liver.

The NEO software can be used to orient the edges of gene co-expression networks or quantitative trait networks if the edges can be anchored by significant QTL. R software tutorials, data, and supplementary material can be downloaded below.

Talk (ppt slides) and (pdf version)

R Software, Tutorials and Data

  1. Network Edge Orienting (NEO) Software, originally written by Jason E. Aten. NEO is written in the R language for statistical computing. The following file contains the most recent version (which was updated by Scott Richie and Peter Langfelder). neoDecember2015.txt

A less recent version can be found here (updatedneo.txt). It works with recent versions of R. The original version of NEO was written for R-2.5.1. You will see warnings if you use it in recent R versions. The original NEO codebase can be downloaded from (neo.txt.zip), (neo.txt.gz), or raw (neo.txt).

  1. Written documents - please cite these if you use NEO in your work.
    1. Aten JE, Fuller TF, Lusis AJ, Horvath S (2008) Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC Systems Biology 2008, 2:34. April 15.
    2. Jason's Ph.D. Dissertation, which contains all technical details on NEO. Aten, Jason Erik. Causal not Confounded Gene Networks: Inferring Acyclic and Non-acyclic Gene Bayesian Networks in mRNA Expression Studies using Recursive V-Structures, Genetic Variation, and Orthogonal Causal Anchor Structural Equation Models. 2008. Ph.D. dissertation, UCLA Department of Biomathematics.
  2. Tutorials
    1. MiniTutorial.on.NEO2.doc
      • Tutorial illustrating simple NEO use, without full analysis on actual data.
      • Depends on: CausalityFunctions.txt.
    2. GeneScreeningMiniTutorial2.doc
      • A tutorial on finding reactive genes. From a lecture by Tova Fuller.
    3. NEOTutorialMultiEdgeSimulation .doc
      • The multi-edge simulation and NEO graph recovery tutorial. This shows how to reproduce Figure 7 from the article.
      • And the CausalityFunctions.txt Function file used by the above tutorial (you will need both).
    4. Insig1Tutorial. GeneIdentification.doc
      • A simplified tutorial. Trimmed down from the complete file, Tutorial_for_Statclub_Insig1_11October2007_slim.doc.
      • dat.bxh.rdat: data file used in the tutorial.
    5. Tutorial_for_Statclub_Insig1_11October2007_slim.doc
      • The most complete analysis tutorial.
      • Also contains the BxH-ApoE-null data set that the tutorial uses.
  3. Methods to Reproduce Results in the paper, and additional studies not in the paper. (These are for archival/results reproduction rather than instructional purposes.)
    1. Methods. NEO.Simulate.Robustness.doc
      • NEO software and LEO scoring: simulation of multiple SNP phenotypes and robustness of detection. Here we provide R code that illustrates the NEO software's performance on simulated data for the robustness analysis.
    2. Methods .NEO. Null.Model.simulations.doc
      • Simulation models studying the dependence of the LEO.NB scores on the number of SNPs and SNP selection method. Here we provide R code that shows how we carried out simulations to evaluate the LEO.NB scores.
    3. Methods. Robustness.of.Insig1.Fdft1.Dhcr7.male.female.doc
      • Automated SNP selection Robustness for the Insig1 and Fdft1/Dhcr7 positive controls for the cholesterol biosynthesis.
    4. Methods. Compare.Power.Single.vs.Orthomarker.doc
      • Study of Single Anchor vs. Orthogonal Causal Anchor LEO scoring: statistical power and false positive rate graphs.
      • Methods..Insig1.Dhcr7.Fdft1.with.automated.snp.selection.doc An application of NEO software and LEO scoring: Ingsig1 and downstream genes. These genes appear to be causally reactive to Insig1.
    5. Methods.Tutorial. confirm.Insig1.in.males.doc
      • See if any of the Insig1->gene relationships observed in Female BxH-Apoe mice also confirm in the Males.
    6. Methods. Insig1.doc
      • Find markers that are consistent with Insig1 being upstream of Fdft1 and Dhcr7. Start with Female BxH-ApoE-null data.
      • Build a minimal model for the genetic control of Insig1.
      • Screen for novel genes downstream of Insig1 by repeating the Single Marker Analysis this time holding the markers fixed.
      • Check for confirmation in Males.
    7. Methods.Insig1. Supplement.doc
      • Application of NEO software and LEO scoring: Ingsig1 and downstream genes - long log of analysis leading to Table with Fourteen positive control genes for Insig1->gene, female BxH-ApoE null data.
      • Warning: This is a very long (~300 pages) analysis log.
  4. Data sets. These are used in the Methods documents below.
    1. liver. 1146snps.23388mrna.21clinical.bxh.apoe.null.rdat.zip
      • BxH-ApoE-null data -- imputed data with no NA missing data, no random location SNPs.
      • This is the main data set and the one you would want to use to verify results.
      • Annotation for gene chromosome location and position were added after the tutorials were written.
    2. liver .snps.23388genes.clinical.bxh.male.and.female.rdat.zip
      • BxH-ApoE-null data -- Same as above but not all SNPs imputed, and includes random SNPs.
    3. bxd.fsp27.blue.df.imp.rdat.zip
      • BxD F2 cross used to check the relationship between Fsp27 and the PCBlue eigengene. Missing data imputed.
    4. bxd.fsp27.blue.df.rdat.zip
      • BxD F2 cross used to check the relationship between Fsp27 and the PCBlue eigengene. Missing data NOT imputed.
    5. insig1_genes_and_blue_module_30oct2007.zip
      • Data set used to evaluate genes that appear downstream of Insig1 and the list of Blue Module eigengenes.
    6. insig1. complete.validation.genes.bxd.set.imputed.rdat.zip
      • Data set from BxD F2 mice that was used to check the genes reactive to Insig1.
    7. insig1.bxd. downstream.robust.results.rdat.zip
      • Data set for the robustness analysis of the "downstream of Insig1 genes" in the BxD.

The following references have used NEO.

Presson AP , Sobel EM , Papp JC , Suarez CJ , Whistler T, Rajeevan MS, Vernon SD, Horvath S (2008) Integrated weighted gene co-expression network analysis with an application to chronic fatigue syndrome. BMC Systems Biology 2008, 2:95

Maclennan NK, Dong J, Aten JE, Horvath S, Rahib L, Ornelas L, Dipple KM, McCabe ER (2009)Weighted gene co-expression network analysis identifies biomarkers in glycerol kinase deficient mice.Mol Genet Metab. 2009 May 27

Farber CR, Aten JE, Farber EA, de Vera V, Gularte R, Islas-Trejo A, Wen P, Horvath S, Lucero M, Lusis AJ,Medrano JF (2009) Genetic dissection of a major mouse obesity QTL (Carfhg2): integration of geneexpression and causality modeling.Physiol Genomics. 2009 May 13;37(3):294-302.

Plaisier CL, Horvath S, Huertas-Vazquez A, Cruz-Bautista I, Herrera MF, Tusie-Luna T, Aguilar-Salinas C, Pajukanta P (2009) A systems genetics approach implicates USF1, FADS3 and other causal candidate genes for familial combined hyperlipidemia. PloS Genetics;5(9):e1000642