Correlation and Variation-Based Method for Identifying Reference Genes from Large Datasets

Authors

  • Maurice Han Tong Ling School of Chemical and Biomedical Engineering, Nanyang Technological University, Republic of Singapore

Keywords:

reference standards, computing methodologies

Abstract

Introduction: Reference genes are assumed to be stably expressed under most circumstances. Previous studies have shown that identification of potential reference genes using common algorithms, such as NormFinder, geNorm, and BestKeeper, are not suitable for microarray-sized datasets. The aim of this study was to evaluate existing methods and develop methods for identifying reference genes from microarray datasets. 

Methods: We evaluated the correlation between outputs from 7 published methods for identifying reference genes, including NormFinder, geNorm, and BestKeeper, using subsets of published microarray data. From these results, seven novel combinations of published methods for identifying reference genes were evaluated.

Results: Our results showed that NormFinder’s and geNorm’s indices had high correlations (R2 = 0.987, P < 0.0001), which is consistent with the findings of previous studies. However, NormFinder’s and BestKeeper’s indices (R2 = 0.489, 0.01 < P < 0.05) and NormFinder’s coefficient of variance (CV) suggested a lower correlation (R2 = 0.483, 0.01 < P < 0.05). We developed two novel methods with high correlations with NormFinder (R2 values of both methods were 0.796, P < 0.0001). In addition, computational times required by the two novel methods were linear with the size of the dataset.  

Conclusion: Our findings suggested that both of our novel methods can be used as alternatives to NormFinder, geNorm, and BestKeeper for identifying reference genes from large datasets. These methods were implemented as a tool, OLIgonucleotide Variable Expression Ranker (OLIVER), which can be downloaded from http://sourceforge. net/projects/bactome/files/OLIVER/OLIVER_1.zip 

 

References

Fedrigo O, Warner LR, Pfefferle AD, Babbitt CC, Cruz-Gordillo P, Wray GA. A Pipeline to Determine RT-QPCR Control Genes for Evolutionary Studies: Application to Primate Gene Expression across Multiple Tissues. PLoS ONE. 2010;5(9):e12545. dx.plos.org/10.1371/journal.pone.0012545. [PMC free article] [PubMed] [Google Scholar]

Remans T, Smeets K, Opdenakker K, Mathijsen D, Vangronsveld J, Cuypers A. Normalisation of real-time RT-PCR gene expression measurements in Arabidopsis thaliana exposed to increased metal concentrations. Planta. 2008;227:1343–9. dx.doi.org/10.1007/s00425-008-0706-4. [PubMed] [Google Scholar]

Agabian N, Thomashow L, Milhausen M, Stuart K. Structural analysis of variant and invariant genes in trypanosomes. American Journal of Tropical Medicine and Hygiene. 1980;29(Supplement 5):1043–9. http://www.ajtmh.org/cgi/pmidlookup?view=long&pmid=7435803. [PubMed] [Google Scholar]

Vandesompele J, de Preter K, Pattyn F. Accurate normalisation of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biology. 2002;3:0034.1–0034.11. http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/12184808/ [PMC free article] [PubMed] [Google Scholar]

Andersen CL, Jensen JL, Ørntoft TF. Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Research. 2004;64:5245–50. http://cancerres.aacrjournals.org/cgi/pmidlookup?view=long&pmid=15289330. [PubMed] [Google Scholar]

Pfaffl MW, Tichopad A, Prgomet C, Neuvians TP. Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper–Excel-based tool using pair-wise correlations. Biotechnology Letters. 2004;26(6):509–15. [PubMed] [Google Scholar]

Kanji G. 100 Statistical tests. 3rd edition. Los Angeles: Sage Publications; 2006. [Google Scholar]

Wurmbach E, Yuen T, Sealfon SC. Focused microarray analysis. Methods. 2003;31:306–16. [PubMed] [Google Scholar]

Chia CY, Lim CW, Leong WT, Ling MHT. High expression stability of microtubule affinity regulating kinase 3 (MARK3) makes it a reliable reference gene. IUBMB Life. 2010;62(3):200–3. dx.doi.org/10.1002/iub.295. [PubMed] [Google Scholar]

Heng SSJ, Chan OYW, Keng BMH, Ling MHT. Glucan biosynthesis protein G, (mdoG) is a suitable reference gene in Escherichia coli K-12. ISRN Microbiology. 2011 Article ID 469053. dx.doi.org/10.5402/2011/469053. [PMC free article] [PubMed] [Google Scholar]

Too HK, Ling MHT. Signal peptidase complex subunit 1 and hydroxyacyl-CoA dehydrogenase beta subunit are suitable reference genes in human lungs. ISRN Bioinformatics. 2012 Article ID 790452. dx.doi.org/10.5402/2012/790452. [PMC free article] [PubMed] [Google Scholar]

Keng BMH, Chan OYW, Heng SSJ, Ling MH. Transcriptome analysis of Spermophilus lateralis and Spermophilus tridecemlineatus liver does not duggest the presence of Spermophilus-liver-specific reference genes. ISRN Bioinformatics. 2013 Article ID 361321. dx.doi.org/10.1155/2013/361321. [PMC free article] [PubMed] [Google Scholar]

Too IHK, Heng SSJ, Chan OYW, Keng BMH, Chia CY, Lim CWX, Leong WT, Chu QH, Ang EJG, Lin YJ, Ling MHT. Identification of Reference Genes by Meta-Microarray Analyses. In: Rogers James V., editor. Microarrays: Principles, Applications and Technologies. New York: Nova Science Publishers, Inc; 2014. [Google Scholar]

Lee S, Jo M, Lee J, Koh SS, Kim S. Identification of novel universal housekeeping genes by statistical analysis of microarray data. Journal of Biochemistry and Molecular Biology. 2007;40(2):226–31. http://www.jbmb.or.kr/fulltext/jbmb/view.php?vol=40&page=226. [PubMed] [Google Scholar]

Wang Q, Ishikaw T, Michiu T, Zhu BL, Guan DW, Maeda H. Stability of endogenous reference genes in postmortem human brains for normalization of quantitative real-time PCR data: comprehensive evaluation using geNorm, NormFinder, and BestKeeper. International Journal of Legal Medicine. 2012;126(6):943–52. dx.doi.org/10.1007/s00414-012-0774-7. [PubMed] [Google Scholar]

Taki FA, Zhang B. Determination of reliable reference genes for multi-generational gene expression analysis on C. elegans exposed to abused drug nicotine. Psychopharmacology. 2013:1–12. http://dx.doi.org/10.1007/s00213-013-3139-0. [PMC free article] [PubMed] [Google Scholar]

Kim I, Yang D, Tang X, Carroll J. Reference gene validation for qPCR in rat carotid body during postnatal development. BMC Research Notes. 2011;4(1):440. http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22023793/ [PMC free article] [PubMed] [Google Scholar]

Klie M, Debener T. Identification of superior reference genes for data normalisation of expression studies via quantitative PCR in hybrid roses (Rosa hybrida) BMC Research Notes. 2011;4(1):518. http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22123042/ [PMC free article] [PubMed] [Google Scholar]

Ratert N, Meyer HA, Jung M, Mollenkopf HG, Wagner I, Miller K, et al. Reference miRNAs for miRNAome analysis of urothelial carcinomas. PLoS ONE. 2012;7(6):e39309. http://dx.plos.org/10.1371/journal.pone.0039309. [PMC free article] [PubMed] [Google Scholar]

Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO. Integrating high-throughput and computational data elucidates bacterial networks. Nature. 2004;429(6987):92–96. dx.doi.org/10.1038/nature02456. [PubMed] [Google Scholar]

Liu M, Durfee T, Cabrera JE, Zhao K, Jin DJ, Blattner FR. Global transcriptional programs reveal a carbon source foraging strategy by Escherichia coli. Journal of Biological Chemistry. 2005;280(16):15921–7. http://www.jbc.org/cgi/pmidlookup?view=long&pmid=15705577. [PubMed] [Google Scholar]

Massé E, Vanderpool CK, Gottesman S. Effect of RyhB small RNA on global iron use in Escherichia coli. Journal of Bacteriology. 2005;187(20):6962–71. http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/16199566/ [PMC free article] [PubMed] [Google Scholar]

Maurer LM, Yohannes E, Bondurant SS, Radmacher M, Slonczewski JL. pH regulates genes for flagellar motility, catabolism, and oxidative stress in Escherichia coli K-12. Journal of Bacteriology. 2005;187(1):304–19. http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/15601715/ [PMC free article] [PubMed] [Google Scholar]

Williams DR, Epperson LE, Li W, Hughes MA, Taylor R, Rogers J, et al. Seasonally hibernating phenotype assessed through transcript screening. Physiological Genomics. 2005;24(1):13–22. http://physiolgenomics.physiology.org/cgi/pmidlookup?view=long&pmid=16249311. [PubMed] [Google Scholar]

Li Q, Skinner J, Bennett J. Evaluation of reference genes for real-time quantitative PCR studies in Candida glabrata following azole treatment. BMC Molecular Biology. 2012;13(1):22. http://www.biomedcentral.com/1471-2199/13/22. [PMC free article] [PubMed] [Google Scholar]

Silver N, Best S, Jian J, Thein S. Selection of housekeeping genes for gene expression studies in human reticulocytes using real-time PCR. BMC Molecular Biology. 2006;7(1):33. http://www.biomedcentral.com/1471-2199/7/33. [PMC free article] [PubMed] [Google Scholar]

Crawford FI, Hodgkinson CL, Ivanova E, Logunova LB, Evans GJ, Steinlechner S, Loudon AS. Influence of torpor on cardiac expression of genes involved in the circadian clock and protein turnover in the Siberian hamster (Phodopus sungorus) Physiological Genomics. 2007;31(3):521–30. http://physiolgenomics.physiology.org/cgi/pmidlookup?view=long&pmid=17848604. [PubMed] [Google Scholar]

Published

2021-12-24