用户名: 密码: 验证码:
Microarray data mining using landmark gene-guided clustering
详细信息    查看全文
  • 作者:Pankaj Chopra (1)
    Jaewoo Kang (2) (3)
    Jiong Yang (4)
    HyungJun Cho (3) (5)
    Heenam Stanley Kim (6)
    Min-Goo Lee (7)
  • 刊名:BMC Bioinformatics
  • 出版年:2008
  • 出版时间:December 2008
  • 年:2008
  • 卷:9
  • 期:1
  • 全文大小:892KB
  • 参考文献:1. Jiang D, Tang C, Zhang A: Cluster Analysis for Gene Expression Data: A Survey. / IEEE Transactions on Knowledge and Data Engineering 2004,16(11):1370鈥?386. CrossRef
    2. Handl J, Knowles J, Kell DB: Computational cluster validation in post-genomic data analysis. / Bioinformatics 2005,21(15):3201鈥?212. CrossRef
    3. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. [http://dx.doi.org/10.1038/10343] / Nat Genet 1999,22(3):281鈥?85. CrossRef
    4. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. / PNAS 1998,95(25):14863鈥?4868. CrossRef
    5. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. [http://www.pnas.org/cgi/content/abstract/96/6/2907] / PNAS 1999,96(6):2907鈥?912. CrossRef
    6. Parsons L, Haque E, Liu H: Subspace clustering for high dimensional data: a review. / SIGKDD Explor Newsl 2004, 6:90鈥?05. CrossRef
    7. Fern X, Brodley C: Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach. / The Twentieth International Conference on Machine Learning (ICML-2003) 2003.
    8. Papadimitriou CH, Raghavan P, Tamaki H, Vempala S: Latent semantic indexing: a probabilistic analysis. / J Comput Syst Sci 2000,61(2):217鈥?35. CrossRef
    9. Deegalla S, Bostrom H: Reducing High-Dimensional Data by Principal Component Analysis vs. Random Projection for Nearest Neighbor Classification. / icmla 2006, 245鈥?50.
    10. Bingham E, Mannila H: Random projection in dimensionality reduction: applications to image and text data. / Knowledge Discovery and Data Mining 2001, 245鈥?50.
    11. Dasgupta S: Experiments with Random Projection. / UAI '00: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence 2000, 143鈥?51.
    12. Fradkin D, Madigan D: Experiments with Random Projections for Machine Learning. / SIGKDD2003 2003.
    13. Cheng Y, Church GM: Biclustering of Expression Data. / Eighth International Conference on Intelligent Systems for Molecular Biology 2000, 93鈥?03.
    14. Zhao L, Zaki MJ: TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data. / Proceedings of the 2005 ACM SIGMOD international conference on Management of data New York, NY, USA: ACM Press 2005, 694鈥?05.
    15. Basu S, Banerjee A, Mooney RJ: Active Semi-Supervision for Pairwise Constrained Clustering. 2004, 333鈥?44.
    16. Bilenko M, Basu S, Mooney RJ: Integrating constraints and metric learning in semi-supervised clustering. / ICML '04: Proceedings of the twenty-first international conference on Machine learning New York, NY, USA: ACM 2004, 11.
    17. Wagsta K, Cardie C, Rogers S, Schroedl S: Constrained K-means Clustering with Background Knowledge. / Proceedings of 18th International Conference on Machine Learning (ICML-01) 2001, 577鈥?84.
    18. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. / Nat Genet 2000, 25:25鈥?9. CrossRef
    19. Liu J, Wang W, Yang J: A framework for ontology-driven subspace clustering. / KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining 2004, 623鈥?28.
    20. Khatri P, Draghici S: Ontological analysis of gene expression data: current tools, limitations, and open problems. / Bioinformatics 2005,21(18):3587鈥?595. CrossRef
    21. Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA: Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. / Nucl Acids Res 2003,31(13):3775鈥?781. CrossRef
    22. Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, Bussey KJ, Riss J, Barrett JC, Weinstein JN: GoMiner: a resource for biological interpretation of genomic and proteomic data. / Genome Biol 2003.,4(4):
    23. Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. / Bioinformatics 2004,20(9):1464鈥?465. CrossRef
    24. Consortium GO: The Gene Ontology (GO) project in 2006. / Nucl Acids Res 2006,34(suppl 1):D322鈥?26. CrossRef
    25. Marzouki N, Camier S, Ruet A, Moenne A, Sentenac A: Selective proteolysis defines two DNA binding domains in yeast transcription factor. / Nature 1986, 323:176鈥?78. CrossRef
    26. Wang X, Sato R, Brown MS, Hua X, Goldstein JL: SREBP-1, a membrane-bound transcription factor released by sterol-regulated proteolysis. / Cell 1994, 77:53鈥?2. CrossRef
    27. Tansey W: Transcriptional regulation: RUPture in the ER. / Nat Cell Biol 2000, 2:175鈥?77. CrossRef
    28. Cross F, Levine K: Regulation of the yeast cell cycle by transcription and proteolysis of cyclin-dependent kinase regulators. / Kidney International 1999, 56:1185鈥?186. CrossRef
    29. Deshaies RJ: Phosphorylation and proteolysis: partners in the regulation of cell division in budding yeast. / Curr Op Gen and Development 1997, 7:7鈥?6. CrossRef
    30. Tyers M, Tokiwa G, Nash R, Futcher B: The Cln3-Cdc28 kinase complex of S. cerevisiae is regulated by proteolysis and phosphorylation. / EMBO J 1992,11(5):1773鈥?784.
    31. Izumi M, Yatagai F, Hanaoka F: Cell cycle-dependent proteolysis and phosphorylation of human Mcm10. / J Biol Chem 2001,276(51):M107190200.
    32. Price MA, Kalderon D: Proteolysis of the Hedgehog Signaling Effector Cubitus interruptus Requires Phosphorylation by Glycogen Synthase Kinase 3 and Casein Kinase 1. / Cell 2002,108(6):823鈥?35. CrossRef
    33. Elion EA, Qi M, Chen W: SIGNAL TRANSDUCTION: Signaling Specificity in Yeast. / Science 2005,307(5710):687鈥?88. CrossRef
    34. Xie Z, Chen Z: Salicylic Acid Induces Rapid Inhibition of Mitochondrial Electron Transport and Oxidative Phosphorylation in Tobacco Cells. / Plant Physiol 1999, 120:217鈥?26. CrossRef
    35. Nussbaum RL: Mining yeast in silico unearths a golden nugget for mitochondrial biology. / J Clin Invest 2005,115(10):2689鈥?691. CrossRef
    36. Mattoon JR, Sherman F: Reconstitution of Phosphorylating Electron Transport in Mitochondria from a Cytochrome c-deficient Yeast Mutant. / J Biol Chem 1966,241(19):4330鈥?338.
    37. Wakiyama S, Ogura Y: Oxidative phosphorylation and the electron transport system of castor bean mitochondria. / Plant Cell Physiol 1970,11(6):835鈥?48.
    38. Van Verseveld HW, Stouthamer AH: Electron-transport chain and coupled oxidative phosphorylation in methanol-grown Paracoccus denitrificans. / J Arch Microbiology 1978, 118:13鈥?0. CrossRef
    39. Hatefi Y: The Mitochondrial Electron Transport and Oxidative Phosphorylation System. / Annu Rev Biochem 1985, 54:1015鈥?069. CrossRef
    40. Allakhverdiev SI, Nishiyama Y, Takahashi S, Miyairi S, Suzuki I, Murata N: Systematic Analysis of the Relation of Electron Transport and ATP Synthesis to the Photodamage and Repair of Photosystem II in Synechocystis. / Plant Physiol 2005, 137:263鈥?73. CrossRef
    41. Allen JF: Photosynthesis of ATPElectrons, Proton Pumps, Rotors, and Poise. / Cell 2002,110(3):273鈥?76. CrossRef
    42. Miller J, Nawarathna D, Vajrala V, Gardner J, Widger W: Electromagnetic probes of molecular motors in the electron transport chains of mitochondria and chloroplasts. 2005.
    43. Faxen K, Gilderson G, Adelroth P, Brzezinski P: A mechanistic principle for proton pumping by cytochrome c oxidase. / Nature 2005, 437:286鈥?89. CrossRef
    44. Belevich I, Verkhovsky MI, Wikstrm M: Proton-coupled electron transfer drives the proton pump of cytochrome c oxidase. / Nature 2006, 440:829鈥?32. CrossRef
    45. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. / Mol Biol Cell 1998,9(12):3273鈥?297.
    46. R Development Core Team: / R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria 2006. [ISBN 3鈥?00051鈥?7鈥?]
    47. DeRisi JL, Iyer VR, Brown PO: Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale. / Science 1997,278(5338):680鈥?86. CrossRef
    48. Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. / Mol Biol Cell 2000,11(12):4241鈥?257.
    49. Kang J, Yang J, Xu W, Chopra P: Integrating Heterogeneous Microarray Data Sources Using Correlation Signatures. / DILS, Volume 3615 of Lecture Notes in Computer Science / (Edited by: Lud盲scher B, Raschid L). Springer 2005, 105鈥?20.
    50. D'Haeseleer P: How does gene expression clustering work? / Nature Biotechnology 2005,23(12):1499鈥?501. CrossRef
    51. Tseng GC, Wong WH: Tight Clustering: A Resampling-Based Approach for Identifying Stable and Tight Patterns in Data. / Biometrics 2005, 61:10鈥?6. CrossRef
    52. Yeung K, Medvedovic M, Bumgarner R: Clustering gene-expression data with repeated measurements. [http://genomebiology.com/2003/4/5/R34] / Genome Biology 2003,4(5):R34. CrossRef
    53. Allison DB, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. / Nature Reviews Genetics 2006,7(5):406鈥?06. CrossRef
    54. Zhou XJ, Kao MCJ, Huang H, Wong A, Nunez-Iglesias J, Primig M, Aparicio OM, Finch CE, Morgan TE, Wong WH: Functional annotation and network reconstruction through cross-platform integration of microarray data. / Nature Biotechnology 2005,23(2):238鈥?43. CrossRef
    55. Huang D, Wei P, Pan W: Combining Gene Annotations and Gene Expression Data in Model-Based Clustering: Weighted Method. / OMICS: A Journal of Integrative Biology 2006, 10:28. CrossRef
    56. Kabbarah O, Mallon MA, Pfeifer JD, Goodfellow PJ: Transcriptional profiling endometrial carcinomas microdissected from DES-treated mice identifies changes in gene expression associated with estrogenic tumor promotion. / International Journal of Cancer 2006,119(8):1843鈥?849. CrossRef
    57. Casati P, Stapleton AE, Blum JE, Walbot V: Genome-wide analysis of high-altitude maize and gene knockdown stocks implicates chromatin remodeling proteins in response to UV-B. / The Plant Journal 2006,46(4):613鈥?27. CrossRef
    58. Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA: Global functional profiling of gene expression. / Genomics 2003,81(2):98鈥?04. CrossRef
  • 作者单位:Pankaj Chopra (1)
    Jaewoo Kang (2) (3)
    Jiong Yang (4)
    HyungJun Cho (3) (5)
    Heenam Stanley Kim (6)
    Min-Goo Lee (7)

    1. Dept. of ComputerScience, NorthCarolina StateUniverstiy, Raleigh, NC27606, USA
    2. Dept. of Computer Science and Engineering, Korea University, Seoul, Korea
    3. Dept. of Biostatistics, College of Medicine, Korea University, Seoul, Korea
    4. Case Western Reserve University, Cleveland, OH-44106, USA
    5. Dept. of Statistics, Korea University, Seoul, Korea
    6. Bioinformatics and Functional Genomics Laboratory, Graduate School of Medicine, Korea University, Seoul, Korea
    7. Department of Physiology, College of Medicine, Korea University, Seoul, Korea
  • ISSN:1471-2105
文摘
Background Clustering is a popular data exploration technique widely used in microarray data analysis. Most conventional clustering algorithms, however, generate only one set of clusters independent of the biological context of the analysis. This is often inadequate to explore data from different biological perspectives and gain new insights. We propose a new clustering model that can generate multiple versions of different clusters from a single dataset, each of which highlights a different aspect of the given dataset. Results By applying our SigCalc algorithm to three yeast Saccharomyces cerevisiae datasets we show two results. First, we show that different sets of clusters can be generated from the same dataset using different sets of landmark genes. Each set of clusters groups genes differently and reveals new biological associations between genes that were not apparent from clustering the original microarray expression data. Second, we show that many of these new found biological associations are common across datasets. These results also provide strong evidence of a link between the choice of landmark genes and the new biological associations found in gene clusters. Conclusion We have used the SigCalc algorithm to project the microarray data onto a completely new subspace whose co-ordinates are genes (called landmark genes), known to belong to a Biological Process. The projected space is not a true vector space in mathematical terms. However, we use the term subspace to refer to one of virtually infinite numbers of projected spaces that our proposed method can produce. By changing the biological process and thus the landmark genes, we can change this subspace. We have shown how clustering on this subspace reveals new, biologically meaningful clusters which were not evident in the clusters generated by conventional methods. The R scripts (source code) are freely available under the GPL license. The source code is available [see Additional File 1] as additional material, and the latest version can be obtained at http://www4.ncsu.edu/~pchopra/landmarks.html. The code is under active development to incorporate new clustering methods and analysis.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700