用户名: 密码: 验证码:
New scoring schema for finding motifs in DNA Sequences
详细信息    查看全文
  • 作者:Fatemeh Zare-Mirakabad (1)
    Hayedeh Ahrabian (2)
    Mehdei Sadeghi (3) (4)
    Abbas Nowzari-Dalini (2)
    Bahram Goliaei (1)
  • 刊名:BMC Bioinformatics
  • 出版年:2009
  • 出版时间:December 2009
  • 年:2009
  • 卷:10
  • 期:1
  • 全文大小:1262KB
  • 参考文献:1. Zhou Q, Liu J: Modeling within-motif dependence for transcription factor binding site predictions. / Bioinformatics 2004, 20:909鈥?16. CrossRef
    2. Hertzberg L, Zuk O, Getz G, Domany E: Finding Motifs in Promoter Regions. / J Compu Biology 2005, 12:314鈥?30. CrossRef
    3. Sandelin A, Wasserman W, Lenhard B: ConSite: web-based prediction of regulatory elements using cross-species comparision. / Nucleic Acids Res 2004, 32:W249-W252. CrossRef
    4. Kel A, G枚脽ling E, Reuter I, Cheremushkin E, Kel-Margoulis O, Wingender E: MATCH: A tool for searching transcription factor binding sites in DNA sequences. / Nucleic Acids Res 2003, 31:3576鈥?579. CrossRef
    5. Marinescu V, Kohane I, Riva A: MAPPER: A search engine for the computational identification of putative transcription factor binding sites in multiple genomes. / BMC Bioinformatics 2005, 6:79. CrossRef
    6. Hertz G, Hartzell G, Stormo G: Identification of consensus patterns in unaligned DNA sequences known to be functionally related. / Comput Appl Biosci 1990,6(2):81鈥?2.
    7. Loots G, Ovcharenkol I: rVISTA 2.0: Evolutionary analysis of transcription factor binding sites. / Nucleic Acids Res 2004, 32:W217-W221. CrossRef
    8. Lawrence C, Altschul S, Bogusky M, Liu J, Neuwald A, Wootton J: Detecting subtle sequence signals: Gibbs sampling strategy for multiple alignment. / Science 1993, 262:208鈥?14. CrossRef
    9. Hughes J, Estep P, Tavazoie S, Church G: Computational identification of cis-regulatory elements associated with functionally coherent groups of genes in Saccharomyces Cerevisiae . / J Mol Biology 2000, 296:1205鈥?214. CrossRef
    10. Bailey T, Elkan C: The value of priori knowledge in discovering motifs with MEME. / Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology AAAI Press, Menlo Park, CA 1995, 21鈥?9.
    11. Sinha S, Tompa M: YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation. / Nucleic Acids Res 2003, 31:3586鈥?588. CrossRef
    12. Day W, McMorris F: Critical comparision of consensus methods for molecular sequences. / Nucleic Acids Res 1992, 20:1093鈥?099. CrossRef
    13. Stormo G, Schneider T, Gold L: Characterization of translational initiation sites in E. Coli . / Nucleic Acids Res 1982, 10:2971鈥?996. CrossRef
    14. Schneider T, Stephens R: Sequence logos: A new way to display consensus sequences. / Nucleic Acids Res 1990, 18:6097鈥?100. CrossRef
    15. Blanchette M, Tompa M: Discovery of regulatory elements by a computational method for phylogenetic footprinting. / Genome Res 2002, 12:739鈥?48. CrossRef
    16. Marsan L, Sagot M: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. / J Comput Biol 2000, 7:345鈥?60. CrossRef
    17. Bortoluzzi S, Coppe A, Bisognin A, Pizzi C, Danieli G: A multistep bioinformatic approach detects putative regulatory elements in gene promoters. / BMC Bioinformatics 2005, 6:121鈥?36. CrossRef
    18. Benos P, Bulyk M, Stormo G: Additivity in Protein-DNA interactions: how good an approximation is it? / Nucleic Acids Res 2002, 30:4442鈥?451. CrossRef
    19. Bulyk M, Johnson P, Church G: Nucleotides of transcription factor binding site exert independent effects on the binding affinities of transcription factors. / Nucleic Acids Res 2002, 30:1255鈥?261. CrossRef
    20. Barash Y, Elidan G, Friedman N, Kaplan T: Modeling dependencies in protein-DNA binding sites. / Proceedings of the seventh annual international conference on Research in computational molecular biology Berlin, Germany: ACM, New York, NY 2003, 28鈥?7.
    21. Zhao X, Huang H, Speed T: Finding short DNA motifs using permuted Markov models. / J Comput Biol 2005, 12:894鈥?06. CrossRef
    22. Ellrott K, Yang C, Sladek F, Jiang T: Identifiying transcription factor binding sites through Markov chain optimization. / Bioinformatics 2002, 18 Suppl 2:S100-S109.
    23. King O, Roth F: A non-parametric model for transcription factor binding sites. / Nucleic Acids Res 2003, 31:e116. CrossRef
    24. Tomovic A, Oakeley E: Position dependencies in transcription factor binding sites. / Bioinformatics 2007, 23:933鈥?41. CrossRef
    25. Pevzner P, Sze S: Combinatorial approaches to finding subtle signals in DNA sequences. / Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology AAAI Press, Menlo Park, CA 2000, 269鈥?78.
    26. Stormo G: Information content and free energy in DNA-Protein interaction. / J Theor Biol 1998, 195:135鈥?37. CrossRef
    27. Benos P, Lapedes A, Stormo G: Probabilistic code for DNA recognition by proteins of EGR family. / J Mol Biol 2002, 323:701鈥?27. CrossRef
    28. Lenhard B, Wasserman W: TFBS: Computational framework for transcription factor binding site analysis. / Bioinformatics 2002, 18:1135鈥?136. CrossRef
    29. Wingender E, Dietze P, Karas H, Knuppel R: TRANSFAC: A database on transcription factors and their DNA binding sites. / Nucleic Acids Res 1996, 24:238鈥?41. CrossRef
    30. Sandve G, Abul O, Walseng V, Drabl酶s F: Improved benchmarks for computational motif discovery. / BMC Bioinformatics 2007, 8:193. CrossRef
    31. Tompa M, Li N, Bailey T, Church G, De Moor B, Eskin E, Favorov A, Frith M, Fu Y, Kent W, Makeev V, Mironov A, Noble W, Pavesi G, Pesole G, Regnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z: Assessing computational tools for the discovery of transcription factor binding sites. / Nat Biotechnol 2005, 23:137鈥?44. CrossRef
    32. Burset M, Guigo R: Evaluation of gene structure prediction programs. / Genomics 1996, 34:353鈥?67. CrossRef
    33. Zhu J, Zhang M: SCPD: A promoter database of yeast Saccharomyces Cerevisiae . / Bioinformatics 1999, 15:563鈥?77. CrossRef
  • 作者单位:Fatemeh Zare-Mirakabad (1)
    Hayedeh Ahrabian (2)
    Mehdei Sadeghi (3) (4)
    Abbas Nowzari-Dalini (2)
    Bahram Goliaei (1)

    1. Department of Bioinformatics, Institute of Biochemistry and Biophysics,University of Tehran, Tehran, Iran
    2. Center of Excellence in Biomathematics, School of Mathematics,Statistics,and Computer Science,University of Tehran, Tehran, Iran
    3. National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
    4. School of Computer Science,Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran
  • ISSN:1471-2105
文摘
Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict) known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple mathematical calculations. By implementing our method on several biological data sets, it can be induced that this method performs better than methods that do not consider dependencies.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700