用户名: 密码: 验证码:
Towards Order-Preserving SubMatrix Search and Indexing
详细信息    查看全文
  • 作者:Tao Jiang (17)
    Zhanhuai Li (17)
    Qun Chen (17)
    Kaiwen Li (17)
    Zhong Wang (17)
    Wei Pan (17)

    17. School of Computer Science and Technology
    ; Northwestern Polytechnical University ; 710072 ; Xi鈥檃n ; China
  • 关键词:OPSM ; Gene expression data ; pIndex ; FIT ; Queries
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2015
  • 出版时间:2015
  • 年:2015
  • 卷:9050
  • 期:1
  • 页码:309-326
  • 全文大小:761 KB
  • 参考文献:1. Barkow, S, Bleuler, S, Preli膰, A, Zimmermann, P, Zitzler, E (2006) Bicat: a biclustering analysis toolbox. Bioinformatics 22: pp. 1282-1283 CrossRef
    2. Ben-Dor, A., Chor, B., et al.: Discovering local structure in gene expression data: the order-preserving submatrix problem. In: RECOMB, pp. 49鈥?7 (2002)
    3. Boyer, RS, Moore, JS (1977) A fast string searching algorithm. Communications of the ACM 20: pp. 762-772 CrossRef
    4. BroadInstitute: Datasets.rar and 5q_gct_file.gct. http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
    5. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Bourne, P.E., Gribskov, M., et al. (eds.) ISMB, pp. 93鈥?03. AAAI (2000)
    6. Chui, C.K., Kao, B., et al.: Mining order-preserving submatrices from data with repeated measurements. In: ICDM, pp. 133鈥?42. IEEE Computer Society (2008)
    7. Fang, Q., Ng, W., Feng, J.: Discovering significant relaxed order-preserving submatrices. In: KDD, pp. 433鈥?42. ACM (2010)
    8. Fang, Q, Ng, W, Feng, J, Li, Y (2012) Mining bucket order-preserving submatrices in gene expression data. IEEE Trans. Knowl. Data Eng. 24: pp. 2218-2231 CrossRef
    9. Fang, Q, Ng, W, Feng, J, Li, Y (2014) Mining order-preserving submatrices from probabilistic matrices. ACM Transactions on Database Systems (TODS) 39: pp. 6 CrossRef
    10. Gao, B.J., Griffith, O.L., Ester, M., Jones, S.J.M.: Discovering significant opsm subspace clusters in massive gene expression data. In: Eliassi-Rad, T., Ungar, L.H., Craven, M., Gunopulos, D. (eds.) KDD, pp. 922鈥?28. ACM (2006)
    11. Jiang, D., Pei, J., Zhang, A.: Gpx: interactive mining of gene expression data. In: Nascimento, M.A., 脰zsu, M.T., Kossmann, D., Miller, R.J., Blakeley, J.A., Schiefer, K.B. (eds.) VLDB, pp. 1249鈥?252. Morgan Kaufmann (2004)
    12. Jiang, T, Li, Z, Chen, Q, Wang, Z, Pan, W, Wang, Z Parallel partitioning and mining gene expression data with butterfly network. In: Decker, H, Lhotsk谩, L, Link, S, Basl, J, Tjoa, AM eds. (2013) Database and Expert Systems Applications. Springer, Heidelberg, pp. 129-144 CrossRef
    13. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323鈥?50 (1977)
    14. KNUTTn, D.: The art of computer programming, vol 3: Sorting and searching (1973)
    15. Liu, J., Wang, W.: Op-cluster: Clustering by tendency in high dimensional space. In: ICDM. pp. 187鈥?94. IEEE Computer Society (2003)
    16. McCreight, EM (1976) A space-economical suffix tree construction algorithm. Journal of the ACM (JACM) 23: pp. 262-272 CrossRef
    17. Trapp, AC, Prokopyev, OA (2010) Solving the order-preserving submatrix problem via integer programming. INFORMS Journal on Computing 22: pp. 387-400 CrossRef
    18. Ukkonen, E (1995) On-line construction of suffix trees. Algorithmica 14: pp. 249-260 CrossRef
    19. Weiner, P.: Linear pattern matching algorithms. In: IEEE Conference Record of 14th Annual Symposium on Switching and Automata Theory, 1973. SWAT2008, pp. 1鈥?1. IEEE (1973)
    20. Yang, J., Wang, W., Wang, H., Yu, P.S.: \(\delta \) -clusters: capturing subspace correlation in a large data set. In: Agrawal, R., Dittrich, K.R. (eds.) ICDE, pp. 517鈥?28. IEEE Computer Society (2002)
    21. Zhang, M., Wang, W., Liu, J.: Mining approximate order preserving clusters in the presence of noise. In: Alonso, G., Blakeley, J.A., Chen, A.L.P. (eds.) ICDE, pp. 160鈥?68. IEEE (2008)
  • 作者单位:Database Systems for Advanced Applications
  • 丛书名:978-3-319-18122-6
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
Order-Preserving SubMatrix (OPSM) has been proved to be important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. Given an OPSM query based on row or column keywords, it is desirable to retrieve OPSMs quickly from a large gene expression dataset or OPSM data via indices. However, the time of OPSM mining from gene expression dataset is long and the volume of OPSM data is huge. In this paper, we investigate the issues of indexing two datasets above and first present a naive solution pfTree by applying prefix-Tree. Due to it is not efficient to search the tree, we give an optimization indexing method pIndex. Different from pfTree, pIndex employs row and column header tables to traverse related branches in a bottom-up manner. Further, two pruning rules based on number and order of keywords are introduced. To reduce the number of column keyword candidates on fuzzy queries, we introduce a First Item of keywords roTation method FIT, which reduces it from \(n!\) to \(n\) . We conduct extensive experiments with real datasets on a single machine, Hadoop and Hama, and the experimental results show the efficiency and scalability of the proposed techniques.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700