用户名: 密码: 验证码:
I/O efficient: computing SCCs in massive graphs
详细信息    查看全文
  • 作者:Zhiwei Zhang (1)
    Jeffrey Xu Yu (1)
    Lu Qin (2)
    Lijun Chang (3)
    Xuemin Lin (3)

    1. The Chinese University of Hong Kong
    ; Shatin ; N.T. ; Hong Kong ; China
    2. Centre for QCIS
    ; FEIT ; University of Technology Sydney ; Sydney ; Australia
    3. The University of New South Wales
    ; Sydney ; Australia
  • 关键词:Graph processing ; I/O efficiency ; Directed acyclic graphs ; Strongly connected components
  • 刊名:The VLDB Journal
  • 出版年:2015
  • 出版时间:April 2015
  • 年:2015
  • 卷:24
  • 期:2
  • 页码:245-270
  • 全文大小:1,474 KB
  • 参考文献:1. Abello, J., Buchsbaum, A.L., Westbrook, J.: A functional approach to external graph algorithms. Algorithmica 32(3), 437鈥?58 (2002)
    2. Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116鈥?127 (1988)
    3. Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
    4. Ajwani, D., Dementiev, R., Meyer, U.: A computational study of external-memory bfs algorithms. In: Proceedings of SODA鈥?6 (2006)
    5. Ajwani, D., Meyer, U.: Algorithmics of Large and Complex Networks, Chapter 1: Design and Engineering of External Memory Traversal Algorithms for General Graphs. Springer, Berlin (2009)
    6. Ajwani, D., Meyer, U., Osipov, V.: Improved external memory bfs implementation. In: Proceedings of ALENEX鈥?7 (2007)
    7. Buchsbaum, A.L., Goldwasser, M.H., Venkatasubramanian, S., Westbrook, J.: On external memory graph traversal. In: Proceedings of SODA鈥?0 (2000)
    8. Chiang, Y.-J., Goodrich, M.T., Grove, E.F., Tamassia, R., Vengroff, D.E., Vitter, J.S.: External-memory graph algorithms. In: Proceedings of SODA鈥?5 (1995)
    9. Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. McGraw-Hill, NY (2001)
    10. Cosgaya-Lozano, A., Zeh, N.: A heuristic strong connectivity algorithm for large graphs. In: Proceedings of SEA鈥?9 (2009)
    11. Dementiev, R., Sanders, P., Schultes, D., Sibeyn, J.F.: Engineering an external memory minimum spanning tree algorithm. In: IFIP TCS (2004)
    12. Fan, W., Li, J., Ma, S., Wang, H., Wu, Y.: Graph homomorphism revisited for graph matching. PVLDB 3(1), 1161鈥?172 (2010)
    13. Hellings, J., Fletcher, G.H., Haverkort, H.: Efficient external-memory bisimulation on dags. In: Proceedings of SIGMOD鈥?2 (2012)
    14. Kumar, V., Schwabe, E.J.: Improved algorithms and data structures for solving graph problems in external memory. In: Proceedings of SPDP鈥?6 (1996)
    15. Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a pc. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI鈥?2, pp. 31鈥?6, Berkeley, CA, USA. USENIX Association (2012)
    16. Mehlhorn, K., Meyer, U.: External-memory breadth-first search with sublinear i/o. In: Proceedings of ESA鈥?2 (2002)
    17. Meyer, U., Osipov, V.: Design and implementation of a practical i/o-efficient shortest paths algorithm. In: Proceedings of ALENEX鈥?9 (2009)
    18. Meyer, U., Zeh, N.: I/O-efficient undirected shortest paths. In: Proceedings of ESA鈥?3 (2003)
    19. Meyer, U., Zeh, N.: I/O-efficient undirected shortest paths with unbounded edge lengths. In: Proceedings of ESA鈥?6 (2006)
    20. Sibeyn, J.F.: External connected components. In: Proceedings of SWAT鈥?4 (2004)
    21. Sibeyn, J.F., Abello, J., Meyer, U.: Heuristics for semi-external depth first search on directed graphs. In: Proceedings of SPAA鈥?2 (2002)
    22. Tarjan, RE (1972) Depth-first search and linear graph algorithms. SIAM J. Comput. 1: pp. 146-160 new window">CrossRef
    23. Vitter, J.S.: External memory algorithms and data structures. ACM Comput. Surv. 33(2), 209鈥?71 (2001)
    24. Yildirim, H., Chaoji, V., Zaki, M.J.: Grail: scalable reachability index for large graphs. PVLDB, 3(1), 276鈥?84 (2010)
    25. Zhang, Z., Yu, J.X., Qin, L., Chang, L., Lin, X.: I/o efficient: computing sccs in massive graphs. In: Proceedings of SIGMOD鈥?3 (2013)
  • 刊物类别:Computer Science
  • 刊物主题:Database Management
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:0949-877X
文摘
A strongly connected component ( \(\mathsf {SCC}\) ) is a maximal subgraph of a directed graph \(G\) in which every pair of nodes is reachable from each other in the \(\mathsf {SCC}\) . With such a property, a general directed graph can be represented by a directed acyclic graph (DAG) by contracting every \(\mathsf {SCC}\) of \(G\) to a node in DAG. In many real applications that need graph pattern matching, topological sorting, or reachability query processing, the best way to deal with a general directed graph is to deal with its DAG representation. Therefore, finding all \(\mathsf {SCC}\) s in a directed graph \(G\) is a critical operation. The existing in-memory algorithms based on depth first search (DFS) can find all \(\mathsf {SCC}\) s in linear time with respect to the size of a graph. However, when a graph cannot reside entirely in the main memory, the existing external or semi-external algorithms to find all \(\mathsf {SCC}\) s have limitation to achieve high I/O efficiency. In this paper, we study new I/O-efficient semi-external algorithms to find all \(\mathsf {SCC}\) s for a massive directed graph \(G\) that cannot reside in main memory entirely. To overcome the deficiency of the existing DFS-based semi-external algorithm that heavily relies on a total order, we explore a weak order based on which we investigate new algorithms. We propose a new two-phase algorithm, namely, tree construction and tree search. In the tree construction phase, a spanning tree of \(G\) can be constructed in bounded number of sequential scans of \(G\) . In the tree search phase, it needs to sequentially scan the graph once to find all \(\mathsf {SCC}\) s. In addition, we propose a new single-phase algorithm, which combines the tree construction and tree search phases into a single phase, with three new optimization techniques. They are early acceptance, early rejection, and batch processing. By the single-phase algorithm with the new optimization techniques, we can significantly reduce the number of I/Os and the CPU cost. We prove the correctness of the algorithms. We conduct extensive experimental studies using 4 real datasets including a massive real dataset and several synthetic datasets to confirm the I/O efficiency of our approaches.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700