I/O efficient: computing SCCs in massive graphs

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

I/O efficient: computing SCCs in massive graphs

详细信息查看全文

作者：Zhiwei Zhang (1)
Jeffrey Xu Yu (1)
Lu Qin (2)
Lijun Chang (3)
Xuemin Lin (3)

1. The Chinese University of Hong Kong ; Shatin ; N.T. ; Hong Kong ; China
2. Centre for QCIS ; FEIT ; University of Technology Sydney ; Sydney ; Australia
3. The University of New South Wales ; Sydney ; Australia
关键词：Graph processing ; I/O efficiency ; Directed acyclic graphs ; Strongly connected components
刊名：The VLDB Journal
出版年：2015
出版时间：April 2015
年：2015
卷：24
期：2
页码：245-270
全文大小：1,474 KB
参考文献：1. Abello, J., Buchsbaum, A.L., Westbrook, J.: A functional approach to external graph algorithms. Algorithmica 32(3), 437鈥?58 (2002)
2. Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116鈥?127 (1988)
3. Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
4. Ajwani, D., Dementiev, R., Meyer, U.: A computational study of external-memory bfs algorithms. In: Proceedings of SODA鈥?6 (2006)
5. Ajwani, D., Meyer, U.: Algorithmics of Large and Complex Networks, Chapter 1: Design and Engineering of External Memory Traversal Algorithms for General Graphs. Springer, Berlin (2009)
6. Ajwani, D., Meyer, U., Osipov, V.: Improved external memory bfs implementation. In: Proceedings of ALENEX鈥?7 (2007)
7. Buchsbaum, A.L., Goldwasser, M.H., Venkatasubramanian, S., Westbrook, J.: On external memory graph traversal. In: Proceedings of SODA鈥?0 (2000)
8. Chiang, Y.-J., Goodrich, M.T., Grove, E.F., Tamassia, R., Vengroff, D.E., Vitter, J.S.: External-memory graph algorithms. In: Proceedings of SODA鈥?5 (1995)
9. Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. McGraw-Hill, NY (2001)
10. Cosgaya-Lozano, A., Zeh, N.: A heuristic strong connectivity algorithm for large graphs. In: Proceedings of SEA鈥?9 (2009)
11. Dementiev, R., Sanders, P., Schultes, D., Sibeyn, J.F.: Engineering an external memory minimum spanning tree algorithm. In: IFIP TCS (2004)
12. Fan, W., Li, J., Ma, S., Wang, H., Wu, Y.: Graph homomorphism revisited for graph matching. PVLDB 3(1), 1161鈥?172 (2010)
13. Hellings, J., Fletcher, G.H., Haverkort, H.: Efficient external-memory bisimulation on dags. In: Proceedings of SIGMOD鈥?2 (2012)
14. Kumar, V., Schwabe, E.J.: Improved algorithms and data structures for solving graph problems in external memory. In: Proceedings of SPDP鈥?6 (1996)
15. Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a pc. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI鈥?2, pp. 31鈥?6, Berkeley, CA, USA. USENIX Association (2012)
16. Mehlhorn, K., Meyer, U.: External-memory breadth-first search with sublinear i/o. In: Proceedings of ESA鈥?2 (2002)
17. Meyer, U., Osipov, V.: Design and implementation of a practical i/o-efficient shortest paths algorithm. In: Proceedings of ALENEX鈥?9 (2009)
18. Meyer, U., Zeh, N.: I/O-efficient undirected shortest paths. In: Proceedings of ESA鈥?3 (2003)
19. Meyer, U., Zeh, N.: I/O-efficient undirected shortest paths with unbounded edge lengths. In: Proceedings of ESA鈥?6 (2006)
20. Sibeyn, J.F.: External connected components. In: Proceedings of SWAT鈥?4 (2004)
21. Sibeyn, J.F., Abello, J., Meyer, U.: Heuristics for semi-external depth first search on directed graphs. In: Proceedings of SPAA鈥?2 (2002)
22. Tarjan, RE (1972) Depth-first search and linear graph algorithms. SIAM J. Comput. 1: pp. 146-160 new window">CrossRef
23. Vitter, J.S.: External memory algorithms and data structures. ACM Comput. Surv. 33(2), 209鈥?71 (2001)
24. Yildirim, H., Chaoji, V., Zaki, M.J.: Grail: scalable reachability index for large graphs. PVLDB, 3(1), 276鈥?84 (2010)
25. Zhang, Z., Yu, J.X., Qin, L., Chang, L., Lin, X.: I/o efficient: computing sccs in massive graphs. In: Proceedings of SIGMOD鈥?3 (2013)
刊物类别：Computer Science
刊物主题：Database Management
出版者：Springer Berlin / Heidelberg
ISSN：0949-877X

文摘

A strongly connected component ( \(\mathsf {SCC}\) ) is a maximal subgraph of a directed graph \(G\) in which every pair of nodes is reachable from each other in the \(\mathsf {SCC}\) . With such a property, a general directed graph can be represented by a directed acyclic graph (DAG) by contracting every \(\mathsf {SCC}\) of \(G\) to a node in DAG. In many real applications that need graph pattern matching, topological sorting, or reachability query processing, the best way to deal with a general directed graph is to deal with its DAG representation. Therefore, finding all \(\mathsf {SCC}\) s in a directed graph \(G\) is a critical operation. The existing in-memory algorithms based on depth first search (DFS) can find all \(\mathsf {SCC}\) s in linear time with respect to the size of a graph. However, when a graph cannot reside entirely in the main memory, the existing external or semi-external algorithms to find all \(\mathsf {SCC}\) s have limitation to achieve high I/O efficiency. In this paper, we study new I/O-efficient semi-external algorithms to find all \(\mathsf {SCC}\) s for a massive directed graph \(G\) that cannot reside in main memory entirely. To overcome the deficiency of the existing DFS-based semi-external algorithm that heavily relies on a total order, we explore a weak order based on which we investigate new algorithms. We propose a new two-phase algorithm, namely, tree construction and tree search. In the tree construction phase, a spanning tree of \(G\) can be constructed in bounded number of sequential scans of \(G\) . In the tree search phase, it needs to sequentially scan the graph once to find all \(\mathsf {SCC}\) s. In addition, we propose a new single-phase algorithm, which combines the tree construction and tree search phases into a single phase, with three new optimization techniques. They are early acceptance, early rejection, and batch processing. By the single-phase algorithm with the new optimization techniques, we can significantly reduce the number of I/Os and the CPU cost. We prove the correctness of the algorithms. We conduct extensive experimental studies using 4 real datasets including a massive real dataset and several synthetic datasets to confirm the I/O efficiency of our approaches.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700