用户名: 密码: 验证码:
Web graph modeling and its applications.
详细信息   
  • 作者:Chen ; Zhiming.
  • 学历:Ph.D.
  • 年:2013
  • 导师:Farago, Andras,eadvisorCobb, Jorge Arturoecommittee memberWu, Weiliecommittee memberZhang, Kangecommittee member
  • 毕业院校:The University of Texas
  • Department:Computer Science
  • ISBN:9781303136726
  • CBH:3564613
  • Country:USA
  • 语种:English
  • FileSize:3604313
  • Pages:126
文摘
The graph that describes the World Wide Web has been the focus of much recent attention. Quite a few stochastic models have been proposed to account for its various properties. We experiment with the emergence of cliques in web graph samples, and capture a previously ignored feature of the Web graph, revealing that clique sizes in the Web graph are power law distributed. Based on this property, we present a new model, called clique rank model of the Web graph that can faithfully represent the power law clique distribution, in addition to preserving other properties, such as power-law degree distribution, small-world effect, and the existence of many bipartite cliques. Since understanding the structure of the Web graph is crucial to effectively modeling of the Web, we also study graph visualization that facilitates users intuition in observing the Web structure. Many structural properties of a graph can be revealed by visualization so that direct comparisons between different Web graph models and real Web graph samples become possible. We develop a convenient visualization tool which helps users explore Web graphs in microscopic and macroscopic levels in a three dimensional space. We also use a structural mining method to specify isolated cliques or other meaningful structures in Web graphs. This tool provides an innovative perspective in visualization of large graphs. We further analyze the relation between the link structure and textual contents of web pages by showing that content similarity is proportional to hyperlink distance if other variables are unknown or random. We propose an efficient topic detection approach in which a pre-processing stage is introduced to analyze the link structure of Web pages in addition to existing Topic Detection and Tracking methods. In order to eliminate unimportant Web pages, we present a heuristic algorithm to find highly connected Web pages which will be processed further to identify hot topics. Experimental results on a popular discussion forum show that our approach helps to enhance the speed and quality of topic detection. Furthermore, we compare two topic detection approaches, and find that adding our pre-processing stage is more useful when applied to topic modeling methods, rather than clustering methods.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700