基于图模型的微博数据分析与管理

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于图模型的微博数据分析与管理

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Towards Microblog Data Analysis and Management Based on Graph Model
作者：赵斌
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：微博 ; 随机游走 ; 垃圾用户 ; 重用检测 ; 二分图 ; 图聚类
英文关键词：Microblog ; Random Walk ; Spammer ; Reuse Detection ; Bipartite Graph ; Graph Clustering
学位年度：2012
导师：周傲英
学科代码：081203
学位授予单位：华东师范大学
论文提交日期：2012-11-01
答辩委员会主席：李战怀

摘要

随着微博应用的蓬勃发展,越来越多的网络用户使用微博记录生活点滴、分享兴趣爱好和发表意见评论。与传统网络媒体数据相比,微博数据具有一些独特的特点,包括长度短、规模大、质量低、实时传播和社交网络等。因此对微博数据挖掘研究提出了一些挑战：(1)由于微博消息长度较短,传统的长文本的挖掘算法无法直接用于微博消息,因而需要研究适合微博的短文本挖掘算法；(2)微博消息是-种“用户生成内容”,消息文本中包含网络新词、拼写错误和缩写等。因而微博消息文本质量有限,仅依靠现有的自然语言处理技术是不够的,需要不断地发展新的文本处理技术解决上述问题；(3)微博数据规模巨大,要求数据挖掘算法具有高效性和可伸缩性；(4)微博除了含有大量的文本数据以外,还包含大量的非结构化数据,如社交网络关系。设计合理的存储策略和索引结构对于微博数据维护和算法性能提升至关重要。
     微博作为信息分享的社交网络平台,每当热点事件发生,众多网络用户借助微博平台参与讨论,发表评论观点和表达自身关切。大量的个人观点经过微博平台的汇聚融合形成群体观点,成为社会舆论的重要组成。因此群体观点挖掘成为了分析热点事件、洞察大众心理和了解公众观点的重要技术手段。可是现阶段微博中存在大量垃圾用户及其消息,这会直接影响群体观点挖掘的算法性能。所以应该在预处理阶段尽可能多地过滤掉垃圾用户及其消息。另外,微博不是一个固定不变的数据集,随着新的用户评论不断产生,如何维护更新群体观点挖掘的结果成为了必须面对的问题。而数据管理技术可以帮助提升挖掘算法的执行效率和算法性能。
     本文对微博数据挖掘的三个基本问题进行研究,分别是反垃圾处理、群体观点挖掘和二分图数据管理。主要贡献有以下几点：
     1.针对微博中的反垃圾消息处理问题,提出了基于重用检测模型的垃圾用户检测算法,该方法综合考虑了消息序列中文本相关性和时间相关性,对垃圾用户的行为强度进行了有效建模。按照检测策略不同,基于重用检测模型的检测算法分为语句级别检测(SRD)和词项级别检测(TRD)。 SRD算法侧重于用户行为方式,而TRD算法着重于垃圾消息的主题信息。基于真实数据集的实验表明,SRD算法在整体性能上优于TRD算法,但TRD算法可以发现被SRD算法忽略的潜在垃圾用户。最后,采用重用检测算法对用户转发行为进行检测,基于转发关系实现垃圾用户的群体发现。
     2.为了研究微博中的群体观点挖掘,提出了一种基于“词项-消息-用户”的TWU图模型,该图模型结合了文本内容、时间因素和社交网络三种关键特征,对微博用户行为进行有效建模。不同于以往结合时间因素的图模型,TWU模型把时戳作为边的属性来处理,而不是单独的时戳结点。这样避免了时戳结点成为高度数结点后带来的计算瓶颈问题。相应地,基于TWU模型提出了时间敏感性随机游走算法TSRW,对词项相关性进行度量进而挖掘群体观点。实验表明TSRW算法明显优于其他基准测试算法,并且通过可视化技术展示了挖掘结果。此外,针对图数据挖掘中的增量式计算问题进行了初步的研究,因为在一个不断演变的图数据集上每次重新挖掘群体观点是不现实的。所以提出了增量式的随机游走算法,可以及时更新维护群体观点挖掘的结果。
     3.针对二分图的数据管理问题,总结了二分图上的基本原子操作,定义了原子操作的代数表达形式。提出了基于极大星型图的原子操作实现方法,并且理论证明了此方法的可行性。为了支持二分图上的查询和分析任务,提出了基于星型图的数据存储策略和索引结构。
     总而言之,本文研究了反垃圾处理、群体观点挖掘和二分图数据管理三个基本问题。实验采用真实微博数据集进行算法测试,实验结果验证了本文所提出的算法是有效的和可行的。
A microblog is a popular Web2.0system, such as Twitter and Sina Wcibo. It allows users to post short messages, also known as tweets, which have up to140characters. However tweets cover a wide variety of content, ranging from break-ing news, discussion, personal life, activities and interests. Microblogs have been a broadcast medium expressing public opinion. Towards hot events, microblogs usu-ally collect diverse and abundant thoughts, comments and opinions from various individual viewpoints in a short period. In the end such individual viewpoints will converge into several collective ones.
     In this thesis, we aim to study microblogging mining techniques based on graph model. Compared with traditional medium, microblogs own several distinguishing characteristics, such as short length, massive size, low quality, real-time nature, so-cial networking. Microblogs pose several challenges with regard to its characteristics. First, tweets are deficient in statistical and linguistic features due to short length. The existing methods for mining long text corpus are not suitable for microblogs. Second, microblog messages contain all kinds of noisy data like typos, ad hoc ab-breviations, phonetic substitutions and so on. Thus these will adversely affect NLP (Natural Language Processing) tool processing. Third, owing to the massive size of microblogs, the proposed approaches need to guarantee both the scalability and efficiency. Finally, microblogs embed not only massive text messages, but also large numbers of unstructured data, such as social networking based on graph model. The key challenge is the efficiency of mining algorithms. Without the appropriate disk block design and indexing structures, microblog mining algorithms will be not efficient.
     To summarize, our main contributions are as follows:
     ·Tremendous increase of spam has become a serious problem. We aim to detect spammer community by means of retweeting relationship. Firstly, we define a new function for rating the intensity of spammer behaviours. We then pro-pose two spam detection algorithms based on reuse detection model. One is sentence-level detection algorithm, the other is term-level one. The sentence- level detection algorithm prefers the behaviour pattern of spammers and ignores the topic of spam messages. The term-level detection algorithm focuses the topic of spam messages and compensates for lack of sentence-level one.
     ·In order to identify collective viewpoints, we propose a Term-Tweet-User graph, which simultaneously incorporates text content, temporal information and com-munity structure, to model postings over time. Based on such model, we pro-pose Time-Sensitive Random Walk to effectively measure the relevance between pairs of terms through considering temporal aspects, and then group terms into collective viewpoints. Additionally, we propose Incremental Random Walk to recompute relevance between nodes incrementally and efficiently.
     ·Bipartite graph data management (BGDM) is an important issue. Firstly we present the common atomic operators in BGDM, which can be implemented using max-stars. We then discuss a bipartite graph block structure in detail and the relevant query algorithms, which utilize Bloom filter to avoid loading the whole block for star vertex queries.
     Finally, extensive experimental results conducted on real data collected from mi-croblogs demonstrated that our proposal outperforms the state-of-the-art approach-es.

引文

[1]Http://www.twitter.com
    [2]Http://www.weibo.com
    [3]Http://t.qq. com
    [4]中国微博元年市场白皮书
    [5]Takeshi Sakaki, Makoto Okazaki, Yutaka Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors. WWW.2010
    [6]Liangjie Hong, Amr Ahmed, Siva Gurumurthy, Alexander J. Smola, Kostas Tsiout-siouliklis. Discovering geographical topics in the twitter stream. WWW.2012, 769-778
    [7]Younghoon Kim, Kyuseok Shim. TWITOBI:A Recommendation System for Twit-ter Using Probabilistic Modeling. ICDM.2011,340-349
    [8]Aditya Pal, Scott Counts. Identifying topical authorities in microblogs. WSDM. 2011,45-54
    [9]Xiaolong Wang, Furu Wei, Xiaohua Liu, Ming Zhou, Ming Zhang. Topic sentiment analysis in twitter:a graph-based hashtag sentiment classification approach. CIKM. 2011,1031-1040
    [10]Daniel M. Romero, Brendan Meeder, Jon M. Kleinberg. Differences in the mechan-ics of information diffusion across topics:idioms, political hashtags, and complex contagion on twitter. WWW.2011,695-704
    [11]Qiming Diao, Jing Jiang, Feida Zhu, Ee-Peng Lim. Finding Bursty Topics from Microblogs. ACL (1).2012,536-544
    [12]Alan Ritter, Mausam, Oren Etzioni, Sam Clark. Open domain event extraction from twitter. KDD.2012,1104-1112
    [13]Brendan Meeder, Brian Karrer, Amin Sayedi, R. Ravi, Christian Borgs, Jennifer T. Chayes. We know who you followed last summer:inferring social link creation times in twitter. WWW.2011,517-526
    [14]Dawei Yin, Liangjie Hong, Brian D. Davison. Structural link analysis and prediction in microblogs. CIKM.2011,1163-1168
    [15]Fabricio Benevenuto, Tiago Rodrigues, Virgflio A. F. Almeida, Jussara M. Almeida, Marcos Andre Goncalves. Detecting spammers and content promoters in online video social networks. SIGIR.2009,620-627
    [16]http://support.twitter.com/entries/18311-the-twitter-rules
    [17]http://blog.twitter.corn/2012/02/coming-soon-twitter-advertising-for.html
    [18]Chao Yang, Robert Chandler Harkrea-der, Jialong Zhang, Seungwon Shin, Guofei Gu. Analyzing spammers' social networks for fun and profit:a case study of cyber criminal ecosystem on twitter. WWW.2012,71-80
    [19]http://baike.baidu.com/view/4604380.htm
    [20]Shixia Liu, Michelle X. Zhou, Shimei Pan, Weihong Qian, Weijia Cai, Xiaoxiao Lian. Interactive, topic-based visual text summarization and analysis. CIKM.2009, 543-552
    [21]Yiming Yang, Thomas Pierce, Jaime G. Carbonell. A Study of Retrospective and On-Line Event Detection. SIGIR.1998
    [22]Ying Zhao, George Karypis. Evaluation of hierarchical clustering algorithms for document datasets. CIKM.2002
    [23]Zhicong Cheng, Bin Gao, Congkai Sun, Yanbing Jiang, Tie-Yan Liu. Let web spam-mers expose themselves. WSDM.2011,525-534
    [24]Steve Webb, James Caverlee, Calton Pu. Predicting web spam with HTTP session information. CIKM.2008,339-348
    [25]Yi-Min Wang, Ming Ma, Yuan Niu, Hao Chen. Spam double-funnel:connecting web spammers with advertisers. WWW.2007,291-300
    [26]Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock, Fabrizio Sil-vestri. Know your neighbors:web spam detection using the web topology. SIGIR. 2007,423-430
    [27]Haixuan Yang, Irwin King, Michael R. Lyu. DiffusionRank:a possible penicillin for web spamming. SIGIR.2007,431-438
    [28]Baoning Wu, Vinay Goel, Brian D. Davison. Topical TrustRank:using topicality to combat web spam. WWW.2006,63-72
    [29]Alexandros Ntoulas, Marc Najork, Mark Manasse, Dennis Fetterly. Detecting spam web pages through content analysis. WWW.2006,83-92
    [30]Zoltan Gyongyi, Pavel Berkhin, Hector Garcia-Molina, Jan O. Pedersen. Link Spam Detection Based on Mass Estimation. VLDB.2006,439-450
    [31]Guoyang Shen, Bin Gao, Tie-Yan Liu, Guang Feng, Shiji Song, Hang Li. Detecting Link Spam Using Temporal Information. ICDM.2006,1049-1053
    [32]Baoning Wu, Brian D. Davison. Identifying link farm spam pages. WWW (Special interest tracks and posters).2005,820-829
    [33]Zoltan Gyongyi, Hector Garcia-Molina. Link Spam Alliances. VLDB.2005,517-528
    [34]Zoltan Gyongyi, Hector Garcia-Molina, Jan 0. Pedersen. Combating Web Spam with TrustRank. VLDB.2004,576-587
    [35]Godwin Caruana, Maozhen Li. A survey of emerging approaches to spam filtering. ACM Comput Surv.2012,44(2):9
    [36]Anirban Dasgupta, Maxim Gurevich, Kunal Punera. Enhanced email spam filtering through combining similarity graphs. WSDM.2011,785-794
    [37]Gordon V. Cormack, Aleksander Kolcz. Spam filter evaluation with imprecise ground truth. SIGIR.2009,604-611
    [38]Ming wei Chang, Wen tau Yih, Christopher Meek. Partitioned logistic regression for spam filtering. KDD.2008.97-105
    [39]Ian Fette, Norman M. Sadeh, Anthony Tomasic. Learning to detect phishing emails. WWW.2007,649-656
    [40]Thomas R. Lynam, Gordon V. Cormack, David R. Cheriton. On-line spam filter fusion. SIGIR.2006,123-130
    [41]Einat Minkov, William W. Cohen, Andrew Y. Ng. Contextual search and name disambiguation in email using graphs. SIGIR.2006,27-34
    [42]Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou. Scalable discovery of hidden emails from large folders. KDD.2005,544-549
    [43]Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang. Email data cleaning. KDD.2005, 489-498
    [44]Shlomo Hershkop, Salvatore J. Stolfo. Combining email models for false positive reduction. KDD.2005,98-107
    [45]Manu Aery, Sharma Chakravarthy. eMailSift:Email Classification Based on Struc-ture and Content. ICDM.2005,18-25
    [46]Paul-Alexandru Chirita, Jorg Diederich, Wolfgang Nejdl. MailRank:using ranking for spam detection. CIKM.2005,373-380
    [47]Kenichi Yoshida, Fuminori Adachi, Takashi Washio, Hiroshi Motoda,, Teruaki Hom-ma, Akihiro Nakashima, Hiromitsu Fujikawa, Katsuyuki Yamazaki. Density-based spam detector. KDD.2004,486-493
    [48]Gordon V. Cormack, Thomas R. Lynairi. Online supervised spam filter evaluation. ACM Trans Inf Syst.2007,25(3)
    [49]Joshua Goodman, Wen tan Yih. Online Discriminative Spam Filter Training. CEAS. 2006
    [50]Nitin Jindal, Bing Liu. Opinion spam and analysis. WSDM.2008,219-230
    [51]Myle Ott, Yejin Choi,Claire Cardie, Jeffrey T. Hancock. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. ACL.2011,309-319
    [52]Ravi Kant, Srinivasan H. Sengamedu, Krishnan S. Kumar. Comment spam detection by sequence mining. WSDM.2012,183-192
    [53]Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, Hady Wirawan Lauw. De-tecting product review spammers using rating behaviors. CIKM.2010,939-948
    [54]Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu. Review Graph Based Online Store Review Spammer Detection. ICDM.2011,1242-1247
    [55]Arjun Mukherjee, Bing Liu, Natalie S. Glance. Spotting fake reviewer groups in consumer reviews. WWW.2012,191-200
    [56]Haewoon Kwak, Changhyun Lee, Hosung Park, Sue B. Moon. What is Twitter, a social network or a news media? WWW.2010,591-600
    [57]Saptarshi Ghosh, Gautam Korlam, Niloy Ganguly. Spammers' networks within online social networks:a case-study on Twitter. WWW (Companion Volume). 2011,41-42
    [58]Kyumin Lee, James Caverlee, Steve Webb. Uncovering social spammers:social honeypots+ machine learning. SIGIR.2010,435-442
    [59]http://www.myspace.com
    [60]Oktie Hassanzadeh, Fei Chiang, Renee J. Miller,Hyun Chul Lee.Framework for Evaluating Clustering Algorithms in Duplicate Detection. PVLDB.2009,2(1):1282-1293
    [61]Javed A. Aslam, Ekaterina Pelekhov, Daniela Rus. The Star Clustering Algorithm for Static and Dynamic Information Organization. J Graph Algorithms Appl.2004, 8:95-129
    [62]Nikhil Bansal, Avrim Blum, Shuchi Chawla. Correlation Clustering. Machine Learn-ing.2004,56(1-3):89-113
    [63]Nilesh Bansal, Fei Chiang, Nick Koudas, Frank Win. Tompa. Seeking Stable Clusters in the Blogosphere. VLDB.2007,806-817
    [64]Gary William Flake, Robert Endre Tarjan, Kostas Tsioutsiouliklis. Graph Cluster-ing and Minimum Cut Trees. Internet Mathematics.2003, 1(4):385-408
    [65]Taher H. Ha.veliwa.la, Aristides Gionis, Piotr Indyk. Scalable Techniques for Clus-tering the Web. WebDB (Informal Proceedings).2000,129-134
    [66]Mauricio A. Hernandez, Salvatore J. Stolfo. Real-world Data is Dirty:Data Cleans-ing and The Merge/Purge Problem. Data Min Knowl Discov.1998,2(1):9-37
    [67]You Jung Kim, Jignesh M. Patel. A framework for protein structure classification and identification of novel protein structures. BMC Bioinformatics.2006,7:456
    [68]Derry Tanti Wijaya, Stephane Bressan. Ricochet:A Family of Unconstrained Al-gorithms for Graph Clustering. DASFAA.2009,153-167
    [69]Qi Zhang, Yue Zhang, Haomin Yu, Xuanjing Huang. Efficient partial-duplicate detection based on sequence matching. SIGIR.2010,675-682
    [70]Sadhan Sood, Dmitri Loguinov. Probabilistic near-duplicate detection using simhash. CIKM.2011,1117-1126
    [71]Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma. Detecting near-duplicates for web crawling. WWW.2007,141-150
    [72]Kristof Beyls, Erik H. D'Hollander. Reuse Distance-Based Cache Hint Selection. Euro-Par.2002,265-274
    [73]Chen Ding, Yutao Zhong. Predicting whole-program locality through reuse distance analysis. PLDI.2003,245-257
    [74]Jangwon Seo, W. Bruce Croft. Local text reuse detection. SIGIR.2008,571-578
    [75]Jong Wook Kim, K. Selcuk Candan, Jun'ichi Tatemura. Efficient overlap and con-tent reuse detection in blogs and online news articles. WWW.2009,81-90
    [76]Alberto Barron-Cedeno. On the mono-and cross-language detection of text reuse and plagiarism. SIGIR.2010,914
    [77]Qi Zhang, Yan Wu, Zhuoye Ding, Xuanjing Huang. Learning hash codes for efficient content reuse detection. SIGIR.2012,405-414
    [78]Uwe Draisbach, Felix Naumann, Sascha Szott, Oliver Wonneberg. Adaptive Win-dows for Duplicate Detection. ICDE.2012,1073-1083
    [79]Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen. Document Summariza-tion Using Conditional Random Fields. IJCAI.2007,2862-2867
    [80]John M. Conroy, Dianne P. O'Leary. Text Summarization via Hidden Markov Mod-els. SIGIR.2001,406-407
    [81]Liangda Li, Ke Zhou, Gui-Rong Xue, Hongyuan Zha, Yong Yu. Enhancing diversity, coverage and balance for summarization through structure learning. WWW.2009, 71-80
    [82]Tadashi Nomoto, Yuji Matsumoto. A New Approach to Unsupervised Text Sum-marization. SIGIR.2001,26-34
    [83]Hongyuan Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. SIGIR.2002,113-120
    [84]Vasileios Hatzivassiloglou, Luis Gravano, Ankineedu Maganti. An investigation of linguistic features and clustering algorithms for topical document clustering. SIGIR. 2000,224-231
    [85]Jon M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. SODA. 1998,668-677
    [86]Sergey Brin, Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks.1998,30(1-7):107-117
    [87]Maofu Liu, Wenjie Li, Mingli Wu, Qin Lu. Extractive Summarization Based on Event Term Clustering. ACL.2007
    [88]Meishan Hu, Aixin Sun, Ee-Peng Lim. Comments-oriented blog summarization by sentence extraction. CIKM.2007,901-904
    [89]Yihong Gong, Xin Liu. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. SIGIR.2001,19-25
    [90]Hanghang Tong, Christos Faloutsos, Jia-Yu Pan. Fast Random Walk with Restart and Its Applications. ICDM.2006
    [91]Yang Zhou, Hong Cheng, Jeffrey Xu Yu. Graph Clustering Based on Struc-tural/Attribute Similarities. PVLDB.2009,2(1)
    [92]Prasanna Kumar Desikan, Nishith Pathak, Jaideep Srivastava, Vipin Kumar. Incre-mental page rank computation on evolving graphs. WWW (Special interest tracks and posters).2005
    [93]Hanghang Tong, Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos. Proximity Tracking on Time-Evolving Bipartite Graphs. SDM.2008
    [94]Yang Zhou, Hong Cheng, Jeffrey Xu Yu. Clustering Large Attributed Graphs:An Efficient Incremental Approach. ICDM.2010
    [95]Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos. Fast mining of complex time-stamped events. CIKM.2008
    [96]Liang Xiang, Quan Yuan, Shiwan Zhao, Li Chen, Xiatian Zhang, Qing Yang, Ji-meng Sun. Temporal recommendation on graphs via long-and short-term preference fusion. KDD.2010
    [97]Arun Qamra, Belle L. Tseng, Edward Y. Chang. Mining blog stories using community-based and temporal clustering. CIKM.2006
    [98]Raghu Ramakrishnan, Johannes Gehrke. Database Management Systems. McGraw-Hill,2002
    [99]Jinyan Li, Guimei Liu, Haiquan Li, Limsoon Wong. Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix:A One-to-One Correspondence and Mining Algorithms. IEEE Trans Knowl Data Eng.2007,19(12):1625-1637
    [100]Kelvin Sim, Jinyan Li, Vivekanand Gopalkrishnan, Guimei Liu. Mining Maximal Quasi-Bicliques to Co-Cluster Stocks and Financial Ratios for Value Investment. ICDM.2006,1059-1063
    [101]Bin Gao, Tie-Yan Liu, Xin Zheng, QianSheng Cheng, Wei-Ying Ma. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. KDD.2005,41-50
    [102]Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, Christos Faloutsos. Neighborhood Formation and Anomaly Detection in Bipartite Graphs. ICDM.2005,418-425
    [103]Lu Liu, Lifeng Sun, Yong Rui, Yao Shi, Shiqiang Yang. Web video topic discovery and tracking via bipartite graph reinforcement model. WWW.2008,1009-1018
    [104]Hongbo Deng, Michael R. Lyu, Irwin King. A generalized Co-HITS algorithm and its application to bipartite graphs. KDD.2009,239-248
    [105]Manjeet Rege, Ming Dong, Farshad Fotouhi. Co-clustering Documents and Words Using Bipartite Isoperimetric Graph Partitioning. ICDM.2006,532-541
    [106]Illhoi Yoo, Xiaohua Hu, Il-Yeol Song. Integration of semantic-based bipartite graph representation and mutual refinement strategy for bioinodical literature clustering. KDD.2006,791-796
    [107]Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, Hang Li. Context-aware query suggestion by mining click-through and session data. KDD. 2008,875-883
    [108]Nitin Jindal, Bing Liu. Analyzing and Detecting Review Spam. ICDM.2007,547-552
    [109]D. Sculley, Gabriel Wachman. Relaxed online SVMs for spam filtering. SIGIR. 2007,415-422
    [110]Burton H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun ACM.1970,13(7):422-426
    [111]http://weibo.com/z/guize/guiding.html
    [112]Piotr Indyk, Rajeev Motwani. Approximate Nearest Neighbors:Towards Removing the Curse of Dimensionality. STOC.1998,604-613
    [113]http://www.jd.com
    [114]Peter D.Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. ACL.2002
    [115]Minqing Hu, Bing Liu. Mining and summarizing customer reviews. KDD.2004
    [116]Qi Su, Xinying Xu, Honglei Guo, Zhili Guo, Xian Wu, Xiaoxun Zhang, Bin Swen, Zhong Su. Hidden sentiment association in chinese web opinion mining. WWW. 2008
    [117]Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press,2008
    [118]Gilbert Strang. Introduction to Linear Algebra. Wellesley Cambridge Press,2009
    [119]Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. Addison Wesley,2005
    [120]http://www.bbc.co.uk/news/world-asia-18718057
    [121]Http://wordnet.princeton.edu
    [122]http://circos.ca
    [123]http://www-rp.lip6.fr/-latapy/FV/generation.html
    [124]CNNIC第30次中国互联网络发展状况统计报告
    [125]Xueqing Gong, Ying Yan, Weining Qian, Aoying Zhou. Bloom Filter-based XML Packets Filtering for Millions of Path Queries. ICDE.2005,890-901
    [126]Ying Yan, Chen Wang, Aoying Zhou, Weining Qian, Li Ma, Yue Pan. Efficient Indices Using Graph Partitioning in RDF Triple Stores. ICDE.2009,1263-1266
    [127]Fang Hao, Murali S. Kodialam, T. V. Lakshman. Building high accuracy bloom filters using partitioned hashing. SIGMETRICS.2007,277-288

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700