面向Web的企业竞争情报获取研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

面向Web的企业竞争情报获取研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on the Acquirement of Enterprise Competitive Intelligence in the Web
作者：赵洁
论文级别：博士
学科专业名称：管理科学与工程
中文关键词：竞争情报 ; Web ; 获取 ; 关系抽取
英文关键词：Competitive intelligence ; Web ; Acquirement ; Relation extraction
学位年度：2013
导师：梁樑
学科代码：1201
学位授予单位：中国科学技术大学
论文提交日期：2013-05-01

摘要

在知识经济和信息化时代,竞争情报已成为企业的“第四核心竞争力”。随着Web技术的快速发展,如何从Web上及时有效地获取企业竞争情报成为竞争情报研究中的前沿问题。但已有的方法局限在“网页驱动”的网页搜集和文本搜索上,即主要借助搜索引擎或文本挖掘工具实现竞争情报搜集和分析。这种方式缺乏对Web信息的深度抽取和理解,所得到的抽取结果与用户的实际需求脱节,阻碍了Web环境下企业竞争情报理论与应用的进一步发展。
     本论文围绕Web环境下的企业竞争情报搜集需求,针对目前面向Web的企业竞争情报获取中存在的关键问题,重点研究Web环境下企业竞争情报的表示模型以及基于不同视角的Web环境下企业竞争情报获取方法。Web可以看成是由Web网页、Web网站以及Web用户构成的一个信息资源平台。基于此观点,我们着重研究了基于Web网页的企业竞争情报获取方法、基于Web网站的企业竞争情报获取方法、基于Web日志的企业竞争情报获取方法,以期能够构建系统性的面向Web的企业竞争情报获取框架,为Web环境下的企业竞争情报研究与实际应用奠定基础。
     总体而言,本论文的主要工作和贡献可总结为以下几个方面：
     (1)研究了Web环境下的企业竞争情报语义问题,提出并建立了基于实体的Web环境下企业竞争情报表示模型,在此基础上提出了一个基于实体和关系抽取的Web环境下企业竞争情报获取框架。
     (2)研究了面向Web网页的企业竞争情报获取问题,提出了企业商业关系的一个分类框架,并提出了一种基于句子时态标注的企业收购关系抽取方法,获得了较好的抽取效果。
     (3)研究了面向Web网站的企业竞争情报获取问题,给出了一种利用Web网站信息进行企业竞争对手分析的框架并进行了实证研究,为Web竞争情报获取与分析提供了新的思路。
     (4)研究了基于Web日志的企业竞争情报获取问题,给出了利用Web用户行为日志进行竞争对手分析的框架,并基于实际的互联网用户行为日志分析了电子商务企业之间的竞争关系,为Web竞争情报获取与分析提供了新的参考。
Competitive intelligence has been the fourth core competitiveness at the age of knowledge economics and informatization. With the rapid development of Web technologies, how to effectively and efficiently acquire enterprise competitive intelligence from the Web has been a new topic in the research on competitive intelligence. However, previous approaches were restricted on the Web-page-driven ways and focused on collecting Web pages and perform textual Web search. According to those approaches, competitive intelligence were mainly collected and analyzed by means of Web search engines or text mining tools. As previous ways did not conduct deeply understanding and extraction on Web information, there will consequently be a big gap bwteen the extracted results and users'needs. Such situations have hindered the further development of the theories and applications of enterprise competitive intelligence in the Web environment.
     In this dissertation, we aim at satisfying the requirements of collecting enterprise competitive intelligence in the Web environment and solving the critical issues existing in this procedure. In particular, we concentrate on the representation model of the enterprise competitive intelligence in the Web environment, as well as the methods to acquire enterprise competitive intelligence in the Web with respect to different viewpoints. As Web can be regarded as a platform of information resource which involves Web pages, Web sites, and Web users, we conduct our research on competitive intelligence acquirement based on three viewpoints, i.e., the Web-page-based viewpoint, the Web-site-based viewpoint, and the Web-log-based viewpoint. Our algorithms are expected to present a systematic framework for the acquirement of enterprise competitive intelligence in the Web, and thus to form the foundation for the future research on Web-oriented enterprise competitive intelligence and applications.
     In general, the contributions of the dissertation can be summarized as follows:
     (1) We study the semantics of the enterprise competitive intelligence in the Web environment, and propose an entity-based representation model for Web-based enterprise competitive intelligence. Based on this model, we develop a framework to extract competitive intelligence from the Web. Our framework is founded on entity recognition and relation extraction, and provides a fundamental solution to the extraction of Web-based competitive intelligence.
     (2) We study the issues on acquiring enterprise competitive intelligence from Web pages, and present a framework to describe the business relations of competitors. After that, a new algorithm to extract company acquirement relations from Web pages is proposed, which is based on the tense labeling for sentences in the pages. The experimental results demonstrate its effectiveness.
     (3) We research the issues on acquiring enterprise competitive intelligence from the Web-site viewpoint, and propose a process as well as an example to analyze competitors'intelligence by utilizing Web sites information. This study can offer new insights to the acquirement and analysis of Web-based enterprise competitive intelligence.
     (4) We explore the extraction of competitors'intelligence from the logs of Web users, and present a process model to perform competitor analysis by suing the behavior logs of Web users. We take the electronic business area as an example and conduct comparable analysis on the typical electronic business companies on the basis of real behavior logs from Internet users. This research provides a new way to analyze competitor intelligence, and is of referential values to the acquirement and analysis of Web-based enterprise competitive intelligence.

引文

[1]竞争情报：企业第四竞争力[EB/OL],世界商业评论,http://bi.icxo.com/htmlnews/2006/04/17/-840444 O.htm
    [2]中国科技情报学会竞争情报分会,2009中国竞争情报实态调查报告[C],见《挑战与创新一竞争情报方法与实践》,第15届中国竞争情报年会论文集,江西,2009.
    [3]S. Thompson, C. Y. Wing, Assessing the Impact of Using the Internet for Competitive Intelligence [J], Information & Management,2001,39(1):67-83.
    [4]吴晓伟,宋文官,徐福缘,竞争情报软件发展现状和趋势研究[J],情报杂志,(6),pp.2-5,2006
    [5]L. Kahaner, Competitive Intelligence [M], New York:Simon & Schuster,1996
    [6]包昌火,谢新洲,企业竞争情报系统[M],北京：华夏出版社,2002
    [7]缪其浩,探索者言：缪其浩情报著作自选集[M],上海：上海科学技术文献出版社,2008
    [8]王知津,竞争情报[M],北京：科学技术文献出版社,2005
    [9]D. G. Gregg, S. Walczak, Exploiting the Information Web [J], IEEE Transactions on Systems, Man and Cybernetics, Part C:Applications and Reviews,2007,37(1):109-125.
    [10]B. Liu, eds. Web Data Mining:Exploring Hyperlink, Content and Usage Data [M], Springer,2007
    [11]乔林,糜仲春,刘亮,张群,多关键词组合加权文献检索方法研究[J],情报学报,2006,25(4)：420-427.
    [12]张玉峰,朱莹,基于Web文本挖掘的企业竞争情报获取方法研究[J],情报理论与实践,2006,29(5)：563-566.
    [13]唐涛,张玉峰,基于数据挖掘的企业竞争情报智能采集模型研究[J],情报科学,2007,25(10)：1575-1579.
    [14]刘晓红,单晓红,数据挖掘在竞争情报系统中的应用[J],管理学报,2005,2(2)：129-130.
    [15]J. Froelich, S. Ananyan, D. L. Olson, Business Intelligence through Text Mining [J], Business Intelligence Journal,2005, (10).
    [16]A. Mikroyannidis, B. Theodoulidis, A. Persidis, PARMENIDES:Towards Business Intelligence Discovery from Web Data [C], Proc. Of IEEE/WIC/ACM International Conference on Web Intelligence,2006,1057-1060.
    [17]余丰,朱东华,信息抽取技术在竞争情报研究中的应用[J],情报杂志,2006,(3)：25-27.
    [18]鞠英杰,网络竞争情报研究——竞争者网站的挖掘与监测[J],情报理论与实践,2005,28(2)：215-218.
    [19]吴金红,张玉峰,王翠波,基于本体的竞争情报采集模型研究[J],情报理论与实践,2007,30(5)：577-581.
    [20]P. Srinivasan, F. Menczer, G. Pant, A General Evaluation Framework for Topical Crawlers [J]. Information Retrieval,2005,8(3):417-447.
    [21]F. Menczer, G. Pant, P. Srinivasan, Topical Web Crawlers:Evaluating Adaptive Algorithms [J]. ACM Transaction on Internet Technology,2004,4(4):378-419.
    [22]A. Maguitman, D. Leake, T. Reichherzer, et al, Dynamic Extraction of Topic Descriptors and Discriminators:Towards Automatic Context-Based Topic Search [C], Proc. Of 13th International Conference on Information and Knowledge Management (CIKM'06), ACM,2004
    [23]吴金红,张玉峰,王翠波,面向主题的网络竞争情报采集系统[J],现代图书情报技术,2006,(12)：54-58.
    [24]M. Sundheim. Named Entity Task Definition. Version 2.1 [C]. Proc. Of the Sixth Message Understanding Conference, Columbia, Maryland, USA, Morgan Kaufmann,1995,319-332.
    [25]ACE (Automatic Content Extraction) [EB/OL], http://www.ldc.upenn.edu/Projects/ACE/,
    [26]张晓艳,王挺,陈火旺,命名实体识别研究[J],计算机科学,2005,32(4)：44-48.
    [27]Y. Zhang, J. Zhou, A Trainable Method for Extracting Chinese Entity Names and Their Relations[C]. In Proc. of the Second Chinese Language Processing Workshop, Association for Computational Linguistics, Hong Kong,2000,66-72.
    [28]H. Chen, Y. Ding, C. Shih, Description of the NTU System Used for MET-2 [C]. In Proc. of the Seventh Message Understanding Conference,1998, M98-1017.
    [29]王睿,张洁,张由仪等.基于混合模型的中文命名实体抽取系统[J].清华大学学报(自然科学版),2005,45(S1)：1908-1914.
    [30]H. Zhang, Q. Liu, X. Cheng, et al., Chinese Lexical Analysis using Hierarchical Hidden Markov Model [C]. In Proc. of the Second SIGHAN Workshop on Chinese Language Processing, Association for Computational Linguistics, Sapporo Japan,2003,63-70.
    [31]T. Tsai, S. Wu, C. Lee, et al., Muncius:A Chinese Named Entity Recognizier Uising the Maximum Entropy-based Hybird Model [J]. International Journal of Computational Linguistics and Chinese Language Processing,2004,9(1):65-82
    [32]H. Hu, H. Zhang, Chinese Named Entity Recognition with CRFs:Two Levels [C]. In Proc. Of 2008 International Conference on Computational Intelligence and Security, IEEE CS Press,2008,1-6.
    [33]Y. Wu, J. Zhao, B. Xu, Chinese Named Entity Recognition Combing a Statistical Model with Human Knowledge [C], Proc. of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP),2005,427-434.
    [34]N. Chinchor, E. Marsh, MUC-7 Information Extraction Task Definition (version 5.1) [C], Proc. of the Seventh Message Understanding Conference,1998.
    [35]S. Brin, Extracting Patterns and Relations from the World-Wide Web [C], Proc. of the 1998 International Workshop on the Web and Databases (WebDB'98),1998,172-183.
    [36]E. Agichtein, L. Gravano, J. Pavel, et al., Snowball:A Prototype System for Extracting Relations from Large Text Collections [C], Proc. Of SIGMOD, ACM,2001,612
    [37]C. Aone, M. Ramos-Santacruze, REES:A Large-Scale Relation and Event Extraction System [C], Proc. of the 6th Applied Natural Language Processing Conference, Association for Computational Linguistics Stroudsburg, PA, USA,2000,76-83.
    [38]J. Iria, T-Rex:A Flexible Relation Extraction Framework [C], Proc. of the 8th Annual Colloquium for the UK Special Interest Group for Computational Linguistics (CLUK'05), Manchester,2005.
    [39]M. Iria, P. Ciravegna, Relation Extraction for Mining the Semantic Web [C], Proc. of Machine Learning for the Semantic Web Dagstuhl Seminar 05071, Dagstuhl,2005.
    [40]N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods [M]. Cambridge University Press,2000.
    [41]J. Lafferty, A. McCallum, F. Pereira, Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data [C], Proc. Of ICML,2001,282-289.
    [42]H. Lodhi, C. Saunders, J. Shawe-Taylor, et al., Text Classification using String Kernels [J]. The Journal of Machine Learning,2002,2:419-444.
    [43]D. Haussler, Convolution Kernels on Discrete Structures [R]. Technical Report, Department of Computer Science, University of California at Santa Cruz,1999.
    [44]R. Huang, L. Sun, Y. Feng, Study of Kernel-Based Methods for Chinese Relation Extraction [C], Proc. Of AIRS,2008,598-604
    [45]F. Reichartz, H. Korte, G. Paass, Dependency Tree Kernels for Relation Extraction from Natural Language Text [C], Proc. of ECML/PKDD,2009,270-285
    [46]赵洁,基于关系抽取的企业竞争情报获取与融合框架[J],情报学报,2010,29(2)：377-384.
    [47]赵洁.Web竞争情报可信性评价：问题分析与研究框架[J],情报学报,2010, 29(4)：586-596.
    [48]Microsoft at Wikipedia [EB/OL], http://en.wikipedia.org/wiki/Microsoft
    [49]王珊,萨师煊,数据库系统概论(第4版)[M],北京：高等教育出版社,2007
    [50]J. LaMar, Competitive Intelligence Survey Report [EB/OL], http://joshlamar.com/documents/ CIT Survey Report.pdf
    [51]IBM:Bring big data to the enterprise. [EB/OL],. http://www-01.ibm.com/software/ data/bigdata/
    [52]C. Hsinchun, Business and Market Intelligence 2.0, Part 2 [J]. IEEE Intelligent Systems,2010, 25(2):74-82
    [53]E. Agichtein, L. Gravano, Snowball:Extracting Relations from Large Plain-Text Collections [C], Proc. of ICDL, New York:ACM,2000,85-94
    [54]Y. Liu, P. Jin, L. Yue, Extracting Position Relations from the Web [C], Proc. of WIDM, USA: ACM NewYork,2009,59-62
    [55]Z. Ma, Discovering Company Revenue Relations from News:A Network Approach [J], Decision Support Systems,2009,4(47):408-414
    [56]S. Bao, R. Li, Y. Yu, et al., Competitor Mining with the Web [J], IEEE Trans. Knowledge & Data Engineering (TKDE),2008,20(10):1297-1310
    [57]张奇,金培权,岳丽华.基于CRF的网页动态关系抽取研究[J],中国科学技术大学学报,2010,11(40)：1197-1202
    [58]D. Zelenko, C. Aone, A. Richardella, Kernel Methods for Relation Extraction [J]. Journal of Machine Learning Research,2003,3(6):1083-1166
    [59]A. Culotta, J. Sorensen, Dependency Tree Kernels for Relation Extraction [C]. Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics,2004,423-429
    [60]G. Zhou, J. Su, J. Zhang, et al., Exploring Various Knowledge in Relation Extraction [C]. Proc. Of ACL,USA:ACL,2005,427-434
    [61]B. Xi, L. Qian, G. Zhou, et al., The Application of Combined Linguistic Features in Semantic Relation Extraction [J]. Journal of Chinese information processing,2008,22(3):44-49
    [62]ACE (Automatic Content Extraction) English Annotation Guidelines for Relations [EB/OL], Version 6.2, http://www.ldc.upenn.edu/Projects/ACE/
    [63]The Stanford NLP Group [EB/OL], http://nlp.stanford.edu/ner/index.shtml
    [64]吴晓伟,刘仲英,李丹,竞争情报研究的创新途径——基于社会网络分析的观点[J],情报学报,2008,27(2)：295-302
    [65]吴晓伟,徐福缘,吴伟昶,基于神经网络的企业竞争对手分析[J],情报学报,2004,23(4)：502-506
    [66]吴晓伟,徐福缘,宋文官,基于人际网络节点中心度的竞争对手分析[J],情报学报,2006,25(1)：122-128
    [67]吴晓伟,宋文官,徐福缘,竞争情报软件发展现状和趋势研究[J],情报杂志,2006,6：2-5
    [68]柯贤能,基于金融网站的竞争对手分析[J],情报科学,2006,25(4)：543-547
    [69]鞠英杰,网络竞争情报研究——竞争者网站的挖掘与监测[J].情报理论与实践,2005,28(2)：215-218
    [70]倪志宏,竞争对手分析“三板斧”[J],竞争情报,2009,1：22-29
    [71]周秋菊,周在峰,基于共链分析图形化描述纸业企业竞争地位[J],图书情报工作,2008,52(8)：62-65
    [72]基础链接术语[EB/OL]http://www.seo.com.cn/links-terms.html,2009.
    [73]杜建栋,赵志强.移动互联网用户价值动态分析模型[J],微计算机应用,2011,32(3)：1-6.
    [74]裴大容,Web日志挖掘技术在电子商务网站优化中的应用[J].科技经济市场.2009,(7)：11-12.
    [75]张新香.Web日志挖掘在电子商务中的应用研究[J].计算机系统应用.2006,(1)：52-55.
    [76]罗隽,魏品帅,贺贵明.基于UAP-T的网络日志挖掘技术在电子商务中的应用[J],计算机应用,2003,23(5)：55-57.
    [77]Apple Denies Steve Jobs Heart Attack Report:"It Is Not True" [EB/OL], http://www.businessinsider.com/2008/10/apple-s-steve-jobs-rushed-to-er-after-heart-attack-says-cnn-citizen-journalist.
    [78]赵洁,Web竞争情报获取中的可信性问题与博弈模型分析[C],第16届中国竞争情报年会,2010,48-62
    [79]张维迎.博弈论与信息经济学[M].上海：上海人民出版社,1996

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700