基于Web访问信息挖掘的推荐方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于Web访问信息挖掘的推荐方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Recommendation Approaches Based on Web Usage Mining
作者：王实
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：Web访问信息挖掘 ; 推荐 ; Web自适应 ; 个性化 ; 聚类 ; 分类 ; 马尔可夫模型 ; 隐马尔可夫模型 ; 协作筛 ; 关联规则发现 ; 导航模式发现 ; 互信息 ; 联机事务分析 ; 商业智能
英文关键词：Web Usage Mining ; Recommendation ; Adaptive Web ; Personalization ; Clustering ; Classification ; Markov Model ; Hidden Markov Model ; Collaborative Filtering ; Association Rule ; Navigation Pattern Discovering ; Mutual Information ; OLAP ; Business Intelligence
学位年度：2001
导师：高文
学科代码：081203
学位授予单位：中国科学院研究生院（计算技术研究所）
论文提交日期：2001-08-01

摘要

随着Internet和WWW的迅速发展，用户访问信息广泛、海量地遍及于其上，其从用户维、时间维、空间维、访问对象维等各方面详尽反映出用户的访问细节。通过对用户访问信息进行有效的数据挖掘，可以得到有关用户访问行为的知识，这些知识可以服务于Web站点的服务提供方和访问者。
     对服务提供方而言，他们需要好的自动辅助设计工具，根据用户的访问兴趣知识动态地调整页面拓扑结构，改进现有的信息服务，开展有针对性的电子商务以更好地满足访问者的需求。对访问者而言，他们希望看到的是个性化的页面，希望得到更好的满足各自需求的服务，希望从具有类似访问兴趣的其他一些用户的访问行为知识中得到有价值的启发。
     因此从大量的用户访问信息中，如何自动地、高效地提取这些知识，即Web访问信息知识发现，具有十分重要的现实意义，这也成为当前国际上受到广泛关注的，新兴的研究领域。
     本文在Web访问信息挖掘中的群体自适应领域和个性化这两个领域进行了研究，取得的主要研究成果为：
     1．Web站点的群体自适应领域：
     1）面向Web广播的聚类：为解决如何组织Web播出集，以利用宽带广播网进行Web广播这样一个问题，本文提出一种新的聚类方法WebClustering。通过使用这种聚类方法，可以得出一个有价值的Web页面播出集合，并且通过形成分层索引页面来帮助用户更好地访问这个Web页面集合。
     2）大项序列发现和互信息规则发现：
     为了挖掘用户访问的序列特性，本文提出一种新的大项序列发现方法。该方法定义了一种新的用户访问事务文法，用于挖掘用户访问的序列特性；为了发现用户访问相关主题域，本文提出一种新的利用互信息规则发现方法发现相关主题域的方法，在发现到的规则基础上，相应的聚类算法被提出以发现相关主题域。发现的大项序列和相关主题域可以帮助Web站点的设计者更好地理解用户的访问行为，用于调整Web站点的结构或者用于站点内知识的再分布。
     3）群体用户访问兴趣和兴趣导航行为模式发现：
     为了挖掘出用户的访问兴趣，本文提出一种新的挖掘群体用户在一个页面上兴趣分布程度的方法。该方法利用了用户访问所具有的目的性，即用户对某种概念的兴趣，以得到单个用户的访问兴趣，然后通过叠加单个用户的访问兴趣来最终得到群体用户在一个页面上的兴趣分布。由于这种兴趣分布可以和用
With the rapid growth in Internet and WWW, the user’browsing information is becoming enormous and pervasive, which represents the user access details with the user dimension, time dimension, space dimension, and access object dimension. Mining the user access information, we can obtain the knowledge about user access manners, which can be used for the service providers and users.
     For the service providers, they require the good automatic assistant design tools that can dynamically adjust web topology, improve service, and carry out personalization electronic commerce for users according to the user access interest knowledge. For users, they need the personalization web pages and special service and get the valuable recommendation from others who have similar interest. Therefore, how to automatically and effectively get the knowledge from the vast user access information, i.e. web usage mining, became a new and important research field in the world.
     This dissertation addresses the researches of the web adaptive research field and the personalization research field in web usage mining. The contribution of the dissertation is as follow:
     1. The web adaptive research field:
     1) Clustering on web broadcast: To resolve the information broadcast problems in broadband network, this dissertation presents a new web mining approach– WebClustering, which can make a new and valuable web broadcast set, and some layered index web pages generated to help users access the set.
     2) Discovering large sequences and mutual information rules:
     To mine the large sequences in user access sequences, this dissertation presents an approach that discovers the large sequence. It defines a kind of the new user access transaction grammar to get the user sequence access transactions from the user access transactions. To mine the mutuality themes, the mutual information discovering approach is used and a new clustering approach is given. The knowledge can help the web site designers deeply understand users’access in order to adjust the web site structures.

引文

[1] A standard for robot exclusion. http://info.webcrawler.com/mak/projects/robots-/norobots.html, 1994.
    [2] Accrue insight. http://www.accrue.com, 1999.
    [3] Agrawal R. Data mining: Crossing the chasm. Invited talk at the 5th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining(KDD99), 1999.
    [4] Agrawal R. and Srikant R.. Fast algorithms for mining association rules. In Proc. of the 20th VLDB Conference, , Santiago, Chile, 1994, pages 487~499.
    [5] Alan Wexelblat. History-Based Tools for Navigation. The Proceeding of the Hawai’i International Conference on System Sciences, January 5-8,1999,IEEE press.
    [6] Alexa internet. http://www.alexa.com, 2000.
    [7] Alladvantage. http://www.alladvantage.com, 1999.
    [8] Amir Zarkesh, Jafar Adibi, Cyrus Shahabi, Reza Sadri, and Vishal Shah. Analysis and design of server informative www-sites. In Sixth International Conference on Information and Knowledge Management, Las Vegas, Nevada, 1997.
    [9] Andromedia aria. http://www.andromedia.com, 1999.
    [10] Arnaud Sahuguet and Fabien Azavant. Wysiwyg web wrapper factory (w4f), 1999.
    [11] Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John T. Riedl. Application of Dimensionality Reduction in Recommender System - A Case vStudy. Proc of WEBKDD2000, Boston, MA, USA, August 20, 2000.
    [12] Balabanovic M. and Shoham Y. Learning information retrieval agents: Ex- periments with automated web browsing. In On-line Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments, 1995.
    [13] Balaji Padmanabhan and Alexander Tuzhilin. A belief-driven method for discovering unexpected patterns. In Fourth International Conference on
    [15] Bamshad Mobasher, Robert Cooley, and Jaideep Srivastava. Creating adaptive web sites through usage-based clustering of urls. In Knowledge and Data Engineering Workshop, 1999. Knowledge Discovery and Data Mining, , New York, New York, 1998, pages 94~100.
    [14] Baldwin J. F.. Evidential support logic programming. Fuzzy Sets and Systems, 1987, 24(1):1~26.
    [16] Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire. Discovery of Aggregate Usage Profiles for Web Personalization. Proc of WEBKDD2000, Boston, MA, USA, August 20, 2000.
    [17] Baum L.E. and Petrie T. Statistical inference for probabilistic functions of finite state. Ann. Math. Stat., 1966, 37: pages 1554~1563.
    [18] Bernardo Huberman, Peter Pirolli, James Pitkow, and Rajan Kukose. Strong regularities in world wide web surfing. Technical report, Xerox PARC, 1998.
    [19] Bettina Berendt. Web usage mining, site semantics, and the support of navigation. Proc of WEBKDD2000, Boston, MA, USA, August 20, 2000.
    [20] Bin Lan, Stephane Bressan, Beng Chin Ooi, Making Web Servers Pushier, In Proceedings of WEBKDD’99, San Diego, CA, USA, August 15-18, 1999.
    [21] Bing Liu, Wynne Hsu, and Shu Chen. Using general impressions to analyze discovered classification rules. In Third International Conference on Knowledge Discovery and Data Mining, 1997.
    [22] Blue martini inc. http://www.bluemartini.com, 2000.
    [23] Bonissone P. P. Summarizing and propagating uncertain information with triangular norms. International Journal of Approximate Reasoning, 1987, 1: pages 71~101.
    [24] Boser B., Guyon M., and Vapnik V. A training algorithm for optimal margin classifiers. In Conference on Computational Learning Theory (COLT), 1992, pages144~152,
    [25] Bray T., Paoli J., and Sperberg-McQueen C. M. Extensible markup language(xml) 1.0 w3c recommendation. Technical report, W3C, 1998.
    [26] Breese J, Heckerman D, Kadie C. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In UAI, 1998
    [27] Brin S., Motwani R., and Silverstein C.. Beyond market baskets: Generalizing association rules to correlations. In ACM SIGMOD International Conference on Management of Data, 1997.
    [28] Broadvision. http://www.broadvision.com, 1999.
    [29]Buchner A.G. and Maurice D Mulvenna. Discovering internet marketingintelligence through online analytical web usage mining. SIGMOD Record,1998, 27(4): pages 54~61.
    [30]Buchner A.G., Baumgarten M., Anand S.S., Mulvenna M.D., Hughes J.G.,Navigation Pattern Discovery from Internet Data, In Proceedings ofWEBKDD’99, San Diego, CA, USA, August 15-18, 1999.
    [31]Catledge L. and Pitkow J. Characterizing browsing behaviors on the worldwide web. Computer Networks and ISDN Systems, 1995, 27(6).
    [32]Chakrabarti S., Dom B., Gibson D., Kumar S.R., Raghavan P., Rajagopalan S.,and Tomkins A. Experiments in topic distillation. In ACM SIGIR workshop onHypertext Information Retrieval on the Web, Melbourne, Australia, 1998.
    [33]Charles Elkan. Boosting and niave Bayesian learning. Technical ReportNo.CS97-557, Department of Computer Science and Engineering, Universityof California. September1997.
    [34]Charu C Aggarwal and Philip S Yu. On disk caching of web objects in proxyservers. In CIKM 97, Las Vegas, Nevada, 1997, pages 238~245.
    [35]Chen L. and Sycara K. Webmate: A personal agent for browsing and searching.In Proc. 2nd Intl. Conf. Autonomous Agents, 1998, pages 132~139.
    [36]Chen M. S., Park J. S., and Yu P. S. Data mining for path traversal patterns in aweb environment. In 16th International Conference on Distributed ComputingSystems, 1996. pages 385~392.
    [37]Chen M. S., Park J. S. and Yu P. S., Efficient Data Mining for Path TraversalPatterns. IEEE Trans. on Knowledge and Data Engineering, Arpil 1998, Vol. 10,No. 2, pages 209~221.
    [38]Cherkassky V. and Mulier F. Learning from Data: Concepts, Theory, andMethods. John Wiley and Sons, Inc., New York, 1998.
    [39]Chris Clifton and Robert Cooley. Topcat: Data mining for topic identificationin a textcorpus. In Principles of Knowledge Discovery in Databases, 1999.
    [40]Christiane Theusinger, Klaus-Peter Huber. Analyzing the footsteps of yourcustomers, A case study by ASK|net and SAS Institute GmbH. Proc ofWEBKDD2000, Boston, MA, USA, August 20, 2000.
    [41]Chumki Basu, Haym Hirsh, And William Cohen. Recommendation asclassification: using social and content-based information in recommendation.In Proceedings of the 1998 Workshop on Recommender Systems, AAAI Presss,August 1998, pages 11~15.
    [42]Cohen E., Krishnamurthy B., and Rexford J.. Improving end-to-endperformance of the web using server volumes and proxy filters. In Proceedingsof ACM SIGCOMM, 1998, pages 241~253.
    [43]Craven M., DiPasquo D., Freitag D., McCallum A., Mitchell T., Nigam K. andSlattery S. Learning to Extract Symbolic Knowledge from the World WideWeb. Proceedings of the 15th National Conference on Artificial Intelligence(AAAI-98). 1998.
    [44]Cyrus Shahabi, Amir M Zarkesh, Jafar Adibi, and Vishal Shah. Knowledgediscovery from users web-page navigation. In Workshop on Research Issues inData Engineering, Birmingham, England, 1997.
    [45]Dan Murray, Kevan Durrel, Inferring Demographic Attributes of AnonymousInternet Users, In Proceedings of WEBKDD’99, San Diego, CA, USA, August15-18, 1999.
    [46]Daniel Billsus, Michael J. Pazzani. Learning collaborative information filters.In Proceedings of the 1998 Workshop on Recommender Systems. AAAI Press,August 1998.
    [47]Daniel Menasce, Virgilio Almeida, Rodrigo Fonseca, and Marco Mendes. Amethodology for workload characterization of e-commerce sites. In ElectronicCommerce, Denver, Colorado, ACM, 1999.
    [48]David Gibson, Jon Kleinberg, and Prabhakar Raghavan. Inferring webcommunities from link topology. In Conference on Hypertext and Hypermedia.ACM, 1998.
    [49]Dempster A. P. Comment. Journal of the American Statistical Association,1982, 77: pages 339~341.
    [50]Doyle J. A truth maintenance system. Ariti_cial Intelligence, 1979, 12(3).
    [51]Ed Chi, James Pitkow, Jock Mackinlay, Peter Pirolli, Rich Gossweiler, andStuart Card. Visualizing the evolution of web ecologies. In CHI 98, LosAngeles, CA, 1998, pages 400~407.
    [52]Edward Shortliffe and Bruce Buchanan. A model of inexact reasoning inmedicine. Mathematical Biosciences, 1975, 23: pages351~379.
    [53]Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar. Clustering based onassociation rule hypergraphs. In SIGMOD'97 Workshop on Research Issues on
    [54]Fawcett T. and Provost F. Activity monitoring: Noticing interesting changes inbehavior. In Fifth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, San Diego, CA, 1999, pages 53~62.
    [55]Fayyad U., Piatetsky-Shapiro G. and Smyth P. From data mining to knowledgediscovery: An overview. In Proceedings of ACM KDD, 1994.
    [56]Filip Coenen, Gilbert Swinnen, Koen Vanhoof, Geert Wets. A Framework forSelf Adaptive Websites: Tactical versus Strategic Changes. Proc ofWEBKDD2000, Boston, MA, USA, August 20, 2000.
    [57]Fink J., Kobsa A., and Nill A. User-oriented Adaptivity and Adaptability in theAVANTI Project. In Designing for the Web: Empirical Studies. 1996.
    [58]Friedman J. H. Flexible nearest neighbor classification. Technical report,Stanford University, 1994.
    [59]Funnel web professional. http://www.activeconcepts.com, 1999.
    [60]George Karypis, Rajat Aggarwal, Vipin Kumar, and Shashi Shekar. Multilevelhypergraph partitioning: Applications in vlsi domain. In ACM/IEEE DesignAutomation Conference, 1997.
    [61]George Karypis and Eui-Hong (Sam ) Han. Concept indexing: A fastdimensionality reduction algorithm with applications to document retrieval andcategorization. Technical Report TR-00-0016, University of Minnesota, 2000.
    [62]Glen Shafer. A Mathematical Theory of Evidence. Princeton University Press,Princeton, NJ, 1976.
    [63]Glenn Shafer. Constructive probability. Synthese, 1981, 48: pages 1~60.
    [64]Glenn Shafer and Amos Tversky. Languages and designs for probabilityjudgement. Cognitive Science, 1985, 9: pages177~210.
    [65]Global reach internet productions. http://www.global-reach.com, 1997.
    [66]Herlocker, J., Konstan, J., Borchers, A., Ridel, J., An algorithmic frameworkfor performing collaborative filtering. To appear in Proceedings of the 1999Conference on Research and Development in Information Retrieval, August1999.
    [67]Hit list commerce. http://www.marketwave.com, 1999.
    [68]Hiroki Kato, Takehiro Nakayama, Yohei Yamane. Navigation Analysis Toolbased on the Correlation between Contents Distribution and Access Patterns.Proc of WEBKDD2000, Boston, MA, USA, August 20, 2000.
    [69]Hochheiser H. and Schneiderman B. Understanding patterns of user visits toweb sites: Interactive star_eld visualizations of www log data. TechnicalReport CS-TR-3989, University of Maryland, 1999.
    [70]Hopcroft, J.E., and ULLman, J.D. 1979. Introduction to Automata Theory,Languages, and Computation. Addison-Wesley.
    [71]Huang, Z. Extensions to the k-Means Algorithm for Clustering Large Data Setswith Categorical Values. Data Mining and Knowledge Discovery, 1998, 2:pages 283~304.
    [72]Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan.Web usage mining: Discovery and application of usage pattens from web data.SIGKDD Explorations, 2000,1(2).
    [73]James E. Pitkow and Colleen M Kehoe. Results from the thrid www usersurvey. The World Wide Web Journal, 1995, 1(1).
    [74]James Pitkow. In search of reliable usage data on the www. In SixthInternational World Wide Web Conference, Santa Clara, CA, 1997,pages451~463.
    [75]James E Pitkow. Summary of www characterizations. In Seventh InternationalWorld Wide Web Conference, 1998.
    [76]Jerome Moore, Eui-Hong (Sam) Han, Daniel Boley, Maria Gini, Robert Gross,Kyle Hastings, George Karypis, Vipin Kumar, and Bamshard Mobasher. Webpage categorization and feature selection using association rule and principalcomponent clustering. In 7th Workshop on Information Technologies andSystems, 1997.
    [77]Joachims T., Freitag D., and Mitchell T. Webwatcher: A tour guide for theworld wide web. In The 15th International Conference on ArtificialIntelligence, Nagoya, Japan, 1997, pages 770~775.
    [78]Jonathan L. Herlocker, Joseph A.Konstan, Al Borchers, John Riedl: AnAlgorithmic Framework for Performing Collaborative Filtering. SIGIR 1999:pages 230~237
    [79]Jose Borges, Mark Levene, Data Mining of User Navigation Patterns, InProceedings of WEBKDD’99, San Diego, CA, USA, August 15-18, 1999.
    [80]Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks ofPlausible Inference. Morgan-Kaufman, San Mateo, CA, 1988.
    [81]Kaufman L. and Rousseeuw P.J. Finding Groups in Data: an Introduction toCluster Analysis. John Wiley and Sons, 1990.
    [82]Kevin Larson and Mary Czerwinski. Web page design: Implications of memory,structure and scent for information retrieval. In CHI 1998, Los Angeles, CA,1998.
    [83]Konstan, J., Miller, B., Maltz, D., Herlocker, J., Gordon, L., and Riedl, J.,GroupLens: apply collaborative filtering to usenet news. Communications ofthe ACM 1997, (40)3.
    [84]Kun-lung Wu, Philip S Yu, and Allen Ballman. Speedtracer: A web usagemining and analysis tool. IBM Systems Journal, 1998, 37(1).
    [85]Lieberman H. Letizia: An agent that assists web browsing. In Proc. of the 1995International Joint Conference on Artificial Intelligence, Montreal, Canada,1995.
    [86]Likeminds. http://www.andromedia.com, 1999.
    [87]Ling, C.X. and Li, C. Data Mining for Direct Marketing: Problems andSolutions, Proc. KDD’99, 1999, pages73-79.
    [88]Luotonen A. The common log file format. http://www.w3.org/pub/WWW/,1995.
    [89]Mannila H., Toivonen H., and Verkamo A. I. Discovering frequent episodes insequences. In Proc. of the First Int'l Conference on Knowledge Discovery andData Mining, , Montreal, Quebec, 1995, pages 210~215.
    [90]Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, TomMitchell, Kamal Nigam, and Sean Slattery. Learning to extract symbolicknowledge from the world wide web. In National Conference on Arti_cialIntelligence (AAAI), 1998.
    [91]Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, TomMitchell, Kamal Nigam, and Sean Slattery. Learning to construct knowledgebases from the world wide web. Artificial Intelligence, 1999.
    [92]Martin F Arlitt and Carey L Williamson. Internet web servers: Workloadcharacterization and performance implications. IEEE/ACM Transactions onNetworking, 1997, 5(5): pages 631~645.
    [93]Menger K. Zur allgemeinen kurventheorie. Fundementals of Mathematics,1927, 0: pages 96~115.
    [94]Mike Perkowitz and Oren Etzioni. Adaptive web sites: Conceptual clustermining. In Sixteenth International Joint Conference on Artificial Intelligence,Stockholm, Sweden, 1999.
    [95]Mladenic D., Machine Learning on non-homogeneous, distributed text data,doctoral dissertation, university of Ljubljana, 1998.
    [96]Mobasher, B., Cooley, R., et al. Creating Adaptive Web Sites ThroughUsage-Based Clustering of URLs, in Proceedings of the 1999 IEEEKnowledge and Data Engineering Exchange Workshop (KDEX'99), November1999.
    [97]Myra Spiliopoulou and Lukas C Faulstich. Wum: A web utilization miner. InEDBT Workshop WebDB98, Valencia, Spain, Springer Verlag, 1998.
    [98]Myra Spiliopoulou, Carsten Pohle, Lukas Faulstich, Improving theEffectiveness of a Web Site with Web Usage Mining, In Proceedings ofWEBKDD’99, San Diego, CA, USA, August 15-18, 1999.-N-
    [99]Nasraoui, O., Frigui, H., Joshi, A., Krishnapuram, R., Mining Web access logsusing relational competitive fuzzy clustering. To appear in the proceedings ofthe Eight International Fuzzy Systems Association World Congress, August1999.
    [100]Netgenesis. http://www.netgenesis.com, 1999.
    [101]Netperceptions. http://www.netperceptions.com, 1999.
    [102]Netzero. http://www.netzero.com, 1999.
    [103]Ng R. and Han J. Efficient and effective clustering method for spatial datamining. In Proc. of the 20th VLDB Conference, Santiago, Chile, 1994, pages144~155.
    [104]Ngu D.S.W. and Wu X. Sitehelper: A localized agent that helpsincremental exploration of the world wide web. In 6th International WorldWide Web Conference, Santa Clara, CA, 1997.
    [105]Olfa Nasraoui, Raghu Krishnapuram, and Anupam Joshi. Mining webaccess logs using a fuzzy relational clustering algorithm based on a robustestimator. In Eighth International World Wide Web Conference, Toronto,Canada, 1999.
    [106]Osmar, R, Z,. Resource and knowledge discovery from the internet andmultimedia repositories, doctoral dissertation, university of Simon Fraser,1999.
    [107]Pang-Ning Tan, Vipin Kumar. Modeling of Web Robot NavigationalPatterns. Proc of WEBKDD2000, Boston, MA, USA, August 20, 2000.
    [108]Park J. S., Chen M. S., and Yu P. S. An Effective Hash Based Algorithmfor Mining Association Rules. Proceedings of ACM SIGMOD, May, 1995,pages 175~186.
    [109]Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, JohnRiedl. GroupLens: An open architecture for collaborative filtering of netnews.In Proceedings of ACM CSCW’94 Conference on Computer SupportedCooperative Work, 1994, pages 175~186.
    [110]Peng W. C. and Chen M. S., Mining User Moving Patterns for PersonalData Allocation in a Mobile Computing System Proc. of the 29th InternationalConference on Parallel Processing (ICPP-2000), August 21-24, 2000, pages573~580,
    [111]Perkowitz M., and Etzioni, O. Adaptive Web Sites: AutomaticallySynthesizing Web Pages. in Proceedings of AAAI98. 1998.
    [112]Perkowitz M., and Etzioni, O.. Adaptive web sites: an AI challenge. InProc.15th Int. Joint Conf. AI, 1997.
    [113]Perkowitz M., and Etzioni, O. Adaptive web sites: Automatically learningfrom user access patterns. In Proceedings of the Sixth Int. WWWconference,1997.
    [114]Peter Pirolli, James Pitkow, and Ramana Rao. Silk from a sow's ear:Extracting usable structures from the web. In CHI-96, Vancouver, 1996.
    [115]Philip K. Chan, A non-invasive learning approach to building Web userprofiles, In Proceedings of WEBKDD’99, San Diego, CA, USA, August 15-18,1999.
    [116]Piatetsky-Shapiro G. and Matheus C. J. The interestingness of deviations.In AAAI-94 Workshop on Knowledge Discovery in Databases, 1994, pages25~36.
    [117]Piero P. Bonissone and Karsten S. Decker. Selecting uncertainty calculiand granularity: An experiment in trading-o_ precision and complexity.Uncertainty in Artificial Intelligence, 1986, pages 2217~2247.
    [118]Quinlan J. R. Consistency and plausible reasoning. In International JointConference on Artificial Intelligence, 1983.
    [119]Quinlan J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann,San Mateo, CA, 1993.
    [120]Rabiner, L. R. A tutorial on hidden Markov models and selectedapplications in speech recognition. In: Proceedings of the IEEE 77(2), NewYork, 1989, 257~286,.
    [121]Rakesh Agrawal, Giuseppe Psaila, Edward L. Wimmers, and MohamedZait Querying shapes of histories. In 21st Int'l Conference on Very LargeDatabases, 1995.
    [122]Ralph Kimball and Richard Merz. The Data Webhouse Toolkit. JohnWiley and Sons, Inc., 2000.
    [123]Rasmussen, E. Clustering algorithms. In Frakes, W., and Baeza-Yates, R.,eds., Information Retrieval. Prentice Hall, Eaglewood Cliffs, N.J. 1992, pages419~442.
    [124]Rissanen J., Modelling by Shortest Data Description, Automatica, 14,1978, 465~471.
    [125]Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Groupingweb page references into transactions for mining world wide web browsingpatterns. In Knowledge and Data Engineering Workshop, Newport Beach, CA,IEEE, 1997, pages 2~9.
    [126]Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Web mining:In- formation and pattern discovery on the world wide web. In InternationalConference on Tools with Artificial Intelligence, Newport Beach, IEEE, 1997,pages 558~567.
    [127]Robert Cooley. Classification of news stories using support vectormachines. In International Joint Conference on Artificial Intelligence TextMining Workshop, Stockholm, Sweden, 1999.
    [128]Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Datapreparation for mining world wide web browsing patterns. Knowledge andInformation Systems, 1999, 1(1).
    [129]Robert Cooley, Pang-Ning Tan, and Jaideep Srivastava. Websift: The website information filter system. In Proceedings of WEBKDD’99, San Diego, CA,USA, August 15-18, 1999.
    [130]Robert Cooley, Pang-Ning Tan, and Jaideep Srivastava. Discovery ofinteresting usage patterns from web data. In Myra Spiliopoulou, editor,LNCS/LNAI Series. Springer-Verlag, 2000.
    [131]Robert J. Hilderman and Howard J. Hamilton. Knowledge discovery andinterestingness measures: A survey. Technical report, University of Regina,1999.
    [132]Rosenfeld R. A maximum entropy approach to adaptive statisticallanguage modeling. Computer, Speech, and Language, 10, 1996.
    [133]Salton G. and McGill M. J. The smart and sire experimental retrievalsystems. McGraw-Hill, New York, 1983, pages 118~155.
    [134]Salton G. and McGill M.J. Introduction to Modern Information Retrieval.McGraw-Hill, New York, 1983.
    [135]Sanford Gayle. The Marriage of Market Basket Analysis to PredictiveModeling: The Essential Challenge in Exploiting Web-Log Files for Prediction.Proc of WEBKDD2000, Boston, MA, USA, August 20, 2000.
    [136]Schechter S., Krishnan M., and Smith M. D., "Using path profiles topredict http requests," in 7th International World Wide Web Conference, ,Brisbane, Qld., Australia, April 1998, pages 457~467.
    [137]Sergey Brin and Lawrence Page. The anatomy of a large-scalehypertextual web search engine. In 7th Int. Conf. WWW, Brisbane, Australia,April 1998.
    [138]Shahabi, C., Zarkesh, A. M., Adibi, J., and Shah, V., Knowledge discoveryfrom users Web-page navigation. In Proceedings of Workshop on ResearchIssues in Data Engineering, Birmingham, England, 1997.
    [139]Shardanand U, Maes P. Social information filtering: Algorithms forautomating “word of mouth”. In Proceedings of ACM CHI’95 Conference onHuman Factors in Computing Systems, 1995, pages 210~217.
    [140]Silberschatz A. and Tuzhilin A. What makes patterns interesting inknowledge discovery systems. IEEE Transactions on Knowledge and DataEng., 1996, 8(6): pages 970~974.
    [141]Slobodan Vucetic and Zoran Obradovic. A Regression-Based Approach forScaling-Up Personalized Recommender Systems in E-Commerce . Proc ofWEBKDD2000, Boston, MA, USA, August 20, 2000.
    [142]Spertus E. Parasite: Mining structural information on the web. ComputerNetworks and ISDN Systems: The International Journal of Computer andTelecommunication Networking, 1997, 29:pages 1205~1215.
    [143]Spiliopoulou M. The laborious way from data mining to web mining. Int.Journal of Comp. Sys., Sci. & Eng., Special Issue on “Semantics of the Web”.Mar.1999.
    [144]Spiliopoulou M., Faulstich, L.C., 1998. WUM: A Web Utilization Miner.
    [145]Srikant R. and Agrawal R. Mining sequential patterns: Generalizations andperformance improvements. In Fifth Int'l Conference on Extending DatabaseTechnology, Avignon, France, 1996.
    [146]Stephen Lee Manley. An Analysis of Issues Facing World Wide WebServers. Undergraduate, Harvard, 1997.
    [147]Stort R. Web Site Stats: tracking hits and analyzing traffic, Osborne:McGraw-Hill, 1997.
    [148]Stuart Russell and Peter Norvig. Artificial Intelligence: A ModernApproach. Prentice Hall, 1995.
    [149]Suhail Ansari, Ron Kohavi, Llew Mason, and Zijian Zheng. IntegratingE-Commerce and Data Mining: Architecture and Challenges. Proc ofWEBKDD2000, Boston, MA, USA, August 20, 2000.
    [150]Surfaid analytics. http://surfaid.dfw.ibm.com, 1999.
    [151]Thorsten Joachims. Text categorization with support vector machines:Learning with many relevant features. In European Conference on MachineLearning(ECML), 1998.
    [152]Tom Mitchell. Machine Learning. McGraw Hill, 1996.
    [153]Valery Guralnik and Jaideep Srivastava. Event detection from time seriesdata. In 5th International Conference onKnowledge Discovery and DataMining, San Diego, CA, 1999.
    [154]Vapnik V. The Nature of Statistical Learning Theory. Springer Verlag,New York, 1995.
    [155]Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira.Characterizing reference locality in the www. Technical Report TR-96-11,Boston University, 1996.
    [156]Voorhees,E.Implementingagglomerativehierarchicalclusteringalgorithms for use in document retrieval. Information Processing &Management, 1986, 22: pages 465~476.
    [157]Webtrends log analyzer. http://www.webtrends.com, 1999.
    [158]Wei-Lun Chang, Soe-Tsyr Yuan. A Synthesized Learning Approach forWeb-Based CRM. Proc of WEBKDD2000, Boston, MA, USA, August 20,2000.
    [159]Weiyang Lin, Sergio A. Alvarez, Carolina Ruiz. CollaborativeRecommendationviaAdaptiveAssociationRuleMining.ProcofWEBKDD2000, Boston, MA, USA, August 20, 2000.
    [160]Wexelblat A., and Maes P. Footprints: History-rich web browsing. In Proc.Conf. Computer-Assisted Information Retrieval(RIAO),1997, pages 75~84.
    [161]Willet, P. Recent trends in hierarchical document clustering: a criticalreview. Information Processing and Management 1988, 24: pages 577~597.
    [162]Wolfgang Gaul, Lars Schmidt-Thieme. Mining web navigation pathfragments. Proc of WEBKDD2000, Boston, MA, USA, August 20, 2000.
    [163]World wide web committee web usage characterization activity.http://www.w3c.org/WCA, 1999.
    [164]World Wide Web Consortium. http://www.w3.org/XML/, 1998.
    [165]Yan T., Jacobsen M., Garcia-Molina H., and Dayal U. From user accesspatterns to dynamic hypertext linking. In Fifth International World Wide WebConference, Paris, France, 1996.
    [166]Yiming Yang and Jan Pedersen. A comparative study of feature selectionin text categorization. In International Conference on Machine Learning, 1997.
    [167]Yongjian Fu, Kanwalpreet Sandhu, Ming-Yi Shih, Clustering of WebUsers Based on Access Patterns, In Proceedings of WEBKDD’99, San Diego,CA, USA, August 15-18, 1999.
    [168]Yun C. H and Chen M. S., Mining Web Transaction Patterns in anElectronic Commerce Environment. Proc. of the 4th Pacific-Asia Conf. onKnowledge Discovery and Data Mining, April 18-20, 2000, pages 216~219.
    [169]Yun C. H. and Chen M. S., Using Pattern-Join and Purchase-Combinationfor Mining Web Transaction Patterns in an Electronic Commerce Environment.Proc. of the 24th annual Intern'l Computer Software and ApplicationConference (COMPSAC-2000), October 25-27, 2000, pages 99~104.Bibliography
    [170]Zadeh L. A. A theory of approximate reasoning. Machine Intelligence,1979, 9: pages 149~194.
    [171]Zaiane O. R., Xin M., and Han J. Discovering web access patterns andtrends by applying olap and data mining technology on web logs. In Advancesin Digital Libraries, Santa Barbara, CA, 1998, pages 19~29.
    [172]Zhang T., Ramakrishnan R., and Livny M. BIRCH: an efficient dataclustering method for very large database. In Proc. 1996 ACM-SIGMOD Int.Conf. Management of Data, Montreal, Canada, June 1996, pages 103~114.
    [173]史忠植，高级人工智能，科学出版社，1997。
    [174]边肇祺，张学工，模式识别，第二版，清华大学出版社，2000。

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700