基于模范用户的协同过滤算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于模范用户的协同过滤算法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Collaborative Filtering Algorithm Based on Model Users
作者：彭晋
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：电子商务 ; 协同过滤 ; 模范用户 ; 聚类有效性验证指标
英文关键词：e-commerce ; collaborative filtering ; model users ; cluster validity indices
学位年度：2010
导师：傅鹤岗
学科代码：081202
学位授予单位：重庆大学
论文提交日期：2010-04-01

摘要

在电子商务大行其道的时代,人们需要的不再是简单的信息提供,而是有针对性的信息推荐。众多个性化推荐技术中协同过滤可谓一枝独秀,该算法引领了当今各大电子商务平台的推荐系统的发展趋势。但随着电子商务行业规模不断发展壮大,无论是用户还是商品的数量呈现指数级增长,同时用户对电子商务推荐所提供服务的要求也越来越高。协同过滤技术在面对当前的挑战时暴露出许多有待解决的瓶颈问题。针对存在的这些问题,国内外的研究机构和学者不断地探索改进方案。本文深入分析比较了协同过滤算法及当前主要的改进算法。提出基于模范用户的协同过滤算法。
     模范用户的概念类似于现实生活中的劳动模范或标兵。在某个领域或行业起到模范带头作用,也是其他人效仿和学习的榜样。将这样一个概念引入到协同过滤推荐算法中,主要目的是希望建立一个有较好稳定性的模范用户模型,该模型中的用户能反映其所在的一个或多个领域内用户的兴趣爱好,协同模范用户推荐出的商品应该是准确和可信赖的。该模型的建立对于缓解协同过滤技术中存在的稀疏性问题、推荐的实时性问题有很大的帮助。同时稳定的模范用户模型也可以应对电子商务平台快速增长的用户和商品数量的挑战。
     本文通过对用户-项目评价矩阵中的用户聚类,在每个类中生成一个模范用户评分向量。模范用户并不是聚类的中心,而是按照一定的生成规则生成的虚拟用户。该组用户增大了类内用户的评分密度,反映了类内用户整体评价趋势。
     聚类技术通常必须指定一个聚类个数,这样给出的聚类结果是否合理,是否真正反映了用户群的分类就需要进行聚类有效性的验证。本文通过DB指标对普通C均值聚类算法的聚类效果进行验证,当DB指标取到极小值时聚类迭代结束,获得最优聚类粒度;通过分割系数PC对模糊C均值聚类进行有效性验证,当聚类自适应函数值取到极大值时获得最优聚类粒度。对于两种聚类算法均实现了自适应聚类粒度的确定。
     实验表明:聚类数自适应算法可以取得有效性验证指标的局部最优值,既最优聚类效果。对在此基础上生成的模范用户模型应用协同过滤推荐算法,目标用户在线推荐的效率有很大的提高,模范用户模型相对稳定,推荐精度也有所改善。
In an era of e-commerce which is becoming popular and popular, our need is no longer a simple provision of information but targeted information recommendation. Collaborative filtering is thriving among lots of personalized recommendation technology which leads the recommendation system trends of major e-commerce platforms. But with the development and growth of e-commerce industry, whether the number of users or goods increased exponentially, and the need of users for e-commerce recommendation services are increasingly higher. Collaborative filtering technology reveals a number of bottlenecks to be addressed in the face of current challenges. To address these problems, domestic and foreign research institutions and scholars continue to explore the improvement program. This paper does in-depth analysis and comparison of the collaborative filtering algorithm and improves the current principal algorithm, proposes an collaborative filtering algorithm based on the model users.
     The concept of the user model is similar to working model or a model in real life. Model users play an exemplary role in a particular field or industry which others follow and learn from. The main purpose of introducing such a concept to the collaborative filtering algorithm is to build a better stability of the model user pattern, by which the model users can reflect the user’s interests the one or more fields. the products recommended by collaborative recommendation should be accurate and reliable. This model of collaborative filtering technology is great help in the mitigation of existing sparse problems and recommendation in time. At the same time a stable model user pattern can also respond to challenges of the rapidly growing users and commodities.
     This paper generates a model user score vector in each class to represent the user's overall evaluation of such trends by the user and project evaluation matrix cluster. Model users is not the center of the users but virtual users generated according to certain rules. This group of model users increases the user's score density and reflects the overall evaluation of trends in certain class users.
     Clustering techniques usually have to assign the number of clusters, but whether the result really reflects the classification of users needs verification on the validity of cluster. This paper introduces two cluster validity indexes: DB indicators and split factor PC. Verifying HCM and FCM clustering strategy by the two cluster validity indices can achieve the optimal cluster size when validity indexes reach the extreme values. Both of HCM and FCM realize adaptive determination of cluster size.
     Experimental results show that adaptive clustering algorithm can obtain the local optimal validation index values, that is, the best clustering effect. Collaborative filtering algorithm based on model users greatly improves the efficiency of online recommendation, makes model users relatively stable and also improves the accuracy of recommendation.

引文

[1] Maltz,D,Ehrlich,K.Pointing the Way:Active Collaborative Filtering[C].Proceedings of the 1995 ACM Conference on Human Factors in Computing Systems.New York,1995.
    [2] Konstan,J.A.,Miller,B.N.,and Riedl,J.Grouplens:Applying Collaborative Filtering to Usenet News[J].Communications of the ACM.1997.
    [3] Billsus,D.,and Pazzani,M.J.Learning Collaborative Information Filters[C].In Proceedings of the Workshop on Recommender Systems,AAAI Press.1998.
    [4] Shardandand,U.,and Maes,P.Social Information Filtering:Algorithms for Automating“Word of Mouth”[C].In Proceedings of ACM CHI’95 Conference on Human Factors in Computing Systems,1995.
    [5] www.acm.org
    [6] www.web-watcher.com
    [9] Adomavicius G.,Tuzhilin A., Toward the next generation of recommender systems: a survey of the state of the art and possible extensions[J]. Knowledge and Data Engineering, IEEE Transactions, 2005, 17(6): 734-749.
    [10] Cai-Nicolas Ziegler, Jennifer Golbeck. Investigating interactions of trust and interest similarity,Decision Support Systems,2007,43(2),460-475.
    [11] M. Papagelis, D. Plexousakis, and T. Kutsuras. Alleviating the sparsity problem of collaborative filtering using trust inferences[J]. In Proceedings of iTrust 2005, 224-239, 2005.
    [12] Z. Fuguo, X. Shenghua. Topic-level trust in recommender systems[C].Proceedings of 2007 International Conference on Management Science and Engineering, 2007.
    [13]孙小华.协同过滤系统的稀疏性和冷启动问题研究[D] .浙江大学博士学位论文,2005.
    [14]熊忠阳,张凤娟,张玉芳.基于粒子群优化的项聚类推荐[J].计算机工程,2009,35(23): 178-180.
    [15] Kennedy J,Eberhard R C,Shi Yuhui.Swarm Intelligence[M].San Francisco, USA: Moraga Kaufman Publisher, 2001:1942-1948.
    [16]林丽冰,师瑞峰,周一民,李月雷.基于双聚类的协同过滤算法[J].计算机科学.2008.35(4):7-9.
    [17] Belkin,N.J, Croft,W.B,1992,Information filtering and information retrieval:Two sides of the same coin[J].Communication of the ACM 35(12),29-38.
    [18] Goldberg,D.,Nichols,D.,Oki,B.M.,and Terry,D.Using Collaborative Filtering to Weave an Information Tapestry[J].Communications of the ACM.1992.
    [19]左子叶,阮备军,邓爱林.一个开发式电子商务推荐系统框架图[R].国家863基金项目.2001.
    [20] Herlocker,J.,Konstan,J.A.,and Riedl,J.An Algorithmic Framework for Perform Collaborative Filtering[C].Proceedings of SIGIR.1999.
    [21] Agrawal R, Srikant R.Fast algorithms for mining association rules[C].Proceeding of the 20th International Conference on Very LargeDatabases, 1994, 9: 487- 499.
    [22]徐宝文,张卫丰.数据挖掘技术在Web预取中的应用研究[J].计算机学报.2000.1 430-436
    [23]程继华,郭建生,施鹏飞.挖掘所关注规则的多策略方法研究[J].计算机学报.2000.1 47-51
    [24] SehaferJ.B., KonstanJA.,RiedlJ.. Recommender Systems in E一commerce[C].In:ACM Conferenee on E一Commerce(EC99),1999,4(6):105一128,158~166
    [25] Marko B,Yoav S.FAB”content-based collaborative recommendation[J].Communications of the ACM, 1997,40(3):66-72.
    [26] Yoon Ho Cho,Jae Kyeong Kim.Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce[J].Expert Systems with Applications 2004 (26):233-246.
    [27] Truong Khanh Quan,Ishikawa Fuyuki, Honiden Shinichi.Improving Accuracy of Recomender System by Clustering Items Based on Stability of User Similarity[C].International Conference on Computational Intelligence for Modelling Control and Automation,2006
    [28]陶剑文,潘红艳.基于相似项目与用户评分的协同推荐算法[J].情报学报. 2008. 27(2) :199-204.
    [29]温会平,陈俊杰.基于用户模糊聚类的个性化推荐算法[J].计算机与数字工程. 2008. 36(2):13-16.
    [30] Sarwar B,Karypis G, Konstan J, et al. Analysis of recommendation algorithms for e-commerce[C].Proceedings of EC’00: Proceedings of the 2nd ACM conference on Electronic commerce, New York, NY, USA: ACM,2000.158-167.
    [31]郑先荣,曹先彬.线性逐步遗忘协同过滤算法的研究[J].计算机工程.2007.3(6):72-73.
    [32]周军锋,汤显,郭景峰.一种优化的协同过滤推荐算法[J].计算机研究与发展. 2004.41(10):1842~1847.
    [33]王慧敏,聂规划.融合用户和项目相关信息的协同过滤算法研究[N].武汉理工大学学报.2007.7(7):160~163.
    [34] Dempster,A.,Laird,N.,and Rubin,D.Maximum Likelihood from Incomplete Data via the EM Algorithm[J].Journal of the Royal Statistical Society.2002.
    [35] Unger,L.H.,and Foster,D.P.Clustering Methods for Collaborative Filtering[C].In Workshop on Recommender Systems at the 15th National Conference on Artificial Intelligence.1998.
    [36] Herlocker,J.,Konstan,J.A.,and Borcher,A.An Algorithmic Framework for Perform Collaborative Filtering[C].Proceedings of SIGIR. 1999.
    [37] Lang,K..Newsweeder:Leaming to filter news[C].Proceedings of the 12thInternational Conference on Machine Learning,LakeTahoe,CA,1995:331-339.
    [38] Demmel, J. and Kahan, W. Computing Small Singular Values of Bidiagonal Matrices With Guaranteed High Relative Accuracy[C]. SIAM J. Sci. Statist. Comput., 11 (5), 873-912.
    [39] Wen W X, Liu H. A Feature Weighting Method for Inductive Learning[C]. Proc. of the 3th PRICAI. 1994: 338-344.
    [40] Friedman P W, Winnick B L, Friedman C P, et al. Development of a MeSH-based Index of Faculty Research Interests[C].In: Proceedings of Annual AMIA Symposium,2000:365-269.
    [41] Pavlov D, Pennock D. A maximum entropy approach to collaborative filtering in dynamic, sparse, high-dimensional domains[C]. In: Proc. of the 16th Annual Conf. on Neural Information Processing Systems. 2002.
    [42] Dey D, Sarkar S, De P. A Distance-based Approach to EntityReconciliation in Heterogeneous Databases[J]. IEEE Transaction onKnowledge and Data Engineering, 2002, 14(3).
    [43]左孝陵,李为鑑,刘永才.离散数学[M].上海:上海科技文献出版社,2003.
    [44] Dunn J C.Well separated clusters and optimal fuzzy partitions[J].J Cybern,1974,4:95—104.
    [45] Davies D L,Bouldin D W.A cluster separation measure[J].IEEE Trans Pattern Anal Machine Intell,1979,1(4):224—227
    [46] Bezdek j.c. Pal NR_Some new indexes of duster validity[J]. IEEE Transactiom on Systems.Man and Cybernetics—Part B:Cyberceties.1998,28 (3):301-305.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700