用户名: 密码: 验证码:
基于本体的语义搜索模型研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
一般的信息检索系统存在检索效能低,检索结果答非所问等问题,其原因主要在于系统无法理解用户的语义,从而难以满足用户的检索需求。针对这一问题,人们提出了语义信息搜索的概念。最近,语义信息搜索的相关研究已逐渐成为信息检索领域的研究热点。然而现阶段的语义搜索研究面临着两大难题:一是没有找到一个有效的解析用户语义的手段,二是没有形成一个成熟的语义搜索模型。这两大难题已经严重阻碍了语义信息搜索研究的深入发展,成为这一领域急需解决的关键性问题。
     本文以情境理论为理论基础,以本体论为方法论基础,在信息检索过程中引入情境变量作为缩小语义开放性和准确理解用户需求的手段,利用本体将抽象的情境因素表达为信息检索系统可以读取和利用的具体变量,并最终发展成为一个以本体为基础的语义信息搜索模型。为了验证该模型的可行性,建立了一个果树语义信息搜索实验性系统,通过实验证明了该系统在检索效能和检索结果用户满意度方面具有较大的优越性。
     本文取得了如下三个方面的创造性研究成果:
     一是提出了基于情境变量的语义信息搜索框架。在这一框架中引入了用户的知识结构、用户承担的工作任务和信息环境这三类情境变量来精确识别用户信息需求,并提出了利用本体模拟用户知识结构和表达用户承担的工作任务的方法。
     二是在语义信息搜索框架基础上提出了一个以情境为导向以本题为基础的语义信息搜索模型,并重点论述了工作任务感知算法、工作任务表达算法和结果排序算法这三个核心算法,以及与这些算法相配合的本体知识库结构。
     三是结合果树领域本体,研制了一个果树语义信息搜索实验性系统。通过一个和百度搜索引擎的对比实验证明了该实验性系统在检索结果相关性、检索效能和智能程度上都有较大的提升,从而检验了语义信息搜索模型的可行性。
General information retrieval system sxposed some obvious defects, such as low retrievaleffectiveness and Inappropriate answers. A very important aspect of the causes of this problem lies inthe information retrieval system can not understand the semantics of the user, which makes it difficult tomeet the user demands. To solve this problem, the concept of semantic information search has beenproposed.Since it proposed, the semantic information search is becoming a hot area of informationretrieval research. However, at this stage, the semantic search research is faced with two problems: First,did not find an effective means of parsing user semantic and locating the user demand. Second, has notformed a full-fledged semantic search model. These two problems has been a serious impediment to thein-depth development of the semantic information search, so, they are the key issues in this area need tobe resolved.
     Based on the situational theory, this study combinated the contextual factors as an effective meansof narrowing the user semantic openness and positioning the means of user needs. The ontology hasbeen used to exchange the abstract contextural factors into some specific various which can be read anduse by information retrival systems. Based on these various, a semantic search model has been built. Totest the model, an experimental system was developed which has been proved has a greaterimprovement by an experiment.
     Three contributions have been made:
     The first is proposed a context-orinted semantic search frame. Three context variables affecting theinformation retrieval process has been selected into the framework from many context factors. They areuser knowledge structure, task and information environment. Further more, the method of expressingthe context variables was proposed.
     The second is made a technical program to use the semantic search framework in semanticinformation search. Based on the semantic search framework, a context-orinted semantic search modelwas proposed. In this technical solution, the task-aware algorithm, the tasks expression algorithm andthe results sorting algorithm are discussed and the ontology structure compatible with these algorithmsis expounded.
     The third is developed a pomology semantic search system. This system has been proved has agreater improvement in the search results appropriate level, retrieval performance and the degree ofintelligence by a comparison test with Baidu search engine. So, the feasibility of the semantic searchmodel has been tested.
引文
1.陈小平.语义网基础教程.北京:机械工业出版社,2007,21~23.
    2.丁晟春,李岳盟,甘利人.基于顶层本体的领域本体综合构建方法研究.情报理论与实践.2007,30(2):236~240.
    3.樊景超,周国民.基于Lucene的农搜并行索引技术研究.农业网络信息.2009,(8):30~31.
    4.盖杰,王怡,武港山.潜在语义分析理论及其应用.计算机应用研究.2004,21(3):9~20.
    5.李光达,常春.构建本体时获取概念方法研究.情报科学,2009,27(5):713~716.
    6.李景.本体理论及在农业文献检索系统中的应用研究─以花卉学本体建模为例.[博士学位论文],北京:中国科学院研究生院,2005.
    7.李景,孟连生.构建知识本体方法体系的比较研究.现代图书情报技术.2004,(7):17~22.
    8.李景,孟宪学,苏晓路.领域本体的构建方法与应用研究.北京:中国农业科学技术出版社,2009,88~96.
    9.李景,苏小鹭,钱平.构建领域本体的方法.计算机与农业.2003,(7):7~10.
    10.刘珊慧,杨乐,万韵.本体在农业信息检索中的应用.农机化研究.2009,31(10):237~239.
    11.麻志毅,姚天顺.基于情境的文本主题求解.计算机研究与发展.1998,4:344~348.
    12.彭玉容,杨捧,高媛.农业搜索引擎的发展现状及关键技术研究.安徽农业科学.2010,38(20):10971~10972.
    13.钱平,苏小鹭,崔运鹏.本体论与农业知识组织.见:李道亮.第一届国际计算机及计算技术在农业中的应用研讨会暨第一届农村信息化发展论坛论文集.北京:中国农业科学技术出版社,2007,368~371.
    14.钱平,郑业鲁.农业本体论研究与应用.北京:中国农业科学技术出版社,2006,112~129.
    15.秦雅楠,由丽萍.一种基于框架的情境知识表示方法.情报杂志.2011,(1):155~158.
    16.孙波,何克抗.基于情境理论的汉语理解模型.计算机研究与发展.1992,4:1~19.
    17.谭爱平,成亚玲.搜索引擎技术综述.湖南工业职业技术学院学报.2008,8(3):19~21.
    18.王杰生.基丁描述逻辑的语义网服务组合.[硕士学位论文].长沙:国防科学技术大学,2006.
    19.王美霞.基于Nutch的智能语义搜索引擎.[硕士学位论文].扬州:扬州大学,2009.
    20.王志晓,张大陆.P2P语义搜索研究进展.计算机科学.2010,37(4):21~26.
    21.王中英.果树学概论(北方本).北京:中国农业出版社,2001,11~19.
    22.文坤梅,卢正鼎,孙小林,李瑞轩.语义搜索研究综述.计算机科学.2008,35(5):1~4.
    23.谢能付,王文生.农业知识本体构建方法.农业网络信息.2007,(8):12~16.
    24.徐济成,李绍稳,张友华.农业本体及本体学习研究.计算机技术与发展.2009,19(8):212~219.
    25.由丽萍,秦雅楠.框架与Petri网相结合的中文情境知识表示方法研究.信息系统.2011,34:102~104.
    26.云健,王春霞.搜索引擎技术综述.河西学院学报.2008,6:21~25.
    27.张玉兴.果树栽培学各论(北方本,第三版).北京:中国农业出版社,2008,11~36.
    28.周国民,樊景超,周义桃.基于SDD算法的中文农业搜索引擎设计与实现.农业图书情报学刊.2008,20(11):48~50.
    29.周琦.基于关键词的语义搜索.[硕士学位论文].上海:上海交通大学,2009.
    30. Bates.M.J.Subject access in online catalogs:A design model.Journal of the American Societyfor information science.1986,37(6):357~376.
    31. Belew.R.K.Finding Out About: A Cognitive Perspective on Search Engine Technology and theWWW.London: Conbridge University Press,2000,216~249.
    32. Belkin,N.J.Anomalous states of knowledge as a basis for information retrieval.Canadian Journalof Information Science.1980,5:133~143.
    33. Belkin,N.J.Ineffable concepts in information retrieval.In K.Sparck Jones (Ed.),Informationretrieval experiment.London: Butterworth,1981,44~58.
    34. Belkin,N.J.Interaction with texts: Information retrieval as information-seeking behavior.InG.Knorz,J.Krause,&C.Wormser Hacker (Eds.),Information Retrieval.Konstantz,Germany:Universitatsverlag Konantz,1993,55~66.
    35. Belkin,N.J.,Seeger,T.&Wersig,G.Distributed expert problem treatment as a model forinformation systems analysis and design.Journal of Information Science,1983,5:153~167.
    36. Bell,D.J.,&Ruthven,I.Searchers assessment of task complexity for web searching.InS.McDonald&J.Tait (Eds.),Proceedings of the26th European Conference on InformationRetrieval (ECIR04).Sunderland,UK: Springer,2004,42~49.
    37. BOLDI P,SANTINI M,VIGNA S.PageRank as a function of the damping factor: proc.of thel4th conference on world Wide web[C].[s.1.]:[s.n.].2005,12~19.
    38. BORODIN A,ROBERTS G O,ROSENTHAL J S.Finding autllorities and hubs from1inkstructures on the word wide Web: porc. of the10th conference on world Wideweb[C].[s.1.]:[s.n.].2001,22~36.
    39. BRIN S,PAGE L.The anatomy of a large-scale hyper textual web search engine: pmc.of the7thIntemational World Wide Web Conference[c].[s.1.]:[s.n.].1998,14~18.
    40. Bystrom,K.Information and information sources in tasks of varying complexity.Journal of theAmerican Society for Information Science and Technology,2002,53:581~591.
    41. Bystrom,K.,&Jarvelin,K.Task complexity affects information seeking and use.InformationProcessing and Management,1995,31:191~213.
    42. ChowdHury.G.G.Natural Language Processing.In:Cronin,B.(Ed.)[ARIST37]:2002,35(5):51-89.
    43. COHN D,CHANG H.Learning to probabilistically identify authoritative documents: proc.of the17th intemational Conferenee on Machine Learning[C].[s.1.]:[s.n.].2000,101~123.
    44. Cool,C.The concept of situation in information science.Annual Review of Information Scienceand Technology.2001,35:5~42.
    45. Cool,C.,&Spink,A.Issues of context in information retrieval (IR): An introduction to the specialissue.Information Processing and Management.2002,38:605~611.
    46. De Mey,M.The Cognitive viewpoint: Its development and its scope.In: De Mey,M.st al.(Eds.),CC77: International Workshop on the Cognitive ViewPoint,March.Ghent,Belgium.Ghent,Belgium: University of Ghent.1977,3,24~26.
    47. De Mey, M. The Cognitive Paradigm: An Integrated Understanding Of ScientificDevelopment.Dordrecht,Holland: Reidel.1982,26,96~103.
    48. Dervin,B.Given a context by any other name: Methodological tools for taming the unrulybeast.In P.Vakkari,R.Savolainen,&B.Dervin (Eds.),Information seeking in context:Proceedings of an International Conference on Research in Information Needs,Seeking and Use inDifferent Contexts.London: Taylor Graham.1996,12(4):16~36.
    49. Doyle,L.B.Indexing and abstracting by association,Part I.Santa Monica,CA:SystemDevelopment Corporation.1962,1,64~69.
    50. EIRON N,MCCURLEY K S.Link analysis: ranking the web frontier: proc.0f the l3th conferenceon world Wide web[C].[s.1.]:[s.n.].2004,169~171.
    51. FellBaum,C.WordNet:An Electronic Lexical Database.Combridge,MA:Mit Press.1998,19~22.
    52. FENSEL D,DECKER S,ERDMANNM,etal.Ontobroker: how to enable intelligent access to[C]//Proc of the11th Banff Knowledge Acquisition for Knowledge-based system Workshop,1998,663~664.
    53. Grube T R, A Translation Approach to Protable Ontology Specifications. KnowledgeAcquisition.1993,3,29~37.
    54. Guarino N.Formal Ontology and Information System.In: Processing of1st InternationalConference on Formal Ontology in Information System.1995,43(5/6),625~640.
    55. Guarino N, Giarretta P. Ontology and Knowledge Bases: Towards a TerminologicalClarification.In: Proceedings of2nd International Conference on Building and Sharing VeryLarge-Scale Knowledge Bases.The ISO Press,1995,521~539.
    56. Harman,D.K.How effective is suffixing.Journal of the American Societyfor informationscience.1991,42(1):7~15.
    57. Hsieh-Yee,I.Effects of search experience and subject knowledge on the search tactics of noviceand experienced searchers.Journal of the American Society for Information Science.1993,44,161~174.
    58. Ingwersen,P.Information Retrieval Interaction.Lomdon: Taylor Graham.1992,37~49.
    59. Ingwersen,P.,&Jarvelin,K..Information retrieval in context.In Ingwersen,P.,&Belkin,N.J.(Eds.),Proceedings of the SIGIR2004Workshop on Information Retrieval in Context.NewYork: ACM.2004,6~9.
    60. Ingwersen,P.,&Jarvelin,K. The Turn Integration of Information Seeking and Retrieval inContext[M].Netherland:Springer,2005,42~69.
    61. Jarvelin, K.&Niemi, T. ExpansionTool: Concept-based query expansion andconstruction.Information Retrieval.2001,4(3/4):231~255.
    62. Kelly,D.,&Belkin,N.J.A longitudinal,naturalistic study of reading behavior as implicitfeedback: Preliminary findings.New Brunswick,NJ: SCILS,Rutgers University,InformationInteraction Lab.2003,82~100.
    63. Kelly,D.,&Cool,C.The effects of topic familiarity on information search behavior.In W.Hersh&G.Marchionini (Eds.),Proceedings of the Second ACM/IEEE Joint Conference on DigitalLibraries (JCDL02).New York: ACM,2002,74~75.
    64. Kemafor Anyanwu,Angela Maduko.SemRank: Ranking Complex Relationship Search Results onthe Semantic Web: proc.of the l4th conference on world Wide web[C].[s.1.]:[s.n.].2005,691~714.
    65. Koskenniemi,K.An application of the two-level model to finbish.In:Karlson,F.(Ed.)Computational Morphosyntax:Repeat on Research1981-84.Helsinki:University of Helsinki,Department of General Linguistics,Publications No.1985,13,421~432.
    66. Kristensen.J.Expending End-users’ query statement,for free text searching with a search-aidthesaurus.Information Processing and Management.1993,29(6):733~744.
    67. Kuhlthau, C. C. Seeking meaning: A process approach to library and informationservices.Norwood,NJ: Ablex.1993,39~46.
    68. LEMPEL R,MORAN S.the stochastic approach for link structure analysis(SALSA)and the TKCeffect: porc.of the9th conference on world Wide web[C].[s.1.]:[s.n.].2000,252~261.
    69. LeWeb官方网站.2010-12-25取自http://LeWeb.cn
    70. Lin Hong fei.Text Browsing Based on Latent Semantic Indexing.Journal of Chinese informationprocessing.2000,14(5):241~245.
    71. Lin,S.J.,&Belkin,N.J.Modeling multiple information seeking episodes.In N.Roderer(Ed.),Proceedings of the Annual Conference of the American Society for Information Science(ASIS00).Silver Spring,MD: American Society for Information Science and Technology.2000,133~146.
    72. Lopez V,Pasin M,Motta E.AquaLog: An Ontology-Portable Question Answering System for theSemantic Web.In:Proceedings of the2nd European Semantic Web Conference.2005,91~126.
    73. MCGill,A.T.&Koll,M.,Eds.proceedings of the5th ACM-SIGIR International Conferenceon Research and Development in Information Retrieval,(ACM SIGIR5).New York,NY:ACMPress.1983,187~196.
    74. MCSHERRY F.A uniform approach to accelerated PageRank computation.In: proc.of the l4thconference on world Wide web[C].[s.1.]:[s.n.].2005,411~419.
    75. Natalya F,Michael Sintek,Creating semantic Web Contents with Protégé-2000.The semanticweb.2001,4:60~71.
    76. Neches R,Fikes R E,Gruber T R,et al.Enabling Technology for Knowledge Sharing.AIMagazine.1991,8(01):36~56.
    77. Pennanen,M.,&Vakkari,P.Students’ cognition and information searching while preparinga research proposal.In: H.Bruce,R.Fidel,P.Ingwersen,&P.Vakkari (Eds.),Emergingframeworks and methods.Proceedings of the Fourth International Conference on Conceptions ofLibrary and Information Science (CoLIS4).Seattle,WA: Libraries Unlimited.2002,33~48.
    78. Piternick.A.B.Searching Vocabularies: a developing category of online search tools.OnlineReview.1984,8(5):441~449.
    79. Ruthven,I.”and this set of words represents the user’s context…” In: P.Ingwersen&N.J.Belkin (Eds.),Proceedings of the SIGIR2004Workshop on Information Retrieval inContext.New York: ACM.2002,10~12.
    80. Rieh,S.Y.On the web at home: Information seeking and web searching in the homeenvironment.Journal of the American Society for Information Science and Technology.2004,55:743~753.
    81. S.T.Dumais,G.W.Furnas,T.K.Landauer,S.Deerwester and R.Harshman,"UsingLatent Semantic Analysis to Improve Access to Textual Information," Human Factors in:Computing Systems,CHI'88Conference Proceedings (Washington,D.C.),ACM Press.1988,281~285.
    82. Salton.G.,Allan.J.&Buckley.C.Approaches to passage retrieval in full text informationSystem.Korfhage,R et.al.1993,12~13.
    83. Salton.G.,Allan.J.,Buckley.C.&Singhal.A.Automatic analysis,theme generationand summarization of machine-readable texts.1994,264:1421~1426.
    84. Salton, G. Automatic Information Organization and Retrieval. New York, NY:MCGraw-Hill.1968,18~29.
    85. Salton.G,MacGill.J.M.Introduction to Modern Information Retrieval.NewYork:McGraw,1983,261~279.
    86. Saracevic, T. The stratified model of information retrieval interaction:Extension andapplications.In: C.S.Schwartz (Ed.),Proceedings ofthe Annual Meeting of the AmericanSociety for Information Science(ASIS97).Silver Spring,MD: American Society for InformationScience and Technology.1997,3~9.
    87. Savolainen,R.Everyday life information seeking: Approaching information seeking in the contextof “way of life”.Library and Information Science Research.1995,17,259~294.
    88. Schutz,A.,&Luckmann,B.The structures of the life-world,volume1.Evanston,IL:Northwestern University Press.1973,76~89.
    89. Sparck Jones,K.&May,M.Linguisitics and information science.New York,NY:AcademicPress.1973,27~43.
    90. Sparck Jones,K.Natural Language Processing:A Histrical Review.Cited Nov.2001,99~100.
    91. Spink,A. Multiple search session model of end-user behavior: An exploratory study.Journal ofthe American Society for Information Science.1996,47:603~609.
    92. Strong.G.W.&Drott.M.C.A thesaurus for end-user indexing and retrieval.InformationProcessing and Management.1986,22(6):487~492.
    93. Sun Jiantao,Zheng Huajun,LIu Huan,et al.CUBESVD: a novel approach to personalized websearch.In:proc.of the14th Intemational www conference.2005,382~390.
    94. Taylor,R.S. Information use environments.In B.Dervin,M.J.Voigt (Eds.),Progressin Communication Sciences.1997,10,217~255.
    95. Tran T,Cimiano P,Rudolph S,Studer R.Ontology-Based Interpretation of Keywords for SemanticSearch.In:Karl.Aberer(Eds.). Proceedings of the6th International Semantic WebConference.Netherland:Springer.2007,891~904.
    96. Vakkari,P.Task complexity,problem structure and information actions: Integrating studies oninformation seeking and retrieval.Information Processing and Management.1999,35:819~837.
    97. Vakkari,P.Cognition and changes of search terms and tactics during task performance: Alongitudinal case study.In J.Mariani&D.Harman (Eds.),Proceedings of the RIAO2000Conference.Paris: C.I.D.2000,894~907.
    98. Vakkari,P. Task-based information searching.Annual Review of Information Science andTechnology.2003,37:413~464.
    99. Wen Kunmei,Lu Zhengding,Li Ruixuan&Sun Xia01in.语义搜索引擎Smartch的设计与实现.JOURNAL OF SOUTHEAST UNIVERSITY (ENGLISH EDITION).2007,23(3):317~321.
    100. Wersig,G.The problematic situation as a basic concept of information science in the framework ofsocial sciences: A reply to N.Belkin.In Theoretical problems of informatics: New trends ininformatics and its terminology.Moscow:VINITI.1979,48~57.
    101. Wildemuth,B.M.,Bliek,R.,Friedman,C.P.,&File,D.D.Medical students’ personalknowledge,searching proficiency and database use in problem solving.Journal of the AmericanSociety for Information Science.1995,46:590~607.
    102. Wilson,T.D. On use studios and information needs.Journal od Documentation.1981,3(1):3~15.
    103. Wilson,T.D.Information behavior: an interdisciplinary perspective.Information Processing&Management.1997,33(4)::551~572.
    104. Wilson,T.D.Models in information behavior search.Journal od Documentation.1999,55(3)::249~270.
    105. Wilson,T.D.,Ford,N.F.,Ellis,D.,Foster,A.,&Spink,A.Information seekingand mediated searching.Part2.Uncertainty and its correlates.Journal of the American Society forInformation Science and Technology.2002,53,:704~715.
    106. Xi Wensi,Zhang Benyu,Chen Zheng,et al.Link fusion:a unified link analysis framework formultitype interrelated data objects.In:proc of the13th Intemational www conference.2004,491~512.
    107. Yuangui Lei,Victoria Uren,Enrico Motter.SemSearch: A Search Engine For the SemanticWeb.In:Proceedings EKAW2006, Managing Knowledge in a World of Networks. EKAW2006.Czech:POdebrady.2006,238~245.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700