用户名: 密码: 验证码:
Downloading data from textual deep Web using clustering.
详细信息   
  • 作者:Yuan ; Xiaolei.
  • 学历:Master
  • 年:2007
  • 毕业院校:University of Windsor
  • 专业:Computer Science.
  • ISBN:9780494349854
  • CBH:MR34985
  • Country:U.K.
  • 语种:English
  • FileSize:3470666
  • Pages:78
文摘
Deep web is the web that is dynamically generated from data sources such as databases or file systems. Crawling deep web is the process of collecting hidden data by issuing appropriate queries in order to download most of the data. Our main challenge is to select appropriate queries in order to obtain most of the data from a data source. A naive solution, which selects the queries that return most results, is problematic because (1) the results may not cover the data source, and more importantly, (2) the results suffer from high overlap, which makes the acquisition of new data items almost impossible after certain steps. The thesis experiments with four different algorithms to select the queries that minimize the overlap rate: (1) greedy algorithm based on set packing; (2)cluster-based algorithm to remove the queries that result in similar returns.;Keywords: deep web, hidden web data discovery, data mining, clustering, information retrieval.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700