Downloading data from textual deep Web using clustering.

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Downloading data from textual deep Web using clustering.

详细信息

作者：Yuan ; Xiaolei.
学历：Master
年：2007
毕业院校：University of Windsor
专业：Computer Science.
ISBN：9780494349854
CBH：MR34985
Country：U.K.
语种：English
FileSize：3470666
Pages：78

文摘

Deep web is the web that is dynamically generated from data sources such as databases or file systems. Crawling deep web is the process of collecting hidden data by issuing appropriate queries in order to download most of the data. Our main challenge is to select appropriate queries in order to obtain most of the data from a data source. A naive solution, which selects the queries that return most results, is problematic because (1) the results may not cover the data source, and more importantly, (2) the results suffer from high overlap, which makes the acquisition of new data items almost impossible after certain steps. The thesis experiments with four different algorithms to select the queries that minimize the overlap rate: (1) greedy algorithm based on set packing; (2)cluster-based algorithm to remove the queries that result in similar returns.;Keywords: deep web, hidden web data discovery, data mining, clustering, information retrieval.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700