Top-<textbox>k</textbox> vectorial aggregation queries in a distributed environment

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Top-k vectorial aggregation queries in a distributed environment

详细信息查看全文

作者：Guy Sagy ; Izchak Sharfman ; Daniel Keren ; Assaf Schuster
关键词：Top- ; none ; color ; black"" href=""/science?_ob=MathURL&_method=retrieve&_udi=B6WKJ-5125R4C-1&_mathId=mml70&_user=10&_cdi=6908&_pii=S0743731510001826&_rdoc=16&_issn=07437315&_acct=C000050221&_version=1&_userid=10&md
刊名：Journal of Parallel and Distributed Computing
出版年：2011
出版时间：February 2011
年：2011
卷：71
期：2
页码：302-315
全文大小：1123 K

文摘

Given a large set of objects in a distributed database, the goal of a top-k query is to determine the top-k scoring objects and return them to the user. Efficient top-k ranking over distributed databases has been the focus of recent research, with most current algorithms operating on the assumption that each node holds a single or small subset of each object’s numerical attributes. However, in many important setups each node might hold instead a full d-dimensional vector of numerical attributes for each object. Examples include website activity in distributed servers, sales statistics for a retail chain, or share price information in different stock markets. For these setups, we define a novel ranking problem, top-k vectorial aggregation queries, where each object’s score is determined by first aggregating the attribute vectors held for it and then applying the scoring function over the aggregated vector.

Our communication-efficient algorithm uses a blend of geometric and skyline related machinery, some of which is newly developed, as well as an algorithmic framework for defining generic local constraints. Whereas previous algorithms have reduced data sharing by defining local thresholds for each attribute, such tailored solutions might perform poorly. Experimental results on real-world data demonstrate that our algorithm maintains low latency, with a communication cost up to four orders of magnitude lower than that of existing solutions.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700