用户名: 密码: 验证码:
Toward webscale,rule-based inference on the semantic web via data parallelism.
详细信息   
  • 作者:Weaver ; Jesse.
  • 学历:Ph.D.
  • 年:2013
  • 导师:Hendler, James A.,eadvisorHendler, James A.ecommittee memberCarothers, Christopherecommittee memberFox, Peterecommittee memberMizell, Davidecommittee member
  • 毕业院校:Rensselaer Polytechnic Institute
  • Department:Computer Science
  • ISBN:9781303225932
  • CBH:3568303
  • Country:USA
  • 语种:English
  • FileSize:746692
  • Pages:202
文摘
This thesis considers the problem of scaling rule-based inference to large quantities of RDF data found on the Semantic Web. The general approach is one of data parallelism, that is, dividing data among processors such that the collective results of each processors individual inference is the same as though inference was performed sequentially. In this way, theoretically speaking, more processors can be added to accommodate more data. The problem is first considered from the perspective of the operational semantics of inference with production rules. The question is asked, under what conditions is embarrassingly parallel inference guaranteed to be correct? Sufficient conditions are determined and proven at both a fine-grained level close to the basic operational semantics and a more coarse-grained level that applies directly to rules. The conditions are placed on the relationship between rules and distribution schemes, that is, the way in which data is assigned to processors. Then, a special class of distribution schemes is considered called replication schemes. Replication schemes require that individual data either be replicated to all processors or placed arbitrarily on some processors). The aforementioned conditions are then reformulated to consider replication schemes which reveals that testing the conditions for replication schemes is reducible to satisfiability SAT), and not only SAT but 2SAT. An augmented version of this reduction which is a reduction to 3SAT also accounts for the possibility to eliminate some rules in order to improve parallelization. These reductions along with a proposed methodology for restricting rules are used to derive restricted versions of the RDFS and OWL2RL rules that are amenable to parallel inference. Finally, an evaluation is performed that tests these theoretical findings for restricted versions of RDFS and OWL2RL inference on two large, well-known datasets exceeding a billion triples: LUBM10K and BTC2012. The LUBM10K dataset represents an optimistic case, meaning that if performance is poor with LUBM10K, then it will likely be poor on many datasets. On the other hand, the BTC2012 dataset represents a pessimistic case, meaning that if performance is good with BTC2012, then it is likely that performance will be good with other datasets. While the usual scalability metrics are used speedup, efficiency, etc.), the Karp-Flatt metric reveals that inference is almost entirely parallel for LUBM10K data, demonstrating the practical feasibility of the theoretical findings. However, for BTC2012, it must be ensured that there is sufficient memory and load-balancing to achieve this high level of scalability on distributed memory architectures. Regardless, for feasible cases, very low times are achieved for LUBM10K seconds) and BTC2012 minutes).

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700