用户名: 密码: 验证码:
AdBench: A Complete Benchmark for Modern Data Pipelines
详细信息    查看全文
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2017
  • 出版时间:2017
  • 年:2017
  • 卷:10080
  • 期:1
  • 页码:107-120
  • 丛书名:Performance Evaluation and Benchmarking. Traditional - Big Data - Interest of Things
  • ISBN:978-3-319-54334-5
  • 卷排序:10080
文摘
Since the introduction of Apache YARN, which modularly separated resource management and scheduling from the distributed programming frameworks, a multitude of YARN-native computation frameworks have been developed. These frameworks specialize in specific analytics variants. In addition to traditional batch-oriented computations (e.g. MapReduce, Apache Hive [14] and Apache Pig [18]), the Apache Hadoop ecosystem now contains streaming analytics frameworks (e.g. Apache Apex [8]), MPP SQL engines (e.g. Apache Trafodion [20], Apache Impala [15], and Apache HAWQ [12]), OLAP cubing frameworks (e.g. Apache Kylin [17]), frameworks suitable for iterative machine learning (e.g. Apache Spark [19] and Apache Flink [10]), and graph processing (e.g. GraphX). With emergence of Hadoop Distributed File System and its various implementations as preferred method of constructing a data lake, end-to-end data pipelines are increasingly being built on the Hadoop-based data lake platform.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700