AdBench: A Complete Benchmark for Modern Data Pipelines

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

AdBench: A Complete Benchmark for Modern Data Pipelines

详细信息查看全文

刊名：Lecture Notes in Computer Science
出版年：2017
出版时间：2017
年：2017
卷：10080
期：1
页码：107-120
丛书名：Performance Evaluation and Benchmarking. Traditional - Big Data - Interest of Things
ISBN：978-3-319-54334-5
卷排序：10080

文摘

Since the introduction of Apache YARN, which modularly separated resource management and scheduling from the distributed programming frameworks, a multitude of YARN-native computation frameworks have been developed. These frameworks specialize in specific analytics variants. In addition to traditional batch-oriented computations (e.g. MapReduce, Apache Hive [14] and Apache Pig [18]), the Apache Hadoop ecosystem now contains streaming analytics frameworks (e.g. Apache Apex [8]), MPP SQL engines (e.g. Apache Trafodion [20], Apache Impala [15], and Apache HAWQ [12]), OLAP cubing frameworks (e.g. Apache Kylin [17]), frameworks suitable for iterative machine learning (e.g. Apache Spark [19] and Apache Flink [10]), and graph processing (e.g. GraphX). With emergence of Hadoop Distributed File System and its various implementations as preferred method of constructing a data lake, end-to-end data pipelines are increasingly being built on the Hadoop-based data lake platform.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700