Tag: Amazon AWS

Amazon EMR introduces EMR runtime for Apache Spark

Amazon EMR introduces EMR runtime for Apache Spark

Amazon EMR is happy to announce Amazon EMR runtime for Apache Spark, a performance-optimized runtime environment for Apache Spark that is active by default on Amazon EMR clusters. EMR runtime for Spark is up to 32 times faster than EMR 5.16, with 100% API compatibility with open-source Spark. This means that your workloads run faster, … Continue reading Amazon EMR introduces EMR runtime for Apache Spark

Introducing S3Guard: S3 Consistency for Apache Hadoop

Introducing S3Guard: S3 Consistency for Apache Hadoop

Synopsis This article introduces a new Apache Hadoop feature called S3Guard. S3Guard addresses one of the major challenges with running Hadoop on Amazon’s Simple Storage Service (S3), eventual consistency. We outline the problem of S3’s eventual consistency, how it affects Hadoop workloads, and explain how S3Guard works. Problem Although Apache Hadoop has support for using … Continue reading Introducing S3Guard: S3 Consistency for Apache Hadoop