Category: Apache Spark

Apache Spark vs Hadoop

Apache Spark vs Hadoop

Apache Spark vs Hadoop Spark and Hadoop are both the frameworks that provide essential tools that are much needed for performing the needs of Big Data related tasks. Of late, Spark has become preferred framework; however, if you are at a crossroad to decide which framework to choose in between the both, it is essential … Continue reading Apache Spark vs Hadoop

Hadoop MapReduce vs. Apache Spark

Hadoop MapReduce vs. Apache Spark

The term big data has created a lot of hype already in the business world. Hadoop and Spark are both big data frameworks; they provide some of the most popular tools used to carry out common big data-related tasks. In this article, we will cover the differences between Spark and Hadoop MapReduce. Introduction Spark: It … Continue reading Hadoop MapReduce vs. Apache Spark

Comprehensive Introduction to Apache Spark, RDDs & Dataframes (using PySpark)

Comprehensive Introduction to Apache Spark, RDDs & Dataframes (using PySpark)

Introduction Industry estimates that we are creating more than 2.5 Quintillion bytes of data every year. Think of it for a moment – 1 Qunitillion = 1 Million Billion! Can you imagine how many drives / CDs / Blue-ray DVDs would be required to store them? It is difficult to imagine this scale of data … Continue reading Comprehensive Introduction to Apache Spark, RDDs & Dataframes (using PySpark)

The Hadoop Module & High-level Architecture

The Hadoop Module & High-level Architecture

The Apache Hadoop Module: Hadoop Common: this includes the common utilities that support the other Hadoop modules HDFS: the Hadoop Distributed File System provides unrestricted, high-speed access to the application data. Hadoop YARN: this technology accomplishes scheduling of job and efficient management of the cluster resource. MapReduce: highly efficient methodology for parallel processing of huge … Continue reading The Hadoop Module & High-level Architecture

Apache Hive vs Spark SQL: Feature wise comparison

Apache Hive vs Spark SQL: Feature wise comparison

1. Objective While Apache Hive and Spark SQL perform the same action, retrieving data, each does the task in a different way. However, Hive is planned as an interface or convenience for querying data stored in HDFS. Though, MySQL is planned for online operations requiring many reads and writes. So we will discuss Apache Hive … Continue reading Apache Hive vs Spark SQL: Feature wise comparison