Category: Spark

Spark SQL Features

Spark SQL Features

1. Objective There are many features Like Unified Data Access, High Compatibility and many more. We will focus on each feature in detail. But, before learning features of Spark SQL, we will also study brief introduction to Spark SQL. 2. Introduction to Spark SQL In Apache Spark, Spark SQL is a module for working with … Continue reading Spark SQL Features

Advertisements
Spark Streaming

Spark Streaming

1. Objective Through this Spark Streaming Blog, you will learn basics of Apache Spark Streaming, what is the need of streaming in Apache Spark, Streaming in Spark architecture, how streaming works in Spark. You will also understand what are the Spark streaming sources and various Streaming Operations in Spark, Advantages of Apache Spark Streaming over … Continue reading Spark Streaming

Apache Hive vs Spark SQL: Feature wise comparison

Apache Hive vs Spark SQL: Feature wise comparison

1. Objective While Apache Hive and Spark SQL perform the same action, retrieving data, each does the task in a different way. However, Hive is planned as an interface or convenience for querying data stored in HDFS. Though, MySQL is planned for online operations requiring many reads and writes. So we will discuss Apache Hive … Continue reading Apache Hive vs Spark SQL: Feature wise comparison

Comparing Hadoop, MapReduce, Spark, Flink, and Storm

Comparing Hadoop, MapReduce, Spark, Flink, and Storm

Companies that need to work with large sets of data have a range of big data, open-source frameworks and solutions from which to choose. Each solution has a different set of advantages, disadvantages and ideal applications. If you're new to Big Data, you may have heard some of these terms. Below we provide a brief … Continue reading Comparing Hadoop, MapReduce, Spark, Flink, and Storm

Real-time Big Data Pipeline with Hadoop, Spark & Kafka

Real-time Big Data Pipeline with Hadoop, Spark & Kafka

Defined by 3Vs that are velocity, volume, and variety of the data, big data sits in the separate row from the regular data. Though big data was the buzzword since last few years for data analysis, the new fuss about big data analytics is to build up real-time big data pipeline. In a single sentence, … Continue reading Real-time Big Data Pipeline with Hadoop, Spark & Kafka