Category: Apache Spark

Spark: Programming with RDDs

Spark: Programming with RDDs

A RDD known as Resilient Distributed Dataset in Spark is simply an immutable distributed huge collection of objects sets. Each RDD is split into multiple partitions (a smaller units), which may be computed on different aspects of nodes of the cluster. RDDs can contain any type of languages such as Python, Java, or Scala objects, … Continue reading Spark: Programming with RDDs

Advertisements

Apache Spark Architecture

In order to understand the way Spark runs, it is very important to know the architecture of Spark. Following diagram and discussion will give you a clearer view into it. There are three ways Apache Spark can run : Standalone – The Hadoop cluster can be equipped with all the resources statically and Spark can … Continue reading Apache Spark Architecture