Category: Spark

Spark: Programming with RDDs

Spark: Programming with RDDs

A RDD known as Resilient Distributed Dataset in Spark is simply an immutable distributed huge collection of objects sets. Each RDD is split into multiple partitions (a smaller units), which may be computed on different aspects of nodes of the cluster. RDDs can contain any type of languages such as Python, Java, or Scala objects, … Continue reading Spark: Programming with RDDs

Advertisements
Hadoop vs. Spark: The New Age of Big Data

Hadoop vs. Spark: The New Age of Big Data

As data science has matured over the past few years, so has the need for a different approach to data and its “bigness.” There are business applications where Hadoop outperforms the newcomer Spark, but Spark has its place in the big data space because of its speed and its ease of use. This analysis examines … Continue reading Hadoop vs. Spark: The New Age of Big Data