Category: Hadoop

Comparing Hadoop, MapReduce, Spark, Flink, and Storm

Comparing Hadoop, MapReduce, Spark, Flink, and Storm

Companies that need to work with large sets of data have a range of big data, open-source frameworks and solutions from which to choose. Each solution has a different set of advantages, disadvantages and ideal applications. If you're new to Big Data, you may have heard some of these terms. Below we provide a brief … Continue reading Comparing Hadoop, MapReduce, Spark, Flink, and Storm

Advertisements
Real-time Big Data Pipeline with Hadoop, Spark & Kafka

Real-time Big Data Pipeline with Hadoop, Spark & Kafka

Defined by 3Vs that are velocity, volume, and variety of the data, big data sits in the separate row from the regular data. Though big data was the buzzword since last few years for data analysis, the new fuss about big data analytics is to build up real-time big data pipeline. In a single sentence, … Continue reading Real-time Big Data Pipeline with Hadoop, Spark & Kafka

How to decide between RDBMS and HADOOP?

How to decide between RDBMS and HADOOP?

What are the differences between traditional or RDBMS and Hadoop database systems? Both traditional relational (RDBMS) and Hadoop database systems have similar functionalities in terms of collection, storage, processing, recovery, extraction and data manipulation. However, they use radically different approaches in terms of data processing, and the problems they are trying to solve. RDBMS systems … Continue reading How to decide between RDBMS and HADOOP?

How to build with IBM and MongoDB Enterprise Document Store

How to build with IBM and MongoDB Enterprise Document Store

Why did we start on this path? It all starts with our customers’ hybrid data management strategy. The need to embrace the proliferation of data that is creating new opportunities for businesses to better understand their customers, their industry and their own operations. What do I mean by “proliferation?” Well, recent studies have suggested that … Continue reading How to build with IBM and MongoDB Enterprise Document Store

Hadoop – HDFS Overview

Hadoop – HDFS Overview

Hadoop File System was developed using distributed file system design. It is run on commodity hardware. Unlike other distributed systems, HDFS is highly faulttolerant and designed using low-cost hardware. HDFS holds very large amount of data and provides easier access. To store such huge data, the files are stored across multiple machines. These files are … Continue reading Hadoop – HDFS Overview