Top 10 Tips for Scaling Hadoop

Top 10 Tips for Scaling Hadoop

Data locality is about making sure a big data set is stored near the compute that performs the analytics. For Hadoop, that means managing DataNodes that provide storage for MapReduce to perform adequately. It works effectively, but leads to the separate operational issue of islands of big data storage. Here are some tips on how … Continue reading Top 10 Tips for Scaling Hadoop

Advertisements

HDFS: Read & Write Commands using Java API

Hadoop comes with a distributed file system called HDFS (HADOOP Distributed File Systems) HADOOP based applications make use of HDFS. HDFS is designed for storing very large data files, running on clusters of commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. Note: When data exceeds the capacity of storage on a single … Continue reading HDFS: Read & Write Commands using Java API

Hadoop Multi Node Clusters

Hadoop Multi Node Clusters

Installing Java Syntax of java version command $ java -version Following output is presented. java version "1.7.0_71" Java(TM) SE Runtime Environment (build 1.7.0_71-b13) Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode) Creating User Account System user account on both master and slave systems should be created to use the Hadoop installation. # useradd hadoop # … Continue reading Hadoop Multi Node Clusters

Hadoop: Features, Components, Cluster & Topology

Hadoop: Features, Components, Cluster & Topology

Apache HADOOP is a framework used to develop data processing applications which are executed in a distributed computing environment. Components of Hadoop Features Of 'Hadoop' Network Topology In Hadoop Similar to data residing in a local file system of personal computer system, in Hadoop, data resides in a distributed file system which is called as … Continue reading Hadoop: Features, Components, Cluster & Topology

What is MapReduce? How it Works

What is MapReduce? How it Works

MapReduce is a programming model suitable for processing of huge data. Hadoop is capable of running MapReduce programs written in various languages: Java, Ruby, Python, and C++. MapReduce programs are parallel in nature, thus are very useful for performing large-scale data analysis using multiple machines in the cluster. MapReduce programs work in two phases: 1. … Continue reading What is MapReduce? How it Works