Hadoop divides the job into tasks. There are two types of tasks: Map tasks (Splits & Mapping) Reduce tasks (Shuffling, Reducing) as mentioned above. The complete execution process (execution of Map and Reduce tasks, both) is controlled by two types of entities called a Jobtracker: Acts like a master (responsible for complete execution of submitted … Continue reading How MapReduce Organizes Work?
Month: February 2019
Apache Pig The Apache Pig is a platform for managing large sets of data which consists of high-level programming to analyze the data. Pig also consists of the infrastructure to evaluate the programs. The advantages of Pig programming is that it can easily handle parallel processes for managing very large amounts of data. The programming … Continue reading Introduction to Pig, Sqoop, and Hive
The Apache Hadoop YARN stands for Yet Another Resource Negotiator. It is a very efficient technology to manage the Hadoop cluster. YARN is a part of Hadoop 2 version under the aegis of the Apache Software Foundation. YARN is a completely new way of processing data and is now rightly at the centre of The … Continue reading What is Hadoop Yarn?