Tag: HDFS

An introduction to the Hadoop Distributed File System

An introduction to the Hadoop Distributed File System

Introduction HDFS is an Apache Software Foundation project and a subproject of the Apache Hadoop project. Hadoop is ideal for storing large amounts of data, like terabytes and petabytes, and uses HDFS as its storage system. HDFS lets you connect nodes (commodity personal computers) contained within clusters over which data files are distributed. You can … Continue reading An introduction to the Hadoop Distributed File System

Advertisements
HDFS Operations

HDFS Operations

Starting HDFS Format the configured HDFS file system and then open the namenode (HDFS server) and execute the following command. $ hadoop namenode -format Start the distributed file system and follow the command listed below to start the namenode as well as the data nodes in cluster. $ start-dfs.sh Listing Files in HDFS Finding the … Continue reading HDFS Operations

10 expert tips to boost agility with Hadoop as a service

10 expert tips to boost agility with Hadoop as a service

Recently, a group of Apache Hadoop and Apache Spark subject matter experts from IBM Analytics hosted a public CrowdChat discussion about using cloud-based Hadoop and Spark services as a lever for business agility. Here is a top-ten list of hot topics and themes that emerged from that discussion. Despite years of effort centralizing information in … Continue reading 10 expert tips to boost agility with Hadoop as a service

Hadoop Delegation Tokens Explained

Hadoop Delegation Tokens Explained

Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the … Continue reading Hadoop Delegation Tokens Explained