Tag: apache hadoop

Apache Hadoop and its Components

Apache Hadoop and its Components

Apache Hadoop is an open-source software framework written in Java. It is primarily used for storage and processing of large sets of data, better known as big data. It comprises of several components that allow the storage and processing of large data volumes in a clustered environment. However, the two main components are Hadoop Distributed … Continue reading Apache Hadoop and its Components

Advertisements
Introduction to Hadoop

Introduction to Hadoop

Apache Hadoop was born to enhance the usage and solve major issues of big data. The web media was generating loads of information on a daily basis, and it was becoming very difficult to manage the data of around one billion pages of content. In order of revolutionary, Google invented a new methodology of processing … Continue reading Introduction to Hadoop

Taking the hard work out of Apache Hadoop

Taking the hard work out of Apache Hadoop

Why did IBM decide to create its own Hadoop and Spark distribution, and why does it need a reference architecture? The ability to collect, manage and analyze big data is one of the key tenets of the IBM cognitive business strategy, as well as being central to the Internet of Things. We see a lot … Continue reading Taking the hard work out of Apache Hadoop

10 expert tips to boost agility with Hadoop as a service

10 expert tips to boost agility with Hadoop as a service

Recently, a group of Apache Hadoop and Apache Spark subject matter experts from IBM Analytics hosted a public CrowdChat discussion about using cloud-based Hadoop and Spark services as a lever for business agility. Here is a top-ten list of hot topics and themes that emerged from that discussion. Despite years of effort centralizing information in … Continue reading 10 expert tips to boost agility with Hadoop as a service

Hadoop Delegation Tokens Explained

Hadoop Delegation Tokens Explained

Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the … Continue reading Hadoop Delegation Tokens Explained

Why big data?

Everywhere you go today, people are looking down at a mobile device. They are online, browsing, collaborating, shopping for goods and services and transacting business. And not just consumers are using them. Mobile devices are also being used extensively in business-to-business transactions. Customers and prospects have become empowered in the online world. Social networks and … Continue reading Why big data?

The 3 Cs of big data

We’re all familiar with big data’s varying number of Vs: volume, variety, velocity and veracity. However, taking into consideration the purpose for which insight can be derived from big data is highly important and likely more useful for engineering information systems. This purpose is often characterized by using data to inform enhanced decision making, and … Continue reading The 3 Cs of big data