Tag: apache hadoop

Real-time Big Data Pipeline with Hadoop, Spark & Kafka

Real-time Big Data Pipeline with Hadoop, Spark & Kafka

Defined by 3Vs that are velocity, volume, and variety of the data, big data sits in the separate row from the regular data. Though big data was the buzzword since last few years for data analysis, the new fuss about big data analytics is to build up real-time big data pipeline. In a single sentence, … Continue reading Real-time Big Data Pipeline with Hadoop, Spark & Kafka

Advertisements
Apache Hadoop and its Components

Apache Hadoop and its Components

Apache Hadoop is an open-source software framework written in Java. It is primarily used for storage and processing of large sets of data, better known as big data. It comprises of several components that allow the storage and processing of large data volumes in a clustered environment. However, the two main components are Hadoop Distributed … Continue reading Apache Hadoop and its Components

Taking the hard work out of Apache Hadoop

Taking the hard work out of Apache Hadoop

Why did IBM decide to create its own Hadoop and Spark distribution, and why does it need a reference architecture? The ability to collect, manage and analyze big data is one of the key tenets of the IBM cognitive business strategy, as well as being central to the Internet of Things. We see a lot … Continue reading Taking the hard work out of Apache Hadoop

10 expert tips to boost agility with Hadoop as a service

10 expert tips to boost agility with Hadoop as a service

Recently, a group of Apache Hadoop and Apache Spark subject matter experts from IBM Analytics hosted a public CrowdChat discussion about using cloud-based Hadoop and Spark services as a lever for business agility. Here is a top-ten list of hot topics and themes that emerged from that discussion. Despite years of effort centralizing information in … Continue reading 10 expert tips to boost agility with Hadoop as a service

Hadoop Delegation Tokens Explained

Hadoop Delegation Tokens Explained

Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the … Continue reading Hadoop Delegation Tokens Explained

Why big data?

Everywhere you go today, people are looking down at a mobile device. They are online, browsing, collaborating, shopping for goods and services and transacting business. And not just consumers are using them. Mobile devices are also being used extensively in business-to-business transactions. Customers and prospects have become empowered in the online world. Social networks and … Continue reading Why big data?