Defined by 3Vs that are velocity, volume, and variety of the data, big data sits in the separate row from the regular data. Though big data was the buzzword since last few years for data analysis, the new fuss about big data analytics is to build up real-time big data pipeline. In a single sentence, … Continue reading Real-time Big Data Pipeline with Hadoop, Spark & Kafka
Tag: apache hadoop
Apache Hadoop is an open-source software framework written in Java. It is primarily used for storage and processing of large sets of data, better known as big data. It comprises of several components that allow the storage and processing of large data volumes in a clustered environment. However, the two main components are Hadoop Distributed … Continue reading Apache Hadoop and its Components
Apache Hadoop was born to enhance the usage and solve major issues of big data. The web media was generating loads of information on a daily basis, and it was becoming very difficult to manage the data of around one billion pages of content. In order of revolutionary, Google invented a new methodology of processing … Continue reading Introduction to Hadoop
Why did IBM decide to create its own Hadoop and Spark distribution, and why does it need a reference architecture? The ability to collect, manage and analyze big data is one of the key tenets of the IBM cognitive business strategy, as well as being central to the Internet of Things. We see a lot … Continue reading Taking the hard work out of Apache Hadoop
Recently, a group of Apache Hadoop and Apache Spark subject matter experts from IBM Analytics hosted a public CrowdChat discussion about using cloud-based Hadoop and Spark services as a lever for business agility. Here is a top-ten list of hot topics and themes that emerged from that discussion. Despite years of effort centralizing information in … Continue reading 10 expert tips to boost agility with Hadoop as a service
Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the … Continue reading Hadoop Delegation Tokens Explained
Everywhere you go today, people are looking down at a mobile device. They are online, browsing, collaborating, shopping for goods and services and transacting business. And not just consumers are using them. Mobile devices are also being used extensively in business-to-business transactions. Customers and prospects have become empowered in the online world. Social networks and … Continue reading Why big data?