Tag: Apache HBase

Taking the hard work out of Apache Hadoop

Taking the hard work out of Apache Hadoop

Why did IBM decide to create its own Hadoop and Spark distribution, and why does it need a reference architecture? The ability to collect, manage and analyze big data is one of the key tenets of the IBM cognitive business strategy, as well as being central to the Internet of Things. We see a lot … Continue reading Taking the hard work out of Apache Hadoop

Advertisements
Performance comparison of different file formats and storage engines in the Apache Hadoop ecosystem

Performance comparison of different file formats and storage engines in the Apache Hadoop ecosystem

TOPIC This post presents a performance comparison of few popular data formats and storage engines available in the Apache Hadoop ecosystem: Apache Avro, Apache Parquet, Apache HBase and Apache Kudu on the field of space efficiency, ingestion performance, analytic scans and random data lookup. This should help in understanding how (and when) each of them … Continue reading Performance comparison of different file formats and storage engines in the Apache Hadoop ecosystem