Cloudera Data Lake Implementation with IBM CDC & Kafka

Quick Summary

A leading Indian payment solutions provider faced challenges extracting data from proprietary mainframe (z/OS) and Oracle applications into a Cloudera Data Lake. Using IBM CDC, data was replicated in real-time and fed through Kafka into Cloudera, where IBM BigIntegrate processed it for BI and predictive analytics. Governance Catalog enabled end-to-end data lineage. The solution delivered real-time visibility, automated data processing, streamlined reporting, and empowered analytics for faster insights.

About the Customer

One of the Leading payment solutions provider in India

Problem Statement

Client had undertaken an initiative to implement Cloudera based Data Lake and were facing huge challenges in the acquisition of data from their mainframe applications based on Z/OS as well as from dozens of satellite applications based on Oracle. Z/OS being proprietary in nature did not allow for easy access of data. Client needed a sophisticated solution to extract data in real-time from not only mainframes from also various applications and store them in Cloudera. Client wanted to leverage native Hadoop capabilities for business intelligence reporting as well as for Predictive Analytics

Solution

IBM CDC was used to extract changed data from mainframes as well as from Oracle RDBMS, these data were then fed to a Kafka cluster and then posted to Cloudera. IBM BigIntegrate jobs were used to process data on Cloudera to generate relevant output data for Business Intelligence and Analytics. As part of the engagement governance catalog was also configured to track technical assets and provide end-to-end lineage.

Business Benefits

–

Technology

IBM CDC (zOS and Oracle based applications), IBM BigIntegrate for Hadoop (DataStage, QualityStage, Governance Catalog) , Cloudera, Kafka, Oracle