Challenge :
Banking customer was looking for solution to handle the huge volume of data & data processing issues they had in the on premises infrastructure.
Solution :
We implemented a snow flake cloud-based architectures. The key components of our solution included
Data Sources:
compressed files, CSV , AWS S3 data sources & local files
Log Collection:
Cloud watch used to collect the logs
Data Ingestion:
From S3, the data are accessed by the Glue Workflow & EMR . AWS Glue crawler to populate the AWS Glue Data Catalog with databases and tables.
Data Processing:
AWS Glue workflows used to create and visualize ETL activities involving multiple crawlers and jobs, with triggers to manage the execution and monitoring of all components.
Amazon EMR,is a cloud big data platform for processing large datasets using popular distributed frameworks such as Apache Spark, Hadoop, and others. It process and analyze vast amounts of data quickly and cost-effectively.
Visualization and Reporting:
BI tools such as Tableau and PowerBI were used to create dashboards and reports, leveraging AWS powerful querying capabilities for both real-time and historical data.
Outcome:
By leveraging a hybrid architecture, we ensured that real-time data was stored and processed on cloud without any challenges they had in on –premises.
Benefits:
- Real-time monitoring and incident response with on-premises storage.
- Efficient long-term storage and querying of historical data .
- Scalable solution capable of handling large volumes of security events.