As a Data Engineer, I specialize in designing and optimizing scalable ETL/ELT pipelines, leveraging Azure Data Factory, AWS Glue, and PySpark. I build cloud-based data warehouses and lakes using Snowflake, BigQuery, and Redshift, enabling high-performance analytics. I also create real-time data streaming solutions with Kafka, Spark Streaming, and Flink for low-latency processing, while optimizing data operations through Elasticsearch, OpenSearch, and Apache Airflow.!
I specialize in designing and optimizing scalable ETL/ELT pipelines to streamline data processing and transformation. Using tools like Apache Spark, PySpark, Azure Data Factory, and AWS Glue, I ensure efficient data movement, integration, and processing across diverse platforms.
Leveraging Azure, AWS, and GCP, I build secure, high-performance cloud-based data solutions. From setting up data lakes, warehouses (Snowflake, BigQuery, Redshift) to deploying serverless data processing with Databricks, I help businesses scale their data infrastructure.
I implement real-time data pipelines using Kafka, Spark Streaming, and Flink to enable low-latency analytics and event-driven architectures. Whether it’s log processing, fraud detection, or real-time dashboards, I ensure businesses get instant insights from their data.
I optimize data workflows, indexing, and query performance using tools like Elasticsearch, OpenSearch, and Apache Airflow. With a focus on automation, CI/CD, and monitoring, I enhance data reliability, reduce costs, and improve processing speeds
Built scalable ETL workflows at Cognizant Technology Solutions using Azure Data Factory, PySpark, and Databricks, processing 1B+ records daily. Automated data movement from Azure Data Lake to SQL Database and Amazon Redshift, enhancing processing efficiency by 40% and reducing manual interventions.
Developed a Hadoop-based data lake for LTImindtree, leveraging AWS Athena for federated querying and seamless data integration. Optimized large-scale storage and retrieval across AWS S3 and GCP Cloud Storage, reducing data processing time and operational costs.
Implemented a real-time streaming pipeline at Sigma Data Solutions using Kafka, Flink, and Spark Streaming. Integrated with AWS S3, GCP BigQuery, and Azure Synapse Analytics, enabling low-latency event processing and improving data processing speed by 60%, making real-time analytics more efficient.
Designed a scalable cloud-based data warehouse using Hive, Spark, and Snowflake for LTImindtree. This system processed 2B+ records daily, optimized Hadoop and Google Cloud Storage-based batch processing, and reduced query time by 50%, significantly improving data retrieval speed and accuracy.
At Cognizant Technology Solutions, optimized Azure Cognitive Search and OpenSearch indexing for 100+ indices, improving query performance and search efficiency. This enhancement reduced query latency by 30% and enabled faster customer reporting and data retrieval, improving overall data accessibility.
Results-driven Data Engineer with 3+ years of experience designing and optimizing large-scale data pipelines, ETL/ELT processes, Big Data technologies, and data warehousing solutions. Proficient in Azure, AWS, GCP, PySpark, SQL, Databricks, and the ELK Stack, with expertise in processing 50+ TB of data daily. Adept at real-time data integration, streaming analytics, and enhancing cloud-based performance. Proven ability to optimize cloud-based solutions, reduce query latency, automate workflows, and leverage AI-driven analytics to drive efficiency and cost-effectiveness!
Download Resume