I am a Principal Data Engineer with over 9+ years of experience designing, building, and optimizing data platforms, data warehouses, and ETL pipelines. My expertise spans working with technologies such as Apache Kafka, Apache NiFi, Spark, Airflow, ELK Stack, Docker, Kubernetes, and cloud platforms including AWS, GCP, and Azure. I specialize in leading large-scale data integration and warehousing projects for global organizations, focusing on creating scalable and reliable data solutions.
Throughout my career, I have developed Python-based automation frameworks and real-time streaming pipelines that have significantly improved data processing efficiency and supported informed decision-making. I have a proven track record of leading and mentoring cross-functional teams to deliver actionable insights and drive data-driven strategies.
In my current role as Principal Data Engineer at MicroDev Solutions, I lead a team of five data engineers in designing and maintaining enterprise-scale data pipelines and infrastructure across AWS and Azure. I have engineered containerized, Docker- and Kubernetes-based data pipelines using Kafka and Python microservices, reducing processing time by 40%. I also designed scalable data architectures supporting real-time analytics across multiple business units.
Previously, as a Senior Data Engineer at TechnoGenics SMC, I designed and optimized large-scale ETL/ELT pipelines using Spark and Airflow, migrated legacy systems to Snowflake and AWS S3, and created Power BI dashboards to provide actionable business insights. I have experience integrating processed datasets into Amazon Redshift and implementing data validation and governance frameworks.
Earlier in my career, I worked as a Software Engineer at Ebryx Pvt. Ltd., where I modernized legacy ETL pipelines and data warehouse workflows, achieving a 30% improvement in processing speed. I implemented solutions using Python, Kafka, Elasticsearch, FluentD, and GCP, supporting large-scale daily data ingestion. I also contributed to migrating on-prem Oracle systems to AWS Redshift, improving scalability and reporting performance.
I hold an MS in Data Science for Business from the University of Stirling and a BS in Computer Science from FAST, National University. I was awarded a Gold Medal for exceptional academic performance. I am passionate about building scalable data architectures, automating data workflows, and enabling data-driven decision-making to empower organizations.
Lead a high-performing team of 5 data engineers in designing and maintaining enterprise-scale data pipelines and infrastructure across AWS and Azure. Engineered containerized, Docker- and Kubernetes-based data pipelines using Kafka, Python microservices, and REST APIs, reducing processing time by 40%. Designed scalable, high-performance data architecture supporting real-time analytics across 10+ business units. Built multi-source streaming platforms with Apache NiFi and Apache Airflow to support data lakes and warehouses. Optimized SQL and Bash scripting for automated data ingestion, transformation, and workflow orchestration. Managed Git-based CI/CD pipelines for version control and automated deployment of data pipelines. Standardized logging and audit frameworks for ETL processes, enabling faster troubleshooting and root-cause analysis. Collaborated with architecture leadership to define governance, lineage, and access frameworks using Unity Catalog and schema registry.
Designed and optimized large-scale ETL/ELT pipelines using Spark and Airflow to process billions of retail and consumer data records. Migrated legacy Teradata and Oracle systems to Snowflake and AWS S3, cutting costs by 40% and improving performance. Integrated processed datasets into Amazon Redshift to support real-time business intelligence and reporting needs. Created Power BI dashboards to track key KPIs and provide actionable insights for business stakeholders. Implemented data validation, monitoring, and lineage tracking to ensure data consistency and governance compliance. Used Git for version control and deployed pipelines via automated CI/CD workflows to ensure reliability and maintainability. Collaborated with cross-functional teams to improve data accessibility, reliability, and overall data engineering best practices.
Modernized legacy ETL pipelines and data warehouse workflows, achieving a 30% improvement in processing speed. Implemented Python, Kafka, Elasticsearch, FluentD, and GCP (GKE) solutions, supporting 5+ TB daily data ingestion. Assisted in migration from on-prem Oracle systems to AWS Redshift improving scalability and reporting performance. Designed and implemented Star Schema data models for internal reporting, improving query performance and simplifying business analytics. Created ELK-based logging pipelines to capture operational metrics and generate audit-ready reports for internal teams. Created lightweight data lineage maps to help stakeholders trace data flow from source to report.
Jobicy
592 professionals pay to access exclusive and experimental features on Jobicy
Free
USD $0/month
For people just getting started
Plus
USD $8/month
Everything in Free, and: