// Senior Data Engineer & AI Engineer · 5 YOE

Onkar
Antad

Building large-scale data platforms on GCP & Azure Databricks. Architecting event-driven pipelines, Lakehouse systems, and GenAI / Agentic AI applications using LangChain & LangGraph.

5
Years Experience
5
Cloud Certifications
1M+
Records / Day
99.9%
Pipeline Uptime

System Designs

GCP Data Platform Architecture Event-driven ETL/ELT pipeline on Google Cloud Platform with Cloud Run, Eventarc, Workflows, and BigQuery Source APIs REST / GraphQL GCS Bucket Raw files Pub/Sub Event stream Eventarc Triggers Cloud Workflows Orchestration Cloud Build CI/CD + Terraform Cloud Logging Monitoring / Alerts Cloud Run ETL / ELT jobs Dataflow Streaming ETL Dataproc Spark batch jobs BigQuery Partitioned + Clustered 25% cost reduction Looker / BI Dashboards SonarQube Code quality Python ETL Reusable framework Terraform IaC Env setup: 2-3d → 3h ✦ 99.9% uptime · 1M+ records/day
Lakehouse Architecture on Azure Databricks Bronze-Silver-Gold Medallion architecture on Azure Databricks with PySpark, ADF, and Delta Lake Azure Data Factory (ADF) BRONZE Raw ingestion Raw files (ADLS) Schema validation Delta Lake write Autoloader (CDC) SILVER Cleaned & conformed PySpark transforms SCD Type 2 logic Data quality rules GOLD Business-ready Star schema models Fact + Dim tables Aggregations / KPIs BigQuery export Analytics BI / Reporting Data Science 50% less manual intervention · Spark perf optimized
Kafka CDC Pipeline Architecture Real-time CDC pipeline using Kafka reducing data latency from 4-6 hours to under 15 minutes SQL Server Source DB Debezium CDC connector Kafka Topics INSERT / UPDATE / DELETE Spark Structured Streaming Delta Lake ACID transactions BigQuery Analytics layer BEFORE 4–6 hours latency batch processing AFTER <15 min latency real-time CDC pipeline
GenAI and Agentic AI Stack LangChain and LangGraph agentic AI architecture with vector databases and LLM integration User Query LangGraph Agent Orchestrator LangChain BigQuery Tool GCS Loader Tool Python Executor Qdrant Vector DB (RAG) LLM GPT-4 / Claude Prompt engineering Memory Store Conversation context Response Agentic AI · RAG Pipelines · Multi-tool Orchestration · LangGraph State Machines · Prompt Engineering

Where I've Built

OCT 2024 — PRESENT · PUNE, INDIA
Accenture
Senior Data Engineer · Fortune 500 Automotive Client
  • Architecting GCP data platforms covering pipeline design, BigQuery modelling, CI/CD and monitoring for multiple business teams.
  • Built event-driven ETL/ELT pipelines using Cloud Run, Workflows, Eventarc and BigQuery processing over 1M records daily at 99.9% uptime.
  • Tuned BigQuery tables with partitioning and clustering, reducing compute costs by ~25% and query times significantly.
  • Created a reusable Python ETL framework + GitHub standards, cutting new pipeline build time by nearly half.
  • Automated environment setup using Terraform IaC: 2–3 day setup → under 3 hours.
  • Introduced SonarQube, FOSSA and Pytest into the dev workflow for early issue detection and code quality improvement.
MAY 2021 — OCT 2024 · BANGALORE, INDIA
Sapiens
Data Engineer · Insurance Technology Platform
  • Built the full Bronze-Silver-Gold Lakehouse on Azure Databricks using PySpark and ADF, reducing manual pipeline intervention by 50%.
  • Engineered Kafka-based CDC pipelines bringing data latency from 4–6 hours → under 15 minutes.
  • Spark performance tuning via partition optimisation, caching and join strategies — improving job performance while reducing cluster costs.
  • Designed SCD Type 2 data models with Star Schema (Fact and Dimension tables) for historical tracking and clean analytics reporting.
  • Migrated SQL Server logic to BigQuery as part of a cloud modernisation effort.
  • Built GraphQL and REST API microservices on Spring Boot with Redis caching, improving API response times by ~40%.

Technical Stack

Cloud & Data Platforms
GCP BigQuery Dataproc Dataflow Cloud Run Eventarc GCS Azure Databricks ADF
Data Engineering
Apache Spark PySpark Delta Lake Apache Iceberg Kafka CDC Apache Airflow DBT Medallion Arch SCD Type 2
🤖
AI & GenAI
LangChain LangGraph Agentic AI LLMs Qdrant Vector DBs Prompt Engineering RAG Pipelines
🛠
Programming & DevOps
Python SQL Java Terraform CI/CD GitHub Actions Docker SonarQube Pytest
Optimization & Architecture
BQ Partitioning BQ Clustering Spark Tuning Lakehouse Dimensional Modeling IaC
🔌
APIs & Integration
REST API GraphQL Spring Boot Redis Trino Talend SQL Server

5 Cloud Certifications

Professional Cloud Architect
Google Cloud · Feb 2026 – Feb 2028
Professional Data Engineer
Google Cloud · Feb 2026 – Feb 2028
Associate Cloud Engineer
Google Cloud · Apr 2025 – Apr 2028
Data Engineer Professional
Databricks · Feb 2026 – Feb 2028
Data Engineer Associate
Databricks · Nov 2025 – Nov 2027

Academic Background

SEP 2020 – APR 2021
PG-DAC in Advanced Computing
CDAC ACTS, Pune
82%
MAR 2016 – MAR 2020
B.E. Mechanical Engineering
Walchand Institute of Technology, Solapur
74%

Let's Connect