Siddharth Pandit · Data/ML Engineer

About

I’m a data/ML engineer focused on reliable ETL, scalable analytics platforms, and explainable ML systems. I enjoy turning messy, multi‑source data into decision‑ready products — from streaming ingestion and orchestration to model deployment and BI.

Open to roles across Data Engineering, Analytics Engineering, and ML Engineering.

Experience

Backend Developer Intern (Full‑time) — Inficloud · Virginia, USA (Jan 2024 – Jan 2025)

Built & maintained AWS‑based ETL pipelines processing 500GB+ monthly with robust data validation.
Supported Tableau/Power BI dashboards for faster trend identification and accurate reporting.
RESTful API integrations for live streams; documented endpoints and validated outputs.
Automations & production‑grade Python scripts, saving 200+ engineer hours annually.

Programmer — Aryagami Cloud Services · Hyderabad, India (Jul 2021 – Jul 2022)

Co‑designed/operated backend for a SaaS analytics platform; optimized MySQL for reliability and speed.
Containerized services with Docker and simplified CI/CD.
Improved API latency by ~25% (400ms → 300ms) and added structured logging/monitoring.

Backend Developer Intern — Aryagami Cloud Services · Hyderabad, India (Jun 2020 – Jul 2021)

Supported RESTful APIs contributing to 30% faster data retrieval for analytics dashboards.
Automated ETL transformations; reduced data prep time by 20%.
Deployed backend services on AWS/Docker; improved scalability and uptime.

Projects

Credit Card Fraud Detection

End‑to‑end fraud system with PySpark + AWS Glue; 92% recall, 90% F1. SHAP explainability, Airflow ETL, Power BI KPIs.

PySparkDatabricksAWS GlueAirflowPower BI

Collision Analysis — Manhattan

Analyzed 20M+ NYC collisions; peak risk Thu–Fri, 14:00–16:00. Tableau dashboards, predictive hotspots.

PythonPandasTableauGeo

Data Platform Starter

Cookie‑cutter stack for ingestion → warehousing → BI. Airflow/DBT on Postgres + Metabase. IaC scaffolding included.

AirflowdbtPostgresMetabaseDocker

Air Quality Dashboard

Kafka → Spark Structured Streaming → S3 + Athena; CDC and schema evolution handled.

KafkaSparkS3Athena

Skills

Data & ML

Python, PySpark, Pandas, scikit‑learn, SHAP, XGBoost, SQL, dbt

Platforms & Infra

AWS (Glue, Athena, S3, Lambda, ECS), Databricks, Docker, Airflow

Analytics

Power BI, Tableau, Metabase; metrics design, KPI dashboards

Ops

CI/CD, testing, observability, documentation

Resume

Prefer a quick download? Open the PDF.

Contact

Want to discuss a role or project? Email is best. Phone intentionally omitted.

Email: siddharthpandit@zohomail.in
GitHub: siddharthp1997

GitHub