Bharath Surampudi

Data Engineer building and operating high-throughput, regulated data platforms on AWS (streaming, lakehouse, governance, observability).

Sydney, Australia · Permanent Resident

bharathsurampudi@gmail.com | 0410 638 861 |

|

Download Resume (PDF) View Case Studies

Summary

Software Development Engineer II (Data) at Mastercard, focused on dependable data movement and governed analytics in regulated payment environments.

Operate enterprise ingestion and routing pipelines with strict SLAs, strong observability, and incident-response discipline.
Design event-driven and lakehouse patterns on AWS to support reporting, auditing, and downstream analytics.
Bias toward reliability: explicit data contracts, schema enforcement, idempotency, and debuggability-first design.

Systems I Have Built & Operated

Real-Time Fraud Detection Platform

Kinesis · Lambda (Python) · DynamoDB · S3 · Terraform

Event-driven streaming pipeline for transaction scoring with latency and reliability constraints.
Built for at-least-once delivery with application-layer idempotency and safe retries.
Cold-path to S3 for auditability and offline analysis.

Streaming Idempotency NoSQL Modeling

Medallion Lakehouse with dbt + Redshift

S3 · Glue (PySpark) · Redshift · dbt · Airflow · Terraform

Medallion-style ingestion and transformation layers: raw → curated → analytics-ready.
dbt tests and documentation to enforce contracts and reduce downstream defects.
Orchestrated and deployed with CI/CD practices appropriate for data workloads.

ELT Governance Data Quality

Enterprise Payment Data Ingestion & Routing

Apache NiFi · Java · Splunk · Regulated Data Flows

Secure ingestion and routing across global regions with operational rigor.
Observability based on provenance/log instrumentation to reduce MTTR.
Schema enforcement to protect downstream reporting and regulatory accuracy.

Reliability Observability Contracts

Architecture Case Studies

Case Study 1 — Event-Driven Fraud Platform (Streaming + Feature Store)

Stack: AWS Kinesis, Lambda (Python), DynamoDB, S3, Terraform Design Focus: delivery semantics, idempotency, hot partitions, auditability

Problem: Score high-velocity transactions with reliability constraints under at-least-once delivery.
Key decision: Apply exactly-once semantics at the application layer using idempotency keys and safe retries.
Data modeling: DynamoDB patterns designed to avoid hot partitions; atomic counters for velocity rules; history tracking for location signals.
Operational posture: Structured logging/metrics and failure modes designed for fast diagnosis and replay.

Case Study 2 — Medallion Lakehouse with dbt + Redshift (Governed Analytics)

Stack: S3, Glue (PySpark), Redshift, dbt, Airflow, Terraform Design Focus: data contracts, testability, reproducibility, CI/CD for data

Problem: Convert raw JSON events into analytics-ready dimensions/facts with controlled quality.
Key decision: Separate ingestion (Bronze), standardization (Silver), and business modeling (Gold) for clarity and governance.
Quality gates: dbt tests (schema, not-null, accepted values) and documentation as first-class artifacts.
Infra discipline: Terraform for reproducible environments; orchestration to support backfills safely.

How I Approach Data Engineering

Design for failure before scale: retries, idempotency, backpressure, replay paths.
Prefer explicit contracts over implicit assumptions: schemas, validation, and compatibility.
Optimize for debuggability: logs, metrics, lineage signals, and clear ownership boundaries.
Separate ingestion, transformation, and serving concerns to keep systems evolvable.

Professional Experience

Software Development Engineer II (Data Engineering & Payments)

Mastercard, Sydney Nov 2021 – Present

High-volume ingestion & routing: Built and operated Apache NiFi + Java data pipelines moving sensitive payment data across regions under strict SLAs.
Observability & incident reduction: Instrumented provenance-driven monitoring in Splunk to detect silent failures and reduce MTTR for throughput incidents.
Schema enforcement: Implemented consumer-driven contract testing (Spring Cloud Contract) to prevent breaking schema changes across producer/consumer services.
Legacy modernization: Partnered with architects to decouple monolithic ETL into modular, cloud-aligned patterns improving maintainability and resilience.

Software Engineer

Neau Collective, Sydney Mar 2021 – Nov 2021

Automated ingestion: Python automation to extract and consolidate Shopify + marketing data, cutting manual reporting effort and improving freshness.
Analytics-ready datasets: Unified Sales/Marketing/Accounting data into consistent datasets for BI and reporting.

Capabilities

Streaming & Event Processing Kafka, Kinesis, delivery semantics, idempotency patterns, retries, replay strategies.

Batch & Analytical Processing Spark / PySpark, Glue, SQL (window functions, optimization), Airflow orchestration.

Data Modeling & Governance Dimensional modeling, schema evolution, data contracts, validation, dbt tests & docs.

Infrastructure & Reliability Terraform, CI/CD (GitHub Actions), Docker, observability in Splunk, production incident response.

AWS Platform S3, Glue, Redshift, Lambda, DynamoDB, IAM fundamentals for governed data systems.

Programming Python (boto3, data pipelines), Java (Spring Boot), strong engineering hygiene and testing.

Certifications & Credentials

AWS Certified Data Engineer – Associate (DEA-C01) link
DeepLearning.AI Data Engineering Professional Certificate link
Apache Airflow Fundamentals link
dbt Fundamentals link

Education

Master of Information Technology

University of New South Wales (UNSW), Sydney Feb 2019 – May 2021

Dual Specialisation: Artificial Intelligence & Database Systems

Bachelor of Technology in Computer Science

Vellore Institute of Technology, India Jun 2014 – May 2018