Bharath Surampudi

Data Engineer building and operating high-throughput, regulated data platforms on AWS (streaming, lakehouse, governance, observability).
Sydney, Australia · Permanent Resident
bharathsurampudi@gmail.com | 0410 638 861 | View on LinkedIn | View on GitHub

Summary

Software Development Engineer II (Data) at Mastercard, focused on dependable data movement and governed analytics in regulated payment environments.

Systems I Have Built & Operated

Real-Time Fraud Detection Platform View on GitHub

Kinesis · Lambda (Python) · DynamoDB · S3 · Terraform

  • Event-driven streaming pipeline for transaction scoring with latency and reliability constraints.
  • Built for at-least-once delivery with application-layer idempotency and safe retries.
  • Cold-path to S3 for auditability and offline analysis.
Streaming Idempotency NoSQL Modeling

Medallion Lakehouse with dbt + Redshift View on GitHub

S3 · Glue (PySpark) · Redshift · dbt · Airflow · Terraform

  • Medallion-style ingestion and transformation layers: raw → curated → analytics-ready.
  • dbt tests and documentation to enforce contracts and reduce downstream defects.
  • Orchestrated and deployed with CI/CD practices appropriate for data workloads.
ELT Governance Data Quality

Enterprise Payment Data Ingestion & Routing

Apache NiFi · Java · Splunk · Regulated Data Flows

  • Secure ingestion and routing across global regions with operational rigor.
  • Observability based on provenance/log instrumentation to reduce MTTR.
  • Schema enforcement to protect downstream reporting and regulatory accuracy.
Reliability Observability Contracts

Architecture Case Studies

Case Study 1 — Event-Driven Fraud Platform (Streaming + Feature Store) View on GitHub
Stack: AWS Kinesis, Lambda (Python), DynamoDB, S3, Terraform Design Focus: delivery semantics, idempotency, hot partitions, auditability
  • Problem: Score high-velocity transactions with reliability constraints under at-least-once delivery.
  • Key decision: Apply exactly-once semantics at the application layer using idempotency keys and safe retries.
  • Data modeling: DynamoDB patterns designed to avoid hot partitions; atomic counters for velocity rules; history tracking for location signals.
  • Operational posture: Structured logging/metrics and failure modes designed for fast diagnosis and replay.
Case Study 2 — Medallion Lakehouse with dbt + Redshift (Governed Analytics) View on GitHub
Stack: S3, Glue (PySpark), Redshift, dbt, Airflow, Terraform Design Focus: data contracts, testability, reproducibility, CI/CD for data
  • Problem: Convert raw JSON events into analytics-ready dimensions/facts with controlled quality.
  • Key decision: Separate ingestion (Bronze), standardization (Silver), and business modeling (Gold) for clarity and governance.
  • Quality gates: dbt tests (schema, not-null, accepted values) and documentation as first-class artifacts.
  • Infra discipline: Terraform for reproducible environments; orchestration to support backfills safely.

How I Approach Data Engineering

Professional Experience

Software Development Engineer II (Data Engineering & Payments)
Mastercard, Sydney Nov 2021 – Present
  • High-volume ingestion & routing: Built and operated Apache NiFi + Java data pipelines moving sensitive payment data across regions under strict SLAs.
  • Observability & incident reduction: Instrumented provenance-driven monitoring in Splunk to detect silent failures and reduce MTTR for throughput incidents.
  • Schema enforcement: Implemented consumer-driven contract testing (Spring Cloud Contract) to prevent breaking schema changes across producer/consumer services.
  • Legacy modernization: Partnered with architects to decouple monolithic ETL into modular, cloud-aligned patterns improving maintainability and resilience.
Software Engineer
Neau Collective, Sydney Mar 2021 – Nov 2021
  • Automated ingestion: Python automation to extract and consolidate Shopify + marketing data, cutting manual reporting effort and improving freshness.
  • Analytics-ready datasets: Unified Sales/Marketing/Accounting data into consistent datasets for BI and reporting.

Capabilities

Streaming & Event Processing Kafka, Kinesis, delivery semantics, idempotency patterns, retries, replay strategies.
Batch & Analytical Processing Spark / PySpark, Glue, SQL (window functions, optimization), Airflow orchestration.
Data Modeling & Governance Dimensional modeling, schema evolution, data contracts, validation, dbt tests & docs.
Infrastructure & Reliability Terraform, CI/CD (GitHub Actions), Docker, observability in Splunk, production incident response.
AWS Platform S3, Glue, Redshift, Lambda, DynamoDB, IAM fundamentals for governed data systems.
Programming Python (boto3, data pipelines), Java (Spring Boot), strong engineering hygiene and testing.

Certifications & Credentials

Education

Master of Information Technology
University of New South Wales (UNSW), Sydney Feb 2019 – May 2021
Dual Specialisation: Artificial Intelligence & Database Systems
Bachelor of Technology in Computer Science
Vellore Institute of Technology, India Jun 2014 – May 2018