Available for opportunities · Hyderabad, India

Praveen Bhaskar

Site Reliability Engineer · AWS EKS · Kubernetes · GitOps · Observability

I build production systems that don't break at 3 AM. 4+ years engineering reliability for multi-region AWS EKS platforms serving millions of users — eliminating OOMKills, slashing MTTR by 60%, and achieving 99.9%+ uptime across 150+ Helm deployments.

99.9%
Uptime SLO
60%
MTTR Reduction
150+
Helm Deployments
20+
Microservices Owned
What I work with

Technical
Arsenal

☁️
Cloud & Infrastructure
AWS EKS EC2 / VPC IAM / RBAC S3 / DynamoDB ALB / NLB CloudWatch RDS AWS SQS Multi-Region DR
⚙️
DevOps & Automation
Kubernetes Helm Docker Argo CD Harness CI/CD GitHub Actions GitLab CI Terraform Ansible Jenkins Python / Bash
📊
SRE & Observability
Datadog APM Prometheus Grafana ELK / OpenSearch SLI / SLO Design Error Budgets Alert Rationalization JVM Tuning Incident Response Postmortems
🔧
Microservices & Runtimes
Spring Boot 3.x Java 17+ Apache Tomcat Kafka Maven G1GC Tuning Thread Dump Analysis
🛡️
Networking & Security
VPC Design DNS Service Mesh Load Balancing AWS Secrets Manager External Secrets Op. RBAC
🗄️
Data & Messaging
DynamoDB MySQL MongoDB Kafka Amazon SQS S3 Event-Driven Arch.
Career

Work
Experience

Feb 2022
— Present
Tata Consultancy Services
SITE RELIABILITY ENGINEER · HYDERABAD, INDIA
  • Eliminated 100% of OOMKilled production crashes by diagnosing JVM heap dumps, recalibrating container memory limits, tuning G1GC parameters, and profiling thread contention across 20+ Spring Boot services — transforming recurring outage cycles into sustained, crash-free uptime.
  • Owned end-to-end on-call incident response for multi-region AWS EKS platforms (us-east-1 & us-east-2), conducting triage, root cause analysis, and post-incident remediation for customer-facing outages — reducing unplanned downtime by 30% with postmortem documentation adopted team-wide.
  • Defined SLIs, SLOs, and error budget policies for 20+ microservices covering latency (p99), availability, and error rate — directly influencing deployment cadence and reducing alert fatigue by 50% through systematic rationalization of noisy monitors.
  • Architected end-to-end observability stack using Datadog APM, Prometheus metrics, and structured log analysis — reducing MTTR by 60% and enabling proactive detection of degradations before customer impact across 8 deployment environments.
  • Engineered reusable CI/CD automation framework via Harness pipelines with RBAC enforcement and performance/reliability gate validation — preventing faulty deployments and cutting deployment failures by 50% across 20+ microservices.
  • Managed 150+ Helm deployments across multi-region EKS with rolling, blue-green, and canary strategies — enabling zero-downtime daily production releases supporting millions of client operations with full disaster recovery capability.
Key Work

Notable
Projects

01
JVM Production Stability Initiative
Diagnosed and permanently resolved OOMKilled restart loops across 20+ Spring Boot services by profiling heap dumps, right-sizing container memory limits, and tuning G1GC flags. RCA documentation adopted as org-wide knowledge base.
100% OOMKilled restarts eliminated
20+ Spring Boot services stabilized
∞→0 Memory-driven outages
02
SLO-Driven Observability Framework
Defined SLIs and SLOs for 20+ microservices, built Datadog APM dashboards with structured alert policies, and automated Harness pipeline reliability gates. Sustained 99.9%+ uptime across all production environments.
60% MTTR reduction
50% alert noise eliminated
99.9%+ uptime maintained
03
Multi-Region GitOps at Scale
Owned Helm chart lifecycle for 150+ deployments across us-east-1 and us-east-2 with full DR capability. Implemented rolling, blue-green, and canary strategies enabling zero-downtime daily releases supporting millions of client operations.
150+ Helm deployments managed
2 AWS regions, full DR
0 downtime on daily releases
Credentials

Certifications

☁️
AWS Solutions Architect Associate
Amazon Web Services
✓ Certified
🚀
AWS DevOps Engineer Professional
Amazon Web Services
⟳ In Progress
🏗️
HashiCorp Terraform Associate
HashiCorp
⟳ In Progress
Let's connect

Open to
Opportunities

Targeting DevOps / SRE roles at product companies & startups in Hyderabad. Available for interviews immediately.