Platform Engineer · SRE · DevOps

Joel Kambella Mutua

More than 8+ years architecting resilient infrastructure and distributed systems across AWS, Kubernetes and hybrid cloud environments.

Building scalable cloud platforms and integrating AI-driven automation from agent-based troubleshooting to MLOps workflows.

☁️ Platform Engineering ⎈ Kubestronaut 🤖 AI/MLOps 📍 Nairobi, Kenya
☁️
🔧
🤖
🔒
📊
JKM

About Me

I am a Cloud and Platform Engineer with 8+ years of experience architecting, deploying and operating resilient infrastructure and distributed systems across public cloud and hybrid environments.

Deep expertise in cloud architecture (AWS), container orchestration (Kubernetes/EKS/ECS/Fargate), DevOps automation, infrastructure-as-code, CI/CD pipelines, observability and production reliability.

Proven ability to deliver scalable, secure and cost-efficient platform solutions for mission-critical systems, including finance and large-scale services. Notable contributor to M-PESA - one of Africa's largest fintech platforms.

Currently expanding into AI/MLOps building and deploying GenAI and agentic AI applications within Kubernetes-based cloud environments.

8+ Years Experience
200+ Services Managed
5 K8s Certifications
4th Kubestronaut in Kenya

Skills & Technologies

☁️

Cloud Platforms

AWS EKSECSFargate ECRBatchAzure

Containers & Orchestration

KubernetesDockerOpenShift HelmService Mesh
🔧

IaC & DevOps / GitOps

TerraformAnsibleArgoCD JenkinsGitLabGit
📊

Observability

PrometheusGrafanaOpenTelemetry ELKDynatraceSplunk
💻

Languages & Scripting

PythonBashJava ShellAutomation
🤖

AI & MLOps

LangChainOllamaLLM APIs Agentic AIStreamlitGenAI
🖥️

Systems & Servers

LinuxNGINXApache TomcatHAProxy
🔒

Security & Governance

IAMNetwork PoliciesCompliance CKSSecurity Audits

Professional Experience

Cloud Support Engineer II — Containers

Amazon Web Services

Supported over 600+ AWS customers across compute, networking and Kubernetes (EKS), resolving complex production issues in distributed cloud environments. Specialized in diagnosing system failures, optimizing cloud-native workloads and improving reliability for containerized applications at scale. Provided deep technical troubleshooting and architectural guidance across core AWS services, helping customers design resilient and scalable systems.

  • Resolved critical production incidents across Kubernetes and distributed systems
  • Optimized workload performance through architecture and configuration improvements
  • Advised on best practices for EKS, networking and high-availability design
  • Contributed to high-impact engagements, including migration of a banking platform to AWS EKS and review of EKS clusters supporting national health services
Impact Improved system reliability, reduced downtime and enabled scalable cloud adoption for a diverse customer base.

DevOps Lead / SRE / Solutions Engineer (M-PESA)

Safaricom PLC

Designed, implemented and supported large-scale telecommunications and infrastructure systems, ensuring high availability and reliability across critical services. Worked across both cloud and on-premise environments to deploy and configure systems, automate workflows and improve operational efficiency using DevOps practices. Contributed to building and maintaining scalable infrastructure by combining system engineering with automation and continuous delivery approaches.

  • Implemented and configured infrastructure systems to support production workloads
  • Applied DevOps practices to automate deployments and improve system consistency
  • Diagnosed and resolved system and network-related issues in high-availability environments
  • Enhanced monitoring and operational visibility across critical services
Impact Improved system reliability and deployment efficiency while contributing to scalable, production-ready infrastructure in a high-demand environment.

Support Engineer

Cellulant Kenya

Supported infrastructure systems powering payment and transaction platforms, ensuring system stability and performance in production environments. Focused on monitoring, troubleshooting and maintaining services critical to transaction processing and system availability. Built foundational expertise in system operations, incident management and infrastructure support within a high-throughput environment.

  • Monitored and maintained system performance for transaction-based platforms
  • Resolved infrastructure and application-level issues impacting availability
  • Supported operational improvements and system reliability initiatives
Impact Contributed to stable and reliable payment systems while developing strong production support and troubleshooting skills.

Key Projects

API · Security

STK Push API

Enabled secure, PIN-based integrations with 3rd party apps including Google Store, powering M-PESA's developer ecosystem.

APISecurityIntegration
Cloud · Migration

Cloud Migration

Led successful migration of legacy financial systems to Kubernetes and Docker, improving scalability and reducing operational overhead.

KubernetesDockerMigration
AI · Agentic

K8s AI Health Agent

Built an autonomous AI agent using LangChain and Ollama that assesses Kubernetes cluster health via MCP tools — running entirely locally.

LangChainOllamaMCPK8s
Automation · Observability

System Automation

Developed automation scripts for proactive monitoring and alerting, significantly reducing MTTR across M-PESA infrastructure.

PythonPrometheusGrafana
Payments · Innovation

M-PESA 1Tap

Delivered card-based mobile payments integration, enabling seamless payment experiences across the M-PESA ecosystem.

PaymentsIntegrationFintech

Certifications

Solutions Architect Professional

SAP-C02

Solutions Architect Associate

SAA-C03

Kubestronaut

4th in Kenya · CKAD · CKA · CKS · KCNA · KCSA

Azure Fundamentals

AZ-900

Platform Engineering

CNPA · In Progress

Get in Touch

I'm always open to discussing cloud engineering, platform engineering and AI-driven infrastructure opportunities.

Feel free to reach out for collaboration, consulting or full-time roles.