Skip to content

TQPro Kubernetes Containerization - Executive Summary

Date: 2024-11-23 Status: Analysis Complete - Ready for Implementation


Quick Reference

Document Purpose Audience
KUBERNETES_DEPLOYMENT_PLAN.md Complete deployment strategy and architecture DevOps, Tech Leads, Architects
HAZELCAST_KUBERNETES_MIGRATION.md Step-by-step Hazelcast migration guide Developers, DevOps
REQUIREMENTS_SPECIFICATION.md (tqamds) Functional requirements for Amadeus module Product, QA, Business

Executive Decision Summary

✅ Recommendation: Proceed with Kubernetes Deployment

Feasibility: GOOD - 4/5 stars
Effort: 6-8 weeks
Risk: Medium (manageable)
Cost: $400-900/month (cloud provider dependent)


Key Findings

Application Strengths ✅

  1. Stateless Design - No server-side sessions, perfect for K8s
  2. Embedded Server - Jetty 12, no external app server needed
  3. RESTful API - Standard JAX-RS, easy to containerize
  4. OAuth2 Auth - Identity in HTTP headers, pod-independent

Critical Requirements ⚠️

  1. Hazelcast Must Be Fixed - Hardcoded IP incompatible with K8s
  2. User Sessions Require Distributed Cache - No database fallback
  3. Health Checks Must Be Added - Required for K8s probes
  4. Secrets Must Be Externalized - 20+ hardcoded credentials

Hazelcast Decision: Keep and Fix

Question: Should we replace Hazelcast with Caffeine or Redis?

Answer: NO - Keep Hazelcast and configure for Kubernetes

Why Hazelcast is Required

Deep code analysis revealed: - User sessions stored in cache ONLY (no DB fallback) - Anonymous shopping carts memory-only - Multi-pod deployment requires distributed cache

Impact if Not Fixed: - ❌ Users logged out when hitting different pods - ❌ Shopping carts lost - ❌ Cannot scale horizontally

Alternatives Considered

Solution Multi-Pod Effort Cost Verdict
Caffeine ❌ NO 2 days $0 ❌ Breaks clustering
Hazelcast (fixed) ✅ YES 3-5 days $0 RECOMMENDED
Redis ✅ YES 2 weeks $50-100/mo ✅ Viable alternative

Decision: Fix Hazelcast (lowest effort, no cost, backward compatible)


Implementation Plan Overview

Phase 1: Code Changes (Week 1-2)

  • ✅ Add health check endpoints
  • ✅ Fix Hazelcast Kubernetes discovery
  • ✅ Externalize database configuration
  • ✅ Add graceful shutdown
  • ✅ Configure TTL/eviction policies

Phase 2: Configuration (Week 3)

  • ✅ Create ConfigMaps
  • ✅ Migrate secrets to K8s Secrets
  • ✅ Update environment variable injection

Phase 3: K8s Deployment (Week 4)

  • ✅ Deploy to dev cluster
  • ✅ Test multi-pod clustering
  • ✅ Validate cache consistency

Phase 4: Observability (Week 5)

  • ✅ Setup logging (EFK/Loki)
  • ✅ Setup monitoring (Prometheus/Grafana)
  • ✅ Configure alerts

Phase 5: Hardening (Week 6)

  • ✅ Security scanning
  • ✅ Performance tuning
  • ✅ Load testing
  • ✅ DR planning

Phase 6: Production (Week 7-8)

  • ✅ Staging deployment
  • ✅ UAT
  • ✅ Production rollout (blue-green)

Resource Requirements

Compute (3-pod cluster)

  • API Pods: 3 × (2 CPU, 4GB RAM) = 6 CPU, 12GB RAM
  • Web Pods: 2 × (0.5 CPU, 512MB) = 1 CPU, 1GB RAM
  • Total: ~7-8 CPU cores, ~13-15GB RAM

Storage

  • Documents PVC: 50GB (ReadWriteMany - EFS/Azure Files)
  • Database: External PostgreSQL (managed service)

Monthly Cost Estimate

  • AWS EKS: ~$606/month
  • Azure AKS: ~$982/month
  • GCP GKE: ~$821/month
  • Optimized: $400-500/month

Critical Path Items

Must Complete Before Multi-Pod

  1. Hazelcast Kubernetes Discovery (3-5 days)
  2. Upgrade to 5.3.6
  3. Add K8s plugin
  4. Configure service discovery
  5. Add TTL/eviction policies

  6. Health Check Endpoints (4 hours)

  7. /health/live - liveness probe
  8. /health/ready - readiness probe (DB + cache)
  9. /health/cache - Hazelcast cluster health

  10. RBAC for Hazelcast (30 minutes)

  11. ServiceAccount
  12. Role (endpoints, pods, services read)
  13. RoleBinding

  14. Headless Service (15 minutes)

  15. ClusterIP: None
  16. Port: 5701
  17. publishNotReadyAddresses: true

Testing Strategy

Unit Tests

  • Health API endpoints
  • Database env var configuration
  • Hazelcast cluster formation

Integration Tests

  • Container builds successfully
  • Multi-pod Hazelcast clustering
  • Cache consistency across pods

Load Tests

  • 100 concurrent users
  • Cache hit/miss ratios
  • Response times < 500ms (p95)

Chaos Tests

  • Pod kills (random)
  • Node drains
  • Network partitions

Success Criteria

Functional

  • ✅ 3-pod cluster forms Hazelcast cluster
  • ✅ User sessions shared across pods
  • ✅ Shopping carts accessible from any pod
  • ✅ Health checks respond correctly
  • ✅ Bare-metal deployment still works

Performance

  • ✅ Response time p95 < 1 second
  • ✅ Cache operations < 100ms
  • ✅ Cluster formation < 30 seconds
  • ✅ Error rate < 0.1%

Operational

  • ✅ Graceful shutdown < 30 seconds
  • ✅ No cache-related errors under load
  • ✅ Monitoring dashboards functional
  • ✅ Alerts working

Risks & Mitigation

Risk Probability Impact Mitigation
Hazelcast clustering fails Medium High Thorough testing; Redis fallback plan
User session loss Medium Critical Add session persistence to DB (future)
Memory leaks Low High TTL + eviction policies configured
Network latency Low Medium Keep pods in same AZ

Rollback Plan

Immediate (< 5 minutes)

kubectl rollout undo deployment/tqpro-api
# OR
kubectl scale deployment tqpro-api --replicas=1

Code Rollback

git revert <commit>
./gradlew clean build
docker build -t registry/tqpro/api:rollback .
kubectl set image deployment/tqpro-api api=registry/tqpro/api:rollback

Next Steps

Immediate Actions Required

  1. ☐ Review and approve deployment plan
  2. ☐ Provision development K8s cluster
  3. ☐ Assign development team (2-3 engineers)
  4. ☐ Allocate budget ($400-900/month cloud costs)

Week 1 Deliverables

  1. ☐ Hazelcast code changes completed
  2. ☐ Health check endpoints added
  3. ☐ Local testing passed
  4. ☐ Code review completed

Decision Point

  • Approve - Begin Phase 1 immediately
  • Defer - Revisit in 6 months
  • Pilot - Dev/staging only

Contact & Support

Technical Lead: [Name]
DevOps Lead: [Name]
Project Manager: [Name]

Documentation: - Full deployment plan: KUBERNETES_DEPLOYMENT_PLAN.md - Hazelcast migration: HAZELCAST_KUBERNETES_MIGRATION.md - tqamds requirements: tqamds/REQUIREMENTS_SPECIFICATION.md


Prepared By: DevOps & Platform Team
Date: 2024-11-23
Version: 1.0