Crossplane Operations & Day-2 Management Training
Master Crossplane production operations in 2 days. Hands-on training covering observability, provider management, Operations type, disaster recovery, and scaling to thousands of resources.
Take your Crossplane deployment from working to production-grade with this 2-day advanced operations training. Learn to monitor and troubleshoot at scale using Prometheus metrics and the beta trace CLI, manage provider families efficiently, implement the Operations type for declarative day-2 workflows, and tune performance for environments with thousands of managed resources.
Training Details
| Duration | 2 days (16 hours) |
| Level | Advanced |
| Delivery | In-person, Live online, Hybrid |
| Certification | N/A |
Who Is This For?
- SREs and platform operators running Crossplane in production
- DevOps engineers managing infrastructure at scale with Crossplane
- Teams scaling Crossplane beyond proof-of-concept to enterprise workloads
- Operations engineers responsible for Crossplane reliability and uptime
Learning Outcomes
After completing this training, participants will be able to:
- Design production architecture patterns for Crossplane at scale
- Monitor Crossplane with Prometheus metrics and custom dashboards
- Troubleshoot resource issues using beta trace and trace --watch
- Manage provider families and optimize resource consumption
- Implement the Operations type for upgrades, backups, and scheduled maintenance
- Tune provider concurrency, rate limiting, and function caching for performance
Detailed Agenda
Day 1: Production Observability & Provider Management
Module 1: Production Architecture Patterns
- High-availability Crossplane deployments
- Control plane sizing — CPU, memory, and etcd considerations
- Separation of concerns — dedicated vs shared control planes
- Multi-cluster Crossplane topologies and their trade-offs
- Hands-on: Deploy a production-grade Crossplane control plane with HA configuration
Module 2: Provider Families and Resource Optimization
- Provider families — monolithic vs family providers
- Choosing the right provider granularity for your use case
- Provider resource consumption — memory profiling and right-sizing
- Managing multiple provider versions across environments
- ProviderConfig best practices for credential rotation
- Hands-on: Migrate from monolithic providers to family providers and measure resource savings
Module 3: Monitoring with Prometheus Metrics
- Crossplane-native Prometheus metrics — reconcile duration, queue depth, errors
- Provider-specific metrics and custom metric endpoints
- Building Grafana dashboards for resource provisioning visibility
- Alert rules for failed reconciliations, stale resources, and provider errors
- SLO definition for infrastructure provisioning latency and success rate
- Hands-on: Deploy Prometheus and Grafana, build operational dashboards, and configure alerting rules
Module 4: Tracing and Debugging at Scale
- Beta trace CLI — tracing resource dependency chains
- trace --watch for real-time resource state monitoring
- Debugging stuck resources — finalizers, conditions, and events
- Drift detection and remediation strategies
- Identifying root causes in complex composition hierarchies
- Hands-on: Use beta trace to debug a multi-layer composition failure and resolve drift on managed resources
Day 2: Operations Type & Performance at Scale
Module 5: The Operations Type
- Operations resource — declarative day-2 task execution
- One-off operations — database migrations, certificate rotations, cache flushes
- Scheduled operations — automated backups, compliance scans, cleanup jobs
- Event-driven operations — triggered by resource state changes or webhooks
- Operation status tracking and failure handling
- Hands-on: Create Operations for automated database backup, provider upgrade, and scheduled resource cleanup
Module 6: Upgrade Strategies and Disaster Recovery
- Crossplane core upgrade paths — minor and major versions
- Provider upgrade strategies — canary, blue-green, rolling
- Composition revision management during upgrades
- Backup strategies for Crossplane state — etcd snapshots, Velero integration
- Disaster recovery runbooks — control plane restoration and resource re-adoption
- Hands-on: Perform a controlled Crossplane upgrade, simulate a control plane failure, and execute disaster recovery
Module 7: Performance Tuning
- Provider concurrency configuration — parallel reconciliation tuning
- API rate limiting — cloud provider quota management and backoff strategies
- Function response caching — TTL optimization for frequently composed resources
- etcd performance — compaction, defragmentation, and size management
- Resource watch optimization — label selectors and informer tuning
- Hands-on: Tune provider concurrency and rate limits, configure function caching, and benchmark reconciliation throughput
Module 8: Scaling to Thousands of Resources
- Scaling patterns — horizontal vs vertical control plane scaling
- Sharding strategies for large resource inventories
- Managing resource explosion in deeply nested Compositions
- Garbage collection and orphan resource cleanup
- Capacity planning — forecasting control plane growth
- Production readiness checklist and operational runbook creation
- Hands-on: Load test a Crossplane control plane with hundreds of resources, identify bottlenecks, and apply scaling optimizations
Prerequisites
- Crossplane experience in production environments
- Kubernetes administration skills (CKA-level recommended)
- Familiarity with Prometheus, Grafana, and Kubernetes monitoring
- Understanding of cloud provider APIs and rate limiting
Delivery Formats
| Format | Description |
|---|---|
| In-Person | On-site at your company's location, hands-on with direct interaction |
| Live Online | Interactive virtual sessions with screen sharing and real-time labs |
| Hybrid | Combination of on-site and remote sessions, flexible scheduling |
All formats include hands-on labs, course materials, and post-training support.
Ready to get started?
Request a training quote for your team — in-person, live-online, or hybrid.