Bazel Remote Execution and Caching Training
Master Bazel remote caching, Remote Build Execution (RBE), performance profiling, and scaling strategies in 3 days.
Scale Bazel builds to thousands of developers and millions of targets with remote caching and Remote Build Execution. This 3-day training covers cache architecture, RBE infrastructure with BuildFarm and BuildBarn, persistent workers, Build Event Protocol monitoring, performance profiling, and the query/cquery/aquery trifecta for deep build analysis.
Training Details
| Duration | 3 days (24 hours) |
| Level | Advanced |
| Delivery | In-person, Live online, Hybrid |
| Certification | N/A |
Who Is This For?
- Build infrastructure engineers operating Bazel at scale
- Platform engineers deploying remote execution clusters
- DevOps engineers optimizing CI/CD build performance
- SREs managing build system reliability and capacity
- Engineering managers evaluating Bazel infrastructure investment
Learning Outcomes
After completing this training, participants will be able to:
- Configure remote caching with disk, HTTP, gRPC, and cloud storage backends
- Deploy and operate Remote Build Execution infrastructure with BuildFarm or BuildBarn
- Configure persistent workers and multiplex workers for JVM and other runtimes
- Monitor builds using Build Event Protocol and Build Event Service
- Profile and optimize build performance using Bazel's built-in profiling tools
- Use bazel query, cquery, and aquery for deep dependency and action analysis
Detailed Agenda
Day 1: Remote Caching Architecture
Module 1: Remote Caching Fundamentals
- Cache key computation: action digests from inputs, command, and environment
- Hermeticity requirements: why non-hermetic actions poison the cache
- Content-addressable storage (CAS): action cache vs CAS store
- Cache hit workflow: action lookup, CAS download, output tree reconstruction
- --remote_upload_local_results and --remote_accept_cached flags
- Hands-on: Enable local disk caching with --disk_cache, observe cache hit rates across incremental builds
Module 2: HTTP and gRPC Cache Backends
- HTTP cache protocol: PUT/GET on action cache and CAS endpoints
- Nginx, Bazel Remote Cache (buchgr/bazel-remote), and Apache Traffic Server as HTTP caches
- gRPC remote cache: Remote Execution API (REAPI) cache-only mode
- Authentication: TLS, mTLS, --remote_header for bearer tokens
- Cache eviction strategies: LRU, TTL, size-based limits
- Hands-on: Deploy bazel-remote as an HTTP cache server, configure Bazel to use it, measure cache hit rates
Module 3: Cloud Storage Cache Backends
- Google Cloud Storage (GCS): --remote_cache=grpcs://remotebuildexecution.googleapis.com
- Amazon S3: S3-compatible backends with HTTP cache proxies
- Azure Blob Storage: configuration via HTTP cache middleware
- Cache partitioning: per-project, per-branch, per-platform namespaces
- Cache warming strategies: CI populates cache, developers consume
- Hands-on: Configure remote caching with a GCS or S3-compatible backend, set up cache partitioning by branch
Module 4: Cache Debugging and Correctness
- --execution_log_binary_file and --execution_log_json_file for action comparison
- Identifying non-hermetic actions: environment leaks, absolute paths, timestamps
- --experimental_remote_cache_eviction_retries for transient failures
- Cache poisoning detection and recovery: --remote_cache_header for invalidation
- Monitoring cache performance: hit rate, download/upload latency
- Hands-on: Diagnose cache misses using execution logs, identify a non-hermetic action, fix it for cache correctness
Day 2: Remote Build Execution
Module 5: RBE Architecture and REAPI
- Remote Execution API (REAPI): Execute, ActionCache, ContentAddressableStorage, Capabilities services
- Architecture: client (Bazel), scheduler, workers, CAS storage
- Execution flow: upload inputs to CAS, submit action, poll result, download outputs
- Platform properties: OSFamily, container-image, Pool, resource requirements
- --remote_executor, --remote_instance_name, --remote_default_exec_properties
- Hands-on: Explore REAPI with grpcurl, inspect action cache entries and CAS blobs
Module 6: BuildFarm and BuildBarn Deployment
- BuildFarm: Java-based REAPI server, architecture (server, worker, memory/shard instance)
- BuildBarn: Go-based REAPI server, bb-storage, bb-scheduler, bb-worker, bb-browser
- EngFlow and Buildbucket: managed RBE services comparison
- Container-based workers: Docker images with toolchains pre-installed
- Capacity planning: worker pools, queue depth, CAS storage sizing
- Hands-on: Deploy BuildFarm or BuildBarn locally with Docker Compose, connect Bazel and run remote builds
Module 7: Worker Strategies and Configuration
- Execution strategies: remote, local, sandboxed, worker, dynamic
- Dynamic execution: --experimental_dynamic_strategy racing local vs remote
- --jobs flag: parallelism for remote execution (higher than local CPU count)
- --remote_download_outputs: toplevel, minimal, all for output management
- Build without the Bytes (BwtB): --remote_download_minimal for build farms
- Hands-on: Configure dynamic execution to race local and remote strategies, tune --jobs for optimal throughput
Module 8: Persistent Workers and Multiplex Workers
- Persistent worker protocol: stdin/stdout JSON or protobuf communication
- Worker benefits: amortize JVM startup, maintain warm caches (javac, kotlinc, scalac)
- --worker_max_instances, --worker_sandboxing, --experimental_worker_multiplex
- Multiplex workers: multiple requests on a single worker process
- Writing custom persistent workers: WorkRequest/WorkResponse protocol
- Hands-on: Enable persistent workers for Java compilation, measure build time improvement, inspect worker processes
Day 3: Monitoring, Profiling, and Scaling
Module 9: Build Event Protocol (BEP)
- BEP: structured stream of build lifecycle events
- --build_event_binary_file and --build_event_json_file for local capture
- Build Event Service (BES): --bes_backend for streaming to a remote service
- BEP events: BuildStarted, ActionExecuted, TestResult, BuildFinished, TargetComplete
- BES servers: BuildBuddy, EngFlow, custom implementations
- Hands-on: Capture BEP output, parse events to extract test results and action timing data
Module 10: Performance Profiling and Optimization
- --profile flag: JSON trace event format, viewable in Chrome chrome://tracing
- Critical path analysis: identifying the longest sequential chain
- Action graph analysis: --execution_log_binary_file for action-level timing
- Memory profiling: --heap_dump_on_oom, --host_jvm_args for Bazel JVM tuning
- Skyframe profiling: understanding Bazel's internal evaluation framework
- --experimental_profile_include_target_label for per-target timing
- Hands-on: Profile a large build, identify the critical path, optimize by restructuring targets to increase parallelism
Module 11: Query, Cquery, and Aquery Mastery
- bazel query: loading-phase graph analysis, deps(), rdeps(), allpaths(), somepath()
- bazel cquery: post-configuration graph, resolving select() and transitions
- bazel aquery: action graph, inputs/outputs/command lines for each action
- Output formats: label, build, graph, proto, jsonproto, streamed_proto
- Advanced query patterns: finding unused deps, detecting circular dependencies, change impact analysis
- Hands-on: Use query to find all transitive deps, cquery to resolve platform-specific select(), aquery to inspect exact compiler commands
Module 12: Scaling Patterns and Capstone
- Scaling to 10K+ targets: package granularity, visibility restrictions, build graph structure
- Repository-level caching strategies: shared cache across CI and developers
- Bazel flags for large repos: --experimental_repository_cache, --experimental_guard_against_concurrent_changes
- Monitoring at scale: SLOs for build time, cache hit rate, test pass rate
- Organizational patterns: Build team, golden .bazelrc, developer experience
- Hands-on: Capstone project — set up a complete remote build infrastructure with HTTP cache, remote execution via BuildFarm, BEP monitoring, profile a 500+ target build, identify and fix the top three performance bottlenecks
Prerequisites
- Bazel Fundamentals and Toolchains/Platforms training or equivalent production experience
- Linux systems administration (Docker, networking, storage)
- Understanding of distributed systems concepts (caching, RPC, content-addressable storage)
- Familiarity with build performance analysis and profiling
Delivery Formats
| Format | Description |
|---|---|
| In-Person | On-site at your company's location, hands-on with direct interaction |
| Live Online | Interactive virtual sessions with screen sharing and real-time labs |
| Hybrid | Combination of on-site and remote sessions, flexible scheduling |
All formats include hands-on labs, course materials, and post-training support.
Prerequisites
- Bazel Fundamentals and Toolchains/Platforms training or equivalent
- Linux systems administration experience
Ready to get started?
Request a training quote for your team — in-person, live-online, or hybrid.