Common errors and troubleshooting
Comprehensive error diagnosis and resolution guide for self-hosted runners
This guide provides systematic approaches to diagnosing and resolving common issues across all supported runner platforms. Use this reference for rapid troubleshooting and escalation procedures.
Troubleshooting overview#
Systematic debugging approach#
When encountering runner issues, follow this structured approach:
- Identify the scope: Single runner, multiple runners, or platform-wide issue?
- Check timing: When did the issue start? Was there a recent change?
- Gather logs: Collect relevant logs from runner, platform, and DevOps Hub
- Reproduce the issue: Can you consistently trigger the problem?
- Apply targeted fixes: Start with the most likely causes first
- Document resolution: Record the solution for future reference
Common error patterns#
Most runner issues fall into these categories:
- Connection failures: Network, DNS, or service unavailability
- Authentication errors: Expired tokens, invalid credentials, or permission issues
- Resource exhaustion: CPU, memory, disk space, or network bandwidth limitations
- Configuration problems: Incorrect settings, missing dependencies, or environment issues
- Platform-specific bugs: Known issues with specific runner versions or platforms
Platform-specific errors#
GitHub Actions errors#
Runner offline/connection issues#
Symptoms:
- Runner shows as "Offline" in GitHub UI
- Workflows queue indefinitely without starting
- Connection timeout errors in runner logs
Common causes:
1# Check runner service status2systemctl status actions.runner.*34# Verify network connectivity5curl -I https://api.github.com6telnet api.github.com 443Solutions:
1# Restart runner service2sudo ./svc.sh stop3sudo ./svc.sh start45# Re-register runner if token expired6./config.sh remove --token <removal-token>7./config.sh --url https://github.com/owner/repo --token <new-token>8sudo ./svc.sh install9sudo ./svc.sh startWorkflow permission denied errors#
Error patterns:
1Error: Resource not accessible by integration2Error: Permission denied to github-actions[bot]Solutions:
- Check repository Settings → Actions → General → Workflow permissions
- Verify token has required scopes for API operations
- Ensure GITHUB_TOKEN has sufficient permissions in workflow
1permissions:2 contents: read3 pull-requests: write4 checks: writeSelf-hosted runner registration failures#
Error patterns:
1A runner exists with the same name2Could not connect to the server3Invalid runner tokenResolution steps:
1# Remove existing runner2./config.sh remove --token <removal-token>34# Clean runner directory5rm -rf _diag/ _work/67# Re-register with new token8./config.sh --url https://github.com/owner/repo --token <new-registration-token>Cache access and artifact upload problems#
Symptoms:
- Artifact upload failures
- Cache restore/save errors
- Permission denied on cache operations
Debugging:
1# Check available disk space2df -h3du -sh ./_work/45# Verify cache directory permissions6ls -la ~/.cache/Solutions:
1# Clear cache directory2rm -rf ~/.cache/actions-runner/34# Fix permissions5sudo chown -R runneruser:runneruser ~/.cache/Azure DevOps errors#
Agent pool connection failures#
Symptoms:
- Agent shows as "Offline" in Azure DevOps
- Connection refused errors
- Authentication timeouts
Diagnostic commands:
1# Check agent service2systemctl status vsts.agent.*34# Test connectivity5curl -v https://dev.azure.com/organization67# Verify agent configuration8cat .agentSolutions:
1# Reconfigure agent2./config.sh remove3./config.sh --url https://dev.azure.com/organization --auth pat --token <new-pat>45# Restart service6sudo systemctl restart vsts.agent.*Service connection authentication errors#
Error patterns:
1TF400813: The user 'Build\...' is not authorized to access this resource2VS403403: The current user does not have permission to perform this actionResolution:
- Verify service principal has required Azure permissions
- Check service connection configuration in project settings
- Ensure agent pool has permission to access service connections
Pipeline variable access denied#
Symptoms:
- Variables undefined in pipeline runs
- Secret variables not being decrypted
- Environment-specific variable access issues
Solutions:
1# Explicitly define variable groups2variables:3 - group: 'Production Variables'4 - name: 'Environment'5 value: 'production'Task execution timeout issues#
Common timeouts:
- Default task timeout: 60 minutes
- Job timeout: 360 minutes (6 hours)
- Pipeline timeout: 60 minutes
Configuration:
1jobs:2- job: LongRunningJob3 timeoutInMinutes: 120 # 2 hours4 steps:5 - task: PowerShell@26 timeoutInMinutes: 307 inputs:8 script: 'long-running-script.ps1'GitLab Runner errors#
Runner registration token expired#
Error patterns:
1ERROR: Verifying runner... failed2ERROR: Failed to register runner3HTTP 403 ForbiddenResolution:
1# Unregister old runner2gitlab-runner unregister --name <runner-name>34# Register with new token5gitlab-runner register \6 --url https://gitlab.com/ \7 --registration-token <new-token> \8 --name <runner-name>Docker executor permission denied#
Symptoms:
- Permission denied errors in Docker containers
- Volume mount failures
- Docker socket access issues
Solutions:
1# Add gitlab-runner to docker group2sudo usermod -aG docker gitlab-runner34# Fix Docker socket permissions5sudo chmod 666 /var/run/docker.sock67# Restart GitLab Runner8sudo gitlab-runner restartConfiguration fix:
1[[runners]]2 [runners.docker]3 privileged = true4 volumes = ["/var/run/docker.sock:/var/run/docker.sock"]Kubernetes executor pod failures#
Common issues:
- ImagePullBackOff errors
- Resource quota exceeded
- Node selector constraints
Debugging:
1# Check pod status2kubectl get pods -n gitlab-runner34# Describe failed pods5kubectl describe pod <pod-name> -n gitlab-runner67# Check resource quotas8kubectl describe quota -n gitlab-runnerConfiguration solutions:
1[[runners]]2 [runners.kubernetes]3 image = "ubuntu:20.04"4 privileged = true5 [runners.kubernetes.node_selector]6 "node-type" = "runner-node"7 [runners.kubernetes.resources]8 [runners.kubernetes.resources.limits]9 memory = "2Gi"10 cpu = "1000m"Concurrent job limit exceeded#
Error message:
1ERROR: Job failed (system failure): concurrent job limit exceededSolutions:
- Increase concurrent job limit in runner configuration
- Deploy additional runners
- Optimize job duration to increase throughput
1concurrent = 10 # Increase from default of 123[[runners]]4 limit = 5 # Jobs per runnerJenkins errors#
Agent SSH connection refused#
Symptoms:
- "Connection refused" errors in Jenkins logs
- Agent appears offline
- SSH handshake failures
Debugging:
1# Test SSH connection manually2ssh -v jenkins@<agent-ip>34# Check SSH service5systemctl status ssh67# Verify SSH key authentication8ssh-keygen -t rsa -b 4096 -f ~/.ssh/jenkins_keySolutions:
- Ensure SSH service is running on agent
- Verify Jenkins master can reach agent IP/port
- Check SSH key permissions and authentication
1# Fix SSH key permissions2chmod 600 ~/.ssh/jenkins_key3chmod 700 ~/.ssh/45# Test SSH connection with key6ssh -i ~/.ssh/jenkins_key jenkins@<agent-ip>JNLP agent connection failures#
Error patterns:
1java.nio.channels.ClosedChannelException2Failed to connect to http://jenkins:8080/Resolution steps:
- Verify Jenkins URL is accessible from agent
- Check JNLP port configuration (default 50000)
- Ensure firewall allows connection
1# Test JNLP port connectivity2telnet <jenkins-master> 5000034# Download and run JNLP agent5wget http://jenkins:8080/jnlpJars/agent.jar6java -jar agent.jar -jnlpUrl http://jenkins:8080/computer/<node-name>/jenkins-agent.jnlp -secret <secret>Workspace permission issues#
Symptoms:
- Permission denied errors during builds
- Unable to create/delete files in workspace
- Git checkout failures
Solutions:
1# Fix workspace permissions2sudo chown -R jenkins:jenkins /var/lib/jenkins/workspace/3sudo chmod -R 755 /var/lib/jenkins/workspace/45# Clean workspace6rm -rf /var/lib/jenkins/workspace/<job-name>/*Plugin compatibility problems#
Common issues:
- Plugin version conflicts
- Incompatible Jenkins core version
- Missing plugin dependencies
Resolution:
- Check plugin compatibility matrix
- Update plugins in correct dependency order
- Roll back problematic plugins
1# Check plugin status via CLI2java -jar jenkins-cli.jar -s http://jenkins:8080/ list-plugins34# Install specific plugin version5java -jar jenkins-cli.jar -s http://jenkins:8080/ install-plugin <plugin-name>@<version>Bazel Remote Build Execution errors#
gRPC connection failures#
Error patterns:
1UNAVAILABLE: io exception2DEADLINE_EXCEEDED: context deadline exceeded3Failed to connect to remote execution serviceDebugging:
1# Test gRPC connectivity2grpc_cli call <remote-executor-address> \3 google.devtools.remoteworkers.v1test2.Bots/CreateBotSession45# Check TLS certificate6openssl s_client -connect <remote-executor>:443 -servername <hostname>Solutions:
- Verify remote executor endpoint is reachable
- Check authentication credentials
- Ensure TLS certificates are valid
1# Bazel configuration with authentication2bazel build --remote_executor=grpcs://executor.example.com:443 \3 --google_credentials=/path/to/credentials.json \4 --remote_timeout=60Remote cache authentication errors#
Error patterns:
1PERMISSION_DENIED: Request not authenticated2UNAUTHENTICATED: Invalid credentialsSolutions:
1# Authenticate with service account2gcloud auth activate-service-account --key-file=credentials.json34# Configure Bazel with authentication5bazel build --remote_cache=grpcs://cache.example.com:443 \6 --google_credentials=/path/to/credentials.jsonWorker registration timeouts#
Symptoms:
- Workers fail to register with scheduler
- Build requests timeout waiting for workers
- Intermittent worker availability
Configuration fixes:
1# worker.conf2worker_properties = {3 "pool": "default",4 "os": "linux",5 "arch": "x86_64"6}78# Increase timeouts9registration_timeout = "30s"10keepalive_timeout = "10s"Build execution environment issues#
Common problems:
- Missing build tools in worker environment
- Incorrect PATH configuration
- Tool version mismatches
Solutions:
- Verify worker environment matches local build requirements
- Use container-based workers for consistency
- Pin tool versions in BUILD files
1# BUILD file with explicit tool versions2cc_binary(3 name = "my_binary",4 srcs = ["main.cc"],5 toolchains = ["@bazel_tools//tools/cpp:toolchain"],6)Network connectivity issues#
Connection failures#
Diagnostic commands:
1# Test basic connectivity2ping -c 4 console.assistance.bg3traceroute console.assistance.bg45# Check DNS resolution6nslookup console.assistance.bg7dig +trace assistance.bg89# Test HTTPS connectivity10curl -v -I https://console.assistance.bg11openssl s_client -connect console.assistance.bg:443Common solutions:
- Check firewall rules and security groups
- Verify DNS configuration
- Test with different networks (mobile hotspot)
- Check proxy configuration
Timeouts#
Connection timeout patterns:
1# Increase connection timeouts2curl --connect-timeout 30 --max-time 300 https://api.assistance.bg34# Test with increased verbosity5curl -v --trace-time https://api.assistance.bg/healthHTTP client configuration:
1# GitHub Actions with timeout2- name: API call with timeout3 run: |4 curl --connect-timeout 10 --max-time 30 \5 --retry 3 --retry-delay 5 \6 https://api.assistance.bg/projectsProxy problems#
Corporate proxy configuration:
1# Set proxy environment variables2export HTTP_PROXY=http://proxy.company.com:80803export HTTPS_PROXY=http://proxy.company.com:80804export NO_PROXY=localhost,127.0.0.1,.company.com56# Test proxy connectivity7curl --proxy http://proxy.company.com:8080 https://assistance.bgRunner-specific proxy config:
1# GitHub Actions runner with proxy2echo "HTTP_PROXY=http://proxy.company.com:8080" >> .env3echo "HTTPS_PROXY=http://proxy.company.com:8080" >> .env4sudo ./svc.sh stop && sudo ./svc.sh startAuthentication problems#
API integration failures#
Token validation:
1# Test API token2curl -H "Authorization: Bearer ${TOKEN}" \3 -H "Content-Type: application/json" \4 https://console.assistance.bg/api/v2/projects56# Check token expiration7jwt-cli decode ${TOKEN}Common token issues:
- Token expired or revoked
- Insufficient token permissions
- Incorrect token format
- Token not properly encoded
Token refresh procedures#
GitHub token refresh:
1# Create new personal access token2# Navigate to GitHub → Settings → Developer settings → Personal access tokens34# Update runner configuration5./config.sh remove --token <old-token>6./config.sh --url https://github.com/owner/repo --token <new-token>DevOps Hub API key rotation:
1# Generate new API key via console2curl -X POST https://console.assistance.bg/api/v2/auth/api-keys \3 -H "Authorization: Bearer ${CURRENT_TOKEN}" \4 -H "Content-Type: application/json" \5 -d '{"name": "Runner API Key", "permissions": ["projects:read", "environments:write"]}'67# Update runner environment8echo "DEVOPSHUB_API_KEY=<new-key>" > .envService account issues#
Permission verification:
1# Azure service principal permissions2az ad sp show --id <service-principal-id>3az role assignment list --assignee <service-principal-id>45# Google Cloud service account6gcloud projects get-iam-policy <project-id> \7 --flatten="bindings[].members" \8 --filter="bindings.members:<service-account-email>"Service account key rotation:
1# Google Cloud2gcloud iam service-accounts keys create new-key.json \3 --iam-account=<service-account-email>45# Delete old key6gcloud iam service-accounts keys delete <old-key-id> \7 --iam-account=<service-account-email>Resource exhaustion#
CPU and memory monitoring#
Real-time monitoring:
1# Monitor runner processes2top -p $(pgrep -f "runner|agent|gitlab-runner")34# Memory usage breakdown5ps aux --sort=-%mem | head -206free -h78# CPU usage analysis9iostat -c 1 510mpstat 1 5Resource alerts:
1# Set up memory monitoring2echo '#!/bin/bash3MEMORY_USAGE=$(free | grep Mem | awk "{print \$3/\$2 * 100.0}")4if (( $(echo "$MEMORY_USAGE > 90" | bc -l) )); then5 echo "High memory usage: $MEMORY_USAGE%"6 # Send alert or restart runner7fi' > /usr/local/bin/memory-check.sh89# Add to crontab10echo "*/5 * * * * /usr/local/bin/memory-check.sh" | crontab -Storage issues#
Disk space monitoring:
1# Check available space2df -h3df -i # Check inode usage45# Find large directories6du -sh /var/lib/docker /tmp /home/runner7find /var/log -size +100M -type f89# Clean up runner workspaces10find /home/runner/_work -name "*.log" -mtime +7 -delete11docker system prune -fAutomated cleanup:
1#!/bin/bash2# cleanup-runner.sh3set -e45# Clean Docker6docker system prune -af --filter "until=24h"78# Clean runner workspaces9find /home/runner/_work -maxdepth 2 -type d -mtime +1 -exec rm -rf {} +1011# Clean logs12find /var/log -name "*.log" -mtime +7 -delete13journalctl --vacuum-time=7d1415# Clean package cache16apt-get cleanPerformance optimization#
Runner tuning:
1# Increase file descriptor limits2echo "* soft nofile 65536" >> /etc/security/limits.conf3echo "* hard nofile 65536" >> /etc/security/limits.conf45# Optimize kernel parameters6echo "vm.max_map_count=262144" >> /etc/sysctl.conf7echo "fs.file-max=2097152" >> /etc/sysctl.conf8sysctl -pConcurrent job optimization:
1# GitLab Runner optimization2concurrent = 4 # Based on CPU cores34[[runners]]5 limit = 2 # Jobs per runner instance6 [runners.docker]7 cpus = "1.5"8 memory = "2g"9 shm_size = 268435456 # 256MBDevOps Hub integration issues#
API errors#
Rate limiting detection:
1# Monitor rate limits2curl -I https://console.assistance.bg/api/v2/projects \3 -H "Authorization: Bearer ${TOKEN}" | grep -i rate45# Expected headers:6# X-RateLimit-Limit: 1007# X-RateLimit-Remaining: 508# X-RateLimit-Reset: 1640995200Rate limiting solutions:
1# Implement exponential backoff2retry_api_call() {3 local max_attempts=54 local delay=156 for i in $(seq 1 $max_attempts); do7 if curl -f "https://console.assistance.bg/api/v2/projects" \8 -H "Authorization: Bearer ${TOKEN}"; then9 return 010 fi1112 if [ $i -lt $max_attempts ]; then13 echo "Attempt $i failed, retrying in ${delay}s..."14 sleep $delay15 delay=$((delay * 2))16 fi17 done1819 return 120}Service connectivity#
Health check endpoints:
1# DevOps Hub service status2curl https://console.assistance.bg/api/health3curl https://console.assistance.bg/api/v2/status45# Expected response:6{7 "status": "healthy",8 "version": "2.1.0",9 "timestamp": "2026-01-17T10:00:00Z"10}Connection pooling optimization:
1# Configure HTTP client settings2export HTTP_KEEP_ALIVE=true3export HTTP_MAX_CONNECTIONS_PER_HOST=104export HTTP_CONNECTION_TIMEOUT=30API versioning issues#
Version compatibility check:
1# Check API version2curl https://console.assistance.bg/api/version34# Use specific API version5curl https://console.assistance.bg/api/v2/projects \6 -H "Accept: application/vnd.devopshub.v2+json"Cross-platform problems#
Linux-specific issues#
Permission problems:
1# Check runner user permissions2id runneruser3groups runneruser45# Fix common permission issues6sudo usermod -aG docker runneruser7sudo chown -R runneruser:runneruser /home/runneruser89# SELinux issues (RHEL/CentOS)10sestatus11setsebool -P httpd_can_network_connect 1SystemD service management:
1# Check service status2systemctl status actions.runner.*3journalctl -u actions.runner.* -f45# Service configuration6sudo systemctl enable actions.runner.*7sudo systemctl restart actions.runner.*macOS-specific issues#
Xcode command line tools:
1# Install/update command line tools2xcode-select --install3sudo xcodebuild -license accept45# Check installed tools6xcode-select -p7xcrun --show-sdk-pathmacOS security restrictions:
1# Check Gatekeeper status2spctl --status34# Allow unsigned binaries (if needed)5sudo spctl --master-disable67# Keychain access for certificates8security list-keychains9security unlock-keychain ~/Library/Keychains/login.keychainLaunchDaemon configuration:
1<!-- com.github.actions.runner.plist -->2<?xml version="1.0" encoding="UTF-8"?>3<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">4<plist version="1.0">5<dict>6 <key>Label</key>7 <string>com.github.actions.runner</string>8 <key>ProgramArguments</key>9 <array>10 <string>/Users/runner/actions-runner/run.sh</string>11 </array>12 <key>RunAtLoad</key>13 <true/>14 <key>KeepAlive</key>15 <true/>16 <key>UserName</key>17 <string>runner</string>18 <key>WorkingDirectory</key>19 <string>/Users/runner/actions-runner</string>20</dict>21</plist>Windows-specific issues#
PowerShell execution policy:
1# Check current policy2Get-ExecutionPolicy34# Set policy for runner5Set-ExecutionPolicy RemoteSigned -Scope CurrentUser67# Bypass for specific script8powershell.exe -ExecutionPolicy Bypass -File script.ps1Windows service configuration:
1# Install as Windows service2.\config.cmd --url https://github.com/owner/repo --token <token>3.\svc.sh install45# Service management6sc query "GitHub Actions Runner"7sc start "GitHub Actions Runner"8sc stop "GitHub Actions Runner"Windows firewall:
1# Check firewall rules2Get-NetFirewallRule | Where-Object {$_.DisplayName -match "runner"}34# Add firewall rule5New-NetFirewallRule -DisplayName "GitHub Runner" -Direction Outbound -Port 443 -Protocol TCP -Action AllowContainer and Docker issues#
Docker daemon problems#
Service management:
1# Check Docker daemon status2systemctl status docker3docker info45# Restart Docker daemon6sudo systemctl restart docker78# Check Docker logs9journalctl -u docker -fDocker configuration:
1{2 "log-driver": "json-file",3 "log-opts": {4 "max-size": "10m",5 "max-file": "3"6 },7 "registry-mirrors": ["https://mirror.gcr.io"],8 "insecure-registries": ["localhost:5000"]9}Image pull failures#
Authentication issues:
1# Login to registry2docker login registry.example.com3echo $REGISTRY_PASSWORD | docker login --username $REGISTRY_USER --password-stdin45# Test image pull6docker pull ubuntu:20.04Network issues:
1# Check DNS resolution from container2docker run --rm alpine nslookup registry.example.com34# Test with different DNS5docker run --rm --dns 8.8.8.8 alpine nslookup registry.example.comContainer resource limits#
Memory constraints:
1# Check memory limits2docker stats --no-stream3docker inspect <container-id> | jq '.[0].HostConfig.Memory'45# Set memory limits6docker run -m 2g ubuntu:20.04OOM killer issues:
1# Check OOM killer logs2dmesg | grep "Killed process"3journalctl -k | grep "Memory cgroup out of memory"45# Monitor memory usage6docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"Docker volume issues#
Permission problems:
1# Check volume mounts2docker inspect <container> | jq '.[0].Mounts'34# Fix volume permissions5sudo chown -R 1000:1000 /host/volume/pathDisk space issues:
1# Clean up Docker2docker system df3docker system prune -af45# Remove unused volumes6docker volume prune -f78# Clean up build cache9docker builder prune -afLog analysis#
Structured log investigation#
Log location by platform:
GitHub Actions:
1# Runner logs2tail -f /home/runner/_diag/Runner_*.log3tail -f /home/runner/_diag/Worker_*.log45# System logs6journalctl -u actions.runner.* -fGitLab Runner:
1# Runner logs2tail -f /var/log/gitlab-runner/gitlab-runner.log34# Service logs5journalctl -u gitlab-runner -f67# Job logs8docker logs $(docker ps --filter label=com.gitlab.gitlab-runner.job.id --format "{{.ID}}")Jenkins:
1# Master logs2tail -f $JENKINS_HOME/logs/jenkins.log34# Agent logs5tail -f /var/log/jenkins/jenkins.log6cat /var/lib/jenkins/logs/slaves/<node-name>/slave.logAzure DevOps:
1# Agent logs (Linux)2tail -f /home/vsts/agents/_diag/Agent_*.log34# Windows logs5Get-Content "C:\agents\_diag\Agent_*.log" -Tail 50 -WaitLog filtering patterns#
Common error patterns:
1# Connection errors2grep -E "(connection|timeout|refused|failed)" runner.log34# Authentication errors5grep -E "(auth|unauthorized|forbidden|401|403)" runner.log67# Resource errors8grep -E "(memory|disk|space|quota|limit)" runner.log910# API errors11grep -E "(api|http|500|502|503|504)" runner.logStructured log parsing:
1# Extract JSON logs2jq '.level == "error" | select(.message | contains("connection"))' runner.log34# Time-based filtering5journalctl --since "2026-01-17 09:00:00" --until "2026-01-17 10:00:00" -u actions.runner.*67# Pattern-based extraction8awk '/ERROR/ {print $1, $2, $NF}' runner.log | sort | uniq -cLog analysis automation#
Log monitoring script:
1#!/bin/bash2# monitor-runner-logs.sh34LOG_FILE="/var/log/runner/runner.log"5ERROR_PATTERNS=(6 "connection refused"7 "authentication failed"8 "out of memory"9 "disk space"10 "timeout"11)1213tail -F "$LOG_FILE" | while read line; do14 for pattern in "${ERROR_PATTERNS[@]}"; do15 if echo "$line" | grep -qi "$pattern"; then16 echo "$(date): ERROR DETECTED - $pattern"17 echo "$line"18 # Send alert19 curl -X POST https://hooks.slack.com/webhook \20 -H "Content-Type: application/json" \21 -d "{\"text\": \"Runner Error: $pattern\"}"22 fi23 done24doneLog aggregation:
1# Collect all relevant logs2collect_logs() {3 local output_dir="/tmp/runner-logs-$(date +%Y%m%d-%H%M%S)"4 mkdir -p "$output_dir"56 # System logs7 journalctl --no-pager > "$output_dir/system.log"89 # Runner logs10 cp -r /home/runner/_diag/ "$output_dir/runner-diag/"1112 # Docker logs13 docker logs $(docker ps -aq) > "$output_dir/docker.log" 2>&11415 # System information16 cat /proc/meminfo > "$output_dir/meminfo"17 df -h > "$output_dir/disk-usage"18 ps aux > "$output_dir/processes"1920 echo "Logs collected in: $output_dir"21}Escalation procedures#
Information gathering checklist#
Before contacting support, gather this information:
System information:
1# Operating system details2uname -a3cat /etc/os-release4lscpu5free -h6df -h78# Network configuration9ip addr show10route -n11cat /etc/resolv.conf1213# Runner configuration14./config.sh --help 2>&1 | head -10 # Shows version15cat .runner # Configuration details (sanitize tokens)Error reproduction:
- Exact error message or log entries
- Steps to reproduce the issue
- Timeline when issue started
- Recent changes to configuration
- Frequency of occurrence (always/intermittent)
Log collection:
1# Create support bundle2tar -czf runner-support-$(date +%Y%m%d).tar.gz \3 /var/log/runner/ \4 /home/runner/_diag/ \5 ~/.runner \6 ~/.env \7 /etc/systemd/system/actions.runner.*Log sanitization for security#
Remove sensitive data:
1# Sanitize logs before sharing2sanitize_logs() {3 local log_file="$1"4 local output_file="${log_file}.sanitized"56 # Remove tokens and secrets7 sed -E 's/token=[A-Za-z0-9_-]+/token=***REDACTED***/g' "$log_file" | \8 sed -E 's/"password"\s*:\s*"[^"]*"/"password": "***REDACTED***"/g' | \9 sed -E 's/Authorization:\s*Bearer\s+[A-Za-z0-9_.-]+/Authorization: Bearer ***REDACTED***/g' \10 > "$output_file"1112 echo "Sanitized log: $output_file"13}Support channels and SLA expectations#
DevOps Hub Support:
- Community: GitHub Discussions (Best effort)
- Pro Plan: Email support (24-48 hours)
- Enterprise: Dedicated support (4-8 hours)
Platform Support:
- GitHub: GitHub Support (varies by plan)
- GitLab: GitLab Support (varies by subscription)
- Azure DevOps: Microsoft Support
- Jenkins: Community forums and commercial support
Emergency escalation: For production-critical issues:
- Use highest priority support channel
- Include "PRODUCTION OUTAGE" in subject
- Provide complete information from checklist
- Include business impact assessment
Information to include:
- Organization/project ID
- Runner platform and version
- Error messages and logs (sanitized)
- Steps already attempted
- Business impact and urgency
Prevention and monitoring#
Proactive monitoring setup#
Health check script:
1#!/bin/bash2# runner-health-check.sh34check_runner_health() {5 local errors=067 # Check service status8 if ! systemctl is-active --quiet actions.runner.*; then9 echo "ERROR: Runner service not active"10 ((errors++))11 fi1213 # Check disk space14 local disk_usage=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')15 if [ "$disk_usage" -gt 90 ]; then16 echo "ERROR: Disk usage high: ${disk_usage}%"17 ((errors++))18 fi1920 # Check memory usage21 local mem_usage=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}')22 if [ "$mem_usage" -gt 90 ]; then23 echo "ERROR: Memory usage high: ${mem_usage}%"24 ((errors++))25 fi2627 # Check network connectivity28 if ! curl -s --max-time 10 https://api.github.com > /dev/null; then29 echo "ERROR: Cannot reach GitHub API"30 ((errors++))31 fi3233 if [ "$errors" -eq 0 ]; then34 echo "OK: Runner health check passed"35 fi3637 return $errors38}3940# Run health check41check_runner_healthAutomated maintenance:
1#!/bin/bash2# runner-maintenance.sh34# Weekly maintenance tasks5maintenance_tasks() {6 echo "Starting runner maintenance..."78 # Update runner software9 cd /home/runner/actions-runner10 ./bin/Runner.Listener update1112 # Clean up old logs13 find /home/runner/_diag -name "*.log" -mtime +7 -delete1415 # Update system packages16 apt-get update && apt-get upgrade -y1718 # Clean Docker19 docker system prune -af --filter "until=72h"2021 # Restart runner service22 systemctl restart actions.runner.*2324 echo "Maintenance completed"25}2627# Schedule with cron28# 0 2 * * 0 /usr/local/bin/runner-maintenance.shThis comprehensive troubleshooting guide provides systematic approaches to diagnosing and resolving the most common self-hosted runner issues across all supported platforms. Keep this reference handy for rapid issue resolution and use the escalation procedures when additional support is needed.