Common errors and troubleshooting

Comprehensive error diagnosis and resolution guide for self-hosted runners

This guide provides systematic approaches to diagnosing and resolving common issues across all supported runner platforms. Use this reference for rapid troubleshooting and escalation procedures.

Troubleshooting overview#

Systematic debugging approach#

When encountering runner issues, follow this structured approach:

Identify the scope: Single runner, multiple runners, or platform-wide issue?
Check timing: When did the issue start? Was there a recent change?
Gather logs: Collect relevant logs from runner, platform, and DevOps Hub
Reproduce the issue: Can you consistently trigger the problem?
Apply targeted fixes: Start with the most likely causes first
Document resolution: Record the solution for future reference

Common error patterns#

Most runner issues fall into these categories:

Connection failures: Network, DNS, or service unavailability
Authentication errors: Expired tokens, invalid credentials, or permission issues
Resource exhaustion: CPU, memory, disk space, or network bandwidth limitations
Configuration problems: Incorrect settings, missing dependencies, or environment issues
Platform-specific bugs: Known issues with specific runner versions or platforms

Platform-specific errors#

GitHub Actions errors#

Runner offline/connection issues#

Symptoms:

Runner shows as "Offline" in GitHub UI
Workflows queue indefinitely without starting
Connection timeout errors in runner logs

Common causes:

1
# Check runner service status
2
systemctl status actions.runner.*
3
4
# Verify network connectivity
5
curl -I https://api.github.com
6
telnet api.github.com 443

Solutions:

1
# Restart runner service
2
sudo ./svc.sh stop
3
sudo ./svc.sh start
4
5
# Re-register runner if token expired
6
./config.sh remove --token <removal-token>
7
./config.sh --url https://github.com/owner/repo --token <new-token>
8
sudo ./svc.sh install
9
sudo ./svc.sh start

Workflow permission denied errors#

Error patterns:

1
Error: Resource not accessible by integration
2
Error: Permission denied to github-actions[bot]

Solutions:

Check repository Settings → Actions → General → Workflow permissions
Verify token has required scopes for API operations
Ensure GITHUB_TOKEN has sufficient permissions in workflow

1
permissions:
2
  contents: read
3
  pull-requests: write
4
  checks: write

Self-hosted runner registration failures#

Error patterns:

1
A runner exists with the same name
2
Could not connect to the server
3
Invalid runner token

Resolution steps:

1
# Remove existing runner
2
./config.sh remove --token <removal-token>
3
4
# Clean runner directory
5
rm -rf _diag/ _work/
6
7
# Re-register with new token
8
./config.sh --url https://github.com/owner/repo --token <new-registration-token>

Cache access and artifact upload problems#

Symptoms:

Artifact upload failures
Cache restore/save errors
Permission denied on cache operations

Debugging:

1
# Check available disk space
2
df -h
3
du -sh ./_work/
4
5
# Verify cache directory permissions
6
ls -la ~/.cache/

Solutions:

1
# Clear cache directory
2
rm -rf ~/.cache/actions-runner/
3
4
# Fix permissions
5
sudo chown -R runneruser:runneruser ~/.cache/

Azure DevOps errors#

Agent pool connection failures#

Symptoms:

Agent shows as "Offline" in Azure DevOps
Connection refused errors
Authentication timeouts

Diagnostic commands:

1
# Check agent service
2
systemctl status vsts.agent.*
3
4
# Test connectivity
5
curl -v https://dev.azure.com/organization
6
7
# Verify agent configuration
8
cat .agent

Solutions:

1
# Reconfigure agent
2
./config.sh remove
3
./config.sh --url https://dev.azure.com/organization --auth pat --token <new-pat>
4
5
# Restart service
6
sudo systemctl restart vsts.agent.*

Service connection authentication errors#

Error patterns:

1
TF400813: The user 'Build\...' is not authorized to access this resource
2
VS403403: The current user does not have permission to perform this action

Resolution:

Verify service principal has required Azure permissions
Check service connection configuration in project settings
Ensure agent pool has permission to access service connections

Pipeline variable access denied#

Symptoms:

Variables undefined in pipeline runs
Secret variables not being decrypted
Environment-specific variable access issues

Solutions:

1
# Explicitly define variable groups
2
variables:
3
  - group: 'Production Variables'
4
  - name: 'Environment'
5
    value: 'production'

Task execution timeout issues#

Common timeouts:

Default task timeout: 60 minutes
Job timeout: 360 minutes (6 hours)
Pipeline timeout: 60 minutes

Configuration:

1
jobs:
2
- job: LongRunningJob
3
  timeoutInMinutes: 120 # 2 hours
4
  steps:
5
  - task: PowerShell@2
6
    timeoutInMinutes: 30
7
    inputs:
8
      script: 'long-running-script.ps1'

GitLab Runner errors#

Runner registration token expired#

Error patterns:

1
ERROR: Verifying runner... failed
2
ERROR: Failed to register runner
3
HTTP 403 Forbidden

Resolution:

1
# Unregister old runner
2
gitlab-runner unregister --name <runner-name>
3
4
# Register with new token
5
gitlab-runner register \
6
  --url https://gitlab.com/ \
7
  --registration-token <new-token> \
8
  --name <runner-name>

Docker executor permission denied#

Symptoms:

Permission denied errors in Docker containers
Volume mount failures
Docker socket access issues

Solutions:

1
# Add gitlab-runner to docker group
2
sudo usermod -aG docker gitlab-runner
3
4
# Fix Docker socket permissions
5
sudo chmod 666 /var/run/docker.sock
6
7
# Restart GitLab Runner
8
sudo gitlab-runner restart

Configuration fix:

1
[[runners]]
2
  [runners.docker]
3
    privileged = true
4
    volumes = ["/var/run/docker.sock:/var/run/docker.sock"]

Kubernetes executor pod failures#

Common issues:

ImagePullBackOff errors
Resource quota exceeded
Node selector constraints

Debugging:

1
# Check pod status
2
kubectl get pods -n gitlab-runner
3
4
# Describe failed pods
5
kubectl describe pod <pod-name> -n gitlab-runner
6
7
# Check resource quotas
8
kubectl describe quota -n gitlab-runner

Configuration solutions:

1
[[runners]]
2
  [runners.kubernetes]
3
    image = "ubuntu:20.04"
4
    privileged = true
5
    [runners.kubernetes.node_selector]
6
      "node-type" = "runner-node"
7
    [runners.kubernetes.resources]
8
      [runners.kubernetes.resources.limits]
9
        memory = "2Gi"
10
        cpu = "1000m"

Concurrent job limit exceeded#

Error message:

1
ERROR: Job failed (system failure): concurrent job limit exceeded

Solutions:

Increase concurrent job limit in runner configuration
Deploy additional runners
Optimize job duration to increase throughput

1
concurrent = 10  # Increase from default of 1
2
3
[[runners]]
4
  limit = 5  # Jobs per runner

Jenkins errors#

Agent SSH connection refused#

Symptoms:

"Connection refused" errors in Jenkins logs
Agent appears offline
SSH handshake failures

Debugging:

1
# Test SSH connection manually
2
ssh -v jenkins@<agent-ip>
3
4
# Check SSH service
5
systemctl status ssh
6
7
# Verify SSH key authentication
8
ssh-keygen -t rsa -b 4096 -f ~/.ssh/jenkins_key

Solutions:

Ensure SSH service is running on agent
Verify Jenkins master can reach agent IP/port
Check SSH key permissions and authentication

1
# Fix SSH key permissions
2
chmod 600 ~/.ssh/jenkins_key
3
chmod 700 ~/.ssh/
4
5
# Test SSH connection with key
6
ssh -i ~/.ssh/jenkins_key jenkins@<agent-ip>

JNLP agent connection failures#

Error patterns:

1
java.nio.channels.ClosedChannelException
2
Failed to connect to http://jenkins:8080/

Resolution steps:

Verify Jenkins URL is accessible from agent
Check JNLP port configuration (default 50000)
Ensure firewall allows connection

1
# Test JNLP port connectivity
2
telnet <jenkins-master> 50000
3
4
# Download and run JNLP agent
5
wget http://jenkins:8080/jnlpJars/agent.jar
6
java -jar agent.jar -jnlpUrl http://jenkins:8080/computer/<node-name>/jenkins-agent.jnlp -secret <secret>

Workspace permission issues#

Symptoms:

Permission denied errors during builds
Unable to create/delete files in workspace
Git checkout failures

Solutions:

1
# Fix workspace permissions
2
sudo chown -R jenkins:jenkins /var/lib/jenkins/workspace/
3
sudo chmod -R 755 /var/lib/jenkins/workspace/
4
5
# Clean workspace
6
rm -rf /var/lib/jenkins/workspace/<job-name>/*

Plugin compatibility problems#

Common issues:

Plugin version conflicts
Incompatible Jenkins core version
Missing plugin dependencies

Resolution:

Check plugin compatibility matrix
Update plugins in correct dependency order
Roll back problematic plugins

1
# Check plugin status via CLI
2
java -jar jenkins-cli.jar -s http://jenkins:8080/ list-plugins
3
4
# Install specific plugin version
5
java -jar jenkins-cli.jar -s http://jenkins:8080/ install-plugin <plugin-name>@<version>

Bazel Remote Build Execution errors#

gRPC connection failures#

Error patterns:

1
UNAVAILABLE: io exception
2
DEADLINE_EXCEEDED: context deadline exceeded
3
Failed to connect to remote execution service

Debugging:

1
# Test gRPC connectivity
2
grpc_cli call <remote-executor-address> \
3
  google.devtools.remoteworkers.v1test2.Bots/CreateBotSession
4
5
# Check TLS certificate
6
openssl s_client -connect <remote-executor>:443 -servername <hostname>

Solutions:

Verify remote executor endpoint is reachable
Check authentication credentials
Ensure TLS certificates are valid

1
# Bazel configuration with authentication
2
bazel build --remote_executor=grpcs://executor.example.com:443 \
3
  --google_credentials=/path/to/credentials.json \
4
  --remote_timeout=60

Remote cache authentication errors#

Error patterns:

1
PERMISSION_DENIED: Request not authenticated
2
UNAUTHENTICATED: Invalid credentials

Solutions:

1
# Authenticate with service account
2
gcloud auth activate-service-account --key-file=credentials.json
3
4
# Configure Bazel with authentication
5
bazel build --remote_cache=grpcs://cache.example.com:443 \
6
  --google_credentials=/path/to/credentials.json

Worker registration timeouts#

Symptoms:

Workers fail to register with scheduler
Build requests timeout waiting for workers
Intermittent worker availability

Configuration fixes:

1
# worker.conf
2
worker_properties = {
3
    "pool": "default",
4
    "os": "linux",
5
    "arch": "x86_64"
6
}
7
8
# Increase timeouts
9
registration_timeout = "30s"
10
keepalive_timeout = "10s"

Build execution environment issues#

Common problems:

Missing build tools in worker environment
Incorrect PATH configuration
Tool version mismatches

Solutions:

Verify worker environment matches local build requirements
Use container-based workers for consistency
Pin tool versions in BUILD files

1
# BUILD file with explicit tool versions
2
cc_binary(
3
    name = "my_binary",
4
    srcs = ["main.cc"],
5
    toolchains = ["@bazel_tools//tools/cpp:toolchain"],
6
)

Network connectivity issues#

Connection failures#

Diagnostic commands:

1
# Test basic connectivity
2
ping -c 4 console.assistance.bg
3
traceroute console.assistance.bg
4
5
# Check DNS resolution
6
nslookup console.assistance.bg
7
dig +trace assistance.bg
8
9
# Test HTTPS connectivity
10
curl -v -I https://console.assistance.bg
11
openssl s_client -connect console.assistance.bg:443

Common solutions:

Check firewall rules and security groups
Verify DNS configuration
Test with different networks (mobile hotspot)
Check proxy configuration

Timeouts#

Connection timeout patterns:

1
# Increase connection timeouts
2
curl --connect-timeout 30 --max-time 300 https://api.assistance.bg
3
4
# Test with increased verbosity
5
curl -v --trace-time https://api.assistance.bg/health

HTTP client configuration:

1
# GitHub Actions with timeout
2
- name: API call with timeout
3
  run: |
4
    curl --connect-timeout 10 --max-time 30 \
5
         --retry 3 --retry-delay 5 \
6
         https://api.assistance.bg/projects

Proxy problems#

Corporate proxy configuration:

1
# Set proxy environment variables
2
export HTTP_PROXY=http://proxy.company.com:8080
3
export HTTPS_PROXY=http://proxy.company.com:8080
4
export NO_PROXY=localhost,127.0.0.1,.company.com
5
6
# Test proxy connectivity
7
curl --proxy http://proxy.company.com:8080 https://assistance.bg

Runner-specific proxy config:

1
# GitHub Actions runner with proxy
2
echo "HTTP_PROXY=http://proxy.company.com:8080" >> .env
3
echo "HTTPS_PROXY=http://proxy.company.com:8080" >> .env
4
sudo ./svc.sh stop && sudo ./svc.sh start

Authentication problems#

API integration failures#

Token validation:

1
# Test API token
2
curl -H "Authorization: Bearer ${TOKEN}" \
3
     -H "Content-Type: application/json" \
4
     https://console.assistance.bg/api/v2/projects
5
6
# Check token expiration
7
jwt-cli decode ${TOKEN}

Common token issues:

Token expired or revoked
Insufficient token permissions
Incorrect token format
Token not properly encoded

Token refresh procedures#

GitHub token refresh:

1
# Create new personal access token
2
# Navigate to GitHub → Settings → Developer settings → Personal access tokens
3
4
# Update runner configuration
5
./config.sh remove --token <old-token>
6
./config.sh --url https://github.com/owner/repo --token <new-token>

DevOps Hub API key rotation:

1
# Generate new API key via console
2
curl -X POST https://console.assistance.bg/api/v2/auth/api-keys \
3
     -H "Authorization: Bearer ${CURRENT_TOKEN}" \
4
     -H "Content-Type: application/json" \
5
     -d '{"name": "Runner API Key", "permissions": ["projects:read", "environments:write"]}'
6
7
# Update runner environment
8
echo "DEVOPSHUB_API_KEY=<new-key>" > .env

Service account issues#

Permission verification:

1
# Azure service principal permissions
2
az ad sp show --id <service-principal-id>
3
az role assignment list --assignee <service-principal-id>
4
5
# Google Cloud service account
6
gcloud projects get-iam-policy <project-id> \
7
  --flatten="bindings[].members" \
8
  --filter="bindings.members:<service-account-email>"

Service account key rotation:

1
# Google Cloud
2
gcloud iam service-accounts keys create new-key.json \
3
  --iam-account=<service-account-email>
4
5
# Delete old key
6
gcloud iam service-accounts keys delete <old-key-id> \
7
  --iam-account=<service-account-email>

Resource exhaustion#

CPU and memory monitoring#

Real-time monitoring:

1
# Monitor runner processes
2
top -p $(pgrep -f "runner|agent|gitlab-runner")
3
4
# Memory usage breakdown
5
ps aux --sort=-%mem | head -20
6
free -h
7
8
# CPU usage analysis
9
iostat -c 1 5
10
mpstat 1 5

Resource alerts:

1
# Set up memory monitoring
2
echo '#!/bin/bash
3
MEMORY_USAGE=$(free | grep Mem | awk "{print \$3/\$2 * 100.0}")
4
if (( $(echo "$MEMORY_USAGE > 90" | bc -l) )); then
5
  echo "High memory usage: $MEMORY_USAGE%"
6
  # Send alert or restart runner
7
fi' > /usr/local/bin/memory-check.sh
8
9
# Add to crontab
10
echo "*/5 * * * * /usr/local/bin/memory-check.sh" | crontab -

Storage issues#

Disk space monitoring:

1
# Check available space
2
df -h
3
df -i  # Check inode usage
4
5
# Find large directories
6
du -sh /var/lib/docker /tmp /home/runner
7
find /var/log -size +100M -type f
8
9
# Clean up runner workspaces
10
find /home/runner/_work -name "*.log" -mtime +7 -delete
11
docker system prune -f

Automated cleanup:

1
#!/bin/bash
2
# cleanup-runner.sh
3
set -e
4
5
# Clean Docker
6
docker system prune -af --filter "until=24h"
7
8
# Clean runner workspaces
9
find /home/runner/_work -maxdepth 2 -type d -mtime +1 -exec rm -rf {} +
10
11
# Clean logs
12
find /var/log -name "*.log" -mtime +7 -delete
13
journalctl --vacuum-time=7d
14
15
# Clean package cache
16
apt-get clean

Performance optimization#

Runner tuning:

1
# Increase file descriptor limits
2
echo "* soft nofile 65536" >> /etc/security/limits.conf
3
echo "* hard nofile 65536" >> /etc/security/limits.conf
4
5
# Optimize kernel parameters
6
echo "vm.max_map_count=262144" >> /etc/sysctl.conf
7
echo "fs.file-max=2097152" >> /etc/sysctl.conf
8
sysctl -p

Concurrent job optimization:

1
# GitLab Runner optimization
2
concurrent = 4  # Based on CPU cores
3
4
[[runners]]
5
  limit = 2  # Jobs per runner instance
6
  [runners.docker]
7
    cpus = "1.5"
8
    memory = "2g"
9
    shm_size = 268435456  # 256MB

DevOps Hub integration issues#

API errors#

Rate limiting detection:

1
# Monitor rate limits
2
curl -I https://console.assistance.bg/api/v2/projects \
3
     -H "Authorization: Bearer ${TOKEN}" | grep -i rate
4
5
# Expected headers:
6
# X-RateLimit-Limit: 100
7
# X-RateLimit-Remaining: 50
8
# X-RateLimit-Reset: 1640995200

Rate limiting solutions:

1
# Implement exponential backoff
2
retry_api_call() {
3
    local max_attempts=5
4
    local delay=1
5
6
    for i in $(seq 1 $max_attempts); do
7
        if curl -f "https://console.assistance.bg/api/v2/projects" \
8
           -H "Authorization: Bearer ${TOKEN}"; then
9
            return 0
10
        fi
11
12
        if [ $i -lt $max_attempts ]; then
13
            echo "Attempt $i failed, retrying in ${delay}s..."
14
            sleep $delay
15
            delay=$((delay * 2))
16
        fi
17
    done
18
19
    return 1
20
}

Service connectivity#

Health check endpoints:

1
# DevOps Hub service status
2
curl https://console.assistance.bg/api/health
3
curl https://console.assistance.bg/api/v2/status
4
5
# Expected response:
6
{
7
  "status": "healthy",
8
  "version": "2.1.0",
9
  "timestamp": "2026-01-17T10:00:00Z"
10
}

Connection pooling optimization:

1
# Configure HTTP client settings
2
export HTTP_KEEP_ALIVE=true
3
export HTTP_MAX_CONNECTIONS_PER_HOST=10
4
export HTTP_CONNECTION_TIMEOUT=30

API versioning issues#

Version compatibility check:

1
# Check API version
2
curl https://console.assistance.bg/api/version
3
4
# Use specific API version
5
curl https://console.assistance.bg/api/v2/projects \
6
     -H "Accept: application/vnd.devopshub.v2+json"

Cross-platform problems#

Linux-specific issues#

Permission problems:

1
# Check runner user permissions
2
id runneruser
3
groups runneruser
4
5
# Fix common permission issues
6
sudo usermod -aG docker runneruser
7
sudo chown -R runneruser:runneruser /home/runneruser
8
9
# SELinux issues (RHEL/CentOS)
10
sestatus
11
setsebool -P httpd_can_network_connect 1

SystemD service management:

1
# Check service status
2
systemctl status actions.runner.*
3
journalctl -u actions.runner.* -f
4
5
# Service configuration
6
sudo systemctl enable actions.runner.*
7
sudo systemctl restart actions.runner.*

macOS-specific issues#

Xcode command line tools:

1
# Install/update command line tools
2
xcode-select --install
3
sudo xcodebuild -license accept
4
5
# Check installed tools
6
xcode-select -p
7
xcrun --show-sdk-path

macOS security restrictions:

1
# Check Gatekeeper status
2
spctl --status
3
4
# Allow unsigned binaries (if needed)
5
sudo spctl --master-disable
6
7
# Keychain access for certificates
8
security list-keychains
9
security unlock-keychain ~/Library/Keychains/login.keychain

LaunchDaemon configuration:

1
<!-- com.github.actions.runner.plist -->
2
<?xml version="1.0" encoding="UTF-8"?>
3
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
4
<plist version="1.0">
5
<dict>
6
    <key>Label</key>
7
    <string>com.github.actions.runner</string>
8
    <key>ProgramArguments</key>
9
    <array>
10
        <string>/Users/runner/actions-runner/run.sh</string>
11
    </array>
12
    <key>RunAtLoad</key>
13
    <true/>
14
    <key>KeepAlive</key>
15
    <true/>
16
    <key>UserName</key>
17
    <string>runner</string>
18
    <key>WorkingDirectory</key>
19
    <string>/Users/runner/actions-runner</string>
20
</dict>
21
</plist>

Windows-specific issues#

PowerShell execution policy:

1
# Check current policy
2
Get-ExecutionPolicy
3
4
# Set policy for runner
5
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
6
7
# Bypass for specific script
8
powershell.exe -ExecutionPolicy Bypass -File script.ps1

Windows service configuration:

1
# Install as Windows service
2
.\config.cmd --url https://github.com/owner/repo --token <token>
3
.\svc.sh install
4
5
# Service management
6
sc query "GitHub Actions Runner"
7
sc start "GitHub Actions Runner"
8
sc stop "GitHub Actions Runner"

Windows firewall:

1
# Check firewall rules
2
Get-NetFirewallRule | Where-Object {$_.DisplayName -match "runner"}
3
4
# Add firewall rule
5
New-NetFirewallRule -DisplayName "GitHub Runner" -Direction Outbound -Port 443 -Protocol TCP -Action Allow

Container and Docker issues#

Docker daemon problems#

Service management:

1
# Check Docker daemon status
2
systemctl status docker
3
docker info
4
5
# Restart Docker daemon
6
sudo systemctl restart docker
7
8
# Check Docker logs
9
journalctl -u docker -f

Docker configuration:

1
{
2
  "log-driver": "json-file",
3
  "log-opts": {
4
    "max-size": "10m",
5
    "max-file": "3"
6
  },
7
  "registry-mirrors": ["https://mirror.gcr.io"],
8
  "insecure-registries": ["localhost:5000"]
9
}

Image pull failures#

Authentication issues:

1
# Login to registry
2
docker login registry.example.com
3
echo $REGISTRY_PASSWORD | docker login --username $REGISTRY_USER --password-stdin
4
5
# Test image pull
6
docker pull ubuntu:20.04

Network issues:

1
# Check DNS resolution from container
2
docker run --rm alpine nslookup registry.example.com
3
4
# Test with different DNS
5
docker run --rm --dns 8.8.8.8 alpine nslookup registry.example.com

Container resource limits#

Memory constraints:

1
# Check memory limits
2
docker stats --no-stream
3
docker inspect <container-id> | jq '.[0].HostConfig.Memory'
4
5
# Set memory limits
6
docker run -m 2g ubuntu:20.04

OOM killer issues:

1
# Check OOM killer logs
2
dmesg | grep "Killed process"
3
journalctl -k | grep "Memory cgroup out of memory"
4
5
# Monitor memory usage
6
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"

Docker volume issues#

Permission problems:

1
# Check volume mounts
2
docker inspect <container> | jq '.[0].Mounts'
3
4
# Fix volume permissions
5
sudo chown -R 1000:1000 /host/volume/path

Disk space issues:

1
# Clean up Docker
2
docker system df
3
docker system prune -af
4
5
# Remove unused volumes
6
docker volume prune -f
7
8
# Clean up build cache
9
docker builder prune -af

Log analysis#

Structured log investigation#

Log location by platform:

GitHub Actions:

1
# Runner logs
2
tail -f /home/runner/_diag/Runner_*.log
3
tail -f /home/runner/_diag/Worker_*.log
4
5
# System logs
6
journalctl -u actions.runner.* -f

GitLab Runner:

1
# Runner logs
2
tail -f /var/log/gitlab-runner/gitlab-runner.log
3
4
# Service logs
5
journalctl -u gitlab-runner -f
6
7
# Job logs
8
docker logs $(docker ps --filter label=com.gitlab.gitlab-runner.job.id --format "{{.ID}}")

Jenkins:

1
# Master logs
2
tail -f $JENKINS_HOME/logs/jenkins.log
3
4
# Agent logs
5
tail -f /var/log/jenkins/jenkins.log
6
cat /var/lib/jenkins/logs/slaves/<node-name>/slave.log

Azure DevOps:

1
# Agent logs (Linux)
2
tail -f /home/vsts/agents/_diag/Agent_*.log
3
4
# Windows logs
5
Get-Content "C:\agents\_diag\Agent_*.log" -Tail 50 -Wait

Log filtering patterns#

Common error patterns:

1
# Connection errors
2
grep -E "(connection|timeout|refused|failed)" runner.log
3
4
# Authentication errors
5
grep -E "(auth|unauthorized|forbidden|401|403)" runner.log
6
7
# Resource errors
8
grep -E "(memory|disk|space|quota|limit)" runner.log
9
10
# API errors
11
grep -E "(api|http|500|502|503|504)" runner.log

Structured log parsing:

1
# Extract JSON logs
2
jq '.level == "error" | select(.message | contains("connection"))' runner.log
3
4
# Time-based filtering
5
journalctl --since "2026-01-17 09:00:00" --until "2026-01-17 10:00:00" -u actions.runner.*
6
7
# Pattern-based extraction
8
awk '/ERROR/ {print $1, $2, $NF}' runner.log | sort | uniq -c

Log analysis automation#

Log monitoring script:

1
#!/bin/bash
2
# monitor-runner-logs.sh
3
4
LOG_FILE="/var/log/runner/runner.log"
5
ERROR_PATTERNS=(
6
    "connection refused"
7
    "authentication failed"
8
    "out of memory"
9
    "disk space"
10
    "timeout"
11
)
12
13
tail -F "$LOG_FILE" | while read line; do
14
    for pattern in "${ERROR_PATTERNS[@]}"; do
15
        if echo "$line" | grep -qi "$pattern"; then
16
            echo "$(date): ERROR DETECTED - $pattern"
17
            echo "$line"
18
            # Send alert
19
            curl -X POST https://hooks.slack.com/webhook \
20
                -H "Content-Type: application/json" \
21
                -d "{\"text\": \"Runner Error: $pattern\"}"
22
        fi
23
    done
24
done

Log aggregation:

1
# Collect all relevant logs
2
collect_logs() {
3
    local output_dir="/tmp/runner-logs-$(date +%Y%m%d-%H%M%S)"
4
    mkdir -p "$output_dir"
5
6
    # System logs
7
    journalctl --no-pager > "$output_dir/system.log"
8
9
    # Runner logs
10
    cp -r /home/runner/_diag/ "$output_dir/runner-diag/"
11
12
    # Docker logs
13
    docker logs $(docker ps -aq) > "$output_dir/docker.log" 2>&1
14
15
    # System information
16
    cat /proc/meminfo > "$output_dir/meminfo"
17
    df -h > "$output_dir/disk-usage"
18
    ps aux > "$output_dir/processes"
19
20
    echo "Logs collected in: $output_dir"
21
}

Escalation procedures#

Information gathering checklist#

Before contacting support, gather this information:

System information:

1
# Operating system details
2
uname -a
3
cat /etc/os-release
4
lscpu
5
free -h
6
df -h
7
8
# Network configuration
9
ip addr show
10
route -n
11
cat /etc/resolv.conf
12
13
# Runner configuration
14
./config.sh --help 2>&1 | head -10  # Shows version
15
cat .runner  # Configuration details (sanitize tokens)

Error reproduction:

Exact error message or log entries
Steps to reproduce the issue
Timeline when issue started
Recent changes to configuration
Frequency of occurrence (always/intermittent)

Log collection:

1
# Create support bundle
2
tar -czf runner-support-$(date +%Y%m%d).tar.gz \
3
    /var/log/runner/ \
4
    /home/runner/_diag/ \
5
    ~/.runner \
6
    ~/.env \
7
    /etc/systemd/system/actions.runner.*

Log sanitization for security#

Remove sensitive data:

1
# Sanitize logs before sharing
2
sanitize_logs() {
3
    local log_file="$1"
4
    local output_file="${log_file}.sanitized"
5
6
    # Remove tokens and secrets
7
    sed -E 's/token=[A-Za-z0-9_-]+/token=***REDACTED***/g' "$log_file" | \
8
    sed -E 's/"password"\s*:\s*"[^"]*"/"password": "***REDACTED***"/g' | \
9
    sed -E 's/Authorization:\s*Bearer\s+[A-Za-z0-9_.-]+/Authorization: Bearer ***REDACTED***/g' \
10
    > "$output_file"
11
12
    echo "Sanitized log: $output_file"
13
}

Support channels and SLA expectations#

DevOps Hub Support:

Community: GitHub Discussions (Best effort)
Pro Plan: Email support (24-48 hours)
Enterprise: Dedicated support (4-8 hours)

Platform Support:

GitHub: GitHub Support (varies by plan)
GitLab: GitLab Support (varies by subscription)
Azure DevOps: Microsoft Support
Jenkins: Community forums and commercial support

Emergency escalation: For production-critical issues:

Use highest priority support channel
Include "PRODUCTION OUTAGE" in subject
Provide complete information from checklist
Include business impact assessment

Information to include:

Organization/project ID
Runner platform and version
Error messages and logs (sanitized)
Steps already attempted
Business impact and urgency

Prevention and monitoring#

Proactive monitoring setup#

Health check script:

1
#!/bin/bash
2
# runner-health-check.sh
3
4
check_runner_health() {
5
    local errors=0
6
7
    # Check service status
8
    if ! systemctl is-active --quiet actions.runner.*; then
9
        echo "ERROR: Runner service not active"
10
        ((errors++))
11
    fi
12
13
    # Check disk space
14
    local disk_usage=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')
15
    if [ "$disk_usage" -gt 90 ]; then
16
        echo "ERROR: Disk usage high: ${disk_usage}%"
17
        ((errors++))
18
    fi
19
20
    # Check memory usage
21
    local mem_usage=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}')
22
    if [ "$mem_usage" -gt 90 ]; then
23
        echo "ERROR: Memory usage high: ${mem_usage}%"
24
        ((errors++))
25
    fi
26
27
    # Check network connectivity
28
    if ! curl -s --max-time 10 https://api.github.com > /dev/null; then
29
        echo "ERROR: Cannot reach GitHub API"
30
        ((errors++))
31
    fi
32
33
    if [ "$errors" -eq 0 ]; then
34
        echo "OK: Runner health check passed"
35
    fi
36
37
    return $errors
38
}
39
40
# Run health check
41
check_runner_health

Automated maintenance:

1
#!/bin/bash
2
# runner-maintenance.sh
3
4
# Weekly maintenance tasks
5
maintenance_tasks() {
6
    echo "Starting runner maintenance..."
7
8
    # Update runner software
9
    cd /home/runner/actions-runner
10
    ./bin/Runner.Listener update
11
12
    # Clean up old logs
13
    find /home/runner/_diag -name "*.log" -mtime +7 -delete
14
15
    # Update system packages
16
    apt-get update && apt-get upgrade -y
17
18
    # Clean Docker
19
    docker system prune -af --filter "until=72h"
20
21
    # Restart runner service
22
    systemctl restart actions.runner.*
23
24
    echo "Maintenance completed"
25
}
26
27
# Schedule with cron
28
# 0 2 * * 0 /usr/local/bin/runner-maintenance.sh

This comprehensive troubleshooting guide provides systematic approaches to diagnosing and resolving the most common self-hosted runner issues across all supported platforms. Keep this reference handy for rapid issue resolution and use the escalation procedures when additional support is needed.

Common errors and troubleshooting

Comprehensive error diagnosis and resolution guide for self-hosted runners

Troubleshooting overview#

Systematic debugging approach#

Common error patterns#

Platform-specific errors#

GitHub Actions errors#

Runner offline/connection issues#

Workflow permission denied errors#

Self-hosted runner registration failures#

Cache access and artifact upload problems#

Azure DevOps errors#

Agent pool connection failures#

Service connection authentication errors#

Pipeline variable access denied#

Task execution timeout issues#

GitLab Runner errors#

Runner registration token expired#

Docker executor permission denied#

Kubernetes executor pod failures#

Concurrent job limit exceeded#

Jenkins errors#

Agent SSH connection refused#

JNLP agent connection failures#

Workspace permission issues#

Plugin compatibility problems#

Bazel Remote Build Execution errors#

gRPC connection failures#

Remote cache authentication errors#

Worker registration timeouts#

Build execution environment issues#

Network connectivity issues#

Connection failures#

Timeouts#

Proxy problems#

Authentication problems#

API integration failures#

Token refresh procedures#

Service account issues#

Resource exhaustion#

CPU and memory monitoring#

Storage issues#

Performance optimization#

DevOps Hub integration issues#

API errors#

Service connectivity#

API versioning issues#

Cross-platform problems#

Linux-specific issues#

macOS-specific issues#

Windows-specific issues#

Container and Docker issues#

Docker daemon problems#

Image pull failures#

Container resource limits#

Docker volume issues#

Log analysis#

Structured log investigation#

Log filtering patterns#

Log analysis automation#

Escalation procedures#

Information gathering checklist#

Log sanitization for security#

Support channels and SLA expectations#

Prevention and monitoring#

Proactive monitoring setup#

Is this helpful?

AI Tools