Runners

Common errors and troubleshooting

Comprehensive error diagnosis and resolution guide for self-hosted runners


This guide provides systematic approaches to diagnosing and resolving common issues across all supported runner platforms. Use this reference for rapid troubleshooting and escalation procedures.

Troubleshooting overview#

Systematic debugging approach#

When encountering runner issues, follow this structured approach:

  1. Identify the scope: Single runner, multiple runners, or platform-wide issue?
  2. Check timing: When did the issue start? Was there a recent change?
  3. Gather logs: Collect relevant logs from runner, platform, and DevOps Hub
  4. Reproduce the issue: Can you consistently trigger the problem?
  5. Apply targeted fixes: Start with the most likely causes first
  6. Document resolution: Record the solution for future reference

Common error patterns#

Most runner issues fall into these categories:

  • Connection failures: Network, DNS, or service unavailability
  • Authentication errors: Expired tokens, invalid credentials, or permission issues
  • Resource exhaustion: CPU, memory, disk space, or network bandwidth limitations
  • Configuration problems: Incorrect settings, missing dependencies, or environment issues
  • Platform-specific bugs: Known issues with specific runner versions or platforms

Platform-specific errors#

GitHub Actions errors#

Runner offline/connection issues#

Symptoms:

  • Runner shows as "Offline" in GitHub UI
  • Workflows queue indefinitely without starting
  • Connection timeout errors in runner logs

Common causes:

1
# Check runner service status
2
systemctl status actions.runner.*
3
4
# Verify network connectivity
5
curl -I https://api.github.com
6
telnet api.github.com 443

Solutions:

1
# Restart runner service
2
sudo ./svc.sh stop
3
sudo ./svc.sh start
4
5
# Re-register runner if token expired
6
./config.sh remove --token <removal-token>
7
./config.sh --url https://github.com/owner/repo --token <new-token>
8
sudo ./svc.sh install
9
sudo ./svc.sh start

Workflow permission denied errors#

Error patterns:

1
Error: Resource not accessible by integration
2
Error: Permission denied to github-actions[bot]

Solutions:

  1. Check repository Settings → Actions → General → Workflow permissions
  2. Verify token has required scopes for API operations
  3. Ensure GITHUB_TOKEN has sufficient permissions in workflow
1
permissions:
2
contents: read
3
pull-requests: write
4
checks: write

Self-hosted runner registration failures#

Error patterns:

1
A runner exists with the same name
2
Could not connect to the server
3
Invalid runner token

Resolution steps:

1
# Remove existing runner
2
./config.sh remove --token <removal-token>
3
4
# Clean runner directory
5
rm -rf _diag/ _work/
6
7
# Re-register with new token
8
./config.sh --url https://github.com/owner/repo --token <new-registration-token>

Cache access and artifact upload problems#

Symptoms:

  • Artifact upload failures
  • Cache restore/save errors
  • Permission denied on cache operations

Debugging:

1
# Check available disk space
2
df -h
3
du -sh ./_work/
4
5
# Verify cache directory permissions
6
ls -la ~/.cache/

Solutions:

1
# Clear cache directory
2
rm -rf ~/.cache/actions-runner/
3
4
# Fix permissions
5
sudo chown -R runneruser:runneruser ~/.cache/

Azure DevOps errors#

Agent pool connection failures#

Symptoms:

  • Agent shows as "Offline" in Azure DevOps
  • Connection refused errors
  • Authentication timeouts

Diagnostic commands:

1
# Check agent service
2
systemctl status vsts.agent.*
3
4
# Test connectivity
5
curl -v https://dev.azure.com/organization
6
7
# Verify agent configuration
8
cat .agent

Solutions:

1
# Reconfigure agent
2
./config.sh remove
3
./config.sh --url https://dev.azure.com/organization --auth pat --token <new-pat>
4
5
# Restart service
6
sudo systemctl restart vsts.agent.*

Service connection authentication errors#

Error patterns:

1
TF400813: The user 'Build\...' is not authorized to access this resource
2
VS403403: The current user does not have permission to perform this action

Resolution:

  1. Verify service principal has required Azure permissions
  2. Check service connection configuration in project settings
  3. Ensure agent pool has permission to access service connections

Pipeline variable access denied#

Symptoms:

  • Variables undefined in pipeline runs
  • Secret variables not being decrypted
  • Environment-specific variable access issues

Solutions:

1
# Explicitly define variable groups
2
variables:
3
- group: 'Production Variables'
4
- name: 'Environment'
5
value: 'production'

Task execution timeout issues#

Common timeouts:

  • Default task timeout: 60 minutes
  • Job timeout: 360 minutes (6 hours)
  • Pipeline timeout: 60 minutes

Configuration:

1
jobs:
2
- job: LongRunningJob
3
timeoutInMinutes: 120 # 2 hours
4
steps:
5
- task: PowerShell@2
6
timeoutInMinutes: 30
7
inputs:
8
script: 'long-running-script.ps1'

GitLab Runner errors#

Runner registration token expired#

Error patterns:

1
ERROR: Verifying runner... failed
2
ERROR: Failed to register runner
3
HTTP 403 Forbidden

Resolution:

1
# Unregister old runner
2
gitlab-runner unregister --name <runner-name>
3
4
# Register with new token
5
gitlab-runner register \
6
--url https://gitlab.com/ \
7
--registration-token <new-token> \
8
--name <runner-name>

Docker executor permission denied#

Symptoms:

  • Permission denied errors in Docker containers
  • Volume mount failures
  • Docker socket access issues

Solutions:

1
# Add gitlab-runner to docker group
2
sudo usermod -aG docker gitlab-runner
3
4
# Fix Docker socket permissions
5
sudo chmod 666 /var/run/docker.sock
6
7
# Restart GitLab Runner
8
sudo gitlab-runner restart

Configuration fix:

1
[[runners]]
2
[runners.docker]
3
privileged = true
4
volumes = ["/var/run/docker.sock:/var/run/docker.sock"]

Kubernetes executor pod failures#

Common issues:

  • ImagePullBackOff errors
  • Resource quota exceeded
  • Node selector constraints

Debugging:

1
# Check pod status
2
kubectl get pods -n gitlab-runner
3
4
# Describe failed pods
5
kubectl describe pod <pod-name> -n gitlab-runner
6
7
# Check resource quotas
8
kubectl describe quota -n gitlab-runner

Configuration solutions:

1
[[runners]]
2
[runners.kubernetes]
3
image = "ubuntu:20.04"
4
privileged = true
5
[runners.kubernetes.node_selector]
6
"node-type" = "runner-node"
7
[runners.kubernetes.resources]
8
[runners.kubernetes.resources.limits]
9
memory = "2Gi"
10
cpu = "1000m"

Concurrent job limit exceeded#

Error message:

1
ERROR: Job failed (system failure): concurrent job limit exceeded

Solutions:

  1. Increase concurrent job limit in runner configuration
  2. Deploy additional runners
  3. Optimize job duration to increase throughput
1
concurrent = 10 # Increase from default of 1
2
3
[[runners]]
4
limit = 5 # Jobs per runner

Jenkins errors#

Agent SSH connection refused#

Symptoms:

  • "Connection refused" errors in Jenkins logs
  • Agent appears offline
  • SSH handshake failures

Debugging:

1
# Test SSH connection manually
2
ssh -v jenkins@<agent-ip>
3
4
# Check SSH service
5
systemctl status ssh
6
7
# Verify SSH key authentication
8
ssh-keygen -t rsa -b 4096 -f ~/.ssh/jenkins_key

Solutions:

  1. Ensure SSH service is running on agent
  2. Verify Jenkins master can reach agent IP/port
  3. Check SSH key permissions and authentication
1
# Fix SSH key permissions
2
chmod 600 ~/.ssh/jenkins_key
3
chmod 700 ~/.ssh/
4
5
# Test SSH connection with key
6
ssh -i ~/.ssh/jenkins_key jenkins@<agent-ip>

JNLP agent connection failures#

Error patterns:

1
java.nio.channels.ClosedChannelException
2
Failed to connect to http://jenkins:8080/

Resolution steps:

  1. Verify Jenkins URL is accessible from agent
  2. Check JNLP port configuration (default 50000)
  3. Ensure firewall allows connection
1
# Test JNLP port connectivity
2
telnet <jenkins-master> 50000
3
4
# Download and run JNLP agent
5
wget http://jenkins:8080/jnlpJars/agent.jar
6
java -jar agent.jar -jnlpUrl http://jenkins:8080/computer/<node-name>/jenkins-agent.jnlp -secret <secret>

Workspace permission issues#

Symptoms:

  • Permission denied errors during builds
  • Unable to create/delete files in workspace
  • Git checkout failures

Solutions:

1
# Fix workspace permissions
2
sudo chown -R jenkins:jenkins /var/lib/jenkins/workspace/
3
sudo chmod -R 755 /var/lib/jenkins/workspace/
4
5
# Clean workspace
6
rm -rf /var/lib/jenkins/workspace/<job-name>/*

Plugin compatibility problems#

Common issues:

  • Plugin version conflicts
  • Incompatible Jenkins core version
  • Missing plugin dependencies

Resolution:

  1. Check plugin compatibility matrix
  2. Update plugins in correct dependency order
  3. Roll back problematic plugins
1
# Check plugin status via CLI
2
java -jar jenkins-cli.jar -s http://jenkins:8080/ list-plugins
3
4
# Install specific plugin version
5
java -jar jenkins-cli.jar -s http://jenkins:8080/ install-plugin <plugin-name>@<version>

Bazel Remote Build Execution errors#

gRPC connection failures#

Error patterns:

1
UNAVAILABLE: io exception
2
DEADLINE_EXCEEDED: context deadline exceeded
3
Failed to connect to remote execution service

Debugging:

1
# Test gRPC connectivity
2
grpc_cli call <remote-executor-address> \
3
google.devtools.remoteworkers.v1test2.Bots/CreateBotSession
4
5
# Check TLS certificate
6
openssl s_client -connect <remote-executor>:443 -servername <hostname>

Solutions:

  1. Verify remote executor endpoint is reachable
  2. Check authentication credentials
  3. Ensure TLS certificates are valid
1
# Bazel configuration with authentication
2
bazel build --remote_executor=grpcs://executor.example.com:443 \
3
--google_credentials=/path/to/credentials.json \
4
--remote_timeout=60

Remote cache authentication errors#

Error patterns:

1
PERMISSION_DENIED: Request not authenticated
2
UNAUTHENTICATED: Invalid credentials

Solutions:

1
# Authenticate with service account
2
gcloud auth activate-service-account --key-file=credentials.json
3
4
# Configure Bazel with authentication
5
bazel build --remote_cache=grpcs://cache.example.com:443 \
6
--google_credentials=/path/to/credentials.json

Worker registration timeouts#

Symptoms:

  • Workers fail to register with scheduler
  • Build requests timeout waiting for workers
  • Intermittent worker availability

Configuration fixes:

1
# worker.conf
2
worker_properties = {
3
"pool": "default",
4
"os": "linux",
5
"arch": "x86_64"
6
}
7
8
# Increase timeouts
9
registration_timeout = "30s"
10
keepalive_timeout = "10s"

Build execution environment issues#

Common problems:

  • Missing build tools in worker environment
  • Incorrect PATH configuration
  • Tool version mismatches

Solutions:

  1. Verify worker environment matches local build requirements
  2. Use container-based workers for consistency
  3. Pin tool versions in BUILD files
1
# BUILD file with explicit tool versions
2
cc_binary(
3
name = "my_binary",
4
srcs = ["main.cc"],
5
toolchains = ["@bazel_tools//tools/cpp:toolchain"],
6
)

Network connectivity issues#

Connection failures#

Diagnostic commands:

1
# Test basic connectivity
2
ping -c 4 console.assistance.bg
3
traceroute console.assistance.bg
4
5
# Check DNS resolution
6
nslookup console.assistance.bg
7
dig +trace assistance.bg
8
9
# Test HTTPS connectivity
10
curl -v -I https://console.assistance.bg
11
openssl s_client -connect console.assistance.bg:443

Common solutions:

  1. Check firewall rules and security groups
  2. Verify DNS configuration
  3. Test with different networks (mobile hotspot)
  4. Check proxy configuration

Timeouts#

Connection timeout patterns:

1
# Increase connection timeouts
2
curl --connect-timeout 30 --max-time 300 https://api.assistance.bg
3
4
# Test with increased verbosity
5
curl -v --trace-time https://api.assistance.bg/health

HTTP client configuration:

1
# GitHub Actions with timeout
2
- name: API call with timeout
3
run: |
4
curl --connect-timeout 10 --max-time 30 \
5
--retry 3 --retry-delay 5 \
6
https://api.assistance.bg/projects

Proxy problems#

Corporate proxy configuration:

1
# Set proxy environment variables
2
export HTTP_PROXY=http://proxy.company.com:8080
3
export HTTPS_PROXY=http://proxy.company.com:8080
4
export NO_PROXY=localhost,127.0.0.1,.company.com
5
6
# Test proxy connectivity
7
curl --proxy http://proxy.company.com:8080 https://assistance.bg

Runner-specific proxy config:

1
# GitHub Actions runner with proxy
2
echo "HTTP_PROXY=http://proxy.company.com:8080" >> .env
3
echo "HTTPS_PROXY=http://proxy.company.com:8080" >> .env
4
sudo ./svc.sh stop && sudo ./svc.sh start

Authentication problems#

API integration failures#

Token validation:

1
# Test API token
2
curl -H "Authorization: Bearer ${TOKEN}" \
3
-H "Content-Type: application/json" \
4
https://console.assistance.bg/api/v2/projects
5
6
# Check token expiration
7
jwt-cli decode ${TOKEN}

Common token issues:

  1. Token expired or revoked
  2. Insufficient token permissions
  3. Incorrect token format
  4. Token not properly encoded

Token refresh procedures#

GitHub token refresh:

1
# Create new personal access token
2
# Navigate to GitHub → Settings → Developer settings → Personal access tokens
3
4
# Update runner configuration
5
./config.sh remove --token <old-token>
6
./config.sh --url https://github.com/owner/repo --token <new-token>

DevOps Hub API key rotation:

1
# Generate new API key via console
2
curl -X POST https://console.assistance.bg/api/v2/auth/api-keys \
3
-H "Authorization: Bearer ${CURRENT_TOKEN}" \
4
-H "Content-Type: application/json" \
5
-d '{"name": "Runner API Key", "permissions": ["projects:read", "environments:write"]}'
6
7
# Update runner environment
8
echo "DEVOPSHUB_API_KEY=<new-key>" > .env

Service account issues#

Permission verification:

1
# Azure service principal permissions
2
az ad sp show --id <service-principal-id>
3
az role assignment list --assignee <service-principal-id>
4
5
# Google Cloud service account
6
gcloud projects get-iam-policy <project-id> \
7
--flatten="bindings[].members" \
8
--filter="bindings.members:<service-account-email>"

Service account key rotation:

1
# Google Cloud
2
gcloud iam service-accounts keys create new-key.json \
3
--iam-account=<service-account-email>
4
5
# Delete old key
6
gcloud iam service-accounts keys delete <old-key-id> \
7
--iam-account=<service-account-email>

Resource exhaustion#

CPU and memory monitoring#

Real-time monitoring:

1
# Monitor runner processes
2
top -p $(pgrep -f "runner|agent|gitlab-runner")
3
4
# Memory usage breakdown
5
ps aux --sort=-%mem | head -20
6
free -h
7
8
# CPU usage analysis
9
iostat -c 1 5
10
mpstat 1 5

Resource alerts:

1
# Set up memory monitoring
2
echo '#!/bin/bash
3
MEMORY_USAGE=$(free | grep Mem | awk "{print \$3/\$2 * 100.0}")
4
if (( $(echo "$MEMORY_USAGE > 90" | bc -l) )); then
5
echo "High memory usage: $MEMORY_USAGE%"
6
# Send alert or restart runner
7
fi' > /usr/local/bin/memory-check.sh
8
9
# Add to crontab
10
echo "*/5 * * * * /usr/local/bin/memory-check.sh" | crontab -

Storage issues#

Disk space monitoring:

1
# Check available space
2
df -h
3
df -i # Check inode usage
4
5
# Find large directories
6
du -sh /var/lib/docker /tmp /home/runner
7
find /var/log -size +100M -type f
8
9
# Clean up runner workspaces
10
find /home/runner/_work -name "*.log" -mtime +7 -delete
11
docker system prune -f

Automated cleanup:

1
#!/bin/bash
2
# cleanup-runner.sh
3
set -e
4
5
# Clean Docker
6
docker system prune -af --filter "until=24h"
7
8
# Clean runner workspaces
9
find /home/runner/_work -maxdepth 2 -type d -mtime +1 -exec rm -rf {} +
10
11
# Clean logs
12
find /var/log -name "*.log" -mtime +7 -delete
13
journalctl --vacuum-time=7d
14
15
# Clean package cache
16
apt-get clean

Performance optimization#

Runner tuning:

1
# Increase file descriptor limits
2
echo "* soft nofile 65536" >> /etc/security/limits.conf
3
echo "* hard nofile 65536" >> /etc/security/limits.conf
4
5
# Optimize kernel parameters
6
echo "vm.max_map_count=262144" >> /etc/sysctl.conf
7
echo "fs.file-max=2097152" >> /etc/sysctl.conf
8
sysctl -p

Concurrent job optimization:

1
# GitLab Runner optimization
2
concurrent = 4 # Based on CPU cores
3
4
[[runners]]
5
limit = 2 # Jobs per runner instance
6
[runners.docker]
7
cpus = "1.5"
8
memory = "2g"
9
shm_size = 268435456 # 256MB

DevOps Hub integration issues#

API errors#

Rate limiting detection:

1
# Monitor rate limits
2
curl -I https://console.assistance.bg/api/v2/projects \
3
-H "Authorization: Bearer ${TOKEN}" | grep -i rate
4
5
# Expected headers:
6
# X-RateLimit-Limit: 100
7
# X-RateLimit-Remaining: 50
8
# X-RateLimit-Reset: 1640995200

Rate limiting solutions:

1
# Implement exponential backoff
2
retry_api_call() {
3
local max_attempts=5
4
local delay=1
5
6
for i in $(seq 1 $max_attempts); do
7
if curl -f "https://console.assistance.bg/api/v2/projects" \
8
-H "Authorization: Bearer ${TOKEN}"; then
9
return 0
10
fi
11
12
if [ $i -lt $max_attempts ]; then
13
echo "Attempt $i failed, retrying in ${delay}s..."
14
sleep $delay
15
delay=$((delay * 2))
16
fi
17
done
18
19
return 1
20
}

Service connectivity#

Health check endpoints:

1
# DevOps Hub service status
2
curl https://console.assistance.bg/api/health
3
curl https://console.assistance.bg/api/v2/status
4
5
# Expected response:
6
{
7
"status": "healthy",
8
"version": "2.1.0",
9
"timestamp": "2026-01-17T10:00:00Z"
10
}

Connection pooling optimization:

1
# Configure HTTP client settings
2
export HTTP_KEEP_ALIVE=true
3
export HTTP_MAX_CONNECTIONS_PER_HOST=10
4
export HTTP_CONNECTION_TIMEOUT=30

API versioning issues#

Version compatibility check:

1
# Check API version
2
curl https://console.assistance.bg/api/version
3
4
# Use specific API version
5
curl https://console.assistance.bg/api/v2/projects \
6
-H "Accept: application/vnd.devopshub.v2+json"

Cross-platform problems#

Linux-specific issues#

Permission problems:

1
# Check runner user permissions
2
id runneruser
3
groups runneruser
4
5
# Fix common permission issues
6
sudo usermod -aG docker runneruser
7
sudo chown -R runneruser:runneruser /home/runneruser
8
9
# SELinux issues (RHEL/CentOS)
10
sestatus
11
setsebool -P httpd_can_network_connect 1

SystemD service management:

1
# Check service status
2
systemctl status actions.runner.*
3
journalctl -u actions.runner.* -f
4
5
# Service configuration
6
sudo systemctl enable actions.runner.*
7
sudo systemctl restart actions.runner.*

macOS-specific issues#

Xcode command line tools:

1
# Install/update command line tools
2
xcode-select --install
3
sudo xcodebuild -license accept
4
5
# Check installed tools
6
xcode-select -p
7
xcrun --show-sdk-path

macOS security restrictions:

1
# Check Gatekeeper status
2
spctl --status
3
4
# Allow unsigned binaries (if needed)
5
sudo spctl --master-disable
6
7
# Keychain access for certificates
8
security list-keychains
9
security unlock-keychain ~/Library/Keychains/login.keychain

LaunchDaemon configuration:

1
<!-- com.github.actions.runner.plist -->
2
<?xml version="1.0" encoding="UTF-8"?>
3
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
4
<plist version="1.0">
5
<dict>
6
<key>Label</key>
7
<string>com.github.actions.runner</string>
8
<key>ProgramArguments</key>
9
<array>
10
<string>/Users/runner/actions-runner/run.sh</string>
11
</array>
12
<key>RunAtLoad</key>
13
<true/>
14
<key>KeepAlive</key>
15
<true/>
16
<key>UserName</key>
17
<string>runner</string>
18
<key>WorkingDirectory</key>
19
<string>/Users/runner/actions-runner</string>
20
</dict>
21
</plist>

Windows-specific issues#

PowerShell execution policy:

1
# Check current policy
2
Get-ExecutionPolicy
3
4
# Set policy for runner
5
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
6
7
# Bypass for specific script
8
powershell.exe -ExecutionPolicy Bypass -File script.ps1

Windows service configuration:

1
# Install as Windows service
2
.\config.cmd --url https://github.com/owner/repo --token <token>
3
.\svc.sh install
4
5
# Service management
6
sc query "GitHub Actions Runner"
7
sc start "GitHub Actions Runner"
8
sc stop "GitHub Actions Runner"

Windows firewall:

1
# Check firewall rules
2
Get-NetFirewallRule | Where-Object {$_.DisplayName -match "runner"}
3
4
# Add firewall rule
5
New-NetFirewallRule -DisplayName "GitHub Runner" -Direction Outbound -Port 443 -Protocol TCP -Action Allow

Container and Docker issues#

Docker daemon problems#

Service management:

1
# Check Docker daemon status
2
systemctl status docker
3
docker info
4
5
# Restart Docker daemon
6
sudo systemctl restart docker
7
8
# Check Docker logs
9
journalctl -u docker -f

Docker configuration:

1
{
2
"log-driver": "json-file",
3
"log-opts": {
4
"max-size": "10m",
5
"max-file": "3"
6
},
7
"registry-mirrors": ["https://mirror.gcr.io"],
8
"insecure-registries": ["localhost:5000"]
9
}

Image pull failures#

Authentication issues:

1
# Login to registry
2
docker login registry.example.com
3
echo $REGISTRY_PASSWORD | docker login --username $REGISTRY_USER --password-stdin
4
5
# Test image pull
6
docker pull ubuntu:20.04

Network issues:

1
# Check DNS resolution from container
2
docker run --rm alpine nslookup registry.example.com
3
4
# Test with different DNS
5
docker run --rm --dns 8.8.8.8 alpine nslookup registry.example.com

Container resource limits#

Memory constraints:

1
# Check memory limits
2
docker stats --no-stream
3
docker inspect <container-id> | jq '.[0].HostConfig.Memory'
4
5
# Set memory limits
6
docker run -m 2g ubuntu:20.04

OOM killer issues:

1
# Check OOM killer logs
2
dmesg | grep "Killed process"
3
journalctl -k | grep "Memory cgroup out of memory"
4
5
# Monitor memory usage
6
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"

Docker volume issues#

Permission problems:

1
# Check volume mounts
2
docker inspect <container> | jq '.[0].Mounts'
3
4
# Fix volume permissions
5
sudo chown -R 1000:1000 /host/volume/path

Disk space issues:

1
# Clean up Docker
2
docker system df
3
docker system prune -af
4
5
# Remove unused volumes
6
docker volume prune -f
7
8
# Clean up build cache
9
docker builder prune -af

Log analysis#

Structured log investigation#

Log location by platform:

GitHub Actions:

1
# Runner logs
2
tail -f /home/runner/_diag/Runner_*.log
3
tail -f /home/runner/_diag/Worker_*.log
4
5
# System logs
6
journalctl -u actions.runner.* -f

GitLab Runner:

1
# Runner logs
2
tail -f /var/log/gitlab-runner/gitlab-runner.log
3
4
# Service logs
5
journalctl -u gitlab-runner -f
6
7
# Job logs
8
docker logs $(docker ps --filter label=com.gitlab.gitlab-runner.job.id --format "{{.ID}}")

Jenkins:

1
# Master logs
2
tail -f $JENKINS_HOME/logs/jenkins.log
3
4
# Agent logs
5
tail -f /var/log/jenkins/jenkins.log
6
cat /var/lib/jenkins/logs/slaves/<node-name>/slave.log

Azure DevOps:

1
# Agent logs (Linux)
2
tail -f /home/vsts/agents/_diag/Agent_*.log
3
4
# Windows logs
5
Get-Content "C:\agents\_diag\Agent_*.log" -Tail 50 -Wait

Log filtering patterns#

Common error patterns:

1
# Connection errors
2
grep -E "(connection|timeout|refused|failed)" runner.log
3
4
# Authentication errors
5
grep -E "(auth|unauthorized|forbidden|401|403)" runner.log
6
7
# Resource errors
8
grep -E "(memory|disk|space|quota|limit)" runner.log
9
10
# API errors
11
grep -E "(api|http|500|502|503|504)" runner.log

Structured log parsing:

1
# Extract JSON logs
2
jq '.level == "error" | select(.message | contains("connection"))' runner.log
3
4
# Time-based filtering
5
journalctl --since "2026-01-17 09:00:00" --until "2026-01-17 10:00:00" -u actions.runner.*
6
7
# Pattern-based extraction
8
awk '/ERROR/ {print $1, $2, $NF}' runner.log | sort | uniq -c

Log analysis automation#

Log monitoring script:

1
#!/bin/bash
2
# monitor-runner-logs.sh
3
4
LOG_FILE="/var/log/runner/runner.log"
5
ERROR_PATTERNS=(
6
"connection refused"
7
"authentication failed"
8
"out of memory"
9
"disk space"
10
"timeout"
11
)
12
13
tail -F "$LOG_FILE" | while read line; do
14
for pattern in "${ERROR_PATTERNS[@]}"; do
15
if echo "$line" | grep -qi "$pattern"; then
16
echo "$(date): ERROR DETECTED - $pattern"
17
echo "$line"
18
# Send alert
19
curl -X POST https://hooks.slack.com/webhook \
20
-H "Content-Type: application/json" \
21
-d "{\"text\": \"Runner Error: $pattern\"}"
22
fi
23
done
24
done

Log aggregation:

1
# Collect all relevant logs
2
collect_logs() {
3
local output_dir="/tmp/runner-logs-$(date +%Y%m%d-%H%M%S)"
4
mkdir -p "$output_dir"
5
6
# System logs
7
journalctl --no-pager > "$output_dir/system.log"
8
9
# Runner logs
10
cp -r /home/runner/_diag/ "$output_dir/runner-diag/"
11
12
# Docker logs
13
docker logs $(docker ps -aq) > "$output_dir/docker.log" 2>&1
14
15
# System information
16
cat /proc/meminfo > "$output_dir/meminfo"
17
df -h > "$output_dir/disk-usage"
18
ps aux > "$output_dir/processes"
19
20
echo "Logs collected in: $output_dir"
21
}

Escalation procedures#

Information gathering checklist#

Before contacting support, gather this information:

System information:

1
# Operating system details
2
uname -a
3
cat /etc/os-release
4
lscpu
5
free -h
6
df -h
7
8
# Network configuration
9
ip addr show
10
route -n
11
cat /etc/resolv.conf
12
13
# Runner configuration
14
./config.sh --help 2>&1 | head -10 # Shows version
15
cat .runner # Configuration details (sanitize tokens)

Error reproduction:

  1. Exact error message or log entries
  2. Steps to reproduce the issue
  3. Timeline when issue started
  4. Recent changes to configuration
  5. Frequency of occurrence (always/intermittent)

Log collection:

1
# Create support bundle
2
tar -czf runner-support-$(date +%Y%m%d).tar.gz \
3
/var/log/runner/ \
4
/home/runner/_diag/ \
5
~/.runner \
6
~/.env \
7
/etc/systemd/system/actions.runner.*

Log sanitization for security#

Remove sensitive data:

1
# Sanitize logs before sharing
2
sanitize_logs() {
3
local log_file="$1"
4
local output_file="${log_file}.sanitized"
5
6
# Remove tokens and secrets
7
sed -E 's/token=[A-Za-z0-9_-]+/token=***REDACTED***/g' "$log_file" | \
8
sed -E 's/"password"\s*:\s*"[^"]*"/"password": "***REDACTED***"/g' | \
9
sed -E 's/Authorization:\s*Bearer\s+[A-Za-z0-9_.-]+/Authorization: Bearer ***REDACTED***/g' \
10
> "$output_file"
11
12
echo "Sanitized log: $output_file"
13
}

Support channels and SLA expectations#

DevOps Hub Support:

  • Community: GitHub Discussions (Best effort)
  • Pro Plan: Email support (24-48 hours)
  • Enterprise: Dedicated support (4-8 hours)

Platform Support:

  • GitHub: GitHub Support (varies by plan)
  • GitLab: GitLab Support (varies by subscription)
  • Azure DevOps: Microsoft Support
  • Jenkins: Community forums and commercial support

Emergency escalation: For production-critical issues:

  1. Use highest priority support channel
  2. Include "PRODUCTION OUTAGE" in subject
  3. Provide complete information from checklist
  4. Include business impact assessment

Information to include:

  • Organization/project ID
  • Runner platform and version
  • Error messages and logs (sanitized)
  • Steps already attempted
  • Business impact and urgency

Prevention and monitoring#

Proactive monitoring setup#

Health check script:

1
#!/bin/bash
2
# runner-health-check.sh
3
4
check_runner_health() {
5
local errors=0
6
7
# Check service status
8
if ! systemctl is-active --quiet actions.runner.*; then
9
echo "ERROR: Runner service not active"
10
((errors++))
11
fi
12
13
# Check disk space
14
local disk_usage=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')
15
if [ "$disk_usage" -gt 90 ]; then
16
echo "ERROR: Disk usage high: ${disk_usage}%"
17
((errors++))
18
fi
19
20
# Check memory usage
21
local mem_usage=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}')
22
if [ "$mem_usage" -gt 90 ]; then
23
echo "ERROR: Memory usage high: ${mem_usage}%"
24
((errors++))
25
fi
26
27
# Check network connectivity
28
if ! curl -s --max-time 10 https://api.github.com > /dev/null; then
29
echo "ERROR: Cannot reach GitHub API"
30
((errors++))
31
fi
32
33
if [ "$errors" -eq 0 ]; then
34
echo "OK: Runner health check passed"
35
fi
36
37
return $errors
38
}
39
40
# Run health check
41
check_runner_health

Automated maintenance:

1
#!/bin/bash
2
# runner-maintenance.sh
3
4
# Weekly maintenance tasks
5
maintenance_tasks() {
6
echo "Starting runner maintenance..."
7
8
# Update runner software
9
cd /home/runner/actions-runner
10
./bin/Runner.Listener update
11
12
# Clean up old logs
13
find /home/runner/_diag -name "*.log" -mtime +7 -delete
14
15
# Update system packages
16
apt-get update && apt-get upgrade -y
17
18
# Clean Docker
19
docker system prune -af --filter "until=72h"
20
21
# Restart runner service
22
systemctl restart actions.runner.*
23
24
echo "Maintenance completed"
25
}
26
27
# Schedule with cron
28
# 0 2 * * 0 /usr/local/bin/runner-maintenance.sh

This comprehensive troubleshooting guide provides systematic approaches to diagnosing and resolving the most common self-hosted runner issues across all supported platforms. Keep this reference handy for rapid issue resolution and use the escalation procedures when additional support is needed.