Skip to content

Self-hosted Runners

Install and operate self-hosted runner pools for private-network execution.

Use self-hosted runners when test execution must stay inside your network while Certyn remains the control plane.

Architecture

  • Certyn API queues lease offers and commands (stop_session, open_stream).
  • Runner daemon (Certyn.Runner) polls the API over outbound HTTPS.
  • Runner starts/stops local Docker job containers through /var/run/docker.sock.
  • Live browser stream uses reverse relay: runner opens outbound websocket to Certyn.

No inbound firewall rules are required.

Prerequisites

  • Linux host with Docker Engine.
  • Outbound HTTPS access from runner host to Certyn API.
  • Persistent host path for runner state (/var/lib/certyn-runner recommended).
  • Tenant admin access in Certyn to create pools and registration tokens.

Control-plane flags

Ensure API config enables self-hosted execution:

  • Features:SelfHostedRunnersEnabled=true
  • Features:SelfHostedRelayEnabled=true (for live view relay)
  • Optional tenant rollout lists:
    • SelfHosted:EnabledTenants=tenant-a,tenant-b
    • SelfHosted:RelayEnabledTenants=tenant-a,tenant-b

If allowlists are empty, all tenants are allowed when the global feature flag is enabled.

1. Create Pool and Token

In Environments > Self-hosted runners:

  1. Create a runner pool.
  2. Click Create token for that pool.
  3. Copy and store the token securely (shown once).

2. Start Runner Container

Run the runner image with Docker socket and persistent state:

docker run -d \
  --name certyn-runner \
  --restart unless-stopped \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /var/lib/certyn-runner:/var/lib/certyn-runner \
  -e CONTROL_PLANE_URL=https://your-certyn-api.example.com \
  -e RUNNER_POOL_ID=<POOL_ID> \
  -e REGISTRATION_TOKEN=<REGISTRATION_TOKEN> \
  -e RUNNER_NAME=runner-east-1 \
  -e MAX_CONCURRENCY=2 \
  docker.io/certyn/runner:<tag>

Production recommendation:

  • Use a pinned release tag (vX.Y.Z) in production.
  • Avoid latest for production runners to keep rollouts deterministic.

3. Configure Environment Routing

In Environments:

  1. Set Execution target to Self-hosted runner.
  2. Select the target Runner pool.
  3. Save.

New runs for that environment will be scheduled to the selected pool.

Agent image for self-hosted runs

Self-hosted runner pulls the agent image specified by the API lease offer. By default the API uses Docker:DefaultImage, so publish and configure this image as well:

  • Docker:DefaultImage=docker.io/certyn/agent:vX.Y.Z

Recommendation:

  • Keep runner and agent on the same release line (vX.Y.Z) to reduce compatibility risk.

GPU and WebGL limitation

Self-hosted runners are supported today, but heavy WebGL scenarios are not currently supported.

This includes workloads such as:

  • graphics-heavy 3D product surfaces
  • demanding canvas or map scenes
  • cases that require true GPU-backed browser rendering

What to assume today:

  • self-hosted execution works for standard browser automation workloads
  • heavy WebGL should be treated as unsupported
  • GPU-backed self-hosted support is planned, but not available yet

Do not position current self-hosted runners as a supported solution for GPU-sensitive browser testing.

4. Operate Runners

From Environments > Self-hosted runners:

  • Use Drain before maintenance to stop new lease assignment.
  • Use Resume after maintenance.
  • Watch heartbeat and slot availability for capacity tracking.

Observability

Expose and chart these OpenTelemetry metrics:

  • Control plane meter: certyn.self_hosted.control_plane
    • self_hosted.runner.sync.requests
    • self_hosted.runner.sync.failures
    • self_hosted.runner.sync.duration
    • self_hosted.relay.connections.timeouts
    • self_hosted.cleanup.leases.expired
  • Runner daemon meter: certyn.self_hosted.runner_daemon
    • self_hosted.runner.registration.failures
    • self_hosted.runner.sync.failures
    • self_hosted.runner.sync.duration
    • self_hosted.runner.relay.failures

Suggested alerts:

  • Sync failure ratio > 5% for 10 minutes.
  • Relay timeout rate > 2% for 10 minutes.
  • Registration failures > 0 for newly created pools.
  • Expired lease count increasing continuously for 15 minutes.

Maintainer release pipeline

Runner image publish action: .github/actions/publish-runner-image Agent image publish action: .github/actions/publish-agent-image

Required GitHub repository secrets:

  • DOCKERHUB_USERNAME
  • DOCKERHUB_TOKEN

Optional GitHub repository variable:

  • RUNNER_IMAGE_REPO (default: certyn/runner)
  • AGENT_IMAGE_REPO (default: certyn/agent)

Tag behavior:

  • Always publishes sha-<shortsha>.
  • Publishes latest only when the caller enables it.
  • Manual dispatch supports explicit image_tag and optional tag reuse via source_tag.

Troubleshooting

Runner does not appear online

  • Check container logs for registration failures.
  • Verify CONTROL_PLANE_URL, RUNNER_POOL_ID, and REGISTRATION_TOKEN.
  • Ensure outbound HTTPS to Certyn API is allowed.

Runs stay queued

  • Confirm environment is set to self_hosted and pool is selected.
  • Verify at least one runner in that pool is Online.
  • Increase MAX_CONCURRENCY or add runners in the same pool.

Live view cannot connect

  • Confirm stream transport is relay for self-hosted sessions.
  • Validate runner can reach Certyn API websocket endpoint.
  • Check both API and runner logs for open_stream command and relay connection events.