herd daemon · v0.1

The Reverse Proxy
for Stateful Workloads.

Your frontend stays stateless. Your heavy workers don't.

herd sits in front of your OS processes — browsers, LLMs, AI agents. It spawns workers on demand, routes every request to the exact memory state via X-Session-ID, and kills the process the instant the client disconnects.

View on GitHub Read the Docs →
$ go install github.com/herd-core/herd/cmd/herd@latest
$ brew install herd-core/tap/herd
$ docker run --rm -v /var/run/herd:/run ghcr.io/herd-core/herd:latest
$ curl -sL https://herdcore.io/install.sh | bash

Pre-compiled binaries on GitHub Releases. Verify checksums before production use.

The gap no one talks about

The layer that was always missing.

Every standard infrastructure tool is blind to half the picture.
Nginx knows the request. systemd knows the PID. Nothing knows both.
herd does.

Network-to-Process Affinity

A reverse proxy knows your HTTP session but not its OS process. A supervisor knows the PID but not its client. herd binds both: this PID exists only for this network stream. When the client drops, the compute is reclaimed—instantly, atomically, with no polling loop.

Kernel-enforced, not polled

Application-level heartbeat loops drift, starve under load, and introduce race conditions. herd sets pdeathsig on every managed process. When the session is invalidated, the kernel delivers SIGKILL to the entire process group—no timer, no poll, no grace-period bug to exploit.

Built for heavy stateful workloads

Playwright browser sessions, Ollama inference contexts, sandboxed code runners—workloads you can't just restart on every request. herd gives each one a session-scoped dead-man's switch: the process lives exactly as long as the client needs it, and not a millisecond longer.

The problem

You've been here before.

You built an AI agent using Playwright and a local LLM. It works perfectly on the first run.

On the third run, your laptop fans are screaming, RAM is maxed at 24 GB, and you have 15 detached orphan processes running in the background because you hit Ctrl+C too fast.

You kill them one by one. You run it again. Same thing.

What you tried instead

os.Exec timeouts

Drift under load. A 5-second timeout becomes a 60-second timeout when the system is saturated. The process outlives your deadline.

Custom heartbeat pings

Race conditions. The ping thread and the crash handler both try to clean up. One wins. The other corrupts state or panics.

Reaching for Docker

Docker packages the app — it doesn't route traffic to hot memory. Mapping 50 concurrent WebSockets to 50 isolated Chromium instances means custom port-proxying, external TTL logic, and cleanup scripts Docker will never write for you.

Visual proof

bash — 120×32

✗ without herd — 23:47:09

  PID USER      %CPU %MEM COMMAND
 4821 deploy   98.7  12.3 chromium [orphan]
 4822 deploy   97.1  11.8 chromium [orphan]
 4823 deploy   95.4  11.2 chromium [orphan]
 4824 deploy   94.9  10.9 chromium [orphan]
 4825 deploy   93.2  10.6 chromium [orphan]

 [python3.11 crawler.py exited with SIGSEGV]
 [parent PID 4820 is gone — children abandoned]

 Mem used: 14.2 GB / 16 GB
 OOM killer invoked at 23:47:31
 System unresponsive.

✓ with herd — 23:47:09

herd[data-plane]  stream breach detected
                   session: sess_7f3a2b
                   parent:  PID 4820 (SIGSEGV)
                   action:  reaping orphan group

herd[reaper]  SIGKILL → PID 4821 ✓
herd[reaper]  SIGKILL → PID 4822 ✓
herd[reaper]  SIGKILL → PID 4823 ✓
herd[reaper]  SIGKILL → PID 4824 ✓
herd[reaper]  SIGKILL → PID 4825 ✓

Mem freed: 14.1 GB in 3ms
System nominal. Next session ready.

The herd way

A config file and three lines of client code.

Define your workers in herd.yaml. Connect from any language using a session header. Everything else is handled.

The Old Way — manage_workers.py
# Warning: this is what you write without herd
import subprocess, signal, atexit, threading, time

procs = {}
lock = threading.Lock()

def cleanup():
    with lock:
        for pid, p in list(procs.items()):
            try:
                p.kill()
                del procs[pid]
            except Exception: pass

atexit.register(cleanup)
signal.signal(signal.SIGTERM, lambda s,f: cleanup())
signal.signal(signal.SIGINT,  lambda s,f: cleanup())

def launch_worker(session_id):
    port = find_free_port()  # your problem
    p = subprocess.Popen(
        ["npx","playwright","run-server",
         "--port", str(port)],
    )
    wait_for_health(port)     # your problem
    with lock:
        procs[session_id] = p
    return port

def poll_health():            # your problem
    while True:
        time.sleep(5)
        for sid, p in list(procs.items()):
            if p.poll() is not None:
                cleanup_session(sid)

threading.Thread(target=poll_health, daemon=True).start()
# ...40 more lines of port proxying, reconnect logic...
The Herd Way — herd.yaml
workers:
  browser:
    cmd: ["npx", "playwright", "run-server",
          "--port", "{{.Port}}"]
    min: 1
    max: 5
    ttl: "15m"
    reuse: false
    health_path: "/"
client.py — 3 lines
# herd intercepts this, spawns a dedicated worker,
# routes the socket, and locks the PID to this session.
browser = await p.chromium.connect(
    "ws://localhost:8080/",
    headers={"X-Session-ID": "user-42"}
)

Dead-man's switch

When the WebSocket closes — intentional or not — the kernel delivers SIGKILL to the entire Playwright process group via pdeathsig. No polling. No timer. No orphan.

Runs locally today

Runs on your machine today.
Scales to your Kubernetes cluster tomorrow.

herd's data plane is a standard HTTP/TCP reverse proxy. The same herd.yaml you run locally is the same contract your workloads run on in production — routed across a distributed mesh without changing a line of application code. No lock-in. No rewrite.