Self-Hosted Dev Sandboxes: Build Preview Environments with Docker and Go
"The sandboxed README explains the Go control plane, Docker, Traefik, SQLite, preview URLs, idle stopping, and production hardening boundaries."
"Docker's resource constraints documentation explains that containers have no CPU or memory limits by default and need explicit constraints."
"Docker Sandboxes documentation describes microVMs, isolated Docker daemons, network proxying, and credential isolation as a stronger security model."
"The Traefik Docker provider can discover routing configuration from Docker labels and route container services through Host rules."
Preview URLs look like a tiny feature: run port 3000 inside a container, give the user a link, done. In practice, a self-hosted Dev Sandbox quickly pulls in port collisions, domain routing, container recycling, file persistence, and API-driven orchestration. AI coding products hit this especially fast: after the agent writes code, the user does not want logs. They want to click the result.
One docker run is not enough here. A steadier design separates four pieces: one Linux host, Docker for containers, a Go control plane for lifecycle, Traefik for preview domains, and SQLite for state. That setup fits trusted teams and early product validation. If you plan to run arbitrary code from unknown users, containers are only the first layer. You should be thinking about microVMs, dedicated hosts, or Kubernetes.
Use it for many short-lived environments, not two personal containers
For a personal project, two containers, two ports, and docker compose up -d are usually enough. A Dev Sandbox starts to earn its place when the number of environments grows, the lifecycle becomes shorter, and an external system needs to create each environment programmatically.
| Scenario | Is a self-hosted Dev Sandbox a good fit? | Why |
|---|---|---|
| One long-running personal demo | Not really | A shell script or Compose is simpler |
| One preview per team branch | Yes | You need creation, routing, recycling, and state tracking |
| An AI app builder that generates small apps for users | Very good fit | The agent needs to write code in an isolated directory and expose a preview URL immediately |
| Running arbitrary code from unknown public users | Prototype only | Docker containers are not a strong isolation boundary; add VMs or microVMs for production |
| Multi-node scheduling, elastic scaling, complex network policy | Single-host is not enough | Kubernetes or a managed platform is more stable |
Many developers will first ask why the agent cannot just write a shell script. Fair question. A script can solve “start a container.” It does not solve “keep 50 environments alive, stop idle ones, wake them on the next request, keep files, give every environment a stable URL, and let a SaaS backend call everything through an API.” Once those requirements pile up, the script starts becoming a control plane.
The smallest architecture: Go control plane, Docker, Traefik, and SQLite
The tastyeffectco/sandboxes design is deliberately small: a Go program called sandboxd sends Docker commands, Traefik routes dynamically through container labels, SQLite stores state, and workspaces live on disk. No Kubernetes, no separate database server, no message queue.
browser
|
v
Traefik ----> sandbox container ----> dev server :3000
^ ^
| |
sandboxd --------------+
|
+-- SQLite: sandbox state, ports, tasks
+-- workspaces/: one persistent directory per sandbox
+-- reaper: idle stop / memory pressure stop
There are four objects worth separating.
The control plane is not the application container
The Go control plane should only handle lifecycle: create a sandbox, stop it, destroy it, execute a command, submit an agent task, and read or write files. Keep it thin. Do not bury all build logic inside it. More complex behavior can live in the sandbox base image, a task queue, or the application layer above it.
A preview URL is not a random port
Each sandbox can expose an address such as s-<id>-3000.preview.localhost. Traefik reads Docker labels, discovers the target container and port, then forwards requests through a Host rule. Users see a stable link instead of “your port is 30017; try not to collide with someone else.”
SQLite is the state anchor for a small system
Containers restart. Hosts restart. Traefik can reload. SQLite records sandbox IDs, ports, status, tasks, and workspace locations. When the control plane starts, it can reconcile Docker’s actual state against that database. SQLite is fine for an early single-host product, as long as you accept the boundary and back it up.
Run it locally: validate the API, ports, and domain resolution first
Do not rush into connecting an agent. First confirm that the control plane can start containers, Traefik can forward requests, and the preview URL opens. The quick start from the sandboxes README is direct:
git clone https://github.com/tastyeffectco/sandboxes.git
cd sandboxes
./install.sh
API=http://127.0.0.1:9090
curl "$API/healthz"
After the health check passes, create a sandbox that serves on port 3000:
ID=$(curl -s -XPOST "$API/sandbox" \
-H 'content-type: application/json' \
-d '{"ports":[3000]}' | sed -E 's/.*"id":"([^"]+)".*/\1/')
curl -s -XPOST "$API/sandbox/$ID/exec" \
-H 'content-type: application/json' \
-d '{"cmd":["bash","-lc","cd ~/workspace && python3 -m http.server 3000"]}'
Then open:
http://s-<id>-3000.preview.localhost
*.localhost resolves to your local machine in modern browsers, which is useful for zero-DNS local testing. For a real domain, point *.preview.yourdomain.com at the host and let Traefik handle TLS. Do not expose a local API such as 127.0.0.1:9090 directly to the public internet. At minimum, turn on token authentication in production and keep the control-plane port behind a firewall or private gateway.
Preview URLs are hard because of routing, wakeups, and persistence
Basic port forwarding only answers “how do I access the container while it is running?” A Dev Sandbox also has to answer “what happens when the container is asleep?”, “where do files live?”, and “can the same user come back later?” Those three questions decide whether the system can move from demo to beta.
Routing: use the domain as the environment identity
A port is the machine’s perspective. A domain is the product’s perspective. s-<id>-3000.preview.example.com contains the sandbox ID and target port, so the application can show the link directly to the user. Traefik’s Docker provider reads container labels and forwards the Host rule to the right container.
Troubleshoot in this order:
- DNS: does the wildcard domain point to the host, or are you using
*.localhostlocally? - Traefik: does the container have the right labels, and is it on the same Docker network?
- Port: is the app actually listening on
0.0.0.0:3000, not only127.0.0.1? - Readiness: do you have a waiting page or retry strategy while the app starts?
- TLS: is the real domain using a wildcard certificate, and are HTTP and HTTPS entrypoints consistent?
Wakeup: idle stop is not environment deletion
An idle sandbox can be stopped with docker stop, freeing memory while keeping the workspace on disk. The next time a user opens the preview URL, a low-priority catch-all route can send the request to the control plane. The control plane starts the container, waits for the port to become ready, then lets the browser enter the real app.
That mechanism uses fewer resources than keeping everything on forever, and it feels more product-like than deleting the environment when it goes quiet. The tradeoff is cold start latency, so the page should show a warming state instead of leaving the user staring at a 502.
Persistence: bind mounts are useful, but know the boundary
Docker’s documentation treats bind mounts as common for sharing source code and build artifacts. A Dev Sandbox often does the same: one host directory per sandbox, mounted into the container’s workspace. The upside is that code survives container deletion. The downside is that host paths and permissions become part of the system design.
Before a beta, set at least three rules: keep workspace directories separate from control-plane configuration; make “delete the container” and “delete the workspace” two different operations; back up workspaces and SQLite, not just container layers.
Multi-tenant basics: resource limits, Docker socket, API auth, and image caching
Docker’s resource constraints documentation is blunt: containers have no resource constraints by default and can use CPU and memory as the host kernel scheduler allows. In a multi-tenant setup, that is a risk. One user’s npm install, build, or infinite loop can slow down the whole machine.
Start with this checklist:
- Set memory, CPU, and PID limits for every sandbox.
- Keep the control-plane API on localhost or a private network by default; require authentication for any public entrypoint.
- Treat preview links as shareable by default. Add forward-auth when they contain sensitive content.
- Preinstall common tools in the base image so every sandbox does not have to pull everything again.
- Docker Hub has official rate limits and fair-use rules. In production, sign in, prepare image caching, or use a private registry.
- Mount sandbox workspaces separately, and never give
/var/run/docker.sockto user containers. - Log creation, stopping, destruction, command execution, and agent tasks.
A Compose resource-limit example might look like this:
services:
sandbox-app:
image: your-sandbox-base:latest
deploy:
resources:
limits:
cpus: "1.00"
memory: 1G
pids: 256
The bigger safety issue is the Docker socket. If sandboxd mounts the host Docker socket, it has high authority over the host. That can be acceptable when you maintain the control plane and users only enter the sandbox containers it creates. If users can affect the control-plane container or obtain the Docker socket, the risk jumps beyond ordinary container isolation.
When to move to microVMs, Kubernetes, or a managed platform
The upside of single-host Docker is cost, readability, and speed of change. The downside is just as clear: one host has limited capacity, the security boundary relies heavily on container isolation and host governance, and scheduling is weaker than a cluster.
| Trigger | Better direction |
|---|---|
| Running arbitrary code from unknown users | microVMs, dedicated VMs, gVisor, Kata, or Firecracker |
| Agents need full Docker capability without touching the host daemon | Docker Sandboxes-style microVM plus isolated daemon |
| Multi-host scheduling, elastic scaling, unified network policy | Kubernetes |
| The team does not want to maintain the low-level control plane | Managed preview-environment platform |
| You are still validating product demand | Single-host Docker plus a Go control plane |
Docker’s official Sandboxes security model is a useful reference point: it puts the AI agent inside a microVM, gives each sandbox its own Docker daemon, filesystem, and network, and keeps the host Docker daemon away from the sandbox. It costs more resources, but the isolation boundary is clearer.
So start with a single-host setup to learn the product loop: create an environment, let the agent write code, open the preview, recycle idle sandboxes, and keep files. Once real users arrive, upgrade the isolation layer according to actual risk. Do not drag the team into cluster maintenance before the scale problem exists.
A practical checklist from MVP to beta
You can move in this order without building everything at once:
- Choose a clean Linux host that runs only sandbox-related services. Do not co-locate databases, CI runners, or production applications.
- Configure a wildcard preview domain such as
*.preview.example.com. Validate locally with*.localhostfirst. - Validate the control-plane API: create, exec, stop, destroy, and healthz.
- Preinstall Node.js, Python, Git, common package managers, and the agent CLIs you support in the sandbox base image.
- Add resource limits, idle recycling, workspace persistence, and destruction policy for every sandbox.
- Enable API authentication, and add access control to preview links when the business needs it.
- Log audit events. Monitor host memory, disk, container count, cold-start time, and 502 rates.
- Practice backup and recovery for SQLite, workspace directories,
.env, and base-image build scripts.
If your next step is connecting code previews to CI, read GitHub Actions Self-Hosted Runner: A Complete Private Environment Deployment Guide. If you want to host the generated Next.js app long term, Escape Vercel: The Complete Guide to Self-Hosting Next.js with Docker is closer to the second half of the journey. For public domains and origin protection, continue with Cloudflare origin IP allowlisting.
FAQ
How is a Dev Sandbox different from Docker Compose?
Compose is closer to “declare a group of services, then start them.” A Dev Sandbox is closer to “the product backend creates, stops, wakes, and destroys environments on demand, and every environment gets a URL.” If the number of environments is small and they live for a long time, Compose is enough. If environments are dynamic per user, branch, or agent task, you need a control plane.
Why not use Kubernetes?
If you already have a cluster, ingress, image registry, permissions, and monitoring, Kubernetes is a strong foundation for standardized environments. The problem is that many early teams only want to validate an AI app builder or internal preview environment, and maintaining the cluster can become heavier than the product itself. Single-host Docker does not replace Kubernetes; it keeps the first stage light.
Can container isolation run unknown users’ code?
I would not do that directly. Containers fit trusted teams, internal users, or low-risk demos. Arbitrary code from unknown users should run in stronger isolation, such as microVMs, dedicated VMs, gVisor, Kata, Firecracker, or at least hosts split by tenant.
Does every preview URL need HTTPS?
Local *.localhost previews can start with HTTP. For a real public domain, use HTTPS, especially when users enter tokens, forms, or business data. A wildcard certificate reduces the hassle of issuing a separate certificate for every sandbox.
Will files disappear after an idle stop?
As long as the workspace is a persistent directory, docker stop will not delete files. The careful part is the difference between destroy and purge: one removes only the container, while the other also removes the workspace. Make that distinction clear in the product UI and API names.
Can Docker Hub rate limits affect this system?
Yes. When many environments frequently start, build, and pull images, the public registry can become an unstable dependency. In production, sign in to Docker Hub, prepare a private registry or image cache, and bake common dependencies into the base image.
Conclusion
A self-hosted Dev Sandbox is worth building, but it is not just an advanced docker run. It is a small platform: the control plane owns lifecycle, the reverse proxy owns URLs, the state store owns recovery, and resource limits plus security policy keep one environment from dragging down the host.
The steady path is to build the product loop with single-host Docker first, then upgrade based on real risk. When users are few, code is trusted, and the team accepts a single-host boundary, Go + Docker + Traefik + SQLite is enough. Once unknown users, stronger isolation, multi-host scheduling, or compliance governance enter the picture, put microVMs, Kubernetes, or a managed platform on the table.
References
- tastyeffectco/sandboxes
- Docker Resource constraints
- Docker Engine security
- Docker Hub usage and limits
- Docker Sandboxes security model
- Traefik Docker provider
Build a Self-Hosted Dev Sandbox MVP
A practical path from single-host Docker validation to pre-beta safety checks.
⏱️ Estimated time: 4 hr
- 1
Step1: Prepare a clean host
Use a Linux host dedicated to sandbox services. Do not co-locate production databases, CI runners, or other high-value services on it. - 2
Step2: Configure the preview domain
Validate locally with `*.localhost` first. For a real deployment, point `*.preview.example.com` to the host and configure TLS. - 3
Step3: Validate the control-plane API
At minimum, test create, exec, stop, destroy, and healthz so your application backend can manage environments through the API. - 4
Step4: Prepare the sandbox base image
Preinstall Node.js, Python, Git, common package managers, and the agent CLIs you plan to support. This reduces repeated setup after each sandbox starts. - 5
Step5: Add resource limits and recycling
Set CPU, memory, and PID limits for every sandbox. Add idle stop, wake-on-request, and persistent workspace handling. - 6
Step6: Lock down the control plane and previews
Enable API tokens, private networking, or firewall rules. For sensitive previews, add forward-auth or your product's login layer. - 7
Step7: Add logs and monitoring
Log create, stop, destroy, command execution, and agent tasks. Monitor host memory, disk usage, container count, cold-start time, and 502 rates. - 8
Step8: Practice backup and recovery
Back up SQLite, workspace directories, `.env`, and base-image build scripts. Confirm you can restore them on a new host.
FAQ
How is a Dev Sandbox different from Docker Compose?
Why not just use Kubernetes?
Can Docker container isolation run unknown users' code directly?
Does every preview URL need HTTPS?
Will files disappear after an idle sandbox stops?
Can Docker Hub rate limits affect a Dev Sandbox system?
11 min read · Published on: Jun 5, 2026 · Modified on: Jun 8, 2026
Related Posts
Cloudflare Pro vs Business: A Three-Dimensional Decision Tree to Judge Upgrade Timing
Cloudflare Pro vs Business: A Three-Dimensional Decision Tree to Judge Upgrade Timing
Docker Mirror Speed Testing in Practice: 3 Methods + Auto-Switch Scripts
Docker Mirror Speed Testing in Practice: 3 Methods + Auto-Switch Scripts
Docker Pull Timeout Troubleshooting in Enterprise Networks: DNS, Proxy & Registry Mirror Guide
Comments
Sign in with GitHub to leave a comment