Auto-reproduce GitHub bug reports in a sandbox

Build me an agent workflow that auto-attempts to reproduce new GitHub bug reports in a Daytona sandbox before an engineer ever looks at them.

Trigger: GitHub webhook for new issues created. Filter to issues that carry a configurable label (default: bug). Also skip the issue entirely if the body has no code block and no numbered steps, so vague reports don't waste sandbox time. Make the label and that skip rule configurable.

When an issue passes the filter, the agent should:

1. Use GitHub's Get an Issue to pull the full issue body, then extract the affected repository (default to the repo the issue was opened in) and the reporter's "steps to reproduce" section. If the report references env, OS, or version info, capture that too.

2. Use Daytona's Create Sandbox to spin up a fresh, short-lived sandbox with sensible auto-stop (e.g. 15 minutes idle) and auto-delete (e.g. a few hours) so it cleans itself up. Then call Git Clone Repository to clone the repo at the default branch into the sandbox.

3. Use Execute Command in Sandbox to run the project's setup (install dependencies, run any documented bootstrap script, populate a .env from a template if one exists). Detect the toolchain from lockfiles (package.json, pyproject.toml, go.mod, Gemfile, etc.) rather than hard-coding one stack.

4. Walk the reproduction steps one at a time using Execute Command in Sandbox for shell steps, Run Code in Sandbox for inline scripts, or curl/CLI calls for HTTP repros. Capture stdout, stderr, exit code, and any minimal stack trace for each step. Stop early as soon as the bug clearly reproduces or clearly does not.

5. If the bug is a server/HTTP one (the steps started a web process), use Get Sandbox Preview URL on the exposed port so a human can hit the running instance before the sandbox auto-stops.

6. Use Create an Issue Comment on the original GitHub issue summarizing: whether the bug reproduced (yes / no / inconclusive), the exact command sequence the agent ran, the relevant log excerpt (truncated, in a fenced code block), the preview URL if applicable, and a one-line guess at the likely subsystem (e.g. "auth middleware", "db migration", "frontend router") so a human can triage faster. Make it clear the comment is from an automated triage agent.

7. If reproduction succeeded, also use Slack Bot's Send a Message to post into a configurable #bug-triage channel with a link to the issue, the repo, the one-line subsystem guess, and a tag for the on-call user/group. Skip the Slack ping for non-reproducing or inconclusive runs (those just get the GitHub comment).

8. Use Stop Sandbox at the end of the run so cost stops accruing immediately; rely on Daytona's auto-delete to clean up the disk. If any step in the pipeline errored out (sandbox couldn't start, clone failed, setup failed), still post a GitHub comment explaining what blocked the repro attempt so the human triager isn't left wondering.

Keep the sandbox lifecycle tight: create, attempt repro, comment, stop. Never leave a sandbox running across runs. One issue = one sandbox.

Configurable inputs the user should be able to set when installing the workflow: the label to watch (default bug), the Slack channel for confirmed-bug alerts, the on-call user or user-group to tag, optional project setup commands to run after clone (overrides the auto-detected ones), and optional environment variables to inject into the sandbox.

Additional information

What does this prompt do?

Watches your GitHub repo for new issues with the bug label and reads the reporter's steps to reproduce.
Spins up a clean, isolated sandbox, pulls the repo, installs dependencies, and walks through the reported steps.
Posts a comment back on the issue saying whether the bug reproduced, the exact commands that did or did not trigger it, and a short log excerpt your team can use to triage.
Pings a Slack channel and tags the on-call when the bug reproduces, so a human jumps on confirmed issues first.

What do I need to use this?

A GitHub account with access to the repo where bug reports are filed.
A Daytona account for spinning up the throwaway sandbox where the reproduction runs.
A Slack workspace with a channel for confirmed bug alerts (optional, but recommended for on-call routing).

How can I customize it?

Change which label the agent reacts to (default is bug) if your team uses something like regression, p0, or needs-repro.
Pick the Slack channel that gets pinged for confirmed bugs and decide who gets tagged as on-call.
Tell the agent which setup commands your project needs (for example, install scripts or env files) so the sandbox is ready before it tries the steps.
Skip vague reports automatically by requiring a code block or numbered steps in the issue body, so the sandbox is only used on actionable tickets.

Frequently asked questions

What happens if the bug report doesn't have clear reproduction steps?

The agent skips it and does not spin up a sandbox. You can configure it to require a code block or a numbered list in the issue body, so vague reports stay in the human triage queue without burning sandbox time.

Will the sandbox stay running and cost me money?

No. The agent creates a short-lived sandbox, runs the reproduction, posts the result, and shuts it down. If the bug is a server or HTTP issue, it surfaces a preview link in the GitHub comment so an engineer can poke at the running instance before it auto-stops.

Does the agent try to fix the bug?

No. This workflow is intentionally just for reproduction and first-pass triage. It tells you whether the bug is real, the exact command sequence that triggered it, and a one-line guess at the likely subsystem, so an engineer can jump straight to the fix.

Can I use this on a private repo?

Yes. When you connect GitHub, you grant access to whichever repos you want the agent to watch, including private ones. The clone happens inside the sandbox using the same access.

What if the agent gets the reproduction wrong?

The comment it posts always includes the exact commands it ran and the raw output, so a human reviewer can see what was tried and adjust the steps or the setup. Treat the comment as a first-pass signal, not a verdict.