AI test runner for GitHub pull requests, powered by Daytona sandboxes

Build an agent workflow that acts as an intelligent CI assistant on every pull request. It should run scoped tests in a fresh Daytona sandbox and post a structured, plain-language failure explanation back to the PR.

Trigger: a GitHub webhook on pull_request events, specifically the opened and synchronize actions. Skip the run entirely if the PR is in draft state.

What the agent should do on each event:

1. Call GitHub's Get a Pull Request operation to read the PR metadata, the head SHA, the base branch, and the list of changed files with their diffs. Use this to decide which tests are likely affected. If the diff touches shared code such as package.json, lockfiles, build config, root-level configs, CI files, or files imported broadly, fall back to running the full test suite.

2. Call Daytona's Create Sandbox to provision a fresh sandbox with sensible default resources for the repo's language. Then call Git Clone Repository to clone the PR repo into the sandbox at the PR's head SHA so the run reflects exactly the code under review.

3. Use Execute Command in Sandbox to install dependencies with the right package manager for the repo (for example npm ci, pnpm install --frozen-lockfile, yarn install --frozen-lockfile, pip install -r requirements.txt, or go mod download). If install fails, capture stderr, skip the test phase, and jump straight to posting a clear install-failure comment on the PR.

4. Use Execute Command in Sandbox again to run only the test files most closely related to the changed files, using the project's standard test runner (jest, vitest, pytest, go test, etc.). If the agent fell back to a full-suite run in step 1, run the full suite instead. Capture stdout, stderr, exit code, and the failing assertion lines.

5. Call GitHub's Create an Issue Comment on the pull request with a structured comment that includes: a one-line pass/fail summary, the list of tests that were run and why (scoped vs full suite), and for each failure a section with the test name, file path, the failing assertion or error, and a short plain-language explanation of the likely cause based on the diff. Use a collapsible section for raw logs so the comment stays readable.

6. Always call Delete Sandbox at the end, even on errors or partial failures, so no sandbox is left running.

Hard rules: enforce a 10 minute total runtime cap across install plus test execution; if the cap is hit, stop the run, tear down the sandbox, and post a comment saying the run timed out and which step was running when it hit the cap. Skip draft PRs entirely (no sandbox created). On install failure, post a comment that clearly says "install failed" rather than mixing it with test results. On unexpected errors mid-run, still post a comment and still delete the sandbox.

The PR comment is the main user-facing output, so optimize it for a human reviewer: lead with whether tests passed, group failures by file, keep the language plain, and never paste a wall of raw logs above the summary.

Additional information

What does this prompt do?

Watches your repository for new and updated pull requests and skips drafts automatically
Spins up a clean, isolated Daytona sandbox for each PR so your laptop and main CI stay untouched
Runs only the test files that actually relate to the changed code, with a smart fallback to the full suite for cross-cutting changes
Posts a structured comment on the PR that lists every failing test, the error, and a plain-language guess at what in the diff likely broke it

What do I need to use this?

A GitHub account with permission to comment on pull requests in the repository you want to watch
A Daytona account so the agent can create and tear down sandboxes for each PR
A repository with a working test command, like npm test, pytest, or go test

How can I customize it?

Change the install and test commands to match your stack, for example pnpm install and pnpm vitest, or pip install -r requirements.txt and pytest
Adjust the rule for what counts as a cross-cutting change that should fall back to the full test suite
Tighten or relax the 10 minute total runtime cap, or change how the agent picks which tests to run for a given diff
Tweak the format of the PR comment, like grouping failures by file, adding a summary table, or tagging a reviewer when more than three tests fail

Frequently asked questions

Will this replace my existing CI like GitHub Actions or CircleCI?

No. Think of it as a second opinion that lives in the PR conversation. Your normal CI still runs. This agent adds a focused, AI-written explanation of failures so reviewers do not have to dig through raw logs.

How does it decide which tests to run?

It reads the list of files changed in the pull request and runs the tests most closely tied to those files. If the diff touches shared utilities, configs, or anything else broad, it falls back to the full test suite so nothing slips through.

What happens if installing dependencies or starting the sandbox fails?

The agent surfaces the failure clearly in the PR comment so you know the run was a setup problem and not a real test failure. The sandbox is still torn down at the end so nothing is left running.

Does this cost a lot to run?

Each sandbox is short-lived. The agent enforces a 10 minute total runtime cap by default and deletes the sandbox as soon as the run is done, so you only pay for the minutes you actually use.

Will it run on draft pull requests?

No. Draft PRs are skipped on purpose so you can keep iterating locally without burning sandbox time. The agent kicks in as soon as you mark the PR ready for review.