Run affected tests on every PR with an AI failure explainer
Whenever a pull request opens or updates, an agent spins up a fresh sandbox, runs the tests touched by the diff, and posts a plain-English explanation of any failures.
Build an agent workflow that acts as an intelligent CI assistant on every pull request. It should run scoped tests in a fresh Daytona sandbox and post a structured, plain-language failure explanation back to the PR.
Trigger: a GitHub webhook on pull_request events, specifically the opened and synchronize actions. Skip the run entirely if the PR is in draft state.
What the agent should do on each event:
1. Call GitHub's Get a Pull Request operation to read the PR metadata, the head SHA, the base branch, and the list of changed files with their diffs. Use this to decide which tests are likely affected. If the diff touches shared code such as package.json, lockfiles, build config, root-level configs, CI files, or files imported broadly, fall back to running the full test suite.
2. Call Daytona's Create Sandbox to provision a fresh sandbox with sensible default resources for the repo's language. Then call Git Clone Repository to clone the PR repo into the sandbox at the PR's head SHA so the run reflects exactly the code under review.
3. Use Execute Command in Sandbox to install dependencies with the right package manager for the repo (for example npm ci, pnpm install --frozen-lockfile, yarn install --frozen-lockfile, pip install -r requirements.txt, or go mod download). If install fails, capture stderr, skip the test phase, and jump straight to posting a clear install-failure comment on the PR.
4. Use Execute Command in Sandbox again to run only the test files most closely related to the changed files, using the project's standard test runner (jest, vitest, pytest, go test, etc.). If the agent fell back to a full-suite run in step 1, run the full suite instead. Capture stdout, stderr, exit code, and the failing assertion lines.
5. Call GitHub's Create an Issue Comment on the pull request with a structured comment that includes: a one-line pass/fail summary, the list of tests that were run and why (scoped vs full suite), and for each failure a section with the test name, file path, the failing assertion or error, and a short plain-language explanation of the likely cause based on the diff. Use a collapsible section for raw logs so the comment stays readable.
6. Always call Delete Sandbox at the end, even on errors or partial failures, so no sandbox is left running.
Hard rules: enforce a 10 minute total runtime cap across install plus test execution; if the cap is hit, stop the run, tear down the sandbox, and post a comment saying the run timed out and which step was running when it hit the cap. Skip draft PRs entirely (no sandbox created). On install failure, post a comment that clearly says "install failed" rather than mixing it with test results. On unexpected errors mid-run, still post a comment and still delete the sandbox.
The PR comment is the main user-facing output, so optimize it for a human reviewer: lead with whether tests passed, group failures by file, keep the language plain, and never paste a wall of raw logs above the summary.
Additional information
What does this prompt do?
- Watches your repository for new and updated pull requests and skips drafts automatically
- Spins up a clean, isolated Daytona sandbox for each PR so your laptop and main CI stay untouched
- Runs only the test files that actually relate to the changed code, with a smart fallback to the full suite for cross-cutting changes
- Posts a structured comment on the PR that lists every failing test, the error, and a plain-language guess at what in the diff likely broke it
What do I need to use this?
- A GitHub account with permission to comment on pull requests in the repository you want to watch
- A Daytona account so the agent can create and tear down sandboxes for each PR
- A repository with a working test command, like npm test, pytest, or go test
How can I customize it?
- Change the install and test commands to match your stack, for example pnpm install and pnpm vitest, or pip install -r requirements.txt and pytest
- Adjust the rule for what counts as a cross-cutting change that should fall back to the full test suite
- Tighten or relax the 10 minute total runtime cap, or change how the agent picks which tests to run for a given diff
- Tweak the format of the PR comment, like grouping failures by file, adding a summary table, or tagging a reviewer when more than three tests fail
Frequently asked questions
Will this replace my existing CI like GitHub Actions or CircleCI?
How does it decide which tests to run?
What happens if installing dependencies or starting the sandbox fails?
Does this cost a lot to run?
Will it run on draft pull requests?
Stop reading raw CI logs to figure out which line broke the build.
Connect GitHub and Daytona once, and Geni runs the right tests on every pull request and explains failures in plain English.