How individual test outcomes roll up into run-level results, and what drives the overall pass/fail decision.

A run executes multiple tests. Each test produces its own outcome, and the run's overall result is computed from those outcomes using a clear set of rules.

Test outcome vs execution status

These are two different things:

Concept	What it tracks	Example
Execution status	Whether the session finished	`completed`, `failed`, `aborted`
Test outcome	What the test found	`passed`, `failed`, `blocked`

An execution can be completed (the session ran to the end) while the test outcome is failed (the test found a defect). The test outcome is what matters for results.

Always look at test outcome

A completed execution does not mean the test passed. It means the agent finished running. The effective result is always the test outcome, not the execution status.

How run results aggregate

When all child tests complete, the run's overall status is determined by priority:

Rule (evaluated in order)	Run status
Any test is `Verified Failed`	Verified Failed
Else any test is `Blocked`	Blocked
Else any test is `To Verify`	To Verify
Else all tests are `Verified Passed`	Verified Passed

One failure fails the run

A run with 49 passes and 1 failure is still Verified Failed. One failing test is enough to mark the entire run as failed.

Tests that are quarantined (flagged as flaky) or intentionally skipped are treated as neutral in aggregation. They do not cause the run to fail or pass — they are excluded from the result calculation.

A run where all non-quarantined tests pass is Verified Passed, even if some quarantined tests were skipped.

Mixed outcomes and Needs Review

If a run contains a mix of test results and non-test statuses (like closed or cannot_reproduce alongside verified_passed), the system flags Needs Review and sets the run to To Verify. This prevents ambiguous outcomes from being treated as clean passes.

What each outcome means for the run

Test outcome	Run impact	Action needed
Passed	Positive signal	None
Failed	Run fails	Investigate failure, triage bug
Blocked	Run blocked	Check environment, dependencies
Skipped (quarantined)	Neutral — ignored	Fix flaky test separately
Inconclusive	Run needs review	Human decides: retry or investigate

Understanding Run Results

Test outcome vs execution status

How run results aggregate

Quarantined and skipped tests

Mixed outcomes and Needs Review

What each outcome means for the run