Step 4.5: AI Batch¶

Purpose¶

The AI Batch step pre-computes results for all AI-powered tests in a single batched API call. This step runs after Sandbox and before Grade, ensuring that AI results are available when the grading engine traverses the criteria tree — without requiring the GraderService to know anything about AI execution.

How It Works¶

Walk the criteria tree — The step traverses the full CriteriaTree (base, bonus, penalty) looking for TestNode instances whose test_function is an AiTestFunction.
Build prompts — For each AI test found, it calls build_prompt(files, **params) to produce the evaluation prompt and resolves the relevant submission files.
Batch request — All prompts are sent in a single call to AiExecutor.run(), which communicates with the OpenAI API and returns Dict[test_name, TestResult].
Store results — The results dict is stored in StepResult.data under StepName.AI_BATCH.

If no AiTestFunction instances are found in the criteria tree, the step exits immediately with an empty dict and incurs no API cost.

Dependencies¶

Step	What It Needs
Build Tree	The `CriteriaTree` with embedded test functions

Input¶

Source	Data
Pipeline	`StepName.BUILD_TREE` → `CriteriaTree`
Pipeline	`pipeline_exec.submission` → submission files
Pipeline	`pipeline_exec.locale` → locale for AI prompt rendering

Output¶

Field	Type	Description
`data`	`Dict[str, TestResult]`	Pre-computed AI results keyed by test name, or `{}` if no AI tests exist
`status`	`StepStatus.SUCCESS`	Always succeeds (API failures result in an empty dict, not a step failure)

How It Integrates with Grade¶

The GradeStep reads the AI_BATCH step result from the pipeline and passes it as pre_computed_results to GraderService.grade_from_tree(). The grader threads this dict through the tree traversal into every process_test() call, which forwards it as a kwarg to test_function.execute().

AiTestFunction.execute() sees the pre_computed_results kwarg, looks up its own name in the dict, and returns the pre-computed TestResult directly — no further API call, no mutation.

Regular (non-AI) test functions receive the same kwarg but ignore it, since it's passed as **kwargs.

Key Design Decisions¶

Pre-grade, not post-grade — AI results are computed before GradeStep so that the grade step sees real scores from the start. There are no placeholder results, no deferred mutations.
Stateless executor — AiExecutor is instantiated per-call with no shared state. No module-level singleton.
Graceful degradation — If the AI API call fails, the step returns an empty dict. AI tests will use their fallback path (a single-call retry) when GraderService invokes their execute().
Zero cost when unused — If a grading configuration has no AI tests, the step detects this instantly and exits with {}.

Source Files¶

File	Contents
`autograder/steps/ai_batch_step.py`	The `AiBatchStep` pipeline step
`autograder/models/abstract/ai_test_function.py`	`AiTestFunction` ABC for AI-powered test functions
`autograder/utils/executors/ai_executor.py`	Stateless `AiExecutor` OpenAI wrapper

Next Step¶

After AI results are pre-computed, the pipeline proceeds to Step 5: Grade to execute all tests and build the ResultTree.