Step 4.5: AI Batch¶
Purpose¶
The AI Batch step pre-computes results for all AI-powered tests in a single batched API call. This step runs after Sandbox and before Grade, ensuring that AI results are available when the grading engine traverses the criteria tree — without requiring the GraderService to know anything about AI execution.
How It Works¶
- Walk the criteria tree — The step traverses the full
CriteriaTree(base, bonus, penalty) looking forTestNodeinstances whosetest_functionis anAiTestFunction. - Build prompts — For each AI test found, it calls
build_prompt(files, **params)to produce the evaluation prompt and resolves the relevant submission files. - Batch request — All prompts are sent in a single call to
AiExecutor.run(), which communicates with the OpenAI API and returnsDict[test_name, TestResult]. - Store results — The results dict is stored in
StepResult.dataunderStepName.AI_BATCH.
If no AiTestFunction instances are found in the criteria tree, the step exits immediately with an empty dict and incurs no API cost.
Dependencies¶
| Step | What It Needs |
|---|---|
| Build Tree | The CriteriaTree with embedded test functions |
Input¶
| Source | Data |
|---|---|
| Pipeline | StepName.BUILD_TREE → CriteriaTree |
| Pipeline | pipeline_exec.submission → submission files |
| Pipeline | pipeline_exec.locale → locale for AI prompt rendering |
Output¶
| Field | Type | Description |
|---|---|---|
data |
Dict[str, TestResult] |
Pre-computed AI results keyed by test name, or {} if no AI tests exist |
status |
StepStatus.SUCCESS |
Always succeeds (API failures result in an empty dict, not a step failure) |
How It Integrates with Grade¶
The GradeStep reads the AI_BATCH step result from the pipeline and passes it as pre_computed_results to GraderService.grade_from_tree(). The grader threads this dict through the tree traversal into every process_test() call, which forwards it as a kwarg to test_function.execute().
AiTestFunction.execute() sees the pre_computed_results kwarg, looks up its own name in the dict, and returns the pre-computed TestResult directly — no further API call, no mutation.
Regular (non-AI) test functions receive the same kwarg but ignore it, since it's passed as **kwargs.
Key Design Decisions¶
- Pre-grade, not post-grade — AI results are computed before
GradeStepso that the grade step sees real scores from the start. There are no placeholder results, no deferred mutations. - Stateless executor —
AiExecutoris instantiated per-call with no shared state. No module-level singleton. - Graceful degradation — If the AI API call fails, the step returns an empty dict. AI tests will use their fallback path (a single-call retry) when
GraderServiceinvokes theirexecute(). - Zero cost when unused — If a grading configuration has no AI tests, the step detects this instantly and exits with
{}.
Source Files¶
| File | Contents |
|---|---|
autograder/steps/ai_batch_step.py |
The AiBatchStep pipeline step |
autograder/models/abstract/ai_test_function.py |
AiTestFunction ABC for AI-powered test functions |
autograder/utils/executors/ai_executor.py |
Stateless AiExecutor OpenAI wrapper |
Next Step¶
After AI results are pre-computed, the pipeline proceeds to Step 5: Grade to execute all tests and build the ResultTree.