Skip to content

Step 4.5: AI Batch

Purpose

The AI Batch step pre-computes results for all AI-powered tests in a single batched API call. This step runs after Sandbox and before Grade, ensuring that AI results are available when the grading engine traverses the criteria tree — without requiring the GraderService to know anything about AI execution.

How It Works

  1. Walk the criteria tree — The step traverses the full CriteriaTree (base, bonus, penalty) looking for TestNode instances whose test_function is an AiTestFunction.
  2. Build prompts — For each AI test found, it calls build_prompt(files, **params) to produce the evaluation prompt and resolves the relevant submission files.
  3. Batch request — All prompts are sent in a single call to AiExecutor.run(), which communicates with the OpenAI API and returns Dict[test_name, TestResult].
  4. Store results — The results dict is stored in StepResult.data under StepName.AI_BATCH.

If no AiTestFunction instances are found in the criteria tree, the step exits immediately with an empty dict and incurs no API cost.

Dependencies

Step What It Needs
Build Tree The CriteriaTree with embedded test functions

Input

Source Data
Pipeline StepName.BUILD_TREECriteriaTree
Pipeline pipeline_exec.submission → submission files
Pipeline pipeline_exec.locale → locale for AI prompt rendering

Output

Field Type Description
data Dict[str, TestResult] Pre-computed AI results keyed by test name, or {} if no AI tests exist
status StepStatus.SUCCESS Always succeeds (API failures result in an empty dict, not a step failure)

How It Integrates with Grade

The GradeStep reads the AI_BATCH step result from the pipeline and passes it as pre_computed_results to GraderService.grade_from_tree(). The grader threads this dict through the tree traversal into every process_test() call, which forwards it as a kwarg to test_function.execute().

AiTestFunction.execute() sees the pre_computed_results kwarg, looks up its own name in the dict, and returns the pre-computed TestResult directly — no further API call, no mutation.

Regular (non-AI) test functions receive the same kwarg but ignore it, since it's passed as **kwargs.

Key Design Decisions

  • Pre-grade, not post-grade — AI results are computed before GradeStep so that the grade step sees real scores from the start. There are no placeholder results, no deferred mutations.
  • Stateless executorAiExecutor is instantiated per-call with no shared state. No module-level singleton.
  • Graceful degradation — If the AI API call fails, the step returns an empty dict. AI tests will use their fallback path (a single-call retry) when GraderService invokes their execute().
  • Zero cost when unused — If a grading configuration has no AI tests, the step detects this instantly and exits with {}.

Source Files

File Contents
autograder/steps/ai_batch_step.py The AiBatchStep pipeline step
autograder/models/abstract/ai_test_function.py AiTestFunction ABC for AI-powered test functions
autograder/utils/executors/ai_executor.py Stateless AiExecutor OpenAI wrapper

Next Step

After AI results are pre-computed, the pipeline proceeds to Step 5: Grade to execute all tests and build the ResultTree.