Step 4.8: Structural Analysis¶
Purpose¶
The Structural Analysis step parses student submission files into Abstract Syntax Trees (ASTs) using ast-grep. This step runs after AI Batch (if applicable) and before Grade, enabling subsequent test functions to perform structural pattern matching on the source code without re-parsing files multiple times.
How It Works¶
- Detect Language — The step identifies the submission language (Python, Java, Node.js, C++, or C).
- Heuristic File Filtering — It scans the submission files and identifies those likely to contain source code, skipping binary files, images, and non-code configurations (e.g.,
.png,.json,.yaml,.md). - Parse ASTs — For each identified code file, it uses
ast-grep-pyto parse the content into anSgRootobject. - Store Results — The resulting mapping of
Dict[filename, SgRoot]is stored inStepResult.dataunderStepName.STRUCTURAL_ANALYSIS.
If ast-grep-py is not installed or the language is not supported, the step logs a warning and proceeds with an empty result set, allowing the pipeline to continue.
Dependencies¶
| Step | What It Needs |
|---|---|
| Bootstrap | The raw Submission object containing files and language metadata |
Input¶
| Source | Data |
|---|---|
| Pipeline | pipeline_exec.submission → submission files and language |
Output¶
| Field | Type | Description |
|---|---|---|
data |
StructuralAnalysisResult |
Contains a mapping of filenames to ast-grep SgRoot objects |
status |
StepStatus.SUCCESS |
Usually succeeds even if parsing fails for some files (stores None for those files) |
How It Integrates with Grade¶
The GradeStep reads the STRUCTURAL_ANALYSIS step result from the pipeline and passes it as structural_analysis to GraderService.grade_from_tree(). The grader threads this object through the tree traversal into every process_test() call, which forwards it as a kwarg to test_function.execute().
Test functions like ForbiddenKeywordTest can then use this pre-computed AST to perform efficient and accurate queries using ast-grep's pattern matching syntax.
Key Design Decisions¶
- Pre-computed ASTs — Parsing is done once at the pipeline level rather than inside individual tests to ensure efficiency when multiple structural tests are defined.
- Fail-safe — If parsing fails for a specific file (e.g., due to syntax errors), the step stores
Nonefor that file instead of failing the entire pipeline. - Library Choice —
ast-grepwas chosen for its performance (Rust-based) and its ability to provide high-level, language-agnostic pattern matching queries.
Source Files¶
| File | Contents |
|---|---|
autograder/steps/structural_analysis_step.py |
The StructuralAnalysisStep pipeline step |
autograder/models/dataclass/structural_analysis_result.py |
Data container for parsed AST roots |
Next Step¶
After structural analysis is complete, the pipeline proceeds to Step 5: Grade to execute all tests.