Step 4.8: Structural Analysis¶

Purpose¶

The Structural Analysis step parses student submission files into Abstract Syntax Trees (ASTs) using ast-grep. This step runs after AI Batch (if applicable) and before Grade, enabling subsequent test functions to perform structural pattern matching on the source code without re-parsing files multiple times.

How It Works¶

Detect Language — The step identifies the submission language (Python, Java, Node.js, C++, or C).
Heuristic File Filtering — It scans the submission files and identifies those likely to contain source code, skipping binary files, images, and non-code configurations (e.g., .png, .json, .yaml, .md).
Parse ASTs — For each identified code file, it uses ast-grep-py to parse the content into an SgRoot object.
Store Results — The resulting mapping of Dict[filename, SgRoot] is stored in StepResult.data under StepName.STRUCTURAL_ANALYSIS.

If ast-grep-py is not installed or the language is not supported, the step logs a warning and proceeds with an empty result set, allowing the pipeline to continue.

Dependencies¶

Step	What It Needs
Bootstrap	The raw `Submission` object containing files and language metadata

Input¶

Source	Data
Pipeline	`pipeline_exec.submission` → submission files and language

Output¶

Field	Type	Description
`data`	`StructuralAnalysisResult`	Contains a mapping of filenames to `ast-grep` `SgRoot` objects
`status`	`StepStatus.SUCCESS`	Usually succeeds even if parsing fails for some files (stores `None` for those files)

How It Integrates with Grade¶

The GradeStep reads the STRUCTURAL_ANALYSIS step result from the pipeline and passes it as structural_analysis to GraderService.grade_from_tree(). The grader threads this object through the tree traversal into every process_test() call, which forwards it as a kwarg to test_function.execute().

Test functions like ForbiddenKeywordTest can then use this pre-computed AST to perform efficient and accurate queries using ast-grep's pattern matching syntax.

Key Design Decisions¶

Pre-computed ASTs — Parsing is done once at the pipeline level rather than inside individual tests to ensure efficiency when multiple structural tests are defined.
Fail-safe — If parsing fails for a specific file (e.g., due to syntax errors), the step stores None for that file instead of failing the entire pipeline.
Library Choice — ast-grep was chosen for its performance (Rust-based) and its ability to provide high-level, language-agnostic pattern matching queries.

Source Files¶

File	Contents
`autograder/steps/structural_analysis_step.py`	The `StructuralAnalysisStep` pipeline step
`autograder/models/dataclass/structural_analysis_result.py`	Data container for parsed AST roots

Next Step¶

After structural analysis is complete, the pipeline proceeds to Step 5: Grade to execute all tests.