Skip to content

Step 4.8: Structural Analysis

Purpose

The Structural Analysis step parses student submission files into Abstract Syntax Trees (ASTs) using ast-grep. This step runs after AI Batch (if applicable) and before Grade, enabling subsequent test functions to perform structural pattern matching on the source code without re-parsing files multiple times.

How It Works

  1. Detect Language — The step identifies the submission language (Python, Java, Node.js, C++, or C).
  2. Heuristic File Filtering — It scans the submission files and identifies those likely to contain source code, skipping binary files, images, and non-code configurations (e.g., .png, .json, .yaml, .md).
  3. Parse ASTs — For each identified code file, it uses ast-grep-py to parse the content into an SgRoot object.
  4. Store Results — The resulting mapping of Dict[filename, SgRoot] is stored in StepResult.data under StepName.STRUCTURAL_ANALYSIS.

If ast-grep-py is not installed or the language is not supported, the step logs a warning and proceeds with an empty result set, allowing the pipeline to continue.

Dependencies

Step What It Needs
Bootstrap The raw Submission object containing files and language metadata

Input

Source Data
Pipeline pipeline_exec.submission → submission files and language

Output

Field Type Description
data StructuralAnalysisResult Contains a mapping of filenames to ast-grep SgRoot objects
status StepStatus.SUCCESS Usually succeeds even if parsing fails for some files (stores None for those files)

How It Integrates with Grade

The GradeStep reads the STRUCTURAL_ANALYSIS step result from the pipeline and passes it as structural_analysis to GraderService.grade_from_tree(). The grader threads this object through the tree traversal into every process_test() call, which forwards it as a kwarg to test_function.execute().

Test functions like ForbiddenKeywordTest can then use this pre-computed AST to perform efficient and accurate queries using ast-grep's pattern matching syntax.

Key Design Decisions

  • Pre-computed ASTs — Parsing is done once at the pipeline level rather than inside individual tests to ensure efficiency when multiple structural tests are defined.
  • Fail-safe — If parsing fails for a specific file (e.g., due to syntax errors), the step stores None for that file instead of failing the entire pipeline.
  • Library Choiceast-grep was chosen for its performance (Rust-based) and its ability to provide high-level, language-agnostic pattern matching queries.

Source Files

File Contents
autograder/steps/structural_analysis_step.py The StructuralAnalysisStep pipeline step
autograder/models/dataclass/structural_analysis_result.py Data container for parsed AST roots

Next Step

After structural analysis is complete, the pipeline proceeds to Step 5: Grade to execute all tests.