This repository describes a multi-stage automated pipeline designed to:
- Discover deep-learning related APIs from major frameworks
- Filter testable APIs using documentation-driven rules
- Convert API docs into structured machine-readable specifications
- Initialize parameter templates and mutation rules
- Enable fully automated test case generation and execution (future stage)
All steps are framework-agnostic, doc-driven, and require no manual annotation.
The pipeline currently targets the following major libraries:
| Category | Libraries |
|---|---|
| Deep Learning Frameworks | TensorFlow (tf), PyTorch (torch), JAX (jax), MindSpore (mindspore), PaddlePaddle (paddle) |
Additional frameworks can be added as long as a Python import-level namespace is available.
| readme | Name | Purpose | Output |
|---|---|---|---|
| 1 | API Auto-Discovery & Testability Filtering | Automatically locate and filter APIs using documentation rules | testable_apis.csv/.xlsx |
| 2 | LLM-Based Doc → JSON Conversion | Convert API docs into structured JSON specs | json_specs/*.json |
| 3 | Specification Parsing & Mutation Init | Build executable input templates & mutation constraints | runtime_api_objects/* |
| 4 | Test Case Generation & Execution | Execute API test suites (mutation or search-based) | coverage & anomaly results |
Follow the pipeline in order.
You will automatically:
- import selected libraries
- recursively discover API symbols
- retrieve and analyze doc strings
- filter testable APIs using strict rules
- export results to CSV / Excel
Output File Example:
doc2info/testable_apis.csv
Columns:
| api_full_name | api_doc_text |
|---|
- Read
testable_apis.csv/.xlsx - Call LLM per API using the predefined prompt
- Produce structured JSON API specification files
Output location:
doc2info/json_specs/*.json
- Read all
.jsonspecs - Parse into executable internal structures
- Generate base input templates
- Configure mutation rules & constraints
Output location:
runtime_api_objects/*
readme_1 → readme_2 → readme_3 →
You must not skip readme_1 because:
- all downstream artifacts depend on the filtered set
- duplicate, dangerous, or non-testable APIs must not enter Stage 2
- The pipeline is documentation-driven, not execution-driven
- No manual API curation is required
- No external datasets, internet, or annotation is needed
- The system is expandable to any Python-based library with valid docs
-
Confirm the set of initial libraries to analyze
(default recommended:tf,torch,onnx,jax,cv2,numpy) -
Run Stage 1 first to generate
testable_apis.csv -
Proceed with Stage 2 only after Stage 1 completes successfully