It’s been a few weeks since I shared that I was embarking on an adventure to build my own LLM-powered OAS analysis tool. And in those few weeks, I’ve learned a lot about the work that goes into building a CLI. Some days the code flowed effortlessly and some days I sat there staring at a blinking cursor in VS Code.
But this is still a fun project. I’m learning a lot about LLMs and thinking more intentionally about the developer experience with documenting APIs.
What I’ve been doing
I identified two areas for improvement: the CLI’s output and allowing users to focus the analysis on specific parts of the OAS. Over the past few weeks, I’ve been working on addressing these issues in 0.4.0.
Changing the CLI output
Between 0.1.0 and 0.3.0, I focused on improving CLI output. I noticed that sometimes the LLM’s response would vary in structure: certain sections were included and others weren’t, some had different headings.
To fix the issue, I used OpenAI’s Structured Output feature. You can use structured outputs to define a schema that the model will use to generate its responses. The response is parsed into a Pydantic model, and the CLI serializes that model into JSON.
I defined a model for the analysis and created a new schemas directory to hold the schema:
class FullAnalysisSchema(BaseModel):
"""Schema for the full analysis response from LLM."""
issues: list[Issue]
description_coverage: int
description_clarity: int
naming_consistency: int
example_adequacy: int
overall_quality: int
recommendations: list[Recommendation]
After that, when a user ran smartdoc check [FILE], this would be the response:
---------------------------
OAS Analysis Results (Full)
---------------------------
Overall Quality: Fair
Key Findings: # Shortened for brevity
- Operation description is empty for creating an order.: Add a concise human-friendly description summarizing what the endpoint does, the expected inputs and effects (e.g., that it creates an order and returns the created order with its ID). Replace the empty string with a meaningful sentence.
- Response description is empty for successful GET order response.: Provide a short description of the 200 response (e.g., 'Order status returned successfully') instead of the empty string so tools and readers clearly see the response intent.
-------------------------
Scores:
-------------------------
Descriptions : 60/100
Description Clarity : 65/100
Naming Consistency : 80/100
Example Adequacy : 70/100
Overall Score : 72/100
Recommendations:
- Populate empty description fields (operations, responses, parameter and property descriptions) with concise, user-focused text.
- Add explicit schemas for error responses and mark required fields on request and response object properties.
- Define allowed status values (enum) and add operationId for each operation to improve code generation and clarity.
The output here is human-readable, presenting an overall quality assessment, key findings, scores, and recommendations.
However, I realized that the output is verbose and difficult to read. You have to parse through lots of text to get through the key findings.
So I decided to change the CLI’s output from a human-readable format to JSON to make it more readable and useful for automation and CI workflows. This way, the tool aligns more with how a developer would expect CLI tools to behave.
I had the hardest time figuring out how to make this work. In my brain, I figured that I needed to send some kind of JSON schema to the LLM and say, “use this JSON schema in your response.” Either that, or I had to find a way to get the CLI to display the response as JSON.
The solution was simpler than I expected. Pydantic has a function called model_dump_json() that generates a JSON representation of the model. So I used that function to serialization the response (which was returned as FullAnalysisSchema) to JSON and print it.
def serialize_to_json(data: BaseModel):
json_output = data.model_dump_json(indent=2)
print(json_output)
So now, if a user runs smartdoc check [file], they’ll receive the output in JSON.
{
"metadata": {
"smartdoc_version": "0.4.0",
"openai_model": "gpt-5-mini",
"analysis_date": "2026-01-15T22:09:00.380702"
},
"issues": [
{...},
{...}
],
"description_coverage": 60,
# other metric scores omitted for brevity
"recommendations": [
{...},
{...}
]
}
LLMs are, by default, non-deterministic. So anchoring the CLI around structured data rather than formatted prose leads to a developer-friendly experience.
Narrowing focus scope
The last thing I worked on between 0.1.0 and 0.4.0 is adding the ability for users to specify which part of the OAS the LLM should focus on. If a user runs smartdoc check [FILE] --focus descriptions, the DescriptionAnalysisSchema is passed to the LLM, narrowing the scope of the analysis:
class DescriptionAnalysisSchema(BaseModel):
metadata: MetadataSchema
issues: list[Issue]
recommendations: list[Recommendation]
description_coverage: int
description_clarity: int
overall_quality: int
NamingAnalysisSchema and ExampleAnalysisSchema handle the other focus options: naming and examples.
What’s next
One area I’m still thinking through is how SmartDoc handles scores. I like the idea of the analysis including some kind of grade or score, and I do think there’s some value in including it.
I do want to be a little more intentional about what that score or grade means. Right now, the LLM assigns scores arbitrarily. I’d rather constrain score assignment.
For example, if 5/10 paths have descriptions, assign X score. If 2/10 paths have descriptions, assign Y score. Basically, add in some deterministic judgment, not just heuristics.
For now, I’m leaving the scores, and I’ll think more on how to proceed. The tool itself is coming together!