dexiio
Coding Tools

BAML vs POML vs YAML vs JSON for LLM Prompts: Which Format Actually Wins

BAMLvsJSON

Updated June 22, 2026

This comparison is a little unusual. Instead of two tools head-to-head, we have four serialization and prompt-definition formats competing for the same job: telling an LLM what shape its output should take. JSON is the incumbent. YAML is the pragmatic alternative. BAML is the opinionated newcomer with its own toolchain. And POML is mostly theoretical, a concept more than a shipping product.

The real question is not "which format is best" in the abstract. It is: which format breaks least, wastes the fewest tokens, and gives you the tightest feedback loop when you are building structured-output pipelines at scale?

What each format actually does in a prompt

JSON is the default. When you ask an LLM to return structured data, you almost certainly started by pasting a JSON schema into the system prompt. It works because every LLM has seen enormous amounts of JSON in training data. The problem: JSON schemas are verbose. Curly braces, quoted keys, commas, colons, and nested brackets all consume tokens and give the model more surface area to produce syntax errors. A moderately complex schema (say, a nested object with enums and descriptions) can eat 200+ tokens before the model generates a single output character.

YAML strips away most of that syntactic overhead. No braces, no mandatory quoting, indentation-based nesting. The same schema in YAML typically runs roughly 30-60% fewer tokens than its JSON equivalent, depending on nesting depth. LLMs handle YAML well because it also appears frequently in training corpora (think Kubernetes manifests, CI configs, Ansible playbooks). The downside: indentation sensitivity. A single misaligned space in the model's output can break your parser, and debugging whitespace issues in streamed LLM responses is not anyone's idea of a good afternoon.

BAML (Basically, A Made-up Language) takes a different approach entirely. Rather than treating the prompt as a blob of text with a schema pasted in, BAML defines prompts as typed functions with explicit input and output contracts. You write a .baml file declaring input types, output types, and the prompt template. BAML's compiler then generates client code in Python, TypeScript, or Ruby. The key innovation: BAML uses its own "type-definition prompting" format that compresses schema descriptions further than JSON or YAML, and it ships a resilient parser that recovers from malformed LLM output rather than crashing on the first missing bracket.

POML (Prompt Markup Language) is the outlier. It appears in some discussions as a structured-prompt concept, but there is no widely adopted runtime, no mature toolchain, and no production user base to point to. In practice, "POML" today is closer to a thought experiment about what a purpose-built prompt markup could look like. We include it for completeness, but if you are shipping code this quarter, POML is not a real option.

FeatureBAMLJSON
Token efficiency~4x fewer tokens than JSON Schema for type defsMost verbose; every key quoted, every brace counted
Parse resilienceBuilt-in parser recovers from malformed outputStrict; one missing comma = parse failure
Type safetyCompile-time types, generated client codeRuntime validation only (e.g. Pydantic, Zod)
IDE supportVSCode playground with live prompt previewStandard JSON tooling, no prompt-specific features
Language supportPython, TypeScript, Ruby via codegenUniversal
Learning curveNew DSL to learn; non-trivial migrationZero; everyone already knows JSON
Ecosystem lock-inAll prompts must live in .baml filesNone; portable across any framework

Token economics are not trivial

The BAML blog's benchmark claims type-definition prompting uses roughly 4x fewer tokens than the equivalent JSON Schema injected into a prompt. That number holds up for complex schemas with nested objects, enums, and field descriptions. For flat, simple schemas (three string fields, no nesting), the savings are smaller, maybe 1.5-2x.

Why does this matter? Because prompt tokens cost money and consume context window. If your agentic pipeline chains four or five structured-output calls per user request, and each one injects a 300-token JSON schema, you are burning 1,200-1,500 tokens on schema alone before the model reasons about anything. YAML cuts that roughly in half. BAML cuts it further.

At GPT-4o-class pricing, the per-request savings are small. At scale (millions of calls per month) or on context-limited local models, the difference compounds. If you are running local inference through Ollama or llama.cpp, every token you save on schema injection is a token the model can spend on reasoning within a fixed context window.

Parse resilience separates BAML from the rest

The real pain with JSON-formatted LLM output is not writing the schema. It is handling the moment the model returns something almost-valid. A trailing comma. An unescaped quote inside a string value. A missing closing brace because the response was truncated by max_tokens. Standard JSON.parse() throws, your pipeline crashes, and you either retry (burning more tokens and latency) or return an error to the user.

YAML has the same class of problems, plus whitespace sensitivity. A model that outputs a YAML block with inconsistent indentation (common when the model "thinks" in a different structure mid-generation) produces silently wrong parses or outright failures.

BAML's parser is purpose-built to handle LLM slop. According to BoundaryML's documentation, it can recover structured data from output that is not valid JSON, YAML, or even the model's own declared format. It looks at the declared schema and extracts matching fields from whatever the model produced. This is genuinely useful in production, where you cannot control model behavior at the token level (unless you are using constrained decoding, which has its own tradeoffs).

YAML: the pragmatic middle ground

If BAML's toolchain feels like too much commitment (new DSL, codegen step, all prompts in .baml files), YAML is the low-friction alternative that still meaningfully improves on JSON. Swap your JSON schema for a YAML equivalent in the system prompt, ask the model to respond in YAML, and parse with a standard library. You get token savings with zero new dependencies.

The catch: you lose type safety and parse resilience. You are back to runtime validation (Pydantic, Zod, or manual checks), and a malformed YAML response still crashes your parser. For simple schemas and high-quality models (GPT-4o, Claude 3.5), this is often fine. For smaller or local models that produce messier output, the lack of a resilient parser hurts.

For developers already working with AI coding tools or building agentic workflows, the format choice often comes down to how much infrastructure you want to adopt.

When each format makes sense

Use JSON when your schema is simple, your model is reliable, and you do not want any new dependencies. It is the universal default, and for flat response shapes (a label, a score, a short explanation), the token overhead is negligible.

Use YAML when you want quick token savings without new tooling. Drop-in replacement for JSON in most prompt templates. Best for teams that already validate output with Pydantic or similar and just want to trim prompt tokens.

Use BAML when you are building production pipelines with complex, nested output schemas, especially if you chain multiple structured-output calls. The compile-time types, resilient parser, and VSCode playground justify the learning curve. The tradeoff is real lock-in: all your prompts live in .baml files, and migrating away means rewriting them.

Skip POML until it ships a real runtime. The concept is interesting, but there is nothing to install today.

BAML

Pros

  • Lowest token cost for schema injection
  • Resilient parser recovers from malformed LLM output
  • Compile-time type safety with codegen for Python, TS, Ruby
  • VSCode playground for prompt testing before calling the model

Cons

  • New DSL with non-trivial learning curve
  • All prompts must migrate to .baml files
  • Ecosystem lock-in; harder to swap frameworks later
  • Smaller community than JSON-based tooling (Instructor, Outlines)

JSON / YAML

Pros

  • Universal; zero new dependencies
  • Every developer already knows the syntax
  • Works with any LLM, any framework, any language
  • YAML variant saves 30-60% tokens over JSON for free

Cons

  • No built-in parse resilience; malformed output crashes the pipeline
  • Type safety is runtime-only (Pydantic, Zod)
  • JSON schemas are token-heavy for complex nested types
  • No prompt-specific tooling or preview

Related comparisons