NDJSON Streaming vs Batched JSON for FHIR Bulk Exports

The FHIR Bulk Data IG specifies NDJSON as the output format, but the implementation patterns vary. Pure NDJSON streaming produces a continuous newline-delimited file of resources. Batched JSON wraps batches of resources in JSON arrays, which is technically not NDJSON but appears in some implementations as a deviation. Each approach has consequences for consumer-side performance, memory usage, and error handling. For deeper Bulk Data engineering coverage on this site, this is the format-level decision.

What Each Format Actually Looks Like

NDJSON is a sequence of FHIR resources, one per line, each line being a complete JSON object representing one resource. There are no array brackets, no commas between resources, and no top-level object wrapping the file. A consumer reads one line at a time, parses it as JSON, and processes the resource.

Batched JSON wraps groups of resources in JSON arrays. A file might contain a top-level array of 1000 resources, then another array of 1000 resources, then another. Some implementations use one big array per file. Some use chunked arrays separated by newlines (which is closer to NDJSON but with each line being an array rather than a single resource).

Why NDJSON Wins For Streaming Consumption

NDJSON allows true streaming consumption. The consumer can process resources one at a time, in constant memory regardless of file size. A 10 GB NDJSON file processes in the same memory footprint as a 1 MB NDJSON file because the consumer only holds one resource at a time.

Batched JSON requires the consumer to either load entire batches into memory (which scales poorly for large files) or implement a custom streaming JSON parser (which is harder than line-based NDJSON parsing).

For CMS-0057-F workloads where Bulk Data files routinely run hundreds of MB to several GB, NDJSON's memory efficiency matters substantially.

Why Some Implementations Use Batched JSON Anyway

Batched JSON appears in implementations that were not built for streaming consumption from the start. Existing tooling that produces standard JSON output naturally produces arrays of resources rather than line-delimited individual resources.

Some implementations use batched JSON because the developers misread the spec or treated NDJSON as an aesthetic preference rather than a structural requirement. These implementations are technically non-conformant with the Bulk Data IG, even though consumers can sometimes handle the output if they implement custom parsing.

The Error-Handling Asymmetry

NDJSON has a useful error-handling property: a malformed line affects only that line. A consumer encountering a parsing error on one line can log the error and continue processing subsequent lines. The blast radius of a malformed resource is one resource.

Batched JSON has the opposite property. A malformed resource in the middle of an array can break parsing for the entire array. The blast radius of a malformed resource extends to all resources in the same batch.

For production reliability at scale, the NDJSON error-handling property matters. Real FHIR data sometimes contains edge cases that produce serialization errors; NDJSON limits the impact.

What the IG Actually Requires

The FHIR Bulk Data Access IG (STU 2.0.0 is the version current in 2026) specifies NDJSON as the output format. Implementations that produce batched JSON are non-conformant. Inferno tests for Bulk Data conformance check for NDJSON output specifically.

Payers selecting a Bulk Data implementation should verify NDJSON conformance during evaluation. Asking the vendor for a sample output file and checking that each line parses as an independent JSON object is the simplest test.

How the Implementation Choice Affects Vendor Selection

Most production-grade Bulk Data implementations in 2026 produce conformant NDJSON. The vendors that ship batched JSON tend to be older implementations or implementations that started outside the FHIR ecosystem and were retrofitted for Bulk Data.

A useful evaluation question is to ask the vendor specifically for an NDJSON sample (not a Postman collection, not a documentation example, an actual sample file from a recent export). Vendors with strong implementations produce this readily; vendors with weaker implementations sometimes need additional time to produce a real sample, which is an indicator.

For the broader Bulk Data IG conformance check that goes beyond just the output format, the Best practices for FHIR Bulk Data IG conformance in 2026 covers the full picture. For the export optimization techniques that work with NDJSON output, the Top 5 Bulk Data export optimizations for Provider Access at scale covers the engineering side.

Sources

Bulk Data Export (NDJSON output specification)