River – 增量解析流式JSON
JSON River – Parse JSON incrementally as it streams in

原始链接: https://github.com/rictic/jsonriver

## jsonriver: 增量JSON解析 jsonriver是一个轻量级的JavaScript库,用于解析流式传入的JSON数据——非常适合网络请求或语言模型输出。与`JSON.parse`不同,它不需要预先获取整个JSON字符串,而是产生一系列逐渐完整的数值。 随着数据的到达,jsonriver提供逐步构建的对象、数组和字符串。例如,一个name字段会从`""`开始,然后是`"A"`,`"Al"`,最后是`"Alex"`。原子值,如数字、布尔值和null,只有在完整时才会被产生。 虽然`JSON.parse`对于完整的字符串更快,但jsonriver具有更小的占用空间、无依赖项以及所有JavaScript环境的兼容性。它的设计旨在匹配`JSON.parse`的行为,并经过了JSONTestSuite的严格测试。与`stream-json`相比,它更简单、更快,但功能较少。

## JSON River:增量 JSON 解析 rictic 发布了 “JSON River”,一个用于增量解析流式 JSON 的 JavaScript 库,可在 GitHub 上找到 ([https://github.com/rictic/jsonriver](https://github.com/rictic/jsonriver))。这对于处理来自 LLM 的流式响应特别有用,能够实现更快的 UI 更新和改善用户体验。 该库设计为易于与 React 或 Lit 等 UI 框架集成,为了提高效率,它会改变现有值而不是创建新值。它经过严格测试,旨在匹配 `JSON.parse` 的行为并遵守文档化的不变性。最近的更新 (v1.0.1) 通过最小化分配并紧密集成解析器和 tokenizer 来实现性能翻倍。 讨论重点包括与现有流式 JSON 解析器(如 `stream-json-js`)的比较、大型 JSON 文件的潜在用例,以及增量解析对于 LLM 生成内容的优势。用户正在探索其在各种场景中的应用,从 UI 渲染到数据处理,甚至微控制器。
相关文章

原文

Parse JSON incrementally as it streams in, e.g. from a network request or a language model. Gives you a sequence of increasingly complete values.

jsonriver is small, fast, has no dependencies, and uses only standard features of JavaScript so it works in any JS environment.

Usage:

// Richer example at examples/fetch.js
import {parse} from 'jsonriver';

const response = await fetch(`https://jsonplaceholder.typicode.com/posts`);
const postsStream = parse(response.body.pipeThrough(new TextDecoderStream()));
for await (const posts of postsStream) {
  console.log(posts);
}

What does it mean that we give you a sequence of increasingly complete values? Consider this JSON:

{"name": "Alex", "keys": [1, 20, 300]}

If you gave this to jsonriver one byte at a time it would yield this sequence of values:

{}
{"name": ""}
{"name": "A"}
{"name": "Al"}
{"name": "Ale"}
{"name": "Alex"}
{"name": "Alex", "keys": []}
{"name": "Alex", "keys": [1]}
{"name": "Alex", "keys": [1, 20]}
{"name": "Alex", "keys": [1, 20, 300]}

The final value yielded by parse will be the same as if you had called JSON.parse on the entire string. This is tested against the JSONTestSuite, matching JSON.parse's behavior on tests of correct, incorrect, and ambiguous cases.

  1. Subsequent versions of a value will have the same type. i.e. we will never yield a value as a string and then later replace it with an array.
  2. true, false, null, and numbers are atomic, we don't yield them until we have the entire value.
  3. Strings may be replaced with a longer string, with more characters (in the JavaScript sense) appended.
  4. Arrays are only modified by either appending new elements, or replacing/mutating the element currently at the end.
  5. Objects are only modified by either adding new properties, or replacing/mutating the most recently added property.
  6. As a consequence of 1 and 5, we only add a property to an object once we have the entire key and enough of the value to know that value's type.

The built-in JSON.parse is faster (~5x in simple benchmarking) if you don't need streaming.

stream-json, is larger, more complex, and slower (~10-20x slower in simple benchmarking), but it's much more featureful, and if you only need a subset of the data it can likely be much faster.

Install dependencies with:

Run the test suite with:

Run the linter with:

And auto-fix most lint issues with:

联系我们 contact @ memedata.com