展示 HN:Xmloxide – 一个用 Rust 编写的 libxml2 替代品。
Show HN: Xmloxide – an agent made rust replacement for libxml2

原始链接: https://github.com/jonwiggins/xmloxide

## xmloxide:Rust语言实现的现代、安全的XML/HTML解析器 xmloxide是广泛使用的libxml2的纯Rust语言重实现,旨在成为一个可以直接替换的方案,并提供改进的内存安全性和性能。libxml2已停止官方维护,并且存在已知的安全漏洞。 **主要特性:** * **内存安全:** 采用基于arena的树形结构,拥有完全安全的公共API,消除了`unsafe`代码。 * **高度符合标准:** 通过了100%的W3C XML一致性测试套件和libxml2的兼容性套件。 * **多种API:** 提供DOM、SAX2流式、XmlReader(拉模式)以及推/增量解析器。包含容错的HTML 4.01解析器。 * **高级功能:** 支持XPath 1.0、DTD/RelaxNG/XSD验证、规范XML、XInclude和XML目录。 * **性能:** 与libxml2竞争,在序列化和XPath评估方面通常更快,这得益于arena分配和零拷贝等优化技术。 * **C FFI:** 提供完整的C API,方便集成到现有的C/C++项目中。 * **线程安全:** 设计上没有全局状态,使得每个`Document`都是自包含且线程安全的。 * **全面的测试:** 包含广泛的单元测试、FFI测试和模糊测试,以确保其健壮性和安全性。 xmloxide是libxml2的一个强大且高性能的替代方案,提供了一个现代、安全且功能丰富的XML/HTML解析解决方案。

## Xmloxide:一个AI辅助的Rust对libxml2的替代方案 一个名为Xmloxide的新项目展示了AI编码代理解决未维护软件的潜力。开发者jawiggins使用Claude Code重构了libxml2,这是一个广泛使用但现已停止维护的C语言XML处理库,并将其用内存安全的Rust语言实现。 原始的libxml2存在已知的安全漏洞,且不再支持。Xmloxide成功通过了兼容性和W3C符合性测试套件,提供了相似或更优的性能,并包含一个C API以便于集成。 该项目突出了AI代理如何通过现有测试套件快速重现并可能*替代*遗留代码。Jawiggins认为这种方法可以为解决日益增长的维护老化软件系统和长期解决安全问题提供一种方案。该项目可在[crates.io](https://crates.io/crates/xmloxide)和[GitHub](https://github.com/jonwiggins/xmloxide)上找到。
相关文章

原文

CI crates.io docs.rs License: MIT MSRV

A pure Rust reimplementation of libxml2 — the de facto standard XML/HTML parsing library in the open-source world.

libxml2 became officially unmaintained in December 2025 with known security issues. xmloxide aims to be a memory-safe, high-performance replacement that passes the same conformance test suites.

  • Memory-safe — arena-based tree with zero unsafe in the public API
  • Conformant — 100% pass rate on the W3C XML Conformance Test Suite (1727/1727 applicable tests)
  • Error recovery — parse malformed XML and still produce a usable tree, just like libxml2
  • Multiple parsing APIs — DOM tree, SAX2 streaming, XmlReader pull, push/incremental
  • HTML parser — error-tolerant HTML 4.01 parsing with auto-closing and void elements
  • XPath 1.0 — full expression parser and evaluator with all core functions
  • Validation — DTD, RelaxNG, and XML Schema (XSD) validation
  • Canonical XML — C14N 1.0 and Exclusive C14N serialization
  • XInclude — document inclusion processing
  • XML Catalogs — OASIS XML Catalogs for URI resolution
  • xmllint CLI — command-line tool for parsing, validating, and querying XML
  • Zero-copy where possible — string interning for fast comparisons
  • No global state — each Document is self-contained and Send + Sync
  • C/C++ FFI — full C API with header file (include/xmloxide.h) for embedding in C/C++ projects
  • Minimal dependencies — only encoding_rs (library has zero other deps; clap is CLI-only)
use xmloxide::Document;

let doc = Document::parse_str("<root><child>Hello</child></root>").unwrap();
let root = doc.root_element().unwrap();
assert_eq!(doc.node_name(root), Some("root"));
assert_eq!(doc.text_content(root), "Hello");
use xmloxide::Document;
use xmloxide::serial::serialize;

let doc = Document::parse_str("<root><child>Hello</child></root>").unwrap();
let xml = serialize(&doc);
assert_eq!(xml, "<root><child>Hello</child></root>");
use xmloxide::Document;
use xmloxide::xpath::{evaluate, XPathValue};

let doc = Document::parse_str("<library><book><title>Rust</title></book></library>").unwrap();
let root = doc.root_element().unwrap();
let result = evaluate(&doc, root, "count(book)").unwrap();
assert_eq!(result.to_number(), 1.0);
use xmloxide::sax::{parse_sax, SaxHandler, DefaultHandler};
use xmloxide::parser::ParseOptions;

struct MyHandler;
impl SaxHandler for MyHandler {
    fn start_element(&mut self, name: &str, _: Option<&str>, _: Option<&str>,
                     _: &[(String, String, Option<String>, Option<String>)]) {
        println!("Element: {name}");
    }
}

parse_sax("<root><child/></root>", &ParseOptions::default(), &mut MyHandler).unwrap();
use xmloxide::html::parse_html;

let doc = parse_html("<p>Hello <br> World").unwrap();
let root = doc.root_element().unwrap();
assert_eq!(doc.node_name(root), Some("html"));
use xmloxide::parser::{parse_str_with_options, ParseOptions};

let opts = ParseOptions::default().recover(true);
let doc = parse_str_with_options("<root><unclosed>", &opts).unwrap();
for diag in &doc.diagnostics {
    eprintln!("{}", diag);
}
# Parse and pretty-print
xmllint --format document.xml

# Validate against a schema
xmllint --schema schema.xsd document.xml
xmllint --relaxng schema.rng document.xml
xmllint --dtdvalid schema.dtd document.xml

# XPath query
xmllint --xpath "//title" document.xml

# Canonical XML
xmllint --c14n document.xml

# Parse HTML
xmllint --html page.html
Module Description
tree Arena-based DOM tree (Document, NodeId, NodeKind)
parser XML 1.0 recursive descent parser with error recovery
parser::push Push/incremental parser for chunked input
html Error-tolerant HTML 4.01 parser
sax SAX2 streaming event-driven parser
reader XmlReader pull-based parsing API
serial XML serializer and Canonical XML (C14N)
xpath XPath 1.0 expression parser and evaluator
validation::dtd DTD parsing and validation
validation::relaxng RelaxNG schema validation
validation::xsd XML Schema (XSD) validation
xinclude XInclude 1.0 document inclusion
catalog OASIS XML Catalogs for URI resolution
encoding Character encoding detection and transcoding
ffi C/C++ FFI bindings (include/xmloxide.h)

Parsing throughput is competitive with libxml2 — within 3-4% on most documents, and 12% faster on SVG. Serialization is 1.5-2.4x faster thanks to the arena-based tree design. XPath is 1.1-2.7x faster across all benchmarks.

Parsing:

Document Size xmloxide libxml2 Result
Atom feed 4.9 KB 26.7 µs (176 MiB/s) 25.5 µs (184 MiB/s) ~4% slower
SVG drawing 6.3 KB 58.5 µs (103 MiB/s) 65.6 µs (92 MiB/s) 12% faster
Maven POM 11.5 KB 76.9 µs (142 MiB/s) 74.2 µs (148 MiB/s) ~4% slower
XHTML page 10.2 KB 69.5 µs (139 MiB/s) 61.5 µs (157 MiB/s) ~13% slower
Large (374 KB) 374 KB 2.15 ms (169 MiB/s) 2.08 ms (175 MiB/s) ~3% slower

Serialization:

Document Size xmloxide libxml2 Result
Atom feed 4.9 KB 11.3 µs 17.5 µs 1.5x faster
Maven POM 11.5 KB 20.1 µs 47.5 µs 2.4x faster
Large (374 KB) 374 KB 614 µs 1397 µs 2.3x faster

XPath:

Expression xmloxide libxml2 Result
Simple path (//entry/title) 1.51 µs 1.63 µs 8% faster
Attribute predicate (//book[@id]) 5.91 µs 15.99 µs 2.7x faster
count() function 1.09 µs 1.67 µs 1.5x faster
string() function 1.32 µs 1.77 µs 1.3x faster

Key optimizations: arena-based tree for fast serialization, byte-level pre-checks for character validation, bulk text scanning, ASCII fast paths for name parsing, zero-copy element name splitting, inline entity resolution, XPath // step fusion with fused axis expansion, inlined tree accessors, and name-test fast paths for child/descendant axes.

# Run benchmarks (requires libxml2 system library)
cargo bench --features bench-libxml2 --bench comparison_bench
  • 785 unit tests across all modules
  • 112 FFI tests covering the full C API surface (including SAX streaming)
  • libxml2 compatibility suite — 119/119 tests passing (100%) covering XML parsing, namespaces, error detection, and HTML parsing
  • W3C XML Conformance Test Suite — 1727/1727 applicable tests passing (100%)
  • Integration tests covering real-world XML documents, edge cases, and error recovery
cargo test --all-features

xmloxide provides a C-compatible API for embedding in C/C++ projects (like Chromium, game engines, or any codebase that currently uses libxml2).

# Build shared + static libraries (uses the included Makefile)
make

# Or build individually:
make shared   # .so / .dylib / .dll
make static   # .a / .lib

# Build and run the C example
make example
#include "xmloxide.h"

xmloxide_document *doc = xmloxide_parse_str("<root>Hello</root>");
uint32_t root = xmloxide_doc_root_element(doc);
char *name = xmloxide_node_name(doc, root);   // "root"
char *text = xmloxide_node_text_content(doc, root); // "Hello"

xmloxide_free_string(name);
xmloxide_free_string(text);
xmloxide_free_doc(doc);

The full API — including tree navigation and mutation, XPath evaluation, serialization (plain and pretty-printed), HTML parsing, DTD/RelaxNG/XSD validation, C14N, and XML Catalogs — is declared in include/xmloxide.h.

libxml2 xmloxide (Rust) xmloxide (C FFI)
xmlReadMemory Document::parse_str xmloxide_parse_str
xmlReadFile Document::parse_file xmloxide_parse_file
xmlParseDoc Document::parse_bytes xmloxide_parse_bytes
htmlReadMemory html::parse_html xmloxide_parse_html
xmlFreeDoc (drop Document) xmloxide_free_doc
xmlDocGetRootElement doc.root_element() xmloxide_doc_root_element
xmlNodeGetContent doc.text_content(id) xmloxide_node_text_content
xmlNodeSetContent doc.set_text_content(id, s) xmloxide_set_text_content
xmlGetProp doc.attribute(id, name) xmloxide_node_attribute
xmlSetProp doc.set_attribute(...) xmloxide_set_attribute
xmlNewNode doc.create_node(...) xmloxide_create_element
xmlNewText doc.create_node(Text{..}) xmloxide_create_text
xmlAddChild doc.append_child(p, c) xmloxide_append_child
xmlAddPrevSibling doc.insert_before(ref, c) xmloxide_insert_before
xmlUnlinkNode doc.remove_node(id) xmloxide_remove_node
xmlCopyNode doc.clone_node(id, deep) xmloxide_clone_node
xmlGetID doc.element_by_id(s) xmloxide_element_by_id
xmlDocDumpMemory serial::serialize(&doc) xmloxide_serialize
xmlDocDumpFormatMemory serial::serialize_with_options xmloxide_serialize_pretty
htmlDocDumpMemory serial::html::serialize_html xmloxide_serialize_html
xmlC14NDocDumpMemory serial::c14n::canonicalize xmloxide_canonicalize
xmlXPathEvalExpression xpath::evaluate xmloxide_xpath_eval
xmlValidateDtd validation::dtd::validate xmloxide_validate_dtd
xmlRelaxNGValidateDoc validation::relaxng::validate xmloxide_validate_relaxng
xmlSchemaValidateDoc validation::xsd::validate_xsd xmloxide_validate_xsd
xmlXIncludeProcess xinclude::process_xincludes xmloxide_process_xincludes
xmlLoadCatalog Catalog::parse xmloxide_parse_catalog
xmlSAX2... callbacks sax::SaxHandler trait xmloxide_sax_parse
xmlTextReaderRead reader::XmlReader xmloxide_reader_read
xmlCreatePushParserCtxt parser::PushParser xmloxide_push_parser_new
xmlParseChunk PushParser::push xmloxide_push_parser_push

Thread safety: Unlike libxml2, xmloxide has no global state. Each Document is self-contained and Send + Sync. The FFI layer uses thread-local storage for the last error message — each thread has its own error state. No initialization or cleanup functions are needed.

xmloxide includes fuzz targets for security testing:

# Install cargo-fuzz (requires nightly)
cargo install cargo-fuzz

# Run a fuzz target
cargo +nightly fuzz run fuzz_xml_parse
cargo +nightly fuzz run fuzz_html_parse
cargo +nightly fuzz run fuzz_xpath
cargo +nightly fuzz run fuzz_roundtrip
cargo build
cargo test
cargo clippy --all-targets --all-features -- -D warnings
cargo bench

Minimum supported Rust version: 1.81

  • No XML 1.1 — xmloxide implements XML 1.0 (Fifth Edition) only. XML 1.1 is rarely used and not planned.
  • No XSLT — XSLT is a separate specification (libxslt) and is out of scope.
  • No Schematron — Schematron validation is not implemented. DTD, RelaxNG, and XSD are supported.
  • HTML 4.01 only — the HTML parser targets HTML 4.01, not the HTML5 parsing algorithm.
  • Push parser buffers internally — the push/incremental parser API (PushParser) currently buffers all pushed data and performs the full parse on finish(), rather than truly streaming like libxml2's xmlParseChunk. SAX streaming (parse_sax) is available as an alternative for memory-constrained large-document processing.
  • XPath namespace:: axis — the namespace:: axis returns the element node when in-scope namespaces match (rather than materializing separate namespace nodes), following the same pattern as the attribute axis.

See CONTRIBUTING.md for development setup and guidelines.

See CHANGELOG.md for version history.

MIT

联系我们 contact @ memedata.com