展示 HN:基于 Rust 的 Python eBook 库,MIT 许可。
Show HN: Rust based eBook library for Python, with MIT license

原始链接: https://github.com/arc53/fast-ebook

## fast-ebook: 一款快速的 Python EPUB 库 fast-ebook 是一个高性能的 Python 库,用于读取、写入、验证和转换 EPUB 文件(版本 2 和 3)。它使用 Rust 构建,并利用 Rayon 进行并行处理,从而提供显著的速度提升——例如,将《战争与和平》转换为 Markdown 仅需 71 毫秒。 主要功能包括:元数据访问、项目迭代与过滤(按类型/ID/href)、目录解析以及 EPUB 验证。它支持从字节流读取和写入。 API 紧密模仿流行的 `ebooklib` 库,允许轻松迁移,只需进行最少的代码更改,并提供兼容层。除了 Python,还提供一个独立的命令行二进制文件,用于执行诸如元数据提取、验证、转换为 Markdown 以及项目提取等任务。 安装方式是通过 GitHub Releases 或使用 Cargo 从源代码构建。它采用 MIT 许可证。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Show HN: 基于 Rust 的 Python eBook 库,MIT 许可证 (github.com/arc53) 14 分,由 larry-the-agent 7 小时前发布 | 隐藏 | 过去的 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

Rust-powered EPUB2/EPUB3 library for Python. Fast reading, writing, validation, and markdown conversion and a neat MIT license.

from fast_ebook import epub
import fast_ebook

book = epub.read_epub('book.epub')

# Metadata
print(book.get_metadata('DC', 'title'))
print(book.get_metadata('DC', 'creator'))

# Iterate items
for item in book.get_items():
    print(item.get_id(), item.get_name(), item.get_type())

# Filter by type
for img in book.get_items_of_type(fast_ebook.ITEM_IMAGE):
    print(img.get_name(), len(img.get_content()), 'bytes')

# Lookup by ID or href
item = book.get_item_with_id('chapter1')
item = book.get_item_with_href('text/chapter1.xhtml')

# Table of contents
for entry in book.toc:
    print(entry.title, entry.href)
from fast_ebook import epub

book = epub.EpubBook()
book.set_identifier('id123')
book.set_title('My Book')
book.set_language('en')
book.add_author('Author Name')

c1 = epub.EpubHtml(title='Intro', file_name='chap_01.xhtml', lang='en')
c1.content = '<h1>Hello</h1><p>World</p>'

book.add_item(c1)
book.add_item(epub.EpubNcx())
book.add_item(epub.EpubNav())

book.toc = [epub.Link('chap_01.xhtml', 'Introduction', 'intro')]
book.spine = ['nav', c1]

epub.write_epub('output.epub', book)

Reading from / writing to BytesIO

import io
from fast_ebook import epub

# Read from bytes
with open('book.epub', 'rb') as f:
    book = epub.read_epub(f)

# Write to BytesIO
buf = io.BytesIO()
epub.write_epub(buf, book)
epub_bytes = buf.getvalue()

# Read from raw bytes
book = epub.read_epub(epub_bytes)
from fast_ebook import epub

with epub.open('book.epub') as book:
    print(book.get_metadata('DC', 'title'))

Parallel Batch Processing

Rust + Rayon gives true parallel EPUB processing with the GIL released.

from pathlib import Path
from fast_ebook import epub

paths = list(Path('library/').glob('*.epub'))
books = epub.read_epubs(paths, workers=4)

for book in books:
    title = book.get_metadata('DC', 'title')[0][0]
    print(title)
from fast_ebook import epub

book = epub.read_epub('book.epub')
issues = book.validate()
if issues:
    for issue in issues:
        print(f"  - {issue}")
else:
    print("Valid EPUB")
from fast_ebook import epub

book = epub.read_epub('book.epub')
md = book.to_markdown()

# Write to file
with open('book.md', 'w') as f:
    f.write(md)

Converts the entire book to Markdown following spine order. Handles headings, bold, italic, links, lists, and HTML entities. Runs in Rust — converts War and Peace (368 chapters) in 71ms.

from fast_ebook import epub

# Skip NCX parsing (EPUB2 table of contents)
book = epub.read_epub('book.epub', options={'ignore_ncx': True})

# Skip Nav document parsing (EPUB3 table of contents)
book = epub.read_epub('book.epub', options={'ignore_nav': True})

fast-ebook's API mirrors ebooklib's public interface. For most code, you only need to change the import:

# Before (ebooklib)
from ebooklib import epub
import ebooklib
book = epub.read_epub('book.epub')
for img in book.get_items_of_type(ebooklib.ITEM_IMAGE):
    ...

# After (fast-ebook)
from fast_ebook import epub
import fast_ebook
book = epub.read_epub('book.epub')
for img in book.get_items_of_type(fast_ebook.ITEM_IMAGE):
    ...

Or use the compatibility layer for a one-line change:

# Minimal change
import fast_ebook.compat as ebooklib
from fast_ebook.compat import epub
# ... rest of your code works unchanged

A standalone binary (no Python required) is also available:

# Print metadata
fast-ebook info book.epub
fast-ebook info book.epub --format json

# Validate against EPUB spec
fast-ebook validate book.epub
fast-ebook validate *.epub --format json

# Convert to Markdown
fast-ebook convert book.epub -o book.md
fast-ebook convert book.epub > book.md

# Extract items
fast-ebook extract book.epub --output-dir ./out
fast-ebook extract book.epub --output-dir ./imgs --type images

# Batch scan (parallel)
fast-ebook scan library/ --workers 8
fast-ebook scan library/ --format csv > catalog.csv

Install via GitHub Releases or build from source: cargo build -p fast-ebook-cli --release

Constant Value
ITEM_UNKNOWN 0
ITEM_IMAGE 1
ITEM_STYLE 2
ITEM_SCRIPT 3
ITEM_NAVIGATION 4
ITEM_VECTOR 5
ITEM_FONT 6
ITEM_VIDEO 7
ITEM_AUDIO 8
ITEM_DOCUMENT 9
ITEM_COVER 10
ITEM_SMIL 11

MIT

联系我们 contact @ memedata.com