展示 HN:Nano PDF – 一个使用 Gemini 的 Nano Banana 编辑 PDF 的 CLI 工具
Show HN: Nano PDF – A CLI Tool to Edit PDFs with Gemini's Nano Banana

原始链接: https://github.com/gavrielc/Nano-PDF

## Nano PDF:AI驱动的PDF编辑工具 Nano PDF是一个命令行工具,允许你使用自然语言提示编辑PDF幻灯片,利用Google的Gemini 3 Pro图像模型(“Nano Banana”)。它支持快速、无损编辑——通过OCR保留可搜索文本层——并且支持编辑现有页面和添加新的、与你的演示文稿风格匹配的AI生成幻灯片。 主要功能包括:多页编辑,并行处理以提高速度;可配置分辨率(4K/2K/1K)以平衡质量和成本;以及提供风格参考以保持视觉一致性。你可以使用`--use-context`控制上下文使用,并使用`--disable-google-search`禁用Google搜索。 **要求:**付费的Google Gemini API密钥(免费套餐不支持),Python 3.10+,Poppler和Tesseract OCR。 **示例:** * `nano-pdf edit my_deck.pdf 2 "将标题更改为‘Q3 Results’"` * `nano-pdf add my_deck.pdf 0 "标题幻灯片,内容为‘Q3 2025 Review’"` 更多细节和安装说明可在项目的GitHub仓库中找到。

## Nano PDF:基于AI的命令行PDF编辑工具 开发者GavCo创建了“Nano PDF”,这是一款利用Google的Gemini 3 Pro Image模型(Nano Banana)通过自然语言提示编辑PDF的命令行工具。该工具将PDF页面转换为图像,使用Gemini根据用户指令进行编辑,然后将修改后的图像重新整合回PDF中。 用户可以编辑现有内容(例如更新图表)或添加新幻灯片,该工具会保留文本层以保证可搜索性。它甚至利用Google搜索向AI提供当前数据。虽然可以用于任何PDF,但它特别适合快速演示文稿编辑。 早期反馈表明,该工具具有自动PDF标注的潜力,并强调每次编辑的成本(Pro模型下为每个图像0.15美元)是一个重要考虑因素。该项目是开源的,并且可在GitHub上找到:[https://github.com/gavrielc/Nano-PDF](https://github.com/gavrielc/Nano-PDF)。
相关文章

原文

Nano PDF Logo

PyPI version Python 3.10+ License: MIT

A CLI tool to edit PDF slides using natural language prompts, powered by Google's Gemini 3 Pro Image ("Nano Banana") model.

  • Natural Language Editing: "Update the graph to include data from 2025", "Change the chart to a bar graph".
  • Add New Slides: Generate entirely new slides that match your deck's visual style.
  • Non-Destructive: Preserves the searchable text layer of your PDF using OCR re-hydration.
  • Multi-page & Parallel: Edit multiple pages in a single command with concurrent processing.

Nano PDF uses Gemini 3 Pro Image (aka Nano Banana) and PDF manipulation to enable quick edits of PDFs with natural language editing:

  1. Page Rendering: Converts target PDF pages to images using Poppler
  2. Style References: Optionally includes style reference pages with generation request to understand visual style (fonts, colors, layout)
  3. AI Generation: Sends images + prompts to Gemini 3 Pro Image, which generates edited versions
  4. OCR Re-hydration: Uses Tesseract to restore searchable text layer to generated images
  5. PDF Stitching: Replaces original pages with AI-edited versions while preserving document structure

The tool processes multiple pages in parallel for speed, with configurable resolution (4K/2K/1K) to balance quality vs. cost.

You need a paid Google Gemini API key with billing enabled. Free tier keys do not support image generation.

  1. Get an API key from Google AI Studio
  2. Enable billing on your Google Cloud project
  3. Set it as an environment variable:
export GEMINI_API_KEY="your_api_key_here"

Note: This tool uses Gemini 3 Pro Image which requires a paid API tier. See pricing for details.

Edit a single page (e.g., Page 2):

nano-pdf edit my_deck.pdf 2 "Change the title to 'Q3 Results'"

Edit multiple pages in one go:

nano-pdf edit my_deck.pdf \
  1 "Update date to Oct 2025" \
  5 "Add company logo" \
  10 "Fix typo in footer"

Insert a new AI-generated slide into your deck:

# Add a title slide at the beginning
nano-pdf add my_deck.pdf 0 "Title slide with 'Q3 2025 Review'"

# Add a slide after page 5
nano-pdf add my_deck.pdf 5 "Summary slide with key takeaways as bullet points"

The new slide will automatically match the visual style of your existing slides and uses document context by default for better relevance.

  • --use-context / --no-use-context: Include the full text of the PDF as context for the model. Disabled by default for edit, enabled by default for add. Use --no-use-context to disable.
  • --style-refs "1,5": Manually specify which pages to use as style references.
  • --output "new.pdf": Specify the output filename.
  • --resolution "4K": Image resolution - "4K" (default), "2K", or "1K". Higher quality = slower processing.
  • --disable-google-search: Prevents the model from using Google Search to find information before generating (enabled by default).

Fixing Presentation Errors

# Fix typos across multiple slides
nano-pdf edit pitch_deck.pdf \
  3 "Fix the typo 'recieve' to 'receive'" \
  7 "Change 'Q4 2024' to 'Q1 2025'"
# Update branding and colors
nano-pdf edit slides.pdf 1 "Make the header background blue and text white" \
  --style-refs "2,3" --output branded_slides.pdf
# Update financial data
nano-pdf edit report.pdf 12 "Update the revenue chart to show Q3 at $2.5M instead of $2.1M"

Batch Processing with Context

# Use full document context for consistency
nano-pdf edit presentation.pdf \
  5 "Update the chart colors to match the theme" \
  8 "Add the company logo in the bottom right" \
  --use-context
# Add a new agenda slide at the beginning
nano-pdf add quarterly_report.pdf 0 "Agenda slide with: Overview, Financial Results, Q4 Outlook"
# Google Search is enabled by default - the model can look up current information
nano-pdf edit deck.pdf 5 "Update the market share data to latest figures"

# Disable Google Search if you want the model to only use provided context
nano-pdf add deck.pdf 3 "Add a summary slide" --disable-google-search
  • Python 3.10+
  • poppler (for PDF rendering)
  • tesseract (for OCR)
brew install poppler tesseract
choco install poppler tesseract

Note: After installation, you may need to restart your terminal or add the installation directory to your PATH.

sudo apt-get install poppler-utils tesseract-ocr

"Missing system dependencies" error

Make sure you've installed poppler and tesseract for your platform. After installation, restart your terminal to refresh PATH. Run which pdftotext and which tesseract to verify they're accessible.

"GEMINI_API_KEY not found" error

Set your API key as an environment variable:

export GEMINI_API_KEY="your_key_here"

"Gemini API Error: PAID API key required" error

Gemini 3 Pro Image requires a paid API tier. Visit Google AI Studio to enable billing on your project.

Generated images don't match the style

Try using --style-refs to specify reference pages that have the desired visual style. The model will analyze these pages to better match fonts, colors, and layout.

Text layer is missing or incorrect after editing

The tool uses Tesseract OCR to restore searchable text. For best results, ensure your generated images are high resolution (--resolution "4K"). Note that OCR may not be perfect for stylized fonts or small text.

Pages are processing slowly

  • Use --resolution "2K" or --resolution "1K" for faster processing

If you want to run the latest development version:

# Clone the repository
git clone https://github.com/gavrielc/Nano-PDF.git
cd Nano-PDF

# Install dependencies
pip install -e .

# Run the tool
nano-pdf edit my_deck.pdf 2 "Your edit here"

MIT

联系我们 contact @ memedata.com