展示HN：使用Docker和Llama 3的离线RAG系统（无需云API）

展示HN：使用Docker和Llama 3的离线RAG系统（无需云API）
Show HN: Offline RAG System Using Docker and Llama 3 (No Cloud APIs)

原始链接: https://github.com/PhilYeh1212/Local-AI-Knowledge-Base-Docker-Llama3

## 离线 RAG 系统 – 黑色星期五促销！该项目提供一个开箱即用的、**100% 离线检索增强生成 (RAG)** 系统，用于安全地与您的专有文档（PDF、TXT、Markdown）进行交互。保护您的敏感数据私密 – 无需依赖云服务或 API 密钥！该系统采用微服务架构和 **Docker Compose**，实现一键轻松部署。它使用 **Ollama** (Llama 3 8B) 进行 LLM 推理，**mxbai-embed-large** 用于嵌入，以及 **ChromaDB** 用于本地向量存储。 **Python & Streamlit** 后端提供上下文感知的聊天界面。 **主要特点：** GPU 加速（CUDA 支持）、智能文档摄取、对话历史记录以及简化的设置。推荐硬件包括 16GB+ 内存和 NVIDIA RTX 3060 (8GB VRAM) 以获得最佳性能，但也可以在 CPU 模式下运行。 **现在使用代码 BLACKFRIDAY 享受 15% 折扣！** 完整软件包包括完整的源代码、Docker 配置和设置指南 – 节省您数周的开发时间。 [Gumroad 链接](https://gumroad.com/)

黑客新闻新的 | 过去的 | 评论 | 提问 | 展示 | 招聘 | 提交登录 [已标记] PhilYeh 1天前 | 隐藏 | 过去的 | 收藏 Alifatisk 1天前 [–] 为什么要在发布在这里？它只是一个自述文件和一个购买产品的链接。GitHub链接感觉具有误导性。 wkat4242 1天前 | 父级 [–] 是的，GitHub是用于开源的。具有误导性，是的。指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

🔥 BLACK FRIDAY SALE: Get 15% OFF all source codes with code BLACKFRIDAY. Click here to apply discount automatically

Stop sending your sensitive engineering data to the cloud.

This project provides a production-grade, 100% offline RAG (Retrieval-Augmented Generation) architecture. It allows you to chat with your proprietary documents (PDF, TXT, Markdown) using a local LLM, ensuring absolute data privacy.

🏗️ System Architecture

This system is designed with a microservices architecture, fully containerized using Docker Compose for one-click deployment.

LLM Inference: Ollama (Running Meta Llama 3 8B)
Embeddings: mxbai-embed-large (State-of-the-art retrieval performance)
Vector Database: ChromaDB (Persistent local storage)
Backend/Frontend: Python + Streamlit (Optimized for RAG workflows)
Deployment: Docker Compose (Isolated environment)

🔒 100% Privacy: No data leaves your machine. No OpenAI API keys required. Zero monthly fees.
🚀 GPU Acceleration: Native support for NVIDIA GPUs (CUDA) for lightning-fast inference.
📂 Smart Ingestion: Automatically parses, chunks, and vectorizes PDF and text documents.
💬 Context-Aware Chat: Remembers conversation history and retrieves relevant context from your knowledge base.
🐳 One-Click Setup: No "dependency hell". Just run docker-compose up -d.

Click the image below to watch the system in action:

Watch the Demo

1. Chat Interface (Streamlit)

[Chat UI]

[Ingestion Process]

graph TD
    subgraph Docker_Container [🐳 Docker Containerized Environment]
        style Docker_Container fill:#e1f5fe,stroke:#01579b,stroke-width:2px,rx:10,ry:10
        
        UI["🖥️ Streamlit Web UI"]:::ui
        Backend["⚙️ Python RAG Backend"]:::code
        
        subgraph Local_AI [🧠 Local AI Engine]
            style Local_AI fill:#fff3e0,stroke:#ff6f00,stroke-width:2px
            Ollama["🦙 Ollama Service<br/>(Llama 3 Model)"]:::ai
            Embed["✨ Embedding Model<br/>(mxbai-embed-large)"]:::ai
        end
        
        DB[("🗄️ ChromaDB<br/>Vector Store")]:::db
    end

    User([👤 User]) -->|Upload PDF/Ask Question| UI
    UI <-->|API Request| Backend
    Backend <-->|Store/Retrieve Vectors| DB
    Backend <-->|Inference Request| Ollama
    Backend -->|Generate Embeddings| Embed

    classDef ui fill:#d1c4e9,stroke:#512da8,stroke-width:2px,color:black;
    classDef code fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:black;
    classDef db fill:#ffcdd2,stroke:#c62828,stroke-width:2px,color:black;
    classDef ai fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:black;

To run this system smoothly with Llama 3 (8B), the following hardware is recommended:

OS: Windows 10/11 (WSL2) or Linux (Ubuntu)
RAM: 16GB+ System Memory
GPU: NVIDIA RTX 3060 (8GB VRAM) or higher recommended.
- Note: The system can run on CPU-only mode, but inference will be slower.

📥 Get the Complete System

Building a stable RAG system from scratch takes weeks of configuration (handling Python dependencies, Vector DB connections, and Docker networking).

I have packaged the Full Source Code, Docker Configuration, and Setup Guide into a ready-to-deploy bundle.

📦 What's included in the Full Package?

✅ Complete Source Code (Python)
✅ docker-compose.yml (Production ready)
✅ Embedding & Vectorization Logic
✅ UI/UX Implementation
✅ Premium Support Guide

👉 Download the System Here: Get it on Gumroad (Instant Access. One-time payment. Lifetime usage.)

👨‍💻 About the Author

Phil Yeh - Senior Automation & Systems Engineer. Specializing in Hardware-Software Integration, Industrial Automation, and Local AI Solutions.

Keywords: RAG, Llama 3, Ollama, Docker, Local AI, Private GPT, Knowledge Base, Python, Vector Database, ChromaDB, Source Code