An intelligent, agentic system for automated architectural analysis and semantic code search.
This project transcends traditional "Chat with Code" paradigms by implementing an autonomous Agent that mimics the cognitive process of a Senior Tech Lead. Instead of statically indexing a repository, the system treats the Large Language Model (LLM) as the CPU and the Vector Store as a high-speed Context Cache. The agent dynamically traverses the repository structure, pre-fetching critical contexts into the "cache" (RAG) and performing Just-In-Time (JIT) reads when semantic gaps are detected.
In traditional code assistants, RAG (Retrieval-Augmented Generation) is often a static lookup table. In this architecture, we redefine RAG as a Dynamic L2 Cache for the LLM:
- Cold Start (Repo Map): The agent first parses the Abstract Syntax Tree (AST) of the entire repository to build a lightweight symbol map (Classes/Functions). This serves as the "index" to the file system.
- Prefetching (Analysis Phase): During the initial analysis, the agent autonomously selects the most critical 10-20 files based on architectural relevance, parses them, and "warms up" the vector store (the cache).
- Cache Miss Handling (ReAct Loop): During user Q&A, if the retrieval mechanism (BM25 + Vector) returns insufficient context, the Agent triggers a Just-In-Time (JIT) file read. It autonomously tools the GitHub API to fetch missing files, updates the cache in real-time, and re-generates the answer.
Standard text chunking destroys code logic. We utilize Python's ast module to implement Structure-Aware Chunking.
- Logical Boundaries: Code is split by Class and Method definitions, ensuring that a function is never severed in the middle.
- Context Injection: Large classes are decomposed into methods, but the parent class's signature and docstrings are injected into every child chunk. This ensures the LLM understands the "why" (class purpose) even when looking at the "how" (method implementation).
Built on top of asyncio and httpx, the system is designed for high-throughput I/O operations.
- Non-Blocking Ingestion: Repository parsing, AST extraction, and vector embedding occur concurrently.
- Worker Scalability: The application runs behind Gunicorn with Uvicorn workers, utilizing a stateless design pattern where the Vector Store Manager synchronizes context via persistent disk storage and shared ChromaDB instances. This allows multiple workers to serve requests without race conditions.
The Chat Service implements a sophisticated Reasoning + Acting (ReAct) loop:
- Query Rewrite: User queries (often vague or in different languages) are first rewritten by an LLM into precise, English-language technical keywords for optimal BM25/Vector retrieval.
- Self-Correction: If the retrieved context is insufficient, the model does not hallucinate. Instead, it issues a
<tool_code>command to fetch specific file paths from the repository. The system intercepts this command, pulls the fresh data, indexes it, and feeds it back to the model in a single inference cycle.
To balance semantic understanding with exact keyword matching, the retrieval engine employs a weighted hybrid approach:
- Dense Retrieval (Vector): Uses
BAAI/bge-m3embeddings to find conceptually similar code (e.g., matching "authentication" to "login logic"). - Sparse Retrieval (BM25): Captures exact variable names, error codes, and specific function signatures that vector embeddings might miss.
- Reciprocal Rank Fusion (RRF): Results are fused and re-ranked to ensure the highest fidelity context is provided to the LLM.
The architecture is completely language-agnostic but optimized for dual-language environments (English/Chinese).
- Dynamic Prompt Engineering: The system detects the user's input language and hot-swaps the System Prompts to ensure the output format, tone, and technical terminology align with the user's locale.
- UI Integration: The frontend includes a dedicated language toggle that influences the entire generation pipeline, from the initial architectural report to the final Q&A.
- Core: Python 3.10+, FastAPI, AsyncIO
- LLM Integration: OpenAI SDK (compatible with DeepSeek/SiliconFlow)
- Vector Database: ChromaDB (Persistent Storage)
- Search Algorithms: BM25Okapi, Rank-BM25
- Parsing: Python
ast(Abstract Syntax Trees) - Frontend: HTML5, Server-Sent Events (SSE) for real-time streaming, Mermaid.js for architecture diagrams.
- Deployment: Docker, Gunicorn, Uvicorn.
- Session Management: Uses browser
sessionStoragecoupled with server-side persistent contexts, allowing users to refresh pages without losing the "warm" cache state. - Network Resilience: Implements robust error handling for GitHub API rate limits (403/429) and network timeouts during long-context generation.
- Memory Efficiency: The
VectorStoreManageris designed to be stateless in memory but stateful on disk, preventing memory leaks in long-running container environments.
-
Prerequisites:
- Python 3.9+
- Valid GitHub Token
- LLM API Keys (DeepSeek-V3 & SiliconFlow bge-m3 recommended).
-
Clone the Repository
git clone [https://github.com/tzzp1224/RepoReaper.git](https://github.com/tzzp1224/RepoReaper.git) cd RepoReaper -
Install Dependencies Using a virtual environment is recommended:
# Create and activate venv python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # Install requirements pip install -r requirements.txt
-
Configure Environment Create a
.envfile in the root directory:# GitHub Personal Access Token GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxx # LLM API Key (e.g., DeepSeek) DEEPSEEK_API_KEY=sk-xxxxxxxxxxxxxxx # Embedding API Key (SiliconFlow) SILICON_API_KEY=sk-xxxxxxxxxxxxxxx
-
Start the Service
Option A: Local Run (Universal) Compatible with Windows, macOS, and Linux. Recommended for development:
(Note: Linux users can still use
gunicorn -c gunicorn_conf.py app.main:appfor production deployment)Option B: Docker Run 🐳 Run in an isolated container:
# 1. Build Image docker build -t reporeaper . # 2. Run Container (loading env vars) docker run -d -p 8000:8000 --env-file .env --name reporeaper reporeaper
-
Access Dashboard Navigate to
http://localhost:8000. Enter a GitHub repository URL to trigger the autonomous analysis agent.