ποΈ RepoRover System Architecture
RepoRover is an AI-powered code analysis platform that provides deep insights into GitHub repositories. The system is built on a modern, scalable architecture that combines FastAPI for the backend, AI models for code understanding, and a clean, responsive frontend.
π Core Principles
- Modular Design: Components are loosely coupled and follow the single responsibility principle
- Extensible: Easy to add new analysis modules or integrate with different AI models
- Real-time Processing: Provides immediate feedback during repository analysis
- Scalable: Designed to handle repositories of various sizes efficiently
π§© Core Components
1. Backend Services
- FastAPI Application: Handles HTTP requests and serves the frontend
- Background Task Queue: Manages long-running repository analysis tasks
- API Endpoints:
/ingest
: Start repository ingestion/ingest/status/{task_id}
: Check ingestion status/query
: Submit questions about the repository
2. AI Components
- Dispatcher Agent: Orchestrates the analysis workflow
- Semantic Memory Manager: Handles storage and retrieval of code knowledge
- AI Model Integrations: Support for multiple AI providers (Gemini, Groq)
3. Frontend
- Single Page Application: Built with vanilla JavaScript
- Responsive UI: Using Tailwind CSS for styling
- Real-time Updates: WebSocket-based updates for long-running tasks
4. Data Storage
- Semantic Memory: Stores processed code information
- Vector Database: For efficient similarity search of code patterns
- Task Status Tracking: In-memory storage for monitoring analysis progress
π Ingestion Workflow
The ingestion process transforms a GitHub repository into a structured knowledge base that can be queried naturally.
Trigger
- User submits a GitHub repository URL through the web interface
Process Flow
Repository Cloning
- Clones the target repository locally
- Scans the repository structure
- Identifies different file types and their relationships
Code Analysis
- Parses source code files
- Extracts functions, classes, and their documentation
- Builds a semantic understanding of the codebase
- Identifies dependencies between components
Knowledge Base Population
- Stores extracted information in the semantic memory
- Generates vector embeddings for semantic search
- Builds a knowledge graph of the codebase
graph TD
A[Start: GitHub URL] --> B(Dispatcher Agent);
B --> C{Clones Repo & Scans Files};
C --> D[Architect Agent];
D --> E[Librarian Agent];
E --> F[Annotator Agent];
subgraph Semantic Memory
G[Entity Store - SQLite];
H[Knowledge Graph - NetworkX];
I[Vector Store - ChromaDB];
end
D -- Creates Code Entities & Relationships --> H;
D -- Stores Code Details --> G;
E -- Creates Doc Chunks --> I;
E -- Stores Doc Details --> G;
F -- Generates Summaries --> G;
F -- Updates Embeddings --> I;
F --> J[End: Ingestion Complete];
π¬ Query Processing Workflow
Trigger
- User submits a natural language question about the codebase
Process Flow
Query Understanding
- Analyzes the user's question
- Identifies key concepts and intents
- Determines relevant parts of the codebase to examine
Context Retrieval
- Searches the semantic memory for relevant code snippets
- Retrieves related documentation and examples
- Gathers contextual information about the code
Response Generation
- Formulates a comprehensive answer using AI
- Includes relevant code examples
- Provides additional context and suggestions
π Deployment Architecture
βββββββββββββββββββ βββββββββββββββββββββββ ββββββββββββββββββββ
β β β β β β
β User's Browser ββββββΊβ FastAPI Backend ββββββΊβ AI Models β
β β β (Python) β β (Gemini, Groq) β
βββββββββββββββββββ βββββββββββ¬ββββββββββββ ββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββ
β β
β Semantic Memory β
β (ChromaDB) β
β β
βββββββββββββββββββββ
π Data Flow
Ingestion Path
- GitHub Repo β FastAPI β Background Task β AI Processing β Semantic Memory
Query Path
- User Question β FastAPI β AI Model β Semantic Memory β Response Generation β User
graph TD
A[Start: User Question] --> B(Dispatcher Agent);
B -- Assembles Cognitive Context --> C[Query Planner Agent];
subgraph Cognitive Context
D[Episodic Memory - History];
E[Core Memory - Persona];
end
D --> B;
E --> B;
C -- Creates Plan --> F[Information Retriever Agent];
F -- Executes Plan --> G((Semantic Memory));
G -- Returns Data --> H[Synthesizer Agent];
H -- Generates Response --> I[End: Final Answer];