
🏗 System Architecture
Locentra OS is built as a modular LLM operating system, where inference, memory, feedback, and agents are decoupled but deeply integrated. Every subsystem can evolve independently—yet all contribute to a shared intelligence layer.
This isn’t just a chatbot backend. It’s an autonomous AI runtime.
🔧 Component Stack (Layered View)
+----------------------------+
| Frontend (React) |
| • Vite + Tailwind UI |
| • Solana Wallet Auth |
| • Live Query Interface |
+----------------------------+
│
▼
+----------------------------+
| FastAPI Backend |
| • REST API / Middleware |
| • Registry & Auth Layers |
| • Memory / Model Control |
+----------------------------+
│
▼
+----------------------------+
| LLM Engine & Adapter |
| • HF Transformers Runtime |
| • Inference + Fine-Tuning |
| • Model Hot-Swapping |
+----------------------------+
│ │
▼ ▼
+-------------+ +----------------+
| Memory Core | | Agent Engine |
| • Embeddings| | • AutoTrainer |
| • Vector DB | | • FeedbackLoop |
| • Recall | | • Optimizer |
+-------------+ +----------------+
│ │
└──────┬───────┘
▼
+-----------------------+
| Analytics & Logs |
| • Real-Time Feedback |
| • CLI + UI Visibility |
+-----------------------+
🧠 Key System Components
1. Frontend (web/
)
Built with React, Vite, Tailwind
Connects directly to Solana wallets (Phantom, Backpack)
Displays:
Model responses
Vector memory hits
Agent evaluations
Live training state
Auth signature is passed to backend for secure, token-gated access
2. Backend (backend/
)
Built on FastAPI
Exposes:
/api/llm/query
,/api/llm/train
/api/system/logs
,/api/user/create
Manages:
Auth middleware
API keys & session scope
Core registry + configuration lifecycle
Integrates:
CLI usage
Agent triggers
Semantic embedding + injection
3. Model Engine (models/
)
Powered by HuggingFace Transformers and SentenceTransformers
Supports:
Falcon
Mistral
GPT-J
LLaMA
Handles:
On-device inference
Real-time fine-tuning
Adapter logic (e.g.,
adapter.py
for LoRA/hybrid layers)
🧬 Semantic Memory System
Found in:
backend/data/
,backend/db/
Every prompt is embedded via a SentenceTransformer
Stored as vector in PostgreSQL alongside metadata
Top-K search via cosine similarity retrieves related history
Matches are injected into prompt context before generation
Can be queried, rewritten, scored, or tagged for training
🤖 Autonomous Agent System
Found in:
backend/agents/
Locentra runs feedback-aware agents to self-correct and self-train:
AutoTrainer
Detects low-score prompts → triggers fine-tuning
FeedbackLoop
Logs user edits + re-queries → queues for learning
PromptOptimizer
Rewrites confusing queries → enhances clarity
Agents operate asynchronously and are granted scoped access to:
Vector memory
Raw prompt history
LLM inference pipeline
Output evaluation metrics
Registry and analytics
Custom agents can be added via:
class MyAgent(BaseAgent):
def run(self, prompt, response):
# your logic here
⚙️ Infrastructure & Deployment
Docker for stack orchestration
NGINX for reverse proxy / TLS
.env
for runtime config (MODEL_NAME
,TRAINING_EPOCHS
, etc.)Uvicorn + Gunicorn for production-grade ASGI execution
Fully containerized:
docker-compose up --build
Locentra can run on bare metal, local dev, or Kubernetes-based infrastructure.
📂 Directory Highlights
backend/api/
FastAPI routes
backend/models/
Model adapter, trainer, infer, loader
backend/agents/
Agent logic and lifecycle
backend/data/
Embedding + tokenizer logic
backend/services/
User service, memory handler, analytics
backend/db/
Schema, ORM, SQLAlchemy interface
web/
Frontend React app
cli/
Developer tools (training, querying)
🔄 System Lifecycle
1. User sends prompt via frontend
2. Backend receives → embeds it → checks memory
3. Semantically similar prompts retrieved
4. Context is injected before model call
5. Model generates response
6. Response is logged, scored, evaluated
7. Agent may retrain or rewrite prompt
8. Vector memory is updated
9. Final result is returned to user
This is LLM orchestration at runtime—autonomous, memory-aware, and open.
Last updated