🏗 System Architecture

Locentra OS is built as a modular LLM operating system, where inference, memory, feedback, and agents are decoupled but deeply integrated. Every subsystem can evolve independently—yet all contribute to a shared intelligence layer.

This isn’t just a chatbot backend. It’s an autonomous AI runtime.

🔧 Component Stack (Layered View)

+----------------------------+
|     Frontend (React)      |
|  • Vite + Tailwind UI     |
|  • Solana Wallet Auth     |
|  • Live Query Interface   |
+----------------------------+
             │
             ▼
+----------------------------+
|     FastAPI Backend        |
|  • REST API / Middleware   |
|  • Registry & Auth Layers  |
|  • Memory / Model Control  |
+----------------------------+
             │
             ▼
+----------------------------+
|    LLM Engine & Adapter    |
|  • HF Transformers Runtime |
|  • Inference + Fine-Tuning |
|  • Model Hot-Swapping      |
+----------------------------+
       │              │
       ▼              ▼
+-------------+   +----------------+
| Memory Core |   | Agent Engine   |
| • Embeddings|   | • AutoTrainer  |
| • Vector DB |   | • FeedbackLoop |
| • Recall    |   | • Optimizer    |
+-------------+   +----------------+
       │              │
       └──────┬───────┘
              ▼
     +-----------------------+
     |   Analytics & Logs    |
     | • Real-Time Feedback  |
     | • CLI + UI Visibility |
     +-----------------------+

🧠 Key System Components

1. Frontend (web/)

Built with React, Vite, Tailwind
Connects directly to Solana wallets (Phantom, Backpack)
Displays:
- Model responses
- Vector memory hits
- Agent evaluations
- Live training state
Auth signature is passed to backend for secure, token-gated access

2. Backend (backend/)

Built on FastAPI
Exposes:
- /api/llm/query, /api/llm/train
- /api/system/logs, /api/user/create
Manages:
- Auth middleware
- API keys & session scope
- Core registry + configuration lifecycle
Integrates:
- CLI usage
- Agent triggers
- Semantic embedding + injection

3. Model Engine (models/)

Powered by HuggingFace Transformers and SentenceTransformers
Supports:
- Falcon
- Mistral
- GPT-J
- LLaMA
Handles:
- On-device inference
- Real-time fine-tuning
- Adapter logic (e.g., adapter.py for LoRA/hybrid layers)

🧬 Semantic Memory System

Found in: backend/data/, backend/db/

Every prompt is embedded via a SentenceTransformer
Stored as vector in PostgreSQL alongside metadata
Top-K search via cosine similarity retrieves related history
Matches are injected into prompt context before generation
Can be queried, rewritten, scored, or tagged for training

🤖 Autonomous Agent System

Found in: backend/agents/

Locentra runs feedback-aware agents to self-correct and self-train:

Agent

Role Description

AutoTrainer

Detects low-score prompts → triggers fine-tuning

FeedbackLoop

Logs user edits + re-queries → queues for learning

PromptOptimizer

Rewrites confusing queries → enhances clarity

Agents operate asynchronously and are granted scoped access to:

Vector memory
Raw prompt history
LLM inference pipeline
Output evaluation metrics
Registry and analytics

Custom agents can be added via:

class MyAgent(BaseAgent):
    def run(self, prompt, response):
        # your logic here

⚙️ Infrastructure & Deployment

Docker for stack orchestration
NGINX for reverse proxy / TLS
.env for runtime config (MODEL_NAME, TRAINING_EPOCHS, etc.)
Uvicorn + Gunicorn for production-grade ASGI execution
Fully containerized:
```
docker-compose up --build
```

Locentra can run on bare metal, local dev, or Kubernetes-based infrastructure.

📂 Directory Highlights

Folder

Description

backend/api/

FastAPI routes

backend/models/

Model adapter, trainer, infer, loader

backend/agents/

Agent logic and lifecycle

backend/data/

Embedding + tokenizer logic

backend/services/

User service, memory handler, analytics

backend/db/

Schema, ORM, SQLAlchemy interface

web/

Frontend React app

cli/

Developer tools (training, querying)

🔄 System Lifecycle

1. User sends prompt via frontend
2. Backend receives → embeds it → checks memory
3. Semantically similar prompts retrieved
4. Context is injected before model call
5. Model generates response
6. Response is logged, scored, evaluated
7. Agent may retrain or rewrite prompt
8. Vector memory is updated
9. Final result is returned to user

This is LLM orchestration at runtime—autonomous, memory-aware, and open.

Previous🔐 $LOCENTRA Token Access Next🧩 Extending the System

Last updated 1 month ago