Atlas Metis: RAG Engine
Platform Status Brief
March 31, 2026
Confidential
01 — Overview
Platform Status

Atlas Metis is a production-deployed RAG-as-a-Service platform. Backend API and dashboard are live on Vercel with the first client tenant (SparkBot.App) operational. The full pipeline — document ingestion, hybrid search with Cohere reranking, GPT-4o-mini generation with source citations, and faithfulness scoring — is verified end-to-end. Client onboarding is invite-only with admin-controlled provisioning. All original P1 issues are resolved. The platform is ready for client acquisition.

62
API Routes
63
Automated Tests
1
Live Client Tenant
86%
Target Margin

System Status

02 — Operations
How a Client Gets Onboarded
Current onboarding time: < 5 minutes from tenant creation to first query.
03 — Current State
Operational Systems
Single File Ingestion
Upload PDF, DOCX, TXT, CSV, audio, images (with Gemini) — auto-process to searchable vectors
Hybrid Search
Semantic + keyword search combined via Reciprocal Rank Fusion
Cohere Reranking
Cross-encoder reranking boosts raw scores from ~0.016 to ~0.97
LLM Generation
GPT-4o-mini generates grounded answers with source citations
Self-RAG Validation
Validates chunk relevance before generating — catches hallucinations
SSE Streaming
Real-time streaming responses via Server-Sent Events
Multi-Tenant Isolation
Verified — Tenant A cannot see Tenant B’s data
Dual API Key Auth
Tenant keys (scoped) + Admin keys (full access) with bcrypt
Job Tracking
Every ingestion tracked with status, timing, and error capture
Health Diagnostics
7 automated checks per tenant with alert creation and resolution
Admin Dashboard
Fleet overview, alert center, metrics bar, diagnostics trigger
Usage Tracking
Queries, tokens, rerank units tracked per tenant per period
Cost Tracking
Real pricing from OpenAI ($0.13/1M), Cohere, and Gemini ($0.15/1M tokens)
Rate Limiting
Sliding window rate limiter enforced per API key
File Size Limits
100MB upload cap prevents memory exhaustion on large files
Gemini Embeddings
Multimodal: images, video, audio natively embedded via Gemini Embeddings 2
04 — Issues
Issues by Priority
RESOLVED Completed This Sprint (March 29)
FeatureResolution
Backend Deployment Deployed to Vercel Python (atlas-metis-api.vercel.app)
OAuth Token Refresh Auto-refresh on 401 for Google Drive + Dropbox, persists new tokens
Gemini E2E Test Full pipeline verified — ingest, search, rerank, generate (score 0.952)
HyDE Fallback Implemented in retrieval.py, per-tenant toggle via use_hyde config
Test Suite 63 automated tests — chunking, context, auth, cache, rate limiting
Client Onboarding Request Access flow + admin approve + auto-provisioning
API Key Security Masked in settings UI, proxy is server-side only
Middleware Hardening Public routes skip Supabase auth — prevents 504 on outage
P2 Remaining (Non-Blocking)
FeatureStatus
Additional Connectors 8 of 11 types not yet implemented (SharePoint, S3, Notion, etc.)
Per-Tenant Portal Customization Branding, persona, suggested questions — planned
Stripe Billing Integration Plans defined, integration not yet built
P&L Admin Dashboard Revenue vs cost tracking per tenant — planned
All P0s and P1s resolved. Zero blocking issues remain. The platform is client-ready for managed onboarding. P2 items are feature enhancements for scale.
05 — Product Architecture
Three-Tier Dashboard Architecture

Atlas Metis serves three distinct user tiers, each with a purpose-built interface, authentication model, and API layer.

Access Levels

LayerAccessAuthURL Pattern
Master Admin Atlas Minds team Admin API key /admin
Org Admin Client admins Supabase Auth /dashboard
End User Anyone with link None (public) /portal/{org-slug}
Master Admin (Atlas Minds)
Fleet Health
Cross-Tenant Analytics
Revenue & Cost Tracking
Org Comparison
Diagnostics & Alerts
Org Admin (Client)
Document Management
Collection Management
Query Playground
Usage & Billing
Settings & API Keys
End User Portal
Chat Interface
Streaming Responses
Source Citations
Per-Org Branding
No Login Required

Data Flow by Tier

Public
End User
Portal API
Public
No auth required
Backend
Supabase
Authenticated
Org Admin
Customer API
Auth’d
Supabase Auth
Backend
Supabase
Admin
Master Admin
Admin API
Admin Key
Full access
Backend
Supabase
06 — Infrastructure
Technical Architecture
External
Client App
Gateway
FastAPI
55 routes • async
Security
Auth Layer
Dual key • bcrypt
Ingestion Pipeline
Parse → Chunk → Embed → Store
Retrieval Pipeline
Search → Rerank → Validate → Generate
Diagnostics
7 Health Checks → Alerts → Auto-Resolve
Database
Supabase
pgvector • RLS
Embeddings + LLM
OpenAI + Gemini
Multi-provider • 3072d • Multimodal
Reranking
Cohere
Rerank v3.5
Task Queue
Redis / Celery
Not running
07 — Roadmap
Path to Production
A
Fix P0s — Critical Dead Code
✓ Completed 2026-03-05
  • URL ingestion, batch upload, connector sync — all wired
  • Celery sync fallbacks added for all dependent endpoints
B
Fix P1s + Multi-Provider Embeddings
✓ Completed 2026-03-11
  • Cost tracking, collection counters, concept clustering, auth caching, query cache
  • Multi-provider embeddings (OpenAI + Gemini) with multimodal support
  • Rate limiting, file size limits, faithfulness checks, auto-slug
C
Three-Tier Dashboard + Analytics
✓ Completed 2026-03-11
  • End User Portal — public chat interface per organization with streaming and citations
  • Master Admin analytics — cross-tenant usage, cost trends, org comparison
  • Org Admin dashboard — document/collection management, query playground, billing
D
Production Deployment + Client Onboarding
✓ Completed 2026-03-29
  • Backend deployed to Vercel Python (atlas-metis-api.vercel.app)
  • OAuth token refresh, HyDE search, 63 automated tests, middleware hardening
  • Invite-only onboarding: Request Access → Admin Approve → auto-provision
  • First client tenant live (SparkBot.App) with verified E2E pipeline
E
Scale & Monetize
In Progress
  • Per-tenant portal customization (branding, persona, suggested questions)
  • Stripe billing integration with tiered plans
  • P&L admin dashboard (revenue vs cost per tenant)
  • Local GPU hosting option (hybrid cloud/local failover)
  • Additional connectors (SharePoint, S3, Notion)
Platform is production-ready and serving clients
08 — Pricing
Service Plans

Four tiers designed for coaching organizations, training platforms, and knowledge-intensive businesses. Pricing reflects the value of a 24/7 AI assistant trained on proprietary content — not just API pass-through costs.

Plan Price Users Queries/mo Documents Our Cost Margin
Starter $199/mo 1–25 750 100 $12.45 93.7%
Growth $499/mo 25–100 2,500 500 $42.25 91.5%
Pro $999/mo 100–500 7,500 1,500 $126.75 87.3%
Enterprise $2,499/mo 500+ 25,000 5,000 $422.50 83.1%

Cross-Subsidy Model

Each tier’s revenue covers the operating cost of the next tier up. This ensures profitability at every level regardless of client mix.

Starter → Growth
4.7x
$199 rev covers $42 cost
Growth → Pro
3.9x
$499 rev covers $127 cost
Pro → Enterprise
2.4x
$999 rev covers $423 cost

Overage

$0.05 per query beyond the plan limit. Clients are notified at 80% usage. No plan includes unlimited usage — every tier has defined caps to protect margins.

Pricing validated through competitive market research (March 2026). These figures reflect real API costs from production usage data.
09 — Infrastructure
Cloud vs Self-Hosted

Two deployment models with automatic failover between them. Self-hosted eliminates 80% of per-query costs by running embedding, reranking, and transcription on local GPU hardware.

Cloud Model (Current)

Component Service Monthly Cost
Backend API Vercel Fluid Compute $20
Database + Vectors Supabase Pro (pgvector) $25
Embeddings OpenAI text-embedding-3-large Variable
Reranking Cohere Rerank v3.5 Variable (49% of API cost)
LLM Generation GPT-4o-mini Variable
Per-Query Cost $0.016

Self-Hosted Model (Planned)

Hardware: Lenovo Legion T5 — Intel Ultra 7 265F, RTX 5060 Ti (16GB VRAM), 64GB DDR5, 2TB SSD

Component Local Alternative Monthly Cost
Embeddings Local model on RTX 5060 Ti $0
Reranking BGE-Reranker on GPU $0
LLM Generation GPT-4o-mini (cloud, quality) Variable
Transcription Faster-Whisper on GPU $0
Per-Query Cost (electricity only for local components) $0.003
Hybrid Failover

Local GPU handles embedding + reranking. If internet/power goes out, automatic failover to cloud APIs. Clients never experience an outage — queries just cost slightly more that day.

Cost Reduction

Self-hosting reduces per-query cost from $0.016 to $0.003 — an 81% reduction. At 150K queries/month (50 orgs), this saves ~$1,950/month.

10 — Revenue
Revenue & Profit Projections

Cloud-Only Model

Per-query cost: $0.016 — Fixed infrastructure: $45/mo (Vercel Pro + Supabase Pro)

Scenario Revenue/mo Cost/mo Profit/mo Annual Profit Margin
10 clients (all Starter) $1,990 $170 $1,820 $21,840 91%
40 clients (10 per tier) $41,960 $5,765 $36,195 $434,340 86%
50 clients (realistic mix) $31,450 $3,785 $27,665 $331,980 88%

Hybrid Model (Self-Hosted GPU + Cloud Failover)

Per-query cost: $0.003 — Fixed: $45/mo infra + ~$50/mo electricity

Scenario Revenue/mo Cost/mo Profit/mo Annual Profit Margin
10 clients (all Starter) $1,990 $118 $1,872 $22,464 94%
40 clients (10 per tier) $41,960 $1,168 $40,792 $489,504 97%
50 clients (realistic mix) $31,450 $796 $30,654 $367,848 97%
Cloud @ 50 Clients
$332K
annual profit — 88% margin
Hybrid @ 50 Clients
$368K
annual profit — 97% margin
Hybrid @ 40 (Even Split)
$490K
annual profit — 97% margin
11 — Infrastructure
Local vs Cloud Inference

Atlas Metis supports hybrid inference — each tenant can independently select local (self-hosted) or cloud providers for embeddings, LLM generation, and reranking. Local models run on a dedicated GPU server (NVIDIA RTX 5060 Ti, 64GB RAM) with automatic cloud fallback.

Cloud Local (Self-Hosted) Edge
Embeddings
Model OpenAI text-embedding-3-large Nomic Embed Text v1.5
Dimensions 3,072 768 Marginal
Speed 100–300ms 10–20ms Local
Cost $0.13 / 1M tokens $0.00 Local
LLM Generation
Model GPT-4o-mini Qwen 2.5 7B (Q4)
Quality (simple Q&A) Excellent Good Slight cloud
Quality (complex) Excellent Decent Cloud
Hallucination resistance Strong Moderate Cloud
Speed (first token) 500–1,500ms 50–100ms Local
Generation speed 30–50 tok/s 40–60 tok/s Tie
Cost $0.15–$0.60 / 1M tokens $0.00 Local
Reranking
Model Cohere rerank-v3.5 ms-marco-MiniLM-L-6-v2
Accuracy Best in class Very good Slight cloud
Speed 200–500ms 5–20ms Local
Cost $0.002 / document $0.00 Local
Overall
Per-query cost ~$0.041 $0.00 Local
Response latency 2–5 sec 1–4 sec Local
Quality ceiling Higher 85–90% of cloud Cloud
Uptime 99.9% Machine-dependent Cloud
Hardware
RTX 5060 Ti • 64GB DDR5
Intel Core Ultra 7 265F — 20 cores
Hybrid Advantage
Local default + cloud fallback
Per-tenant provider selection — zero marginal cost at scale
Built by Atlas Minds
atlas-minds.com
March 31, 2026