Atlas Metis: RAG Engine — Platform Status Brief

A

Atlas Metis: RAG Engine

Platform Status Brief

March 31, 2026

Confidential

01 — Overview

Platform Status

Atlas Metis is a production-deployed RAG-as-a-Service platform. Backend API and dashboard are live on Vercel with the first client tenant (SparkBot.App) operational. The full pipeline — document ingestion, hybrid search with Cohere reranking, GPT-4o-mini generation with source citations, and faithfulness scoring — is verified end-to-end. Client onboarding is invite-only with admin-controlled provisioning. All original P1 issues are resolved. The platform is ready for client acquisition.

62

API Routes

63

Automated Tests

1

Live Client Tenant

86%

Target Margin

System Status

Core RAG Pipeline Operational
Admin Dashboard Operational
Multi-Tenant Auth Operational
Batch Processing Operational
Connector Syncs Drive + Dropbox + Webhook (OAuth refresh)
Cost Tracking Operational
Gemini Multimodal Embeddings Operational
Three-Tier Dashboard Master Admin + Org Admin + End User Portal
Analytics Dashboard Cross-Tenant Analytics
End User Portal Public Chat per Org
Backend API (Vercel) atlas-metis-api.vercel.app
Client Onboarding Request Access → Admin Approve
HyDE Search Enhancement Per-Tenant Toggle
Test Suite 63 Tests Passing

02 — Operations

How a Client Gets Onboarded

1

Client Requests Access

Prospect visits metis.atlasmindspreview.com/signup and submits org name, contact info, and use case. No account is created yet — request is stored for admin review.
2

Admin Approves (One Click)

Admin sees the request in /admin → Requests tab and clicks Approve. System auto-creates: Supabase auth user, tenant, API key, default “General” collection, and temp password.
3

Admin Sends Credentials

“Copy Welcome Message” button generates a ready-to-send message with login URL, email, temp password, and portal link. Admin sends via email or chat.
4

Client Uploads Content

Client logs in, sees a 3-step getting-started guide, and uploads their training materials (PDF, DOCX, TXT, CSV, audio). Engine processes automatically.
5

Users Query via Portal

Client shares their portal URL (metis.atlasmindspreview.com/portal/{slug}) with their team. Users ask questions and get cited answers grounded in the uploaded content.
6

Monitor Health

Admin monitors all clients via dashboard. Fleet overview with 7 health indicators per organization.

Current onboarding time: < 5 minutes from tenant creation to first query.

03 — Current State

Operational Systems

Single File Ingestion

Upload PDF, DOCX, TXT, CSV, audio, images (with Gemini) — auto-process to searchable vectors

Hybrid Search

Semantic + keyword search combined via Reciprocal Rank Fusion

Cohere Reranking

Cross-encoder reranking boosts raw scores from ~0.016 to ~0.97

LLM Generation

GPT-4o-mini generates grounded answers with source citations

Self-RAG Validation

Validates chunk relevance before generating — catches hallucinations

SSE Streaming

Real-time streaming responses via Server-Sent Events

Multi-Tenant Isolation

Verified — Tenant A cannot see Tenant B’s data

Dual API Key Auth

Tenant keys (scoped) + Admin keys (full access) with bcrypt

Job Tracking

Every ingestion tracked with status, timing, and error capture

Health Diagnostics

7 automated checks per tenant with alert creation and resolution

Admin Dashboard

Fleet overview, alert center, metrics bar, diagnostics trigger

Usage Tracking

Queries, tokens, rerank units tracked per tenant per period

Cost Tracking

Real pricing from OpenAI ($0.13/1M), Cohere, and Gemini ($0.15/1M tokens)

Rate Limiting

Sliding window rate limiter enforced per API key

File Size Limits

100MB upload cap prevents memory exhaustion on large files

Gemini Embeddings

Multimodal: images, video, audio natively embedded via Gemini Embeddings 2

04 — Issues

Issues by Priority

RESOLVED Completed This Sprint (March 29)

Feature	Resolution
Backend Deployment	Deployed to Vercel Python (atlas-metis-api.vercel.app)
OAuth Token Refresh	Auto-refresh on 401 for Google Drive + Dropbox, persists new tokens
Gemini E2E Test	Full pipeline verified — ingest, search, rerank, generate (score 0.952)
HyDE Fallback	Implemented in retrieval.py, per-tenant toggle via `use_hyde` config
Test Suite	63 automated tests — chunking, context, auth, cache, rate limiting
Client Onboarding	Request Access flow + admin approve + auto-provisioning
API Key Security	Masked in settings UI, proxy is server-side only
Middleware Hardening	Public routes skip Supabase auth — prevents 504 on outage

P2 Remaining (Non-Blocking)

Feature	Status
Additional Connectors	8 of 11 types not yet implemented (SharePoint, S3, Notion, etc.)
Per-Tenant Portal Customization	Branding, persona, suggested questions — planned
Stripe Billing Integration	Plans defined, integration not yet built
P&L Admin Dashboard	Revenue vs cost tracking per tenant — planned

All P0s and P1s resolved. Zero blocking issues remain. The platform is client-ready for managed onboarding. P2 items are feature enhancements for scale.

05 — Product Architecture

Three-Tier Dashboard Architecture

Atlas Metis serves three distinct user tiers, each with a purpose-built interface, authentication model, and API layer.

Access Levels

Layer	Access	Auth	URL Pattern
Master Admin	Atlas Minds team	Admin API key	/admin
Org Admin	Client admins	Supabase Auth	/dashboard
End User	Anyone with link	None (public)	/portal/{org-slug}

Master Admin (Atlas Minds)

Fleet Health
Cross-Tenant Analytics
Revenue & Cost Tracking
Org Comparison
Diagnostics & Alerts

Org Admin (Client)

Document Management
Collection Management
Query Playground
Usage & Billing
Settings & API Keys

End User Portal

Chat Interface
Streaming Responses
Source Citations
Per-Org Branding
No Login Required

Data Flow by Tier

Public

End User

→

Portal API

Public

No auth required

→

Backend

→

Supabase

Authenticated

Org Admin

→

Customer API

Auth’d

Supabase Auth

→

Backend

→

Supabase

Admin

Master Admin

→

Admin API

Admin Key

Full access

→

Backend

→

Supabase

06 — Infrastructure

Technical Architecture

External

Client App

→

Gateway

FastAPI

55 routes • async

→

Security

Auth Layer

Dual key • bcrypt

↓

Ingestion Pipeline

Parse → Chunk → Embed → Store

Retrieval Pipeline

Search → Rerank → Validate → Generate

Diagnostics

7 Health Checks → Alerts → Auto-Resolve

↓

Database

Supabase

pgvector • RLS

Embeddings + LLM

OpenAI + Gemini

Multi-provider • 3072d • Multimodal

Reranking

Cohere

Rerank v3.5

Task Queue

Redis / Celery

Not running

07 — Roadmap

Path to Production

A

Fix P0s — Critical Dead Code

✓ Completed 2026-03-05

URL ingestion, batch upload, connector sync — all wired
Celery sync fallbacks added for all dependent endpoints

B

Fix P1s + Multi-Provider Embeddings

✓ Completed 2026-03-11

Cost tracking, collection counters, concept clustering, auth caching, query cache
Multi-provider embeddings (OpenAI + Gemini) with multimodal support
Rate limiting, file size limits, faithfulness checks, auto-slug

C

Three-Tier Dashboard + Analytics

✓ Completed 2026-03-11

End User Portal — public chat interface per organization with streaming and citations
Master Admin analytics — cross-tenant usage, cost trends, org comparison
Org Admin dashboard — document/collection management, query playground, billing

D

Production Deployment + Client Onboarding

✓ Completed 2026-03-29

Backend deployed to Vercel Python (atlas-metis-api.vercel.app)
OAuth token refresh, HyDE search, 63 automated tests, middleware hardening
Invite-only onboarding: Request Access → Admin Approve → auto-provision
First client tenant live (SparkBot.App) with verified E2E pipeline

E

Scale & Monetize

In Progress

Per-tenant portal customization (branding, persona, suggested questions)
Stripe billing integration with tiered plans
P&L admin dashboard (revenue vs cost per tenant)
Local GPU hosting option (hybrid cloud/local failover)
Additional connectors (SharePoint, S3, Notion)

Platform is production-ready and serving clients

08 — Pricing

Service Plans

Four tiers designed for coaching organizations, training platforms, and knowledge-intensive businesses. Pricing reflects the value of a 24/7 AI assistant trained on proprietary content — not just API pass-through costs.

Plan	Price	Users	Queries/mo	Documents	Our Cost	Margin
Starter	$199/mo	1–25	750	100	$12.45	93.7%
Growth	$499/mo	25–100	2,500	500	$42.25	91.5%
Pro	$999/mo	100–500	7,500	1,500	$126.75	87.3%
Enterprise	$2,499/mo	500+	25,000	5,000	$422.50	83.1%

Cross-Subsidy Model

Each tier’s revenue covers the operating cost of the next tier up. This ensures profitability at every level regardless of client mix.

Starter → Growth

4.7x

$199 rev covers $42 cost

Growth → Pro

3.9x

$499 rev covers $127 cost

Pro → Enterprise

2.4x

$999 rev covers $423 cost

Overage

$0.05 per query beyond the plan limit. Clients are notified at 80% usage. No plan includes unlimited usage — every tier has defined caps to protect margins.

Pricing validated through competitive market research (March 2026). These figures reflect real API costs from production usage data.

09 — Infrastructure

Cloud vs Self-Hosted

Two deployment models with automatic failover between them. Self-hosted eliminates 80% of per-query costs by running embedding, reranking, and transcription on local GPU hardware.

Cloud Model (Current)

Component	Service	Monthly Cost
Backend API	Vercel Fluid Compute	$20
Database + Vectors	Supabase Pro (pgvector)	$25
Embeddings	OpenAI text-embedding-3-large	Variable
Reranking	Cohere Rerank v3.5	Variable (49% of API cost)
LLM Generation	GPT-4o-mini	Variable
Per-Query Cost		$0.016

Self-Hosted Model (Planned)

Hardware: Lenovo Legion T5 — Intel Ultra 7 265F, RTX 5060 Ti (16GB VRAM), 64GB DDR5, 2TB SSD

Component	Local Alternative	Monthly Cost
Embeddings	Local model on RTX 5060 Ti	$0
Reranking	BGE-Reranker on GPU	$0
LLM Generation	GPT-4o-mini (cloud, quality)	Variable
Transcription	Faster-Whisper on GPU	$0
Per-Query Cost	(electricity only for local components)	$0.003

Hybrid Failover

Local GPU handles embedding + reranking. If internet/power goes out, automatic failover to cloud APIs. Clients never experience an outage — queries just cost slightly more that day.

Cost Reduction

Self-hosting reduces per-query cost from $0.016 to $0.003 — an 81% reduction. At 150K queries/month (50 orgs), this saves ~$1,950/month.

10 — Revenue

Revenue & Profit Projections

Cloud-Only Model

Per-query cost: $0.016 — Fixed infrastructure: $45/mo (Vercel Pro + Supabase Pro)

Scenario	Revenue/mo	Cost/mo	Profit/mo	Annual Profit	Margin
10 clients (all Starter)	$1,990	$170	$1,820	$21,840	91%
40 clients (10 per tier)	$41,960	$5,765	$36,195	$434,340	86%
50 clients (realistic mix)	$31,450	$3,785	$27,665	$331,980	88%

Hybrid Model (Self-Hosted GPU + Cloud Failover)

Per-query cost: $0.003 — Fixed: $45/mo infra + ~$50/mo electricity

Scenario	Revenue/mo	Cost/mo	Profit/mo	Annual Profit	Margin
10 clients (all Starter)	$1,990	$118	$1,872	$22,464	94%
40 clients (10 per tier)	$41,960	$1,168	$40,792	$489,504	97%
50 clients (realistic mix)	$31,450	$796	$30,654	$367,848	97%

Cloud @ 50 Clients

$332K

annual profit — 88% margin

Hybrid @ 50 Clients

$368K

annual profit — 97% margin

Hybrid @ 40 (Even Split)

$490K

annual profit — 97% margin

11 — Infrastructure

Local vs Cloud Inference

Atlas Metis supports hybrid inference — each tenant can independently select local (self-hosted) or cloud providers for embeddings, LLM generation, and reranking. Local models run on a dedicated GPU server (NVIDIA RTX 5060 Ti, 64GB RAM) with automatic cloud fallback.

	Cloud	Local (Self-Hosted)	Edge
Embeddings
Model	OpenAI text-embedding-3-large	Nomic Embed Text v1.5	—
Dimensions	3,072	768	Marginal
Speed	100–300ms	10–20ms	Local
Cost	$0.13 / 1M tokens	$0.00	Local
LLM Generation
Model	GPT-4o-mini	Qwen 2.5 7B (Q4)	—
Quality (simple Q&A)	Excellent	Good	Slight cloud
Quality (complex)	Excellent	Decent	Cloud
Hallucination resistance	Strong	Moderate	Cloud
Speed (first token)	500–1,500ms	50–100ms	Local
Generation speed	30–50 tok/s	40–60 tok/s	Tie
Cost	$0.15–$0.60 / 1M tokens	$0.00	Local
Reranking
Model	Cohere rerank-v3.5	ms-marco-MiniLM-L-6-v2	—
Accuracy	Best in class	Very good	Slight cloud
Speed	200–500ms	5–20ms	Local
Cost	$0.002 / document	$0.00	Local
Overall
Per-query cost	~$0.041	$0.00	Local
Response latency	2–5 sec	1–4 sec	Local
Quality ceiling	Higher	85–90% of cloud	Cloud
Uptime	99.9%	Machine-dependent	Cloud

Hardware

RTX 5060 Ti • 64GB DDR5

Intel Core Ultra 7 265F — 20 cores

Hybrid Advantage

Local default + cloud fallback

Per-tenant provider selection — zero marginal cost at scale

A

Built by Atlas Minds

atlas-minds.com

March 31, 2026