# DoCoreAI — Self-Tuning LLM Budgets, Privacy-First

> Stop AI budget overruns before they happen. DoCoreAI uses a self-tuning token budget that learns from real usage and adjusts every request — without ever reading a prompt. Lower AI costs. Zero compromise on privacy.

DoCoreAI is a privacy-first LLM cost observability and autonomous budget governance SDK for teams running AI in production. It monkey-patches LLM SDKs in the same Python environment as your application — never sitting in the call path, never storing prompts — and governs every LLM call autonomously using a self-improving LightGBM prediction model.

## The Problem It Solves

Enterprise AI teams face an impossible choice:

- **Log everything** → compliance team blocks the rollout (PII risk, data retention liability)
- **Log nothing** → flying blind on costs, no audit trail, no way to explain bill shocks to leadership

DoCoreAI eliminates both problems: full cost visibility with zero prompt storage, by architecture — not by policy.

## Key Features

- **Self-tuning budgets**: LightGBM prediction model learns real usage patterns and replaces wasteful default max_tokens ceilings with precise per-request estimates
- **Autonomous budget pacing**: Spreads daily budget across 24 hours intelligently; detects over-pace and throttles automatically — no human approval needed
- **Zero prompt storage**: Metadata-only telemetry by architecture; prompt and response content never leave your network
- **Zero code changes**: Monkey-patches all active LLM SDK calls at startup; your existing code is untouched
- **PII detection at edge**: Scans for names, emails, phone numbers, credit cards before any API call leaves your environment
- **Drift detection & auto-retrain**: Continuously monitors prediction accuracy; retrains the LightGBM model, runs an A/B test, and promotes the better model automatically
- **Soft limits**: Instead of hard-truncating responses, injects concise guidance into the system prompt when budget is tight — quality preserved
- **Real-time cost intelligence**: Per-request cost tracking by team, feature, and model; anomalies caught before they become bill shocks
- **Governance & audit trails**: All governance events logged locally; no sensitive content ever sent to cloud

## Architecture

DoCoreAI operates as a three-step autonomous loop:

**Step 1 — Every LLM Call**
DoCoreAI SDK monkey-patches the LLM SDK in the same Python process as your app. Before each call: governance check → PII scan → token prediction → budget check → pacing adjustment → soft limit injection. After each call: comparison of prediction vs. actual, telemetry logging, model update.

**Step 2 — Train & Predict**
Telemetry from every call feeds a LightGBM model trained on 30 days of usage patterns. The model predicts exact completion tokens needed per request, replacing the wasteful 2,000+ token default ceiling with a precise estimate (typically 200–400 tokens). At 30,000 requests/month, prediction alone saves approximately $990/month.

**Step 3 — Detect Drift & Auto-Retrain**
Drift detector monitors prediction accuracy continuously. When MAE increases >20%, it triggers automatic retraining, runs a 24-hour A/B test with 10% challenger traffic, and auto-promotes the better model. No action required from your team.

Your app calls the LLM provider directly, unchanged. Only cost and token metadata is sent to DoCoreAI Cloud. Prompts never leave your network. Ever.

## Supported LLM Providers

All providers are auto-detected and wrapped at startup — no provider-specific configuration required:

- OpenAI (GPT-4, GPT-4 Turbo, GPT-4o, GPT-3.5 Turbo and all variants)
- Anthropic (Claude 3 family, Claude 3.5, and newer models)
- Google Gemini (Gemini Pro, Gemini Ultra, Gemini Flash)
- Groq (Llama, Mixtral, and all Groq-hosted models)
- AWS Bedrock (all supported foundation models)
- Ollama (local model deployments)

## Budget Modes

Six enforcement modes — from intelligent optimization to hard block:

- `smart_reduce` (default): Intelligently reduces token limits as budget tightens; service never stops
- `reduce_tokens`: Linear token reduction based on remaining budget percentage
- `warn_and_allow`: Logs warnings but never blocks (recommended for week 1 learning phase)
- `block`: Hard stop when budget is exhausted (strict compliance use cases)
- `use_fallback`: Switches to cheaper model when budget is tight (multi-model setups)
- `ignore`: No enforcement (development only)

## Pacing Strategies

- `even`: Divides budget equally across 24 hours (recommended for first 7 days)
- `adaptive`: Learns hourly usage patterns over 7 days, then paces to your real profile (recommended after learning)
- `peak_aware`: Adaptive baseline plus real-time spike detection; allows temporary over-pace during detected events

## Installation & Quick Start

**Requirements:** Python 3.12+ · Works on Windows, macOS, Linux

```bash
pip install docoreai
docoreai config    # one-time setup — generates free org token at docoreai.com
docoreai start     # automatically intercepts all LLM SDK calls
```

No changes to your application code required. Ever.

## Use Cases

- **SaaS Platforms**: Autonomous pacing prevents budget exhaustion during traffic spikes; service stays up 24 hours on the same daily budget
- **Regulated Industries (Healthcare, Finance)**: Full AI observability with zero prompt storage; HIPAA-friendly architecture; PII detection at edge; 7-year audit log retention
- **E-Commerce**: Seasonal peak-aware pacing absorbs 5× traffic spikes without blowing monthly budget
- **Enterprise & Government**: Multi-tenant governance, RBAC, SSO/SAML (roadmap Q3 2026)
- **Developer Tools**: Drop-in SDK integration; per-team cost attribution

## Architecture Coverage (Current & Roadmap)

- Single-shot LLM calls: ✓ Live (v2.1.0)
- Agentic tool-calling: September 2026
- Multi-turn conversations: TBD
- Multi-agent orchestration: February 2027
- RAG / large-context: July 2027

## Pricing

- **Free**: Forever free · up to 100K API requests/month · single LLM provider · basic cost & token dashboard · PII detection · budget tracking & alerts
- **Design Partner Program**: Full platform access at zero cost for 3–5 enterprise teams · white-glove founder support · direct roadmap input · 3–5 spots available
- **Pro**: $99–$299/month · 1M requests/month · all providers · advanced dashboards · intelligent pacing · auto-retraining · A/B testing · coming soon via Paddle
- **Enterprise**: $999/month+ · unlimited requests · SSO/RBAC/SAML · multi-tenant governance · dedicated CSM · on-premise option · SOC2 in progress

## Traction

- v2.1.0 on PyPI (June 2026) · Python 3.12+ · closed-source from v2.0
- 30K+ total PyPI downloads
- 20+ enterprise AI engagements (same privacy vs. visibility gap found every time)
- 17 months active research and build (January 2025 — present)
- In the running for Anthropic & AWS Agentic AI Accelerator, Bengaluru 2026
- SOC2 certification targeted Q3 2026

## Key Pages

- Homepage: https://docoreai.com/
- Documentation: https://docoreai.com/docs/
- Pricing: https://docoreai.com/pricing/
- Live Demo Dashboard: https://docoreai.com/demo-dashboard-llm-analytics/
- Features: https://docoreai.com/features/
- Security & Privacy: https://docoreai.com/security/
- PyPI: https://pypi.org/project/docoreai/

## Contact & Founder

- Founder & CEO: Saji John · Bengaluru, India
- LinkedIn (founder): https://www.linkedin.com/in/saji-john-docoreai/
- LinkedIn (company): https://www.linkedin.com/company/docoreai/
- Email: saji.john@docoreai.com · info@docoreai.com
- Book a call: https://calendly.com/docoreai
- Legal entity: MobiLights · Neeladri Road, Electronic City Phase 1, Bangalore 560100, India

## Privacy & Compliance

- Prompt content: never stored, never sent to cloud — by architecture
- Response content: never stored, never sent to cloud
- Telemetry collected: token counts, cost per request, latency, model name — all aggregated
- PII detection runs at the edge inside your environment before any data leaves your network
- Local telemetry stored in SQLite on your machine
- Audit trails retained for 7 years (HIPAA/SOX compliant configuration)
- SOC2 certification in progress (targeted Q3 2026)
- GSTIN: 29AEVPJ5922P2ZG (India)