AI SaaS / Business Operations
Corzeni
A multi-tenant RAG SaaS for turning uploaded documents and Google Drive sources into workspace-scoped, cited internal knowledge answers.
AI SaaS / Business Operations
C
Corzeni
Case Study Notes
Overview
Corzeni is a retrieval-first internal knowledge SaaS built to help small business operations teams, MSPs, and service teams turn scattered company documents into grounded, cited answers.
The product is designed around a practical workflow: a customer admin creates or receives a workspace, uploads internal documents or connects approved Google Drive sources, Corzeni processes that content into searchable chunks, and users ask plain-language questions inside that workspace. The intended answer experience returns responses grounded in the customer's own source material, with citations back to the original documents so teams can verify the information instead of treating the output like a generic chatbot.
Corzeni is being built as a hosted, multi-tenant SaaS with workspace isolation, Supabase-backed authentication, data, and storage, a pgvector-ready document chunk schema, OpenAI embeddings, Stripe billing foundations, customer-admin controls, guided onboarding, document ingestion, Google Drive source-connector foundations, and a local to UAT to production build model.
This is a focused Phase 1 product, not a vague AI platform. The core boundaries are document upload, source connection, background processing, retrieval-ready storage, grounded answers, citations, workspace separation, admin controls, billing state, and supportability.
Core Problem
Most growing teams do not have a true knowledge-access problem because the information usually exists somewhere. They have a knowledge-retrieval and trust problem.
Internal answers are often buried across PDFs, SOPs, shared drives, onboarding documents, policy files, templates, spreadsheet exports, and tribal knowledge. Employees lose time searching through folders, asking senior staff repeat questions, or guessing based on outdated information. New hires struggle to self-serve. Managers become bottlenecks. Teams lose confidence because even when an answer is found, it is not always clear where it came from or whether it is still trustworthy.
Corzeni addresses this by treating internal knowledge as a searchable, workspace-scoped retrieval system. The goal is not to replace human judgment with AI. The goal is to reduce operational drag by helping users ask normal business questions and receive answers grounded in their own approved content.
Tenant isolation is central to the design. Every document, connected source, chunk, query, citation, user, membership, billing state, and processing job is tied to a workspace boundary. Tenant safety is treated as a database and runtime concern, not as a front-end filter. The RLS policy layer enables row-level security across documents, document chunks, processing jobs, queries, citations, feedback, usage summaries, admin events, invites, checkout sessions, and billing webhook events.
Main Value
Corzeni turns static and scattered internal knowledge into a usable answer layer for daily operations.
Instead of forcing a user to know which folder, document, shared drive, or process owner contains the answer, Corzeni lets the user ask the question directly. The system is designed to retrieve the most relevant passages from that workspace's approved knowledge sources, generate an answer from the retrieved evidence, and show citations so the user can verify the source.
The value is strongest in:
- Speed: Teams can get operational answers faster without digging through folders or interrupting senior employees.
- Trust: Citations create a review surface between the AI-generated answer and the original source document.
- Onboarding: New employees can ramp faster because internal documentation becomes easier to query.
- Operational control: Customer admins can manage documents, connected sources, users, usage limits, and workspace access without routine founder intervention.
The product's first value moment is not upload alone. First value happens when a user receives a grounded answer with at least one visible citation inside the correct workspace.
User Experience
Corzeni's UX is built around getting a customer from signup to their first trusted answer as cleanly as possible.
The intended journey is:
- Sign in
- Choose package
- Create or confirm workspace
- Complete Stripe checkout
- Move through provisioning
- Enter guided onboarding
- Upload documents or connect Google Drive
- Process approved content
- Ask the first real question
- Review a cited answer
- Invite teammates
The current get-started implementation includes package selection, checkout error handling, workspace eligibility checks, recurring price configuration checks, and Stripe checkout entry points.
After provisioning, the admin enters onboarding. The onboarding screen explains what Corzeni does, what documents to upload first, supported file types, how answers and citations work, how storage and processing fit together, and why first value requires a cited answer rather than just a completed upload or connected source.
The onboarding UI includes a real document-upload section, upload status feedback, processing status display, support messaging, and future slots for processing confirmation, first question, citation review, teammate invites, Google Drive connection, and completion.
Tech Stack
- Frontend/App Framework: Next.js 16, React 19, TypeScript
- Hosting: Vercel
- Database: Supabase Postgres
- Auth: Supabase Auth
- Storage: Supabase Storage
- Vector Layer: pgvector-ready document_chunks schema with 1536-dimension embeddings
- AI Provider: OpenAI
- Embedding Model: text-embedding-3-small
- Billing: Stripe checkout and webhook foundations
- Monitoring: Sentry for Next.js
- Styling: Tailwind CSS
- Source Connectors: Google Drive connection foundation in progress
- Environments: Local, UAT, Production
- Branch Model: main, uat, feature/*, hotfix/*
The RAG ingestion path is designed as:
Upload file or ingest approved source content to validate file type or CSV structure where applicable, store in Supabase Storage, write document metadata, queue a processing job, let a protected worker claim the parse job, parse the document, chunk text, generate embeddings, insert document_chunks rows, and mark the document ready.
The ingestion worker imports parsing, chunking, embedding, and Supabase admin utilities. It claims queued parse jobs, downloads the stored file, parses it, chunks the text, embeds the chunks, writes rows into document_chunks, updates document status, and marks the processing job as succeeded or failed with safe operational error messages.
Chunking is implemented with target and max chunk sizes, paragraph-aware splitting, sentence and word fallback for oversized units, and source offsets for later citation and source mapping. Embedding uses OpenAI text-embedding-3-small, fixed 1536-dimension vectors, batching, response-order normalization, and dimension validation before returning embedded chunks.
The database schema includes documents and document chunks with workspace IDs, document IDs, storage paths, status fields, raw text counts, chunk counts, lifecycle triggers, document-limit enforcement, and vector(1536) embeddings.
Proof-of-Work Signal
Corzeni demonstrates product thinking and implementation discipline across the layers needed for a serious internal knowledge SaaS:
- RAG architecture understanding: The project shows document storage, source ingestion foundations, parsing, chunking, embedding, vector-ready storage, processing jobs, workspace metadata, and a future retrieval/citation flow.
- Multi-tenant SaaS design: Corzeni resolves authenticated identity, profile sync, membership, workspace status, provisioning status, and runtime readiness before allowing workspace actions. Customer roles are separated into admin and member capabilities.
- Security and tenant-isolation awareness: The repo includes RLS completion migrations and tenant-isolation validation documentation. The validation plan checks that users cannot resolve foreign workspace boundaries and that retrieval-sensitive tables like document_chunks and query_citations remain invisible across tenants.
- Real product workflow: Corzeni includes authentication pages, get-started flow, checkout foundation, provisioning page, dashboard, onboarding page, document upload, upload feedback, processing status, support states, and source-connector foundations.
- Commercial and operational logic: The billing package module defines Starter, Growth, and Scale plans; monthly and annual intervals; founder-assisted and self-serve onboarding modes; onboarding fees; annual discounts; Stripe price environment variables; and package validation.
- Background processing discipline: The ingestion system is separated from the live upload path. Upload creates a document row and queued parse job, while a protected worker route processes jobs separately. Hosted scheduler documentation explains the protected cron route, required secrets, OpenAI/Supabase environment variables, UAT Vercel project, and production cadence requirements.
- Environment and release discipline: The repo uses main, uat, and feature branches. The operating model requires feature branch to local to UAT to production, with production treated as a release target rather than a build surface.
- AI-assisted build workflow: The project demonstrates how Eric works with AI as a build partner. Notion serves as the source of truth, ChatGPT helps sequence and reason through the build, Eric acts as the human operator and decision-maker, and Codex/AI-assisted coding tools are used as the implementation layer inside a controlled branch and environment model.
Current Status Details
Corzeni is currently in active Phase 1 implementation on the uat branch.
The GitHub repository reflects working implementation layers. The uat branch is 37 commits ahead of main and includes application routes, Supabase migrations, document storage, processing jobs, ingestion worker code, billing/provisioning modules, onboarding UI, RLS validation documentation, hosted ingestion scheduler, and source-connector foundations.
Implemented or substantially represented on uat:
- Next.js app foundation
- Supabase client/server/admin utilities
- Auth shell: login, signup, logout, reset password, callback flow
- Workspace identity and boundary resolution
- Customer admin/member capability model
- Self-serve get-started flow
- Stripe checkout foundation
- Billing package catalog
- Workspace provisioning and handoff logic
- Guided onboarding shell
- Document upload interface
- Document storage and metadata insertion
- Supported file validation for PDF, DOCX, TXT, and structured CSV
- CSV structure validator
- Processing jobs foundation
- Ingestion worker
- Document parsing, chunking, and embeddings
- document_chunks schema with vector embeddings
- Hosted ingestion scheduler via Vercel cron
- Google Drive/source-connector foundation in progress
- Supabase migrations for core schema, RLS, documents, chunks, processing jobs, queries, citations, feedback, usage summary, and admin events
- Validation docs for tenant isolation, document storage, processing jobs, provisioning, self-serve provisioning, and hosted ingestion scheduler
The full ask/retrieval/answer/citation user experience is not fully complete yet. The repo has the ingestion and retrieval-ready storage foundation in place, along with Google Drive/source-connector foundations in progress, but the real retrieval/ask path is not built in that branch yet. Future ask routes must resolve the workspace boundary before embedding search and must filter vector search by the resolved workspace ID.
The next major milestone is completing the protected ask path: workspace-filtered vector retrieval, bounded context assembly, grounded answer generation, citation rendering, feedback logging, and first-value activation tracking.
Want to discuss similar work?
Connect about systems, workflows, websites, operational ideas, or practical AI-assisted builds.
