AI SaaS / Business Operations

Corzeni

A multi-tenant RAG SaaS for turning uploaded documents and Google Drive sources into workspace-scoped, cited internal knowledge answers.

Visit Website View GitHub

AI SaaS / Business Operations

Corzeni

Case Study Notes

Overview

Corzeni is a retrieval-first internal knowledge SaaS built to help small business operations teams, MSPs, and service teams turn scattered company documents into grounded, cited answers.

The product is designed around a practical workflow: a customer admin creates or receives a workspace, uploads internal documents or connects approved Google Drive sources, Corzeni processes that content into searchable chunks, and users ask plain-language questions inside that workspace. The intended answer experience returns responses grounded in the customer's own source material, with citations back to the original documents so teams can verify the information instead of treating the output like a generic chatbot.

Corzeni is being built as a hosted, multi-tenant SaaS with workspace isolation, Supabase-backed authentication, data, and storage, a pgvector-ready document chunk schema, OpenAI embeddings, Stripe billing foundations, customer-admin controls, guided onboarding, document ingestion, Google Drive source-connector foundations, and a local to UAT to production build model.

This is a focused Phase 1 product, not a vague AI platform. The core boundaries are document upload, source connection, background processing, retrieval-ready storage, grounded answers, citations, workspace separation, admin controls, billing state, and supportability.

Core Problem

Most growing teams do not have a true knowledge-access problem because the information usually exists somewhere. They have a knowledge-retrieval and trust problem.

Internal answers are often buried across PDFs, SOPs, shared drives, onboarding documents, policy files, templates, spreadsheet exports, and tribal knowledge. Employees lose time searching through folders, asking senior staff repeat questions, or guessing based on outdated information. New hires struggle to self-serve. Managers become bottlenecks. Teams lose confidence because even when an answer is found, it is not always clear where it came from or whether it is still trustworthy.

Corzeni addresses this by treating internal knowledge as a searchable, workspace-scoped retrieval system. The goal is not to replace human judgment with AI. The goal is to reduce operational drag by helping users ask normal business questions and receive answers grounded in their own approved content.

Tenant isolation is central to the design. Every document, connected source, chunk, query, citation, user, membership, billing state, and processing job is tied to a workspace boundary. Tenant safety is treated as a database and runtime concern, not as a front-end filter. The RLS policy layer enables row-level security across documents, document chunks, processing jobs, queries, citations, feedback, usage summaries, admin events, invites, checkout sessions, and billing webhook events.

Main Value

Corzeni turns static and scattered internal knowledge into a usable answer layer for daily operations.

Instead of forcing a user to know which folder, document, shared drive, or process owner contains the answer, Corzeni lets the user ask the question directly. The system is designed to retrieve the most relevant passages from that workspace's approved knowledge sources, generate an answer from the retrieved evidence, and show citations so the user can verify the source.

The value is strongest in:

Speed: Teams can get operational answers faster without digging through folders or interrupting senior employees.
Trust: Citations create a review surface between the AI-generated answer and the original source document.
Onboarding: New employees can ramp faster because internal documentation becomes easier to query.
Operational control: Customer admins can manage documents, connected sources, users, usage limits, and workspace access without routine founder intervention.

The product's first value moment is not upload alone. First value happens when a user receives a grounded answer with at least one visible citation inside the correct workspace.

User Experience

Corzeni's UX is built around getting a customer from signup to their first trusted answer as cleanly as possible.

The intended journey is:

Sign in
Choose package
Create or confirm workspace
Complete Stripe checkout
Move through provisioning
Enter guided onboarding
Upload documents or connect Google Drive
Process approved content
Ask the first real question
Review a cited answer
Invite teammates

The current get-started implementation includes package selection, checkout error handling, workspace eligibility checks, recurring price configuration checks, and Stripe checkout entry points.

After provisioning, the admin enters onboarding. The onboarding screen explains what Corzeni does, what documents to upload first, supported file types, how answers and citations work, how storage and processing fit together, and why first value requires a cited answer rather than just a completed upload or connected source.

The onboarding UI includes a real document-upload section, upload status feedback, processing status display, support messaging, and future slots for processing confirmation, first question, citation review, teammate invites, Google Drive connection, and completion.

Tech Stack

Frontend/App Framework: Next.js 16, React 19, TypeScript
Hosting: Vercel
Database: Supabase Postgres
Auth: Supabase Auth
Storage: Supabase Storage
Vector Layer: pgvector-ready document_chunks schema with 1536-dimension embeddings
AI Provider: OpenAI
Embedding Model: text-embedding-3-small
Billing: Stripe checkout and webhook foundations
Monitoring: Sentry for Next.js
Styling: Tailwind CSS
Source Connectors: Google Drive connection foundation in progress
Environments: Local, UAT, Production
Branch Model: main, uat, feature/*, hotfix/*

The RAG ingestion path is designed as:

Upload file or ingest approved source content to validate file type or CSV structure where applicable, store in Supabase Storage, write document metadata, queue a processing job, let a protected worker claim the parse job, parse the document, chunk text, generate embeddings, insert document_chunks rows, and mark the document ready.

The ingestion worker imports parsing, chunking, embedding, and Supabase admin utilities. It claims queued parse jobs, downloads the stored file, parses it, chunks the text, embeds the chunks, writes rows into document_chunks, updates document status, and marks the processing job as succeeded or failed with safe operational error messages.

Chunking is implemented with target and max chunk sizes, paragraph-aware splitting, sentence and word fallback for oversized units, and source offsets for later citation and source mapping. Embedding uses OpenAI text-embedding-3-small, fixed 1536-dimension vectors, batching, response-order normalization, and dimension validation before returning embedded chunks.

The database schema includes documents and document chunks with workspace IDs, document IDs, storage paths, status fields, raw text counts, chunk counts, lifecycle triggers, document-limit enforcement, and vector(1536) embeddings.

Proof-of-Work Signal

Corzeni demonstrates product thinking and implementation discipline across the layers needed for a serious internal knowledge SaaS:

RAG architecture understanding: The project shows document storage, source ingestion foundations, parsing, chunking, embedding, vector-ready storage, processing jobs, workspace metadata, and a future retrieval/citation flow.
Multi-tenant SaaS design: Corzeni resolves authenticated identity, profile sync, membership, workspace status, provisioning status, and runtime readiness before allowing workspace actions. Customer roles are separated into admin and member capabilities.
Security and tenant-isolation awareness: The repo includes RLS completion migrations and tenant-isolation validation documentation. The validation plan checks that users cannot resolve foreign workspace boundaries and that retrieval-sensitive tables like document_chunks and query_citations remain invisible across tenants.
Real product workflow: Corzeni includes authentication pages, get-started flow, checkout foundation, provisioning page, dashboard, onboarding page, document upload, upload feedback, processing status, support states, and source-connector foundations.
Commercial and operational logic: The billing package module defines Starter, Growth, and Scale plans; monthly and annual intervals; founder-assisted and self-serve onboarding modes; onboarding fees; annual discounts; Stripe price environment variables; and package validation.
Background processing discipline: The ingestion system is separated from the live upload path. Upload creates a document row and queued parse job, while a protected worker route processes jobs separately. Hosted scheduler documentation explains the protected cron route, required secrets, OpenAI/Supabase environment variables, UAT Vercel project, and production cadence requirements.
Environment and release discipline: The repo uses main, uat, and feature branches. The operating model requires feature branch to local to UAT to production, with production treated as a release target rather than a build surface.
AI-assisted build workflow: The project demonstrates how Eric works with AI as a build partner. Notion serves as the source of truth, ChatGPT helps sequence and reason through the build, Eric acts as the human operator and decision-maker, and Codex/AI-assisted coding tools are used as the implementation layer inside a controlled branch and environment model.

Current Status Details

Corzeni is currently in active Phase 1 implementation on the uat branch.

The GitHub repository reflects working implementation layers. The uat branch is 37 commits ahead of main and includes application routes, Supabase migrations, document storage, processing jobs, ingestion worker code, billing/provisioning modules, onboarding UI, RLS validation documentation, hosted ingestion scheduler, and source-connector foundations.

Implemented or substantially represented on uat:

Next.js app foundation
Supabase client/server/admin utilities
Auth shell: login, signup, logout, reset password, callback flow
Workspace identity and boundary resolution
Customer admin/member capability model
Self-serve get-started flow
Stripe checkout foundation
Billing package catalog
Workspace provisioning and handoff logic
Guided onboarding shell
Document upload interface
Document storage and metadata insertion
Supported file validation for PDF, DOCX, TXT, and structured CSV
CSV structure validator
Processing jobs foundation
Ingestion worker
Document parsing, chunking, and embeddings
document_chunks schema with vector embeddings
Hosted ingestion scheduler via Vercel cron
Google Drive/source-connector foundation in progress
Supabase migrations for core schema, RLS, documents, chunks, processing jobs, queries, citations, feedback, usage summary, and admin events
Validation docs for tenant isolation, document storage, processing jobs, provisioning, self-serve provisioning, and hosted ingestion scheduler

The full ask/retrieval/answer/citation user experience is not fully complete yet. The repo has the ingestion and retrieval-ready storage foundation in place, along with Google Drive/source-connector foundations in progress, but the real retrieval/ask path is not built in that branch yet. Future ask routes must resolve the workspace boundary before embedding search and must filter vector search by the resolved workspace ID.

The next major milestone is completing the protected ask path: workspace-filtered vector retrieval, bounded context assembly, grounded answer generation, citation rendering, feedback logging, and first-value activation tracking.

Want to discuss similar work?

Connect about systems, workflows, websites, operational ideas, or practical AI-assisted builds.

Contact Eric View More Projects