Docs · Setup

Knowledge bank ingest.

Drop in docs, FAQs, policies. Connect Shopify, REST, SQL, CSV. Voyage embeddings, Qdrant retrieval per turn, PII redacted before the LLM ever sees it.

Two collections per tenant

Every tenant gets two Qdrant collections, hard-isolated:

company_kb — your own docs, FAQs, policies, scripts. Long-form. Voyage-3-large.
customer_data — per-customer state, orders, tickets, plan. Shorter. Voyage-3.

Both are queried on every turn. The rule book can scope which one is consulted.

Company KB — uploading docs

Dashboard → Knowledge bank → New artifact. Supported formats:

Markdown (.md, .markdown)
PDF (text-extracted; OCR pass-through for scanned PDFs)
DOCX
HTML / web pages (paste URL, we crawl one page; bulk URL list also works)
Plain text (.txt)

Each artifact gets semantically chunked (500–1000 tokens per chunk, sentence boundaries respected), embedded with Voyage-3-large, stored with rich metadata (title, section, source URL, last updated).

Bulk sync from Git or S3

If your docs live in a Git repo or an S3 bucket, point us at it. We poll on a cadence (default daily) and re-index on changes. Authentication via deploy key or S3 IAM role.

Customer data — connectors

Shopify — OAuth install via the Shopify app. We sync customers, orders, returns, subscriptions on webhook events plus a nightly full pass.

Generic REST — Paste an endpoint + auth header. We poll on your schedule. Returns rows in your shape; we map fields to our Contact / Order / Ticket models.

SQL (JDBC / SSH) — Provide a read-only DB user + connection string. SSH tunnel supported. Run a discovery query, map columns, schedule incremental pulls.

CSV upload — Drag-and-drop. Schema-discovery wizard infers types and runs through a merge preview before commit. Best for one-off campaign uploads.

Identity resolution

One customer, three data sources, three different identifiers — the orchestrator merges them into one Contact:

Email + phone as primary keys (fuzzy-normalised: lowercase, E.164)
External IDs preserved per source (so we can write-back to Shopify by their customer_id while writing back to your CRM by yours)
Conflicts surfaced as merge candidates for you to confirm; we don't merge silently across high-confidence-but-not-certain matches

PII redaction (mandatory)

Before every LLM call, customer-data fields run through the redactor:

Phone (E.164 + Indian regional + US 10-digit) → {{phone:N}}
Email → {{email:N}}
PAN, Aadhaar (India) → {{pan:N}}, {{aadhaar:N}}
SSN (US) → {{ssn:N}}
IFSC, bank account → {{ifsc:N}}, {{bank:N}}
Money amount → {{money:N}}
Address (multi-line postal) → {{addr:N}}

The model reasons over the tokens; we reinject originals before the customer hears or reads the reply. Logs keep redacted form by default; full-fidelity audit trail available to DPOs.

Retrieval per turn

Default: top-5 chunks from company_kb + top-3 chunks from customer_data scoped to the current contact. Tuneable per rule book.

Re-rank pipeline runs over the initial recall — Voyage rerank-3 — to surface the paragraphs that actually answer the query, not just the closest neighbours.

Updating the KB

Re-uploads replace the previous version. Old chunks are deleted; new chunks indexed. A version history is retained (per artifact) for rollback.

If you want to A/B test KB versions on different campaigns, scope the artifact to a workflow rather than a tenant.