LawPilot is an AI assistant for legal professionals — research, drafting, and document support powered by LLMs. Built to handle the parts of legal work that are tedious and time-consuming without requiring practitioners to expose sensitive client data to third-party services.

The Problem

Legal work involves a lot of repetitive document tasks: drafting standard clauses, researching case law, summarising lengthy contracts, and generating first-draft correspondence. Junior associates spend significant time on these tasks; partners spend significant money paying junior associates to do them.

Existing AI tools like ChatGPT are used informally by lawyers, but using them directly with client matters creates data privacy and confidentiality concerns. LawPilot is designed with those constraints at the centre — the architecture separates what goes to the LLM from what stays in your database.

Privacy-First Architecture

Client names, case references, and identifying details are stripped from document content before it is sent to OpenAI. A preprocessing step applies a named entity recognition pass using a lightweight local model (run in a Node.js child process via ONNX Runtime) to identify and replace PII with neutral placeholders:

// Before sending to OpenAI:
// "Pursuant to our agreement with John Smith (Case #2024-1234)..."
// becomes:
// "Pursuant to our agreement with [CLIENT_A] (Case #[CASE_REF_1])..."

// After receiving the response, placeholders are substituted back
// for display within the secure context.

The placeholder map is stored per session in the database, never sent to OpenAI, and used to restore the response for display. OpenAI receives only the anonymised text.

Document Workflows

LawPilot is organised around three core workflows:

Draft — provide a document type, jurisdiction, and key terms; receive a full first-draft that follows standard conventions for that document type. Clause library allows reusing and tweaking commonly used language.
Review — upload a contract or document; receive a structured summary, a list of unusual or potentially problematic clauses highlighted with plain-English explanations, and suggested amendments.
Research — describe a legal question; receive a structured response with relevant principles, common approaches, and suggested research directions. Explicitly not a substitute for proper legal research, but a starting point that saves hours.

Stack and Architecture

React + Vite (frontend)
  └── Document editor (ProseMirror)
  └── Chat interface for research workflow
  └── Document library with version history

NestJS (backend)
  └── OpenAI integration with streaming responses
  └── PII preprocessing (ONNX NER model)
  └── Document storage (PostgreSQL + S3-compatible)
  └── Clause library CRUD
  └── Session management

AI responses stream directly to the frontend via Server-Sent Events. The NestJS service pipes the OpenAI streaming response through the PII placeholder restoration step before forwarding to the client, so users see text appearing word by word with real client names already substituted back in.

The Document Editor

The draft workflow outputs into a ProseMirror editor rather than a read-only text area. This was a deliberate choice: lawyers need to edit AI-generated drafts, not just read them. ProseMirror provides a proper rich text foundation with track-changes support — edited clauses are marked so reviewers can distinguish AI-generated content from human edits.

Jurisdiction Awareness

Legal conventions differ significantly between jurisdictions. A contract clause that is standard in English law may be unenforceable or unusual in US law. The system prompt for each workflow includes the selected jurisdiction, and the document type configurations include jurisdiction-specific instructions that shape the model's output — different standard clauses, different formatting conventions, different legal terminology.

What I Learned

Streaming AI responses with a preprocessing step in the middle is trickier than it looks. The PII placeholder restoration needs to handle the case where a placeholder spans multiple stream chunks — you cannot restore [CLIENT_A] if the stream delivers [CLIENT in one chunk and _A] in the next. The solution is a small buffer that holds incomplete placeholder patterns and flushes them only when the closing bracket arrives.

The NER model for PII detection runs locally in a Node.js child process. Spinning up the ONNX Runtime adds latency to the first request in a session. Keeping the worker process alive and warm between requests was necessary to make the preprocessing feel instant.

Status

LawPilot is in active development. The draft and review workflows are functional. The research workflow, clause library, and ProseMirror integration are in progress. The PII preprocessing pipeline is complete and tested.