How I built an AI pipeline that reads Kfz claim documents, extracts every field, scores readiness across 4 dimensions, and gives brokers an action checklist β all in under 5 seconds.
THE CHALLENGE
German insurance brokers spend 15β20 minutes manually reviewing each Kfz Schadensmeldung (vehicle damage report). They read multi-page PDFs, cross-reference fields against checklists, assess completeness, and identify what's missing before forwarding to the insurer. A single missed field means a callback, a delayed payout, and an unhappy client.
The problem compounds: brokers handle 10β30 claims daily. That's 2.5β10 hours per day on document review alone β repetitive, error-prone, and impossible to scale.
VALIDATED PAIN POINTS
β Independent Insurance Broker, Munich
β Claims Team Lead, Regional Brokerage
β Senior Broker, Hamburg
SOLUTION
ClaimIQ is a full-stack AI pipeline purpose-built for the German Kfz insurance workflow. Brokers upload a Schadensmeldung PDF, and within 5 seconds receive: structured field extraction, a 0β100 readiness score across 4 dimensions, automated fraud signal detection, and an interactive action checklist.
| Dimension | What It Measures |
|---|---|
| Completeness | Are all required fields (Kennzeichen, VSN, Unfallhergang, etc.) present? |
| Consistency | Do dates, amounts, and descriptions align logically? |
| Fraud Signals | German-specific patterns: claim date = policy start, round amounts, late filing |
| Documentation | Are supporting documents (photos, police report, repair estimate) referenced? |
MARKET ANALYSIS
| Solution | Critical Gap |
|---|---|
| Manual Checklists | 15β20 min per claim; human error increases with volume |
| Generic OCR (ABBYY, Adobe) | Extracts text but doesn't understand Kfz field semantics |
| Enterprise IDP (Kofax, UiPath) | β¬50K+ annual; overkill for independent brokers |
| ChatGPT / Generic AI | No structured output; can't score readiness or detect fraud patterns |
PRIORITIZATION & SCOPE
| Feature | Rationale |
|---|---|
| Dual-layer OCR (Tesseract + Vision fallback) | German handwriting + fax-quality PDFs require robust extraction |
| Structured Kfz field extraction via Gemini | Core value proposition β semantic understanding, not just text |
| 4-dimension readiness scoring | Brokers need a decision, not raw data |
| Interactive action checklist | Bridges the gap between analysis and broker workflow |
| Branded PDF report export | Brokers send structured reports to clients and insurers |
| DE/EN bilingual interface | German-first, but many brokers work with international clients |
Complex pipeline orchestration; single-claim UX is validating first
Requires API partnerships with German insurance platforms; post-PMF priority
Requires persistent storage and user accounts; MVP validates core extraction first
| Trade-off | Choice | Rationale |
|---|---|---|
| Single LLM call vs. two-stage pipeline | Two-stage: extraction β scoring | Separation improves accuracy; scoring prompt can reference extracted fields explicitly |
| Cloud OCR only vs. local-first with fallback | Local Tesseract + Cloud Vision fallback | Free-first approach; only invokes paid API when confidence < 80% |
| GPT-4 vs. Gemini 1.5 Flash | Gemini 1.5 Flash | Strong German language support; 10x lower cost; sufficient accuracy for structured extraction |
EXECUTION & ITERATION
| Phase | Duration | Deliverable |
|---|---|---|
| Phase 1: OCR + Extraction Core | Week 1 | Tesseract pipeline + Gemini extraction prompt + FastAPI endpoints |
| Phase 2: Scoring & Fraud Detection | Week 2 | 4D scoring model, German fraud heuristics, action checklist generation |
| Phase 3: Frontend & UX | Week 3 | Next.js 14 UI, animated score gauge, PDF export, DE/EN toggle |
| Phase 4: Production & Polish | Week 4 | PWA setup, demo mode, VPS deployment, monitoring |
German compound words and insurance-specific terminology (Versicherungsscheinnummer, Unfallhergang) caused high OCR error rates. Solved with a dual-layer approach: Tesseract with German language pack as the primary engine, falling back to Google Vision API when confidence drops below 80%. This reduced extraction errors by 60%.
Schadensmeldungen vary wildly across insurers β different layouts, field orders, and terminology. A two-stage Gemini pipeline solved this: the first prompt extracts into a rigid schema regardless of source format; the second scores against business rules. Separating concerns improved both accuracy and debuggability.
Sales conversations with brokers stall when they need to configure API keys before seeing value. Built a full mock pipeline with realistic sample data so the entire UI flow works without any API keys β brokers experience the product before committing.
TECH STACK
| Layer | Tech | Why |
|---|---|---|
| API | FastAPI + uvicorn | Async, typed, auto-generated OpenAPI docs |
| OCR | Tesseract 5 (deu/eng) + Google Vision | Free-first with quality fallback |
| AI | Google Gemini 1.5 Flash | Cost-efficient, strong German NLU |
| ReportLab | Pure Python, no browser dependency | |
| DB | SQLite (dev) / PostgreSQL (prod) | Zero-config dev, Neon free tier prod |
| Layer | Tech | Why |
|---|---|---|
| Framework | Next.js 14 (App Router) | RSC, PWA support, edge-ready |
| Styling | Tailwind CSS v3 JIT | Glassmorphism, custom animations |
| Language | TypeScript | End-to-end type safety with Pydantic schemas |
OUTCOMES & IMPACT
Field extraction accuracy
End-to-end processing
Manual review eliminated
Infrastructure cost
| Expectation | Reality | Learning |
|---|---|---|
| Tesseract alone would handle all PDFs | Faxed/scanned forms had <70% accuracy | Cloud Vision fallback was essential; should have planned for it from day 1 |
| Brokers would want detailed field-by-field view | They only care about the score and what's missing | Action checklist became the most-used feature; detail is secondary |
| Single AI call would be fast enough | Combined extraction+scoring in one prompt was slow and inaccurate | Two focused calls are faster and more reliable than one overloaded call |
KEY TAKEAWAYS
A Gemini prompt that knows what a Versicherungsscheinnummer is and which fields are legally required outperforms any generic document extraction tool. Domain knowledge baked into prompts is the moat.
Brokers don't want extracted text β they want a decision-ready score and a task list. The readiness score and action checklist are what they pay for, not the OCR.
Every B2B tool should work without configuration for first-time users. Removing the setup barrier converted more prospects than any feature list ever could.
Running the entire stack for ~β¬5/month means I can test market fit without revenue pressure. Gemini Flash + local Tesseract + SQLite keeps the burn at near-zero.