About Us Services Case Studies FAQ Get in touch
← Back
Case Study · AI Automation — Custom Hybrid Workflow

Intelligent Document Processing Pipeline

A lending platform was spending 15 hours a day on manual document review. We built a six-stage pipeline that classifies, extracts, validates, and routes documents — reducing processing time by 95%.

Client

B2B Lending Platform

Industry

Fintech

Timeline

5 weeks

Type

Custom Hybrid Automation

The Situation

Every loan application came with a stack of supporting documents — pay stubs, bank statements, tax returns, identity documents, proof of address — arriving in every format imaginable. PDFs, scanned images, photos taken on phones, and the occasional Word document.

Their operations team of three was opening each document individually, reading through it, typing key data into their system, cross-referencing it against the application, and flagging anything that didn't match. At around 60 applications per day, it was consuming roughly 15 hours of combined staff time daily.

Error rates were climbing. Processing times were stretching. And the team was burning out.

"We needed a system that could handle the messy reality of how documents actually arrive — not a solution that only works when everything is a perfectly formatted PDF."

What We Built

A six-stage pipeline that takes documents from arrival to validated, structured data — with humans only stepping in when the system isn't confident enough to proceed alone.

How It Works

1

Ingestion

Watchers on every inbound channel — email inbox, upload portal, shared Drive folder. Every document gets normalised: format conversion, image preprocessing (straighten, contrast correction, shadow removal), then queued for processing.

Custom image preprocessing
2

Classification

AI vision model identifies document type (pay stub, bank statement, tax return, ID, etc.), language, and whether it's a digital original, scan, or photograph. Classification determines which extraction template runs next.

3

Intelligent Extraction

Each document type has a custom extraction schema. The AI returns structured JSON with every field plus a confidence score (0 to 1). Post-processing validates outputs — real dates, parseable amounts, internal consistency checks.

Custom prompts, validation & confidence scoring
4

Validation & Cross-Referencing

Extracted data checked against the loan application: name matches, income alignment, employer verification, address consistency, duplicate detection. Each check produces pass/warning/fail.

Custom validation layer
5

Smart Routing

High confidence + all validations passed → auto-approved, data populates the lending system. Medium confidence → Slack review queue (avg 90 seconds per review). Critical issues → escalated to senior reviewer.

6

Learning Loop

Every document logged with processing time, confidence scores, and human corrections. Weekly reports surface patterns. Prompts and thresholds tuned continuously — auto-approval rate improved from 48% to 79% in three months.

Why This Couldn't Be Drag-and-Drop

Standard workflow tools (n8n, Zapier, Make) handled about 30% of this pipeline — the triggers, routing, notifications, and scheduling. The remaining 70% required custom work:

Image preprocessing for cleaning up photos of crumpled documents taken under bad lighting. Extraction prompt engineering iterated over weeks per document type. Post-processing logic that catches when line items on a pay stub don't add up to the stated total. Multi-page handling for bank statements where transaction tables span pages. Confidence scoring tuned to the client's risk tolerance — for a lending platform, a wrong number isn't just an inconvenience, it's a compliance risk.

We used off-the-shelf tools where they made sense and wrote custom code where the problem demanded it. That's the difference between a demo and a production system.

Results

Before
15 min
Per document, end to end
4.2%
Data entry error rate
~60/day
Application capacity
15 hrs/day
Combined staff time
After
47 sec
Per document, end to end
0.6%
Data entry error rate
150+/day
Application capacity
2.5 hrs/day
One staff member, review only
95%
Faster processing
From 15 minutes to 47 seconds per document including both auto-approved and human-reviewed.
79%
Auto-approval rate
Started at 48% in week one. Reached 79% after three months of continuous tuning.
~$95K
Estimated annual saving
In operational labour costs. Project investment recouped within 8 weeks of going live.
86%
Fewer data entry errors
Error rate dropped from 4.2% to 0.6% — critical for a lending platform's compliance requirements.

The Tech Stack

n8n Custom Python AI Vision Model Slack Client Lending Platform API PostgreSQL

No new platforms for the team to learn. The system plugs into what they already use.

What Made This Work

Starting with the messy reality. We didn't build for perfectly formatted PDFs and hope for the best. We started with the worst-case documents — blurry photos, handwritten notes, multi-page scans — and built the system to handle those first. Everything else became easy by comparison.

Confidence scoring as the safety net. The system never guesses and hopes. Every extracted field carries a confidence score, and routing thresholds are tuned to the client's risk tolerance. For a lending platform, a wrong number isn't an inconvenience — it's a compliance risk.

Hybrid approach. Off-the-shelf workflow tools where they made sense. Custom code where the problem demanded it. This kept the build lean while handling complexity that drag-and-drop tools can't touch.

The feedback loop. The system got meaningfully better every week because we built measurement in from day one. Without tracking what humans corrected, there's no way to improve prompts and thresholds over time.

Engagement Timeline

Week 1
Discovery
Weeks 2–4
Build
Week 5
Tuning & handover
Ongoing
Monthly optimisation

Drowning in manual processes?

Let's talk about what a custom automation pipeline could free up for your team.

Get in touch

This case study represents a composite engagement based on real automation work. Client details have been anonymised.