Underwriting CLI¶

Give Gen H's underwriters a conversational AI interface to all their systems. Instead of navigating AA, X-1, ComplyAdvantage, SIRA checks, Gmail, Slack, and HubSpot separately, the underwriter works through cases in Claude Code - pulling data, parsing documents, logging decisions, and actioning outcomes from one place. All existing deterministic code (affordability, eligibility, funder rules) stays exactly where it is. Claude is the interface, not the decision-maker.

This is a counter-thesis to building AI agents that automate underwriting from scratch. Rather than birthing an underwriter out of AI, strap a rocket onto the underwriters you already have.

Three Modes of AI in a Business¶

There are three ways to apply AI to improve how a business operates. Gen H has done all three. Understanding which mode to use where is the strategic question.

Mode 1: Manual with basic tooling¶

Build dashboards, spreadsheets, some code. Humans do the assembly, translation, and delivery. This works - it doesn't block the business - but execution leaves a lot to be desired. Quality drifts, human error creeps in, customisation is painful, and every special request means significant manual rework or quiet deprioritisation.

Funding reports before this week were Mode 1. LightDash dashboards filtered by funder, manually screenshotted and pasted into presentations. Reliable enough to function, but inflexible, error-prone, and producing output that looked less professional than Gen H deserves. When funders asked for custom metrics or definitions - which is their right, they all have different priorities - the response was either significant manual effort or a polite shrug.

Most of Gen H's operational tooling sits here today. It works. It's not good enough.

Mode 2: Full end-to-end AI automation¶

Capture all the nuance, rules, judgement, and evolving context of a role, and build a system that executes it autonomously. The system must be significantly better than human output - because the effort to build it is enormous, and "as good as before but automated" doesn't clear the bar.

This is what Nebula has been attempting for income verification. It's genuinely, extraordinarily hard. You need to understand deeply the concerns of every stakeholder, capture how their needs evolve over time, and produce a system that extremely reliably delivers customised, high quality, perfectly accurate output every single time. And then it needs to be traceable and auditable, because a couple of mistakes and people say "the system doesn't work and we don't know why" - and either you abandon it or someone has to check every output with a fine-toothed comb, which defeats the purpose of automating in the first place.

Even if you succeed, the prize is smaller than it appears. You've nominally automated a role - but businesses care about risk, accountability, and trust. No CEO is going to say "we have no one in capital markets now" or "we have no underwriters now." The human stays regardless. So you've spent enormous effort building something that still requires human oversight, except now the humans are checking a system they don't fully understand instead of doing work they deeply understand.

Mode 2 is right for some things. But Gen H has been defaulting to it because it's the traditional way to think about automation. The menu has expanded.

Mode 3: AI-augmented expert tooling¶

Rather than automate the expert, massively lower the barrier between what they know and what they can produce. Build Claude Code skills, commands, guardrails, and context that let domain experts work at the speed of their ideas rather than the speed of their tools.

The expert still decides what to do. AI removes the mechanical pipeline between their judgement and the outcome. Expert has an idea, expert delivers the idea. The entire pipeline shrinks.

This isn't traditional automation. You're not asking the product team to understand an entire role and encode it into a system. You're building tooling and guardrails that empower experts - who already know more than the product team about their domain - to work faster, smarter, and more creatively. You're strapping a rocket to these functions rather than boiling them down into an automation.

The context engineering matters - the right CLAUDE.md, the right skills, the right guardrails - but it's dramatically easier and more iterable than Mode 2. And the tool gets better over time as experts use it, because their knowledge and insight gets baked in through natural interaction.

Funding Reports: Mode 3 Proven¶

This isn't theoretical. Funding reports are the proof point, already shipped.

In two days, Hal built a Claude Code skill wrapping a Python generation script and SQL queries. On the surface it looks like code automation (Mode 1). What it actually is: a context-engineered tool that understands the job to be done, with guardrails that prevent misuse or breaking edits, designed to be handed to the capital markets team so they own it.

The result, already live: - Every funder receives a materially higher quality report with more information - Custom definitions where funders need them, standard definitions where they don't - High quality additional slides added by prompting Claude rather than designing them manually - Flexibility to respond to funder-specific requests without manual rework

The capital markets team hasn't been automated. Their expertise is absolutely necessary - they understand what each funder cares about, how to communicate complex information honestly but favourably, what nuance matters. All of that judgement still sits with them. They've been massively unlocked to actually deliver on it, rather than fighting the mechanical pipeline of LightDash screenshots and PowerPoint assembly.

The base case - the most likely outcome - is that funding reports are much better in three to six months, requiring almost no further effort from the product team. Two forces: models may improve (though it doesn't need them to), and the capital markets team will bake their evolving knowledge into the tool through natural use. Every funder conversation, every new priority, every bit of feedback - it all flows into the tool because the team is working through it, not around it.

Two days. One person. Already shipped.

Why Mode 3 for Underwriting¶

Underwriting is different from funding reports. Capital markets' expertise is editorial - what to show, how to frame it. Underwriting is investigative. The expert's value is in assembling a picture from scattered data and making a judgement call.

But the mindset is identical. Underwriters today can underwrite cases clearly - their judgement isn't the bottleneck. They're limited to roughly two and a half cases per day by their tooling. They navigate six separate systems, manually joining information in their heads, context-switching between screens to build a complete picture. Once that picture is assembled, the next step is usually obvious: does this case make sense? Should these customers get a mortgage from us to buy this property?

The complexity isn't in contemplating each decision. It's in drawing a complete and coherent picture from information spread across six systems, multiple documents, and dozens of pages. If you could conceptually "SQL join" the data an underwriter needs, you'd dramatically speed up comprehension and therefore decision-making.

The LightDash analogy is instructive. A data analyst today can build dashboards with custom SQL and customised charts in the UI. It works, but it's slow, it's painful, and the barrier to entry is high because you need to understand both the UI and the data structure. Give that analyst the LightDash CLI and Claude Code, and they create dashboards in seconds - writing YAML, uploading, iterating. That changes not just speed but mindset. When the barrier drops, you promote ideas and creativity. You free up mental energy that was being spent on mechanical translation. This shift is real - Hal has felt it personally.

Apply the same thinking to underwriting: give the underwriter a conversational interface that pulls together what they need, and they focus on what they're actually good at - lending judgement.

How It Works¶

The underwriter opens Claude Code and works through their X-1 task list conversationally:

Check property and valuation: "Show me the valuation report for this case" - Claude pulls the document and summarises key fields
Verify income: "What's the income breakdown for applicant one?" - Claude pulls from AA, retrieves supporting documents (payslips, accountant letters, bank statements), and presents them together
Check SIRA and compliance: "Run the SIRA check" / "Show me the ComplyAdvantage results" - Claude pulls from external systems via API
Assess deposit sources: "Where's the deposit coming from and does it match the bank statements?" - Claude cross-references declared sources against bank statement evidence
Build follow-up list: As the underwriter investigates, they build a running list of things needed from the broker: "Add to broker follow-ups: need three months of business bank statements for applicant two." Claude maintains the list throughout the investigation and drafts the broker communication when ready
Make and action decisions: "Income is fine, accountant letter supports it" or "Need to raise a funder exception on the LTV." Claude logs the decision, drafts the broker email or raises the exception with documented rationale, updates the case status
Rerun checks: After each decision, Claude reruns the deterministic rules (affordability, funder eligibility, policy compliance) - all existing code, untouched

Critical constraints: Claude interacts with all existing deterministic code and doesn't replace any of it. Lending rules, affordability calculations, and funder criteria stay where they are. And Claude parses documents directly - no need to extract document data into systems first. Feed bank statements, payslips, and accountant letters straight to Claude.

What Nebula Has Proven (and What It Teaches Us)¶

Nebula has been working on automating income verification using Gemini agents for months. About 40% accuracy on suggesting follow-up questions. That's Mode 2 applied to one of the hardest slices of underwriting. Progress has been slow - not because the team isn't capable, but because Mode 2 is genuinely, extraordinarily hard.

The underwriting CLI doesn't need the AI to understand underwriting at all. It needs to fetch data, present it clearly, help maintain the investigation workflow, and record what the human decides. That's a fundamentally different (and much more tractable) problem.

This isn't an argument to stop Mode 2 work. There may be sub-tasks where full automation is the right answer. But Mode 3 should run in parallel - faster to build, lower risk, delivering value to underwriters immediately while Mode 2 continues on a longer timeline.

The framing for the organisation: tools like Claude Code have made Mode 3 possible in a way it wasn't before. The menu of options has expanded. Gen H should use all of them.

The Honest Challenges¶

1. Will underwriters adopt the interaction model?¶

The biggest risk. Underwriters are a different population from Hal. The counter-argument: the conversation is natural language, not technical commands. "Show me the applicant's credit commitments" is how you'd ask a colleague. And good context engineering (skills, commands, CLAUDE.md) should lower the barrier further. But there's a genuine learning curve, comparable to any new system introduction (X-1, AA). The first session needs to produce something the underwriter can see value in immediately.

2. Packaging is the real bottleneck (Andre's challenge)¶

Andre estimates this makes underwriting 40% better, not 10x - because the real constraint is broker packaging (incomplete or poorly packaged cases). Underwriting speed doesn't fix bad inputs. Counter-argument: faster underwriting means faster feedback. The case arrives, gets reviewed in minutes, and the broker hears back immediately about what's missing. It doesn't eliminate the wait for broker responses, but it eliminates the queue on Gen H's side. The truth is probably somewhere in between.

3. Robustness and guardrails¶

Claude pulling wrong data, misinterpreting a document, or actioning something incorrectly. Mitigations: the underwriter reviews everything before action (this is Mode 3 by design - the human stays in the loop), deterministic systems act as guardrails, and a CLAUDE.md configuration defines rules and constraints.

Validation Plan¶

Before Hackathon: PII Approval and Prep¶

ExCo paper on 17 Feb to approve starting supplier due diligence for Anthropic as PII consumer
Hal to build internal support while Andre is on holiday (Graham already onboard)
Hal to attempt underwriting a completed case through Claude Code himself - test the interaction model by manually feeding Claude the case data and working through the X-1 task list

Hackathon: Build the First Integration¶

Hal and Andre build the dedicated API and connect Claude to at least one system (likely AA for case data). Test with a real case. Demonstrate to an underwriter.

Post-Hackathon: Pilot with Barrie¶

Put the tool in Barrie's hands on real cases (in parallel with current workflow, not replacing it). Measure time per case, track what works and what doesn't, iterate.

The Strategic Bet¶

If the underwriting pilot works, Gen H has the evidence base for a company-wide Mode 3 strategy.

The pitch to the organisation: look at your area. Find the place where your team spends time on mechanical translation between what they know and what they need to produce. That's where Mode 3 applies. The product team builds the tooling and guardrails. The domain experts bring the knowledge. Neither can do it alone.

Concrete candidates already identified: - Sales: custom data-driven reports for broker meetings (Sales Intelligence CLI). Nearly identical to funding reports - bespoke output, data in, report out. Infrastructure already exists. - Servicing: to be explored - Completions: to be explored - Corporate accounts: showing Gen H data to brokerages and networks in compelling, customised ways

This requires two things from the wider organisation: willingness to learn tools like Claude Code (there's a learning curve, but it's comparable to any new system), and willingness to contribute ideas about where Mode 3 applies. The product team shouldn't be the only ones having these ideas. Funding reports prove you can create serious value in a couple of days. Other people should be getting excited about where to apply this next.

External Evidence¶

GitHub Copilot: developers complete tasks 55% faster with AI augmentation
Harvey AI (legal): saves lawyers ~10 hours/week on document review - the closest analogy to what the underwriting CLI proposes, applied to a different regulated profession
Ocrolus (mortgage-specific): American Federal Mortgage cut underwriting time per file by 29% using AI-augmented document processing
Fannie Mae: 73% of lenders adopting AI for operational efficiency focus on augmenting existing resources rather than replacing them
OpenClaw parallel (Peter Steinberger, referenced by Andre): "AI is a lever, not a substitute." Autonomous agent systems without human guidance become "slop generators." The value is human taste, judgement, and expertise - AI removes the mechanical barriers to applying them. Mode 3 as philosophy.

Connections¶

Sales Intelligence CLI - Mode 3 applied to sales. Nearly identical pattern to funding reports (data in, bespoke report out). Easier, lower-risk pilot.
Automated Report Pipelines - The funding reports pipeline is the first shipped Mode 3 tool at Gen H. Proves the pattern and demonstrates the handover challenge that applies here too.
Voice-First Affordability Calculator - Mode 3 from a different angle. Voice calc removes form-field friction for brokers; underwriting CLI removes system-navigation friction for underwriters. Same thesis: the bottleneck is the interface, not the person.

EA Reasoning (bootstrap mode)

- Major restructure after extended /idea session on 15 Feb. The three modes framework is the significant new development - it provides strategic context for why Mode 3 (and the underwriting CLI specifically) is the right approach. Previously the doc argued for augmentation over automation; now it has a named framework and a shipped proof point (funding reports). - Funding reports section is new and critical. It's the concrete evidence that Mode 3 works, built in two days, already live. This transforms the underwriting CLI from "here's what we plan to do" to "here's what we've already proven works, applied to a harder domain." - Mode 2 section (Nebula/income verification) reframed diplomatically. Not "we were wrong" but "the menu has expanded." This matters for internal communication. - Added new workflow details from the session: follow-up list building during investigation, exception actioning. These came from Hal thinking through what underwriters actually need beyond just data retrieval. - The LightDash CLI analogy is new and powerful - it's Hal's personal experience of Mode 3 applied to data analysis, used to explain what Mode 3 would feel like for underwriters. - The strategic bet section is significantly expanded. Previously noted the emerging cluster; now it's a concrete organisational pitch with a call to action (look at your area, find the mechanical translation, come talk to us). - Status stays at developing. The three modes framework adds strategic depth but the validation plan hasn't changed. - The three modes framework may eventually want its own idea file as a standalone strategic communication piece. For now it's anchored here because Hal asked to develop it in this context.