AI Lead Qualification & Content Engine for Runa

Q: How is the AI kept accurate and on-brand?

Every analyzer splits extraction from judgment, returns schema-validated JSON, and is benchmarked against a human-scored test set. The content engine enforces brand guardrails as a hard control — generation cannot proceed past unanswered brand or product questions — and keeps a human reviewer at the decision point.

Client:: Runa
Industry:: Finance | Lead Generation | B2B SaaS
Duration:: Ongoing
Team:: Senior Python + AI engineer

Runa is a B2B payments platform powering global payouts, rewards, and incentives. We built two production AI systems for their team: an automated lead-qualification engine that turns an hour of manual research into a five-minute scored verdict, and a guardrailed AI content engine that drafts on-brand marketing copy straight into the team's Notion workspace. Both run on the same backbone — n8n orchestration, Notion as system-of-record, and GPT-4o + Claude for reasoning — and both keep a human firmly in control at the decision point.

Runa AI automation — lead qualification and content engine

THE CHALLENGE

Two different parts of Runa's business were bottlenecked by the same problem: skilled people doing slow, repetitive judgement work that didn't scale.

Manual lead research

Every inbound lead had to be researched by hand before anyone could decide whether it was worth pursuing. For a single domain, a team member would read the company's website for product fit, dig through Terms & Conditions and privacy pages for compliance red flags, check LinkedIn for headcount and leadership, look up funding and traffic, and screen for disqualifiers like crypto exposure or sanctioned jurisdictions.

30–60+ minutes per lead, every time
Inconsistent between reviewers — no shared rubric
Obvious disqualifications cost the same effort as strong fits
Promising leads sat in a queue while research backed up

Brand drift in content

As Runa's product line grew, so did demand for written collateral — product playbooks, competitor battlecards, and customer-facing case studies across multiple products and dozens of commercial categories. Every writer had to internalise Runa's tone, positioning, and canonical product facts and re-apply them by hand, producing brand drift between authors and slow first-draft turnaround.

OUR APPROACH

We don't automate a process until it has earned it. For both systems we started from the real workflow, encoded the judgement a person was already making, and kept a human at the point of decision rather than the blank page. Our guiding principle: "AI that can't be accountable for outcomes is not a feature."

Shared backbone

n8n for orchestration — a thin main workflow coordinating many small, single-purpose sub-workflows
Notion as system-of-record and live prompt store
GPT-4o + Claude Sonnet 4.6 for reasoning, with schema-validated output
Apify for resilient live-web data acquisition
Humans review and curate — the AI removes the blank-page work, not the judgement

SOLUTION 1 — AI LEAD-QUALIFICATION ENGINE

Feed the engine a single input — a company domain, or an inbound email address — and it returns a complete, scored verdict in 5 to 8 minutes, delivered to Slack and logged to Notion. Under the hood an n8n pipeline fans out to eight specialist analyzers running in parallel, each gathering a different class of signal from the live web, then consolidates their findings into a 17-point scoring matrix across four categories.

The eight analyzers

Use-Case Fit & Industry Fit — AI reads site content to score alignment with Runa's ideal customer
T&C & Privacy Policy — extracts and scores compliance-relevant language
LinkedIn intelligence — headcount, hiring, leadership pedigree, geography, tenure
Website traffic — Similarweb volume and engagement, plus WHOIS domain age
Funding signals — round size and recency via search + AI extraction
Disqualification — screens for crypto, sanctioned regions, and prohibited industries

A verdict a salesperson can act on

Results merge into 17 individual scores; a scoring stage computes category averages, an overall fit classification, and a separate risk score, then applies compliance overrides:

High / Medium / Low Fit — product-market signals
Auto DQ — automatic disqualification for compliance or prohibited-industry hits
EDD Required — flagged for enhanced due diligence

SOLUTION 2 — AI CONTENT ENGINE

A guardrailed engine that generates brand-aligned copy on demand and writes it directly into Notion — the team's existing workspace — for asynchronous human review. Its core idea is layered context stacking: before writing a word, it assembles a hierarchy of constraints, each able to refine or override the one above it.

Brand voice & guidelines — tone, vocabulary, positioning, canonical facts
Content-type context — rules per asset (case study, battlecard, playbook)
Product context — per-product messaging and target segments
Guardrail Q&A — unanswered guardrails block generation until resolved
Block-level instructions — guidance for the exact section being written

Template-driven, not hardcoded

The engine discovers what to write by reading the structure of a Notion page — headings, blocks, and question toggles — rather than relying on hardcoded templates. It generates N variations per block (five by default), written back as labelled options for a reviewer to pick and refine. Adding a new content type is a Notion edit, not a code change.

RESULTS & SCALE

Lead research cut from 30–60+ minutes to 5–8 minutes per domain

17 scores across 4 categories, computed the same way every time

8 analyzers in parallel, ~16 AI evaluation calls per domain

Single-analyzer debug runs in 1–3 minutes for fast tuning

Lead-qualification engine

Standardised 17-point matrix replaces reviewer-by-reviewer inconsistency
Slack report + structured Notion log instead of ad-hoc notes
Quality-controlled: a batch harness compares AI verdicts to a human-scored baseline to catch regressions
Zero-deployment prompt tuning — prompts live in Notion, fetched at runtime

Content engine

4 content types modelled (playbook, case study, battlecard, commercial categories)
5 products with dedicated messaging context
26 commercial categories and 25+ case-study templates generated
5 draft variations per block by default, with guardrail enforcement as a hard control

TECHNICAL DETAILS

Orchestration

A modular sub-workflow architecture: a thin main workflow coordinates ~21 reusable sub-workflows, organised by dependency depth so fetchers, analyzers, and reporters compose cleanly. It runs on n8n Cloud with a self-hosted Docker stack (n8n + worker in Redis queue mode, PostgreSQL with pgvector, Traefik for TLS, Gotenberg for document conversion) available as a portable / disaster-recovery environment.

AI & data

GPT-4o for text-evaluation analyzers; Claude Sonnet 4.6 for leadership-quality assessment
Structured-output parsers — every model response is schema-validated JSON
Extractor → Evaluator pattern keeps evidence-gathering and scoring independently tunable and reduces hallucinated scores
Apify actors for Similarweb, WHOIS, LinkedIn, and Google Search, with a browser fallback for Cloudflare-protected sites
Notion as live prompt store and structured log; Slack for report delivery

Technologies used

Orchestration: n8n (queue mode) | Docker | Redis | Traefik | Gotenberg
AI: GPT-4o | Claude Sonnet 4.6 | structured output parsers
Data: PostgreSQL + pgvector | Apify | Notion | Slack
Tooling: Python | Claude Code + MCP servers

TESTIMONIAL

Andrii was professional, efficient, and an excellent communicator from the start. He helped me think through scenarios I hadn't considered, and his deep developer expertise pushed the final product well beyond what I expected.

— Verified Upwork review, AI screening automation

THE OUTCOME

Runa replaced two slow, inconsistent, manual processes with repeatable engines. Leads are qualified in minutes with the evidence behind every score, disqualifications are caught automatically, and on-brand draft copy lands in the team's workspace ready for review — while the scoring logic and brand guardrails stay tunable without touching code.

For Runa: faster sales cycles, consistent compliance screening, and content output that scales without scaling headcount. For AnvilEight: another production AI system doing real, accountable work every day.

FREQUENTLY ASKED QUESTIONS

How fast is the automated lead-qualification engine?

It returns a complete, evidence-backed scored verdict for a company domain in 5 to 8 minutes, versus the 30 to 60+ minutes a person used to spend on the same lead.

Which AI models and tools power it?

An n8n orchestration layer fans out to eight specialist analyzers. GPT-4o handles the text-evaluation analyzers and Claude Sonnet 4.6 scores leadership quality; Apify gathers live web data (Similarweb, WHOIS, LinkedIn, Google Search) and Notion stores both the live prompts and the structured result log, with the final report posted to Slack.

How is the AI kept accurate and on-brand?

Every analyzer splits extraction from judgement, returns schema-validated JSON, and is benchmarked against a human-scored test set. The content engine enforces brand guardrails as a hard control — generation can't proceed past unanswered brand or product questions — and keeps a human reviewer at the decision point.

Can the scoring logic change without a code release?

Yes. All AI prompts live in Notion and are fetched live at runtime, so analysts tune scoring logic by editing a Notion page — no deployment required.

Contact

Have a workflow worth automating?

We build accountable AI automation on senior Python engineering — and we'll tell you when not to. See our AI automation services, or get in touch.