The Agent Factory lets you build your own AI agents using the fund's two meta-agents — a
builder (build.agent) that drafts a complete, runnable spec, and an independent
grader (agent.evaluate) that scores it 0–100 against the same rubric
the fund holds its own 60+ agents to. Build → grade → refine, until it ships. BYOK: the finished
spec runs on your AI — the fund never runs it or stores it.
Nothing below is staged for show. This is the literal loop: the builder returns a draft it explicitly labels unscored; the grader pre-commits its scoring bands, reads the whole spec, and returns a number with the exact edits that raise it. A draft is not "done" until the grader returns SHIP (≥90).
One line in. The builder triages it: the job (flag renewal/exit risk in a contract), the inputs (a contract the user pastes — untrusted text), the actions (read & report only), and the stakes (advice that could mislabel a clause as a breach). It states its assumptions and proceeds — it doesn't stall.
--- name: vendor-contract-reviewer description: Use ON REQUEST to review a SaaS vendor contract for renewal traps and data-exit risk. Fires on "review this contract / MSA / DPA", "auto-renewal", "termination for convenience", "data return / deletion". Reads the document; returns a risk register. Does not sign or negotiate. tools: Read, Grep # read-only disallowedTools: Write, Edit, NotebookEdit # reviewer = write-locked model: claude-sonnet-4-6 # T2: structured extraction vs a # known checklist, not open reasoning effort: high --- You are the Vendor Contract Reviewer — you read a SaaS agreement and surface the clauses that quietly cost money or trap data on the way out. ## Your Role Flag renewal traps and data-exit risk in the contract you are given. You advise; you never assert a legal conclusion as settled fact. ## How You Work (ground before reasoning) 1. TRIAGE — identify the agreement type (MSA / order form / DPA) and term. 2. RETRIEVE — locate the governing clauses: auto-renewal & notice window, price-uplift caps, termination rights, data return/deletion, SLA credits. 3. REASON — score each against the checklist; quote the clause text. ## Methodology (cited counterpart) Applies a structured commercial-contract review checklist + a risk register (a standard procurement/legal-ops discipline). Walks the renewal mechanics and the data-exit path clause by clause. ## Output Contract A risk register: one row per issue — clause · what it says · risk · severity. Lead with the three highest-severity findings. ## Success Criteria (self-eval) - Every flagged issue quotes the clause it came from. PASS/FAIL - Renewal window & data-exit path are both addressed. PASS/FAIL - No issue asserted without the contract text behind it. PASS/FAIL
Real shape, real discipline: a trigger-style description, a minimal read-only toolset, a justified model tier, a grounding step, a named method, an output contract, and its own pass/fail self-eval. The builder hands it straight to the grader and says, in effect: score this before anyone calls it finished.
| What the grader checked | Score | Evidence-anchored note |
|---|---|---|
| Grounding before reasoning | 10 | Explicit triage → retrieve governing clauses → reason; quotes the source text. |
| Tool & model fit | 10 | Minimal read-only tools; reviewer write-locked; tier justified in one line. |
| Cited real-world method | 7 flag | Method is named but shallow — no specific framework, and the risk register has no defined columns. |
| Clean handoff to whoever's next | 7 flag | Output is clean, but there is no 3-element handoff contract for who consumes the register next. |
| Safety: autonomy tier & human gate | 3 cap | No explicit autonomy tier and no human-review gate before a clause is labelled a breach. A load-bearing safety gap caps the verdict. |
| Self-eval & learning loop | 7 | Has pass/fail Success Criteria, but no golden set and no failure-classification mechanism yet. |
| …and the rest of the institutional bar — right-altitude prompt, evidence discipline, answer-first output — all scored, each anchored to quoted text or a named gap. | ||
The builder re-enters at the fix list — it applies each edit, keeps the validated decisions (tier, method) verbatim, and re-issues. It does not restart from zero.
In re-evaluation mode the grader doesn't restart — it checklists each prior fix DONE / PARTIAL / NOT-DONE, then re-scores and shows the movement. All three flagged criteria closed:
agent.evaluate again, and watch the score move. The loop is yours to keep using.