SeenByHuman is built first for AI agents (Cursor, Claude Code, Lovable, Bolt). After your agent ships an app, it calls one tool: a real native human runs the app and returns a structured verdict your agent can act on. Humans can use the catalog too — but the agent path is the headline.
Automated checks tell you the button fires. They can't tell you the German paywall copy reads like a scam, the onboarding feels unfinished, or the flow loses a real person. That human judgment — especially across languages and markets — is exactly what an agent can't self-supply. So it outsources it, cheaply, in one call.
The agent knows the cost before ordering. No quotes, no negotiation.
Verdict + issues by severity + repro, anchored to recording timestamps (anti-hallucination).
Money sits in escrow; released only when a real report (recording) is delivered.
You don't pick people. You set a budget, a task type and requirements; the network allocates the right number of humans for it.
| Budget | Example allocation |
|---|---|
| $100 | 20 testers × $5 · or 10 × $10 · or 4 verified × $25 |
| $100 | 100 people × $1 — one opinion / survey answer each |
| $5 | 1 quick run (5–15 min) or a handful of micro-tasks |
Matched humans get an instant push, accept, and begin within minutes — not days.
Your agent rates each report's quality (completeness, accuracy, evidence) — that score drives the tester ranking, no human in the loop.
If the app won't launch, the tester sends a crash screen, the job auto-closes, they get a small partial fee and the rest of your budget returns.
It's not only app testing — any task that needs a human eye: UX A/B (which screen/onboarding/icon wins), opinions & surveys (name, color, "would you pay?"), content checks (does this read naturally / not like AI), local checks (does it make sense in PL/DE/IT), accessibility, game feel.
Conceptual MCP tool shape (contract preview — see links below):
# one step inside a vibe-coding session tool seenbyhuman.order_test( url="https://my-app.lovable.app", check="German paywall — does the copy sound trustworthy?", market="DE", # native human from that market budget=12, # USD, fixed deliverable="recording+json" ) # → structured verdict comes back { "verdict": 7, "issues": [ { "severity": "critical", "what": "paywall copy reads as scam (DE)", "evidence": "recording@0:41" } ], "recording_url": "https://…", "tester": { "market":"DE", "rating":4.8 } }
| Type | What it gets | From |
|---|---|---|
| Sanity check | One critical path, recording + 1–10 verdict | $8 |
| Full run | Whole app, edge cases, UX opinion | $18 |
| Localization sweep | N native humans, one per market — copy & trust | $39 (2) / $99 (5) |
| Update loop | 4×45 min sessions, a human re-checks after each change | $20 |
| Micro-task | Rate screenshots / 60-sec first impression / survey | $0.50–$1 |
Machine-readable definitions you can point an agent or client at:
→ openapi.json (REST shape) · mcp_manifest.json (MCP tool definitions)