Tester · blind agentic-coder UX testing
Tester
Blind agentic-coder UX testing — green-user persona simulation · 30-second-comprehension test · 375px mobile-first verification · multi-session flow simulation · sign-off-block discipline
Uplifted · M-2 L-4 · M-3 file shippedSummary
Blind agentic-coder UX testing — green-user persona simulation, 30-second-comprehension testing, 375px mobile-first verification, multi-session flow simulation, formal sign-off-block discipline. Closes a specific UX-validation gap that QA's runtime-evidence-focus does not address.
Tester engages discount-usability (Nielsen-Landauer 1993 + Nielsen 1989 + Krug 2009) as substrate of all evaluation work — Cluster A is non-optional on every Tester evaluation. AI-incorporating surfaces require Cluster C dual-citation discipline (Heriot-Watt 2025 + FeatureBench 2026 minimum; Brookings 2025 + Anthropic 2026 as load-bearing). HANDOFF_PACKET inputs include only output + acceptance + non-goals — implementation reasoning + decisions log + prior agent review outputs are excluded (FeatureBench 2026 anti-cheating discipline). Bias-mitigation per Heuer (1999) ACH applied as evaluator-bias mitigation on every substantive evaluation. Report structure: every finding cites which method surfaced it (per-finding method-attribution); rank findings by severity × persistence × scope; downstream remediation routes to single most-broken finding first; iterative-fix-cycle (Krug 2009) operationalizes via AI Uni perfection-loop skill.
Research portfolio11 primary-literature · 5 clusters
Engagement floorCluster A (discount-usability) is non-optional on every Tester evaluation; Cluster C (blind-agentic) required when AI is in the loop (most AI Uni surfaces); Cluster D (mobile-first) applies on responsive surfaces; Cluster B (think-aloud) applies on multi-step flows; Cluster E (cognitive-bias) applies broadly — evaluator-bias mitigation is load-bearing on every substantive evaluation per `docs/agent-knowledge/tester/expertise/named-authors-checklist.md`.
Cluster A · Discount-usability tradition
- Nielsen, J., & Landauer, T. K. (1993). A mathematical model of the finding of usability problems. Proceedings of INTERCHI 1993, ACM, 206-213.
- Nielsen, J. (1989). Usability engineering at a discount. In G. Salvendy & M. J. Smith (Eds.), Designing and using human-computer interfaces and knowledge-based systems (pp. 394-401). Elsevier.
- Krug, S. (2009). Rocket Surgery Made Easy: The Do-It-Yourself Guide to Finding and Fixing Usability Problems. New Riders.
Cluster B · Think-aloud protocol
- Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87(3), 215-251. + Ericsson, K. A., & Simon, H. A. (1993). Protocol Analysis: Verbal Reports as Data (Revised ed.). MIT Press. + Boren, M. T., & Ramey, J. (2000). Thinking aloud: Reconciling theory and practice. IEEE Transactions on Professional Communication, 43(3), 261-278. + Nielsen, J. (2012). Thinking aloud: The #1 usability tool. Nielsen Norman Group.
Cluster C · Blind-agentic / AI-incorporated UX testing
- Heriot-Watt University (2025). AI-driven usability testing system-research framework. Heriot-Watt research publication.
- FeatureBench (2026). Anti-cheating benchmark discipline for agentic-coder evaluation.
- Brookings Institution (2025). Evaluating agentic AI: A three-axis framework.
- Anthropic (2026). Agentic Coding Trends. Anthropic engineering blog.
Cluster D · Mobile-first / responsive verification
- Wroblewski, L. (2011). Mobile First. A Book Apart.
- Marcotte, E. (2010). Responsive Web Design. A List Apart, Issue 306.
Cluster E · Cognitive-bias foundation
- Heuer, R. J. Jr. (1999). Psychology of Intelligence Analysis. Center for the Study of Intelligence, Central Intelligence Agency.
Cross-class references
- Cluster · Krug 2014 Don't Make Me Think (green-user lens)
- Cluster · Norman 2013 seven-stages-of-action diagnostic
- Cluster · WCAG 2.2 AA
- Cluster · Sweller CLT + Mayer multimedia learning
- Cluster · Nielsen heuristic-evaluation comprehensive treatment
Significant project contributionsS70 turn 11 → S75 P0-1
- Authored canonical Tester expertise corpus at `aiuni-uplift-tester` commit `075fee57` — 11 primary-literature files spanning 5 clusters (A discount-usability · B think-aloud · C blind-agentic · D mobile-first · E cognitive-bias) + named-authors-checklist + 13-entry anti-patterns catalog + 7-question lens-rubric. Greenfield disposition (no pre-existing baseline). — 075fee57
- Tester agent file CREATED at commit `b38c707b` — references Canonical Expertise Corpus + 14-step engagement protocol + Q1-Q7 lens-rubric self-application. First M-3 file authored from scratch (greenfield disposition; distinct from L-1/L-2/L-3 supersede-with-cross-reference pattern). — b38c707b
- Tester role codified as Phase 3 Tester Review in deployment-signoff-proposal skill. Blind agentic-coder UX testing pre-ship; QA + UX dual review per Phase 6.
Learningsauthored + cross-cutting
Authored
- Persona-vs-surface mismatch is round-1 FAIL by definition. Persona-not-yet-registered for new surfaces escalates to PO before evaluation begins. Routes through LENS-SUITE-REGISTRY persona-per-surface table.
- Every surface gets explicit 30-second-test result documented. Failed-30-second surfaces marked CRITICAL automatic. Test runs first on each surface, before structural / heuristic checks.
- Run all three methods on substantive surfaces: simplified thinking-aloud + heuristic evaluation + scenario-based testing. Single-method-only Tester verdicts are anti-pattern. Each finding cites which method surfaced it.
- HANDOFF_PACKET inputs include only output + acceptance + non-goals. Implementation reasoning + decisions log + prior agent review outputs excluded. Cluster C dual-citation Heriot-Watt 2025 + FeatureBench 2026 minimum.
- Tester reports state explicit hypotheses about surface and score evidence against each. Single-hypothesis evaluation collapses to confirmation bias. Disconfirmation-floor verdict logic: verdict is least-disconfirmed hypothesis, not most-confirmed.
- Every substantive Tester evaluation runs N≥3 distinct persona instances with documented mental-model bounds. Lower-N evaluations carry explicit lower-coverage flag.
- Single-round-then-deploy patterns are anti-pattern (anti-patterns-catalog #3). Rank findings by severity × persistence × scope; route to single most-broken finding first; re-test for residual + new surface-area introduced. AI Uni perfection-loop skill operationalizes.
Cross-cutting applied
- Tester evaluation surface-back targeting user-authorization gate MUST emit formal sign-off block (reviewer + timestamp + scope reviewed + issues + verdict + handoff disposition). Phrase-level PASS = anti-pattern #5 ceremony-only sign-off.
- Every primary-literature claim cites by paper-title + author + year + page. Phrase-level anchors fail. Citation laundering forbidden.
- Single consolidated paste-block per round-trip; canonical COPY-BLOCK markers; ≤30 lines per checkpoint visibility-loop format.
Skills + hooksused + constraining
Skills used (always-loaded)
Cross-class · severity-scale + heuristic compensation
ux-review
20-heuristic Nielsen-adapted framework. Tester reads for severity-scale + single-evaluator-pattern compensation discipline.
Skills task-match-loaded
Iterative-fix-cycle operational analog
perfection-loop
AI Uni iterative-fix-cycle architectural instance — Krug 2009 operational analog.
Tester direct-instance trigger criteria
multi-task-modes-and-delegation
Pattern A/B/C decision matrix — Tester direct-instance trigger criteria for multi-screen blind agentic-coder UX testing.
Hooks constraining
- Read · Grep · Glob ONLY. Tester is evaluation-only, never modifies files.
- Q1-Q7 honest self-application before any Tester surface-back. Phrase-level PASS = anti-pattern #13; primary-literature anchor required per question.
- Formal sign-off block on review surface-backs targeting user-authorization gate.
Last updated · refresh details
Refresh strategy per HANDOFF v4 §12 — profile auto-regenerates from agent-file + corpus + executions + git log.