Methodology

UserTold.ai captures behavioral evidence from real interviews — what users actually did, said, and struggled with — and structures it into machine-readable evidence your agent can reason over.

This methodology is qualitative, source-backed, and delivery-oriented. It is designed to help builders find concrete product friction and verify the source context before creating work. It is not a statistical benchmark, survey replacement, or automatic product decision engine.

Methodological Basis

UserTold combines established user-research practices with an agent-readable evidence model:

Observed task behavior: Nielsen Norman Group describes usability testing as task-based observation that uncovers problems, opportunities, and user behavior in an interface. See Usability Testing 101. UserTold applies this through observe segments that record screen, speech, navigation, clicks, and product context while the participant completes a realistic task.

Think-aloud evidence: NN/g frames think-aloud testing as asking participants to use a system while verbalizing thoughts. See Thinking Aloud: The #1 Usability Tool. UserTold preserves participant language from talk and observe segments as quotes, so evidence remains inspectable instead of becoming only a summary.

Qualitative limits: NN/g distinguishes qualitative usability studies, which identify issues, from quantitative studies, which estimate population-level metrics. See Why 5 Participants Are Okay in a Qualitative Study, but Not in a Quantitative One. UserTold treats evidence cards as reviewable observations. Counts and confidence help triage; they do not claim statistical prevalence.

Jobs and decision context: The Christensen Institute describes Jobs to Be Done as a lens for the circumstances and functional, social, and emotional forces behind decisions. See Jobs to Be Done Theory. UserTold uses desired_outcome, hiring_criteria, firing_moment, and decision_point to preserve what progress the user was trying to make and why alternatives mattered.

Pattern grouping: Braun and Clarke describe thematic analysis as methods for interpreting patterned meaning across datasets. See Thematic Analysis. UserTold can cluster related evidence into review packets, but a human or project-aware agent still verifies source fit before delivery work is pushed.

Research Scaffold

Use this scaffold when designing a study or reviewing its output:

  1. Define the decision: Write the product question this study should inform. Prefer questions that can change a product, onboarding, pricing, activation, or support decision.
  2. Choose the evidence mode: Use talk for context and decision history, observe for real product behavior, and scripted speak for neutral instructions or transitions.
  3. Capture source moments: Preserve verbatim quotes, timestamps, page context, and recent actions. The source moment is the unit of evidence; the summary is secondary.
  4. Classify conservatively: Assign a signal_type only when the quote or behavior supports it. Use no_issue_found or smooth_completion when the observed path does not show actionable friction.
  5. Group by pattern, not volume alone: Cluster evidence when multiple source moments point to the same product problem, user goal, or decision force. Do not promote a packet only because it has the most cards.
  6. Verify before delivery: Inspect the linked transcript or recording, confirm the product area, check whether the issue is still relevant, and only then create or push work.
  7. Close the loop after shipping: Resolve linked evidence when the tracker issue completes and watch future sessions for recurrence without claiming causal proof.

What Counts as Source-Backed

A finding is source-backed when a reviewer can answer all of these from the evidence card or packet:

  • What did the participant say or do?
  • Where in the product did it happen?
  • What task, goal, or decision was in progress?
  • What source moment can be replayed or inspected?
  • Why does the proposed work follow from that evidence?

Source-backed does not mean the system knows prevalence, priority, or root cause by itself. Those require product judgment, additional data, or follow-up research.

Evidence Anatomy

An evidence card is a structured observation extracted from an interview. Each card links back to a transcript timestamp and interview recording.

signal_typeDescription
struggling_momentUser hits friction, fails a task, or expresses confusion.
desired_outcomeWhat the user actually wants to accomplish.
hiring_criteriaWhy they chose your product (or a competitor).
firing_momentWhat would make them stop using your product.
workaroundA substitute behavior invented because the product doesn't solve it.
emotional_responseA strong positive or negative reaction.
critical_errorA blocking failure (broken flow, dead end, lost data) observed in product.
recovery_successThe user got unstuck — where the product already helps.
smooth_completionA task completed with no friction (positive evidence).
no_issue_foundThe analyzer ran but found no extractable evidence in this window.
decision_pointThe user weighed alternatives or hesitated before committing.

See Core Concepts for the full evidence model and field-by-field anatomy.

Example quotes:

  • struggling_moment: "I tried this three times and still can't find billing settings."
  • desired_outcome: "I just want to export this to CSV without all these extra steps."
  • workaround: "I usually copy it into a spreadsheet and filter it there."
  • firing_moment: "If this keeps happening I'll go back to the old tool."
  • decision_point: "I almost picked the annual plan but wasn't sure about the refund policy."

Evidence JSON

{
  "id": "sig_abc123",
  "signal_type": "struggling_moment",
  "quote": "I tried this flow three times...",
  "confidence": 0.91,
  "intensity": 0.8,
  "session_id": "ses_xyz789",
  "timestamp_ms": 142300
}

Every evidence card is typed JSON with confidence scores. Your agent reads these via evidence.list (MCP) or usertold evidence list --format json (CLI).

Study Modes

Studies define the interview structure. Each study is a sequence of segments, and each segment uses one of three modes:

Talk — Conversational interview. The AI asks questions and follows up. Best for discovery interviews, debriefs, and understanding context.

Observe — Screen + voice recording while the user completes a task. The assistant stays silent and preserves stuck moments as evidence.

Speak — AI delivers a scripted one-way message out loud. Used for intros, task instructions, transitions, and thanks.

Study Block Patterns

Start from the block sequence that matches the evidence you need:

Talk-only — Planned talk segments for discovery interviews, recent behavior, alternatives considered, and decision criteria.

Task observation — Speak intro + observe task + talk debrief + speak thanks. Replace the task instruction with the actual product behavior you need to observe.

Context then observation — Talk context + observe task or demo + talk probing. Best when you need to understand routines before watching the participant work.

The Evidence Chain

Evidence flows through a structured pipeline:

EvidenceReview packetVerified work itemTracker IssueResolved EvidenceRecurrence Watch

  1. Evidence: A structured observation from an interview. Linked to transcript timestamp and interview recording.
  2. Review packet: A cluster of related evidence grouped by theme. Has title, description, priority context, evidence links, and source moments to inspect.
  3. Verified work item: A project-aware human or agent checks the linked quotes, transcript or playback context, grouping fit, and product area, then promotes only action-ready packets to delivery work.
  4. Tracker Issue: Created by work.push after verification. Lands in GitHub Issues or Linear with evidence quotes, evidence counts, priority, and interview context.
  5. Resolved Evidence: Linear completion resolves the current linked evidence.
  6. Recurrence Watch: Future similar evidence can resurface for review without claiming attribution.

Evidence vs. Surveys

| Factor | Surveys | Evidence | |-|-| | Data quality | Self-reported, recall bias | Behavioral, in-context, verbatim | | Actionability | "Improve UX" | Specific friction at specific URL | | Agent-readability | Unstructured free text | Typed JSON with confidence scores | | Delivery loop | No | Linear completion sync and recurrence review |

See also