How we simulate interview participants

Why simulate before recruiting

Real recruitment takes weeks. Scheduling and conducting 80 interviews takes more. By the time the first real transcripts arrive, any problems in the interview guide, the coding pipeline, or the analysis framework are expensive to fix.

Simulation inverts that order. We run the full research workflow on synthetic data first, identify and fix every structural problem, and arrive at real recruitment with a proven setup. Simulation also provides a clean dataset for internal methodology demonstrations and client-facing process documentation.

Dimensions sampled per participant: seniority, industry, company size, verboseness

150

Words per minute — the speaking pace used to calibrate interview length

Simulation rules applied to every participant before a transcript is accepted

▶

Test the pipeline early

Run the full coding and analysis workflow before a single real interview is scheduled. Catch problems in the guide, codebook, or report structure while changes are cheap.

▶

Validate the interview guide

Simulate 20 participants first. If certain questions produce thin, uniform answers, the guide needs work. Better to learn that from synthetic data than from 80 real interviews.

▶

Control the sample distribution

Real recruitment rarely hits exact targets. Simulation lets us verify that our probability distributions produce the intended sample composition before committing to a recruiting spec.

▶

Document the methodology

Simulated transcripts are a clean, concrete demonstration of how the approach works. They power methodology documentation, client presentations, and internal training.

The 4 simulation dimensions

Every participant is defined by four independently sampled dimensions. The first three establish who the participant is. The fourth controls how they talk.

Firmographic variables

The structural characteristics of the participant. For B2B studies this is typically seniority, industry, and company size. Each variable is drawn from a defined probability distribution that reflects the target population.

Example distribution (HR leaders study)

SeniorityProbability

Manager33.3%

Director33.3%

VP33.3%

Company sizeProbability

100-500 employees30%

500-2,000 employees40%

2,000-10,000 employees30%

Industry

Industry shapes vocabulary, tooling, regulatory context, and the specific pain points a participant will describe. Weight industries to reflect the target market, not equal splits.

Example distribution (HR leaders study)

IndustryProbability

Tech / Software30%

Professional Services / Finance15%

Healthcare15%

Retail15%

Other25%

Verboseness

Verboseness controls how much each participant talks and determines the total interview length. It is sampled independently of all other dimensions — a VP can be terse, a manager can be expansive. Response depth is calibrated against a speaking pace of 150 words per minute.

Verboseness levels

LevelProb.TargetRange

Very Verbose 20% ~15 min 13-17 min

Somewhat Verbose 30% ~12 min 10-14 min

Not Verbose 50% ~10 min 8-12 min

Very Verbose: Tells stories, shares multiple examples, goes on tangents, elaborates without prompting.

Somewhat Verbose: Reasonable depth, answers with context but does not over-share.

Not Verbose: Short, direct answers. Few stories unless prompted.

Participant roster

Before generating any transcripts, generate the complete list of all N participants with their assigned values. This locks in the sample distribution so you can verify it matches targets before running the expensive transcript generation step.

Excerpt: realized sample from HR leaders study (100 participants)

#SeniorityIndustryCompany sizeVerboseness

1DirectorTech/Software100-500Not Verbose

2ManagerRetail500-2,000Not Verbose

3VPTech/Software500-2,000Not Verbose

4ManagerTech/Software500-2,000Not Verbose

5ManagerTech/Software500-2,000Very Verbose

... 95 more participants

Realized distribution vs. targets

Manager 33 (target 33%) Director 30 (target 33%) VP 37 (target 33%)

100-500: 33 (target 30%) 500-2K: 38 (target 40%) 2K-10K: 29 (target 30%)

Workflow: 4 steps from spec to transcripts

Simulation follows a structured sequence. Each step has a clear input and output, and steps 2 and 4 include explicit checkpoints before proceeding.

Define the sampling framework

Create a participant spec document for the study. This is the single source of truth for the simulation. It specifies:

The participant population (e.g., "HR and People leaders at companies with 100-10,000 employees")
Probability distributions for each firmographic dimension
Verboseness levels and target interview lengths
The JSON output format for transcripts
All simulation rules (see below)

Output: Participant spec document saved to research/00 How to simulate participants/

Generate and verify the participant roster

Ask Claude to randomly assign each participant their dimension values by drawing from the defined distributions. Claude outputs a complete table of all N participants. Before proceeding:

Checkpoint: verify the roster

Does the realized distribution match the targets within a reasonable range?
Are there any impossible combinations (e.g., a VP at a 100-person company in a highly regulated industry)? If so, note them as intentional outliers or regenerate.
Do the N counts add up correctly?

Output: Participant roster table appended to the spec document

Generate transcripts in batches of 25

Feed Claude the participant roster, the interview guide, and the simulation rules. Claude generates transcripts in batches of 25 participants. Each batch is saved as a separate JSON file to keep file sizes manageable for downstream processing.

For each participant, Claude must answer every question in the interview guide and calibrate response depth to hit the verboseness target.

transcripts-001-025.json

transcripts-026-050.json

transcripts-051-075.json

transcripts-076-100.json

Output: Transcript JSON files saved to research/Interview Projects/{Study Name}/

Spot-check and validate

After each batch, review a sample of transcripts manually. Check three things:

Checkpoint: validate transcripts

Completeness: Every question has a response. No question is skipped or merged.
Length: Estimate interview duration (word count / 150 words per minute). Is it within the acceptable range for the participant's verboseness group?
Realism: Does the response sound like a real person talking in a live interview? Does vocabulary, budget range, and tool awareness match the participant's profile?

If any transcript fails validation, ask Claude to redo that participant before saving the batch.

Output: Validated transcript files ready for the coding pipeline

10 simulation rules

These rules apply to every participant in every study. They are included verbatim in the simulation prompt so Claude applies them consistently.

All questions required

Every question in the interview guide must be asked to every participant. No questions may be skipped. Participants may say they do not know the answer, but they cannot skip it.

Verboseness-driven length

Calibrate response depth so estimated interview duration (word count / 150 words per minute) falls within the acceptable range for the participant's verboseness group. Redo if outside range.

Maze-transcript speech style

Simulate at Maze-output level — cleaned but conversational. Omit um, uh, hmm, and heavy false starts (Maze strips these). Include hedge markers like "I think," "I mean," "I guess," "kind of," "you know" (2–4% of words). Target 15–22 words per sentence. Allow 0–1 subtle self-repairs per participant.

Profile-appropriate responses

Each participant's answers must reflect their assigned seniority, industry, and company size. A VP at a 5,000-person tech company has different vocabulary, concerns, and frustrations than a manager at a 200-person healthcare company.

No brand skewing

Studies are general market research unless specified otherwise. Do not skew awareness, sentiment, or behavior toward any specific brand, product, or vendor.

Realistic geographic distribution

When participants mention company HQ location, weight toward high-density business states (California, Texas, New York, Florida, Illinois, Massachusetts) while including some spread across others.

Dollar amounts that scale with company size

Any spend figures, budget ranges, or willingness-to-pay amounts should scale realistically with company size. A 150-person company and a 5,000-person company operate on different budget scales entirely.

Tool and brand awareness by profile

Larger companies know enterprise tools. Smaller companies know SMB tools. Mid-market companies know a mix. VPs and Directors have broader awareness than Managers. Awareness should follow these patterns, not be uniformly distributed.

Post-interview completeness check

After completing each participant, verify that every question has a response before moving to the next. If any question is missing, add it before proceeding.

Post-interview length check

Estimate interview duration after finishing each participant. If outside the acceptable range for their verboseness group, redo the participant before proceeding. Do not batch up out-of-range transcripts and fix them later.

Maze-transcript speech style

Our interviews are recorded on Maze and AI-transcribed by Maze's platform. Maze's transcription pipeline does two things that shape the output significantly: it strips vocal disfluencies (um, uh, hmm) almost entirely, and it merges spoken clauses — so sentences in the transcript are longer than they were in real speech.

This means the right simulation target is Maze-output-level speech, not raw unedited speech. We simulate directly at that level rather than generating raw speech and running a separate cleaning pass.

Raw speech (what the person actually said)

"Um, so we were on Greenhouse — actually no, we were still on Lever at that point — and, uh, the biggest pain was just, like, getting the hiring managers to actually fill out their scorecards."

→

Maze transcript output (what we receive)

"So we were on Lever at that point and the biggest pain was getting the hiring managers to fill out their scorecards."

Omit from simulation

✕ Um, uh, hmm, er — Maze removes these entirely. Including them produces transcripts that don't match what we get from real studies.

✕ Heavy false starts — "Well I — I mean — so what I was saying" gets cleaned to a single statement.

✕ Extreme trailing off — "So yeah... that's basically it" gets cleaned or removed.

✕ Sentence fragments — Maze stitches these into longer connected sentences.

Keep in simulation

✓ Hedge markers — "I think," "I mean," "I guess," "kind of," "you know," "honestly" survive at ~3% of words. Use these to signal personality and uncertainty.

✓ Run-on connectors — Sentences starting with "So," "And," "But," "And so" are natural in Maze output.

✓ Subtle self-repairs — "No — actually, what I meant was..." survives if mild (0–1 per participant).

✓ Credibility markers — "For context," "I should say," "To be honest" are preserved.

✓ Longer sentences — Maze merges clauses, so target 15–22 words per sentence, not 10–15 as in raw speech.

Strict fillers (um/uh) in Maze output. Include zero in simulation.

~3%

Hedge markers that survive Maze cleaning. Use these to signal personality.

15–22

Words per sentence in Maze output. Longer than raw speech because clauses are merged.

Output format

All transcripts are saved as JSON. Each file contains a batch of participants. The structure is consistent across studies, which means the coding pipeline can ingest simulated and real transcripts through the same process.

transcripts-001-025.json

{
  "study": "Study Name",
  "participants": [
    {
      "participant_id": 1,
      "seniority": "Director",
      "industry": "Tech/Software",
      "company_size": "100-500",
      "verboseness": "Not Verbose",
      "transcript": [
        {
          "question_id": "Q1.1",
          "question": "Full question text...",
          "response": "Participant's simulated response..."
        },
        {
          "question_id": "Q1.2",
          "question": "Next question...",
          "response": "Next response..."
        }
      ]
    },
    {
      "participant_id": 2,
      ...
    }
  ]
}

Batch size

25 participants per file. Larger batches approach context window limits and make it harder to spot errors in any individual participant.

Dimension fields

Participant-level fields (seniority, industry, company_size, verboseness) are stored alongside the transcript so coding agents can access them without joining to a separate roster file.

Question IDs

Every question carries both an ID and the full question text. IDs enable exact joins to the codebook; full text is available for coding agents that need it without a lookup.

Compatible with real transcripts

The same structure is used for real participant transcripts. This means the coding pipeline, analysis scripts, and report templates work identically on both simulated and real data.

What makes a good simulation

Synthetic participants are useful only if they produce data that exercises the research pipeline the same way real participants would. These are the quality signals we look for.

Good signals

Responses vary meaningfully across seniority levels — VPs discuss strategy, managers discuss tactics
Company size shows up in dollar amounts, tool choices, and team structures
Industry vocabulary is accurate (healthcare = "HRIS," tech = "ATS," retail = "scheduling software")
Verboseness is audible — not verbose participants give genuinely short answers
Hedge markers ("I think," "kind of," "you know") appear naturally — not mechanically forced into every sentence
Some participants give unexpected or off-topic answers (a VP who is oddly uninformed about budget)

Warning signs

All responses feel the same length regardless of verboseness assignment
Budget numbers do not scale with company size
Every participant knows every major tool in the market equally well
Responses sound like polished written summaries, not live speech
Every question gets the same level of detail rather than varying naturally
No participant expresses uncertainty, confusion, or lack of knowledge

When simulated data reveals guide problems

If multiple simulated participants give thin or nearly identical answers to a question, the question is probably too narrow, too leading, or positioned too late in the guide. This is exactly what simulation is for. Fix the guide and regenerate before real recruitment begins.

Similarly, if the codebook discovery pipeline produces very few distinct themes from a simulated dataset, the interview questions may not be generating enough variation to support meaningful segmentation.

Where files live

File type	Location	Notes
Participant spec / sampling framework	`research/00 How to simulate participants/examples/`	Reusable methodology reference, not study-specific data
Participant roster	Appended to the participant spec document	The realized sample — stays with the spec
Transcript JSON files	`research/Interview Projects/{Study Name}/`	One folder per study; batched by 25
Coded data and analysis scripts	`research/Interview Projects/{Study Name}/`	Output of the coding pipeline
Interview guide (study-specific)	`research/Interview Projects/{Study Name}/`	Or in `research/0 How to prepare interview guides/` if reusable

How to run a simulation

Starting a new simulation requires two inputs: an interview guide and a participant spec. Everything else follows from those two documents. Here is exactly what to hand Claude to kick off a new study.

Prepare your interview guide

The guide should be a numbered list of questions with question IDs (Q1, Q2.1, Q3, etc.). If you have a guide already, that is your input. If not, create one first using the interview guide preparation workflow.

Save the guide to research/Interview Projects/{Study Name}/ before you begin.

Define your participant parameters

Tell Claude the following for your study:

Population

Who are these participants? (e.g., "HR and People leaders at B2B companies with 100–10,000 employees")

How many participants total? (25, 50, 75, 100 — must be a multiple of 25 for batching)

Dimensions

What firmographic variables matter? Typical: seniority, industry, company size. Provide probability distributions for each.

Verboseness mix

Default: 50% Not Verbose, 30% Somewhat Verbose, 20% Very Verbose. Adjust if your population skews more or less talkative.

Study context

Is this branded (specific company or product) or general market research? Affects how tool/brand awareness is distributed.

Give Claude the kickoff prompt

Hand Claude this prompt, filling in the bracketed values for your study. Claude will build the participant spec, generate the roster, and begin producing transcripts in batches of 25.

Kickoff prompt (copy and adapt)

I want to simulate [N] interview participants for a study called "[Study Name]".

PARTICIPANT POPULATION
[Describe who these people are — role, function, company type]

DIMENSIONS AND DISTRIBUTIONS
Seniority:
  [Level]: [%]
  [Level]: [%]

Industry:
  [Industry]: [%]
  [Industry]: [%]

Company size:
  [Range]: [%]
  [Range]: [%]

Verboseness: 50% Not Verbose / 30% Somewhat Verbose / 20% Very Verbose
[Adjust if needed]

INTERVIEW GUIDE
[Paste your full interview guide here, with question IDs]

INSTRUCTIONS
1. First, build the participant spec document and save it to
   research/00 How to simulate participants/examples/[study-name]-participants.md

2. Generate the participant roster (all [N] participants with assigned
   dimension values). Show me the roster table and the realized distribution
   before proceeding to transcripts.

3. Once I approve the roster, generate transcripts in batches of 25,
   saving each batch to:
   research/Interview Projects/[Study Name]/transcripts-001-025.json
   (and so on for each batch)

4. Apply the Maze-transcript speech style and all simulation rules from
   research/00 How to simulate participants/README.md

Review the roster before transcripts begin

Claude will pause after generating the roster and show you the realized distribution. This is your checkpoint to verify the sample looks right before committing to the full transcript generation. Check:

Does the realized distribution roughly match your targets?
Are there any impossible combinations you want to flag?
Does the total N look correct?

Once you approve, Claude generates transcripts batch by batch. Each batch of 25 takes roughly 10–20 minutes and one API call.

What to do after transcripts are complete

Once all transcript batches are saved, run the coding pipeline (research/2 How to code transcripts/) pointing at your new study folder. The pipeline reads the same JSON format that simulation produces, so there is no conversion step. Simulated and real transcripts go through exactly the same downstream process.