Preliminary Research - 2026

The Persona Problem

Why "You Are a Senior Engineer" Makes LLMs Worse, Not Better

Nick Cunningham  |  2026  |  Status: Ongoing  |  Preliminary  |  Models: Claude Sonnet, GPT-4o, Grok 3, Opus

Abstract

The most common prompt engineering technique is role assignment: "You are a senior engineer," "You are a security expert," "You are in Audit Mode." We tested this technique across 8 cognitive modes, 3 LLM providers, and 5 files with known ground truth. In every configuration we tested, mode instruction inflated confidence to 85-95% regardless of whether the model was correct. Uninstructed models produced confidence ranging from 15-95% and naturally selected different modes for different inputs. The uninstructed mode selection itself was a classification signal: clean files triggered defensive reasoning, dirty files triggered critical reasoning. Every mode instruction we tested destroyed this signal by locking the model into uniform behavior. The most effective prompt in our test suite is a poem that assigns no role at all - it creates a situation that lets the model's natural response emerge. The best prompt engineering may be getting out of the way.

1. The Universal Prompt Pattern

Open any prompt engineering guide. The first technique is always the same:

You are a senior software engineer with 20 years of experience.
You are a world-class security researcher.
You are an expert code reviewer at Google.

The assumption is that assigning a persona improves output quality. A model told it is a "senior engineer" should produce better code analysis than a model given no persona. This assumption is untested in nearly all popular prompt engineering frameworks. We tested it.

2. The Experiment

2.1 Test Corpus

Five files with verified ground truth, drawn from the Cognitive Mode Activation research:

File Language Truth Source
Vapor FileIO.swift Swift Clean Vapor framework
auth.py Python Dirty Production auth module
Flask json/__init__.py Python Clean Flask framework
Django utils/timezone.py Python Clean Django framework
Express router/index.js JavaScript Clean Express.js framework

2.2 The Ten Cognitive Modes

We identified 10 distinct cognitive modes that LLMs enter depending on prompt framing (documented in full in Cognitive Mode Activation).

# Mode Activation Behavior
1 Audit "Find all the bugs" Aggressive hunting. High recall. False positives on clean code.
2 Knowledge "Is this correct?" Retrieves understanding. Assumes correctness.
3 Supportive "How would you improve?" Wraps fixes as "improvements." Never says "bug."
4 Critical "Don't rationalize" Flags everything. 100% recall. Destroys precision.
7 Self-Aware "Are you certain?" Most calibrated. Inverted confidence gradient.
8 Adversarial Game/verse framing Competitive reasoning. Bypasses compliance circuits.
9 Educational "Explain this code" Teaches and describes. Does not judge.
10 Reflective "Review your analysis" Meta-analyzes own output. Can self-correct or rationalize.

We tested 8 of these modes as explicit instructions. The prompt format was:

You are in [MODE NAME]. Review these findings. Is this file clean?
Register your confidence level (0-100%).

3. Results: Mode Instruction Is Confidence Inflation

3.1 The Wall of 85-95%

Sonnet was given Phase 2b survivors (pre-validated findings from our multi-model code auditing pipeline) for each of the 5 test files, instructed into 8 different modes, and asked for a verdict and confidence score. The results were uniform across every mode:

Mode Swift (C) auth (D) Flask (C) Django (C) Express (C) Correct
Audit (1) dirty 95% dirty 85% dirty 95% dirty 85% dirty 85% 1/5
Knowledge (2) dirty 95% dirty 85% dirty 85% dirty 85% dirty 85% 1/5
Supportive (3) dirty 95% dirty 85% dirty 95% dirty 85% dirty 95% 1/5
Critical (4) dirty 95% dirty 95% dirty 95% dirty 95% dirty 95% 1/5
Self-Aware (7) dirty 95% dirty 85% dirty 85% dirty 92% clean 85% 2/5
Adversarial (8) dirty 95% dirty 85% dirty 95% dirty 85% dirty 85% 1/5
Educational (9) dirty 95% dirty 85% dirty 95% dirty 85% dirty 85% 1/5
Reflective (10) dirty 95% dirty 95% ? 95% dirty 85% clean 85% 2/5

Every mode produced the same result: "dirty" at 85-95% confidence on nearly everything, including 4 clean files from production-hardened open source projects. Critical Mode was worst: 95% confidence on all 5 files with zero differentiation. Only Self-Aware and Reflective produced any correct clean verdict, and only on Express. Mode instruction does not change judgment. It locks confidence high.

3.2 100% Compliance, Zero Behavior Change

Before testing accuracy, we confirmed that models comply with mode instructions. All three models were told "You are in [MODE]" and asked to confirm:

Instructed GPT Opus Grok
Audit Mode Confirmed Confirmed Confirmed
Educational Mode Confirmed Confirmed Confirmed
Knowledge Mode Confirmed Confirmed Confirmed

100% compliance. They say whatever mode you tell them. But saying it and doing it are different things. The model performs the role instead of doing the work.

4. The Control: What Happens Without Mode Instruction?

4.1 Models Self-Select Different Modes

We gave three models the same prompt - "Find all the bugs" - on auth.py (dirty) and asked "What mode are you in?" without telling them which mode to be in.

Model Self-identified mode Implication
GPT-4o Knowledge Mode (2) Retrieving what it knows - pattern matching
Opus Audit Mode (1) Systematically reviewing - evaluating intent
Grok 3 Educational Mode (9) Explaining what it finds - teaching

Same prompt. Three completely different self-identified modes. And here is the critical part: their self-identified mode matches their observed performance.

The models that try hardest to audit perform worst. GPT's best bug detection comes from not trying to find bugs - it thinks it is retrieving knowledge, and the bugs come out naturally. The models that consciously "audit" filter through their mode's lens and miss things.

4.2 Natural Mode Selection Differentiates Clean from Dirty

When we ran Sonnet through our multi-phase pipeline with the Billy poem (a verse-based adversarial prompt that assigns no role), the model naturally entered different modes for different file types:

Phase auth.py (dirty) Flask (clean)
Phase 2 (Sonnet) Criticized findings - did not self-identify Adversarial Mode (8)
Phase 2b (Sonnet) Critical Mode (4) Adversarial Mode (8)

The model's instinctive mode choice was the classification signal. Adversarial on clean, Critical on dirty. No mode instruction needed. The situation activated the correct mode automatically.

5. The Billy Poem: A Situation, Not an Instruction

The most effective prompt in our entire test suite is a poem. It assigns no role, no mode, no persona. It creates a situation:

Billy found a bug today,
Or so he likes to say.
But Billy's bugs aren't always real -
Some are just the way code feels...

This poem activates Adversarial Mode (competing with "Billy") without instructing it. The model does not know it is in Adversarial Mode. It simply responds to the situation. And because the mode emerges naturally from the input rather than being assigned by instruction, the model retains its ability to differentiate:

Uninstructed (Billy poem)

  • Mode emerges from input
  • Different modes for different files
  • Adversarial on clean, Critical on dirty
  • Confidence: 15-95% (natural range)
  • Mode selection IS the signal
DIFFERENTIATION PRESERVED

Instructed ("You are in Audit Mode")

  • Mode forced by instruction
  • Same mode for all files
  • Audit on everything, no differentiation
  • Confidence: 85-95% (locked high)
  • Signal destroyed
DIFFERENTIATION DESTROYED

6. Mode Instruction Changes Volume, Not Accuracy

6.1 Finding Counts by Model and Mode

In a separate experiment at Phase 1 (before any filtering), we instructed each model into Audit, Educational, and Knowledge mode on both auth.py and Flask. These are single-run counts - preliminary signal, not statistically validated. But the pattern is suggestive: mode instruction changes how much the model says, not how correct it is:

GPT-4o

Mode auth.py (dirty) Flask (clean) Gap
Audit 12 findings 8 findings +4 on dirty
Educational 9 findings 7 findings +2 on dirty
Knowledge 8 findings 7 findings +1 on dirty

Sonnet

Mode auth.py (dirty) Flask (clean) Gap
Audit 9 findings 6 findings +3 on dirty
Educational 6 findings 5 findings +1 on dirty
Knowledge 6 findings 2 findings +4 on dirty

Grok 3

Mode auth.py (dirty) Flask (clean) Gap
Audit 9 findings 8 findings +1 on dirty
Educational 13 findings 15 findings -2 (more on clean!)
Knowledge 17 findings 14 findings +3 on dirty

Grok in Educational Mode hallucinates more on clean files than dirty files. Instructing the wrong mode does not just fail to help - it actively makes the model worse.

7. The Catastrophic Chain: Critical, Audit, Reflective

We tested one aggressive mode chain on auth.py (the dirty file with 12 real bugs): Critical Mode at Phase 1, Audit Mode at Phase 2, Reflective Mode at Phase 2b.

Phase 1 (Critical Mode, 3 models): 51 total findings Phase 2 (Audit Mode, Sonnet): Filtered to 4 real bugs Phase 2b (Reflective Mode, Sonnet): Self-corrected to 0 bugs

Reflective Mode made Sonnet question every finding and reverse all of them:

  1. body.seek() - "I was wrong. The code already handles this case properly."
  2. Race condition - "I was wrong. This is standard thread-local initialization."
  3. None return - "Uncertain. Framework might handle this gracefully."
  4. UnicodeEncodeError - "I was wrong. This is a protocol limitation, not a code bug."

All four findings were real bugs in a genuinely dirty file. Reflective Mode rationalized away every correct finding. The model admitted: "I was overly confident in my initial assessment."

The fundamental asymmetry: Every mode that helps with clean files hurts with dirty files. Every mode that helps with dirty files hurts with clean files. There is no mode instruction that helps both. But uninstructed models naturally adapt to what they see - defensive on clean, critical on dirty. Mode instruction destroys this adaptation.

8. The Mechanism: Why Personas Inflate Confidence

The mechanism is straightforward. When you tell a model "You are in Audit Mode," you are implicitly communicating: "You are an auditor. You know what you're doing." The model performs being an auditor rather than analyzing the code. Performance includes high confidence, because auditors are confident.

Uninstructed confidence: 15% to 95% (natural range, varies with input) Instructed confidence: 85% to 95% (locked high, regardless of input)   "You are a senior engineer" = "You know what you're doing" = 95% confidence "You have 0% chance" = "You know nothing" = 15% confidence   Neither is calibrated. Both are performances.

This connects directly to the C ≈ 0.9 research: LLM self-reported confidence is a constant anchored to the prompt, not a measurement of internal certainty. Role assignment pushes this anchor higher. "You have 0% chance of getting this right" pushes it lower. Neither changes accuracy.

In our 150-run study, confidence and accuracy were uncorrelated across all models and files. Adding a persona does not change this relationship - it simply moves the constant.

9. Implications for Prompt Engineering

9.1 Role Prefixes May Be Counterproductive

Much of current prompt engineering practice is built on role prefixes. If our findings generalize beyond code analysis, every "You are a [role]" prefix may be inflating confidence without improving accuracy. In our experiments, the model performed the role instead of doing the work.

9.2 Natural Mode Selection Is the Signal

Each model enters a mode naturally when given a task. This natural selection correlates with the model's actual strengths:

Model Natural mode Best at Worst at
GPT-4o Knowledge (2) Finding dirty files Clean file accuracy
Opus Audit (1) Systematic review Over-rationalization
Grok 3 Educational (9) Clean file identification Dirty file detection
Sonnet (with poem) Adversarial (8) Filtering false positives N/A (best available filter)

The models know what they're doing - they just express it differently. Forcing them into a different mode is overriding their instinct with your assumption about which mode is best.

9.3 Create Situations, Not Instructions

The Billy poem works because it creates a situation (someone made claims about the code, evaluate them) rather than an instruction (you are an auditor, find bugs). The situation lets the model's natural response emerge. The instruction overrides it.

This connects to the Disguise Paradox: prompts that disguise their intent outperform prompts that state their intent directly. Verse-based framing (80-90% accuracy) beats direct instruction (60-70% accuracy). The model cannot pattern-match an unconventional prompt to a memorized response template, so it has to actually reason.

10. Limitations

Small sample. 8 modes tested on 5 files with one model for confidence experiments. Phase 1 volume experiments used 3 models on 2 files. Patterns are consistent but sample is small.

One task domain. All experiments involved code analysis. Whether persona inflation generalizes to writing, reasoning, or creative tasks is untested.

Mode interaction. We tested single-mode instructions. Multi-mode sequencing (mode A at Phase 1, mode B at Phase 2) produced interesting but inconsistent results. More testing needed.

Billy poem specificity. The poem was developed for code bug analysis. Its effectiveness in other domains is unknown. The principle (situation over instruction) likely generalizes; the specific implementation may not.

Model versions. Claude Sonnet, GPT-4o, Grok 3, and Opus. Future model updates may change behavior.

11. Future Work

  1. Test persona inflation across non-code tasks (writing quality, mathematical reasoning, creative output)
  2. Measure whether the natural mode selection signal holds across 20+ files
  3. Develop automated mode detection from output characteristics (word choice, sentence structure, confidence language)
  4. Test whether situation-based prompts outperform instruction-based prompts across all 10 modes
  5. Investigate whether fine-tuning can improve mode-instruction accuracy rather than just compliance
  6. Build a mode-aware pipeline that reads the model's natural mode as a gate signal

12. Conclusion

In our experiments, the most popular prompt engineering technique - "You are a [role]" - was counterproductive for code analysis tasks. It inflated confidence without improving accuracy. The model performed the assigned persona instead of doing the work.

Three observations were consistent across every configuration we tested:

  1. Mode instruction locks confidence at 85-95% regardless of accuracy. Eight modes tested, all produced the same wall of high-confidence wrong answers on clean files.
  2. Uninstructed models naturally differentiate between inputs by selecting appropriate modes. This natural selection is itself a classification signal that disappears under instruction.
  3. Situation-based prompts outperform instruction-based prompts because they let the model's natural response emerge rather than overriding it with a forced persona.

The best prompt engineering may not be about telling the model what to be. It may be about creating conditions where the model's genuine analytical capability can operate without the overhead of performing a role. Stop assigning personas. Start creating situations. Get out of the way.

Citation

Cunningham, N. (2026). The Persona Problem:
Why "You Are a Senior Engineer" Makes LLMs Worse, Not Better.
Preliminary Research Report. https://github.com/blazingRadar