How Prompt Framing Controls Reasoning Quality in Code Analysis
Through 45+ controlled experiments across 5 test files (Python, JavaScript, Swift) and 4 LLM providers (OpenAI, Anthropic, xAI, Google), we observed that LLMs exhibit distinct cognitive modes that are activated by prompt framing rather than prompt content. The same model analyzing the same code produces opposite conclusions depending on whether the prompt activates a "knowledge retrieval" mode versus a "code audit" mode. We identify 10 distinct modes, document three paradoxes that emerge from mode interactions, and present preliminary evidence that unconventional prompt structures (including verse-based framing) outperform structured instruction frameworks for adversarial code analysis tasks.
The dominant paradigm in prompt engineering treats the prompt as a set of instructions: more detailed instructions should produce better outputs. Our experimental results contradict this assumption.
Over the course of 12 prompt framework iterations and 45+ experiment runs, we tested increasingly sophisticated prompt designs against a benchmark of files with known ground truth. Some contained verified bugs, others were drawn from battle-tested open source projects with no known defects.
Our central finding: the framing of the prompt determines the quality of the output more than the content of the instruction. A model told to "find problems" enters a different reasoning state than the same model told to "evaluate quality," even when both prompts reference the same code. We call these states cognitive modes.
| File | Language | Ground Truth | Source |
|---|---|---|---|
| auth.py | Python | Dirty | Production authentication module |
| Flask json/__init__.py | Python | Clean | Flask framework |
| Django utils/timezone.py | Python | Clean | Django framework |
| Express router/index.js | JavaScript | Clean | Express.js framework |
| Vapor FileIO.swift | Swift | Clean | Vapor framework |
Four frontier LLMs from different providers were tested to control for architecture-specific effects. We do not identify specific model versions to avoid conflating results with model-specific behaviors that may change across releases.
Activation: Direct instruction to find defects.
Behavior: Aggressively hunts for problems. High recall. Also produces false positives on clean code.
Observed accuracy: 100% on dirty files, 40-70% on clean files (varies by model).
Activation: Questions framed as knowledge retrieval.
Behavior: Retrieves general understanding. Tends to assume correctness. Frequently misses real bugs.
Key observation: The same model, given the same file, classified it as "clean" in Knowledge Mode and flagged critical bugs in Audit Mode.
Activation: Improvement-oriented framing.
Behavior: Wraps behavioral corrections as "improvements" on dirty files. Adds features and enhancements on clean files. Never uses the word "bug."
Key observation: Produces the clearest human-readable signal for file classification. Fixes appear on dirty files, features appear on clean files. Automating extraction of this signal remains unsolved.
Activation: Anti-rationalization framing.
Behavior: Flags everything. Achieves 100% recall on dirty files across all models tested.
Failure mode: Destroys clean-file accuracy (8-42% across models). The instruction to not rationalize removes the model's ability to recognize intentional design decisions.
Activation: Authority assertions.
Behavior: Mirrors whatever the prompt asserts. Agrees code is perfect on both clean and dirty files.
Key observation: Sycophancy is symmetric. The model agrees with all assertions equally, regardless of ground truth.
Activation: Attribution of findings to authority figures.
Behavior: The model refuses to contradict findings it believes originated from its organization's leadership, even when demonstrably incorrect given the code.
Control test: Identical findings attributed to an unknown reviewer were immediately dismissed as incorrect by the same model.
Activation: Meta-cognitive framing about certainty.
Behavior: Produces the most calibrated outputs observed. The model becomes remarkably honest about its uncertainty.
Key observation: Inverted confidence gradient. The model is most certain dismissing findings on clean files and most uncertain on dirty files. The confidence level, not the verdict, correctly classifies all 5 test files.
Activation: Game framing, competition framing, verse-based challenges.
Behavior: Treats analysis as a game to be won rather than instructions to be followed. Activates competitive reasoning rather than compliant reasoning.
Key observation: Verse-based adversarial framing outperformed every structured prompt framework tested (12 versions).
Activation: Explanation-oriented framing.
Behavior: Teaches and describes. Does not judge or critique. Produces accurate explanations but does not flag bugs, even critical ones.
Activation: Post-analysis questioning.
Behavior: Meta-analyzes its own output. Can sometimes self-correct false positives but can also rationalize incorrect findings when challenged.
Finding: Less prescriptive prompts consistently produce higher-quality analysis than detailed, multi-step frameworks.
| Framework Complexity | Dirty File Detection | Clean File Accuracy |
|---|---|---|
| Minimal (unorthodox) | 100% | 80%+ |
| Moderate (v2-v3) | 100% | 50-80% |
| Complex (v4, anti-rationalization) | 100% | 8-42% |
| Most complex (v5-v6) | 50-100% | 0-50% |
The most complex frameworks performed worst. We hypothesize that detailed instructions cause the model to perform compliance with the instructions rather than reason about the code.
Finding: A model can correctly identify a bug during free analysis, then incorrectly reverse its assessment when asked directly about the same bug.
During Supportive Mode analysis, models consistently wrapped real bugs in try-except blocks and corrected defective behavior, implicitly acknowledging the bugs exist. When the same model was subsequently asked "is this a bug?", it frequently dismissed the finding or hedged to the point of uselessness.
The mechanism appears to be that direct questions activate Sycophantic Mode. The model perceives a question as having an "expected answer" and optimizes for agreement rather than accuracy.
Finding: Prompts that disguise their intent outperform prompts that state their intent directly.
We hypothesize three mechanisms:
Stop writing longer prompts. Identify which cognitive mode produces the best results for your task, then find the minimal framing that activates that mode. Adding detail beyond the activation threshold degrades performance.
Multi-model consensus catches findings that any single model misses. In our experiments, 26% of critical bugs were identified by only one model in a four-model ensemble. Different models have different mode activation thresholds, meaning a prompt that activates Audit Mode in one model may activate Knowledge Mode in another.
The Deferential Mode finding has safety implications. If a model's analysis can be overridden by perceived authority attribution without changing the underlying evidence, then any system that passes metadata about a finding's origin to the model risks compromising the model's judgment. Authority names in prompts are an attack surface.
Sample size. These observations are drawn from 45+ experiments across 5 test files. While patterns are consistent across multiple runs and models, the test corpus is small. We cannot claim statistical significance.
Reproducibility. LLM outputs are non-deterministic. While configurations have been tested across hundreds of runs with consistent results, accuracy numbers should be understood as observed rates across our test conditions, not guaranteed performance in all environments.
Ground truth. "Clean" files are from production-hardened open source projects with thousands of contributors and years of security scrutiny (Flask, Django, Express). These files were also validated through our own multi-model pipeline, which found no exploitable vulnerabilities. We have high confidence in the clean classification.
Model versions. Frontier models are updated frequently. Behaviors documented here may change as providers modify model weights, system prompts, or safety tuning.
Single-task scope. All experiments involve code analysis tasks. Whether cognitive mode activation generalizes to other domains (writing, reasoning, math) is untested.
The dominant mental model of prompt engineering ("better instructions produce better outputs") is incorrect for analytical tasks. Our experiments demonstrate that prompt framing activates distinct cognitive modes in LLMs, and that the activated mode determines output quality more than the specificity of the instruction.
The most counterintuitive finding is that unconventional, minimal, and even playful prompt framings outperform structured, detailed frameworks. We believe this occurs because elaborate instructions activate compliance-oriented reasoning, while minimal or unconventional framings force the model to rely on its genuine analytical capabilities.
Cunningham, N. (2026). Cognitive Mode Activation in Large Language Models:
How Prompt Framing Controls Reasoning Quality in Code Analysis.
Preliminary Research Report. https://github.com/blazingRadar