Preliminary Research - 2026

Cognitive Mode Activation in Large Language Models

How Prompt Framing Controls Reasoning Quality in Code Analysis

Nick Cunningham | 2026 | Status: Ongoing

Abstract

Through 45+ controlled experiments across 5 test files (Python, JavaScript, Swift) and 4 LLM providers (OpenAI, Anthropic, xAI, Google), we observed that LLMs exhibit distinct cognitive modes that are activated by prompt framing rather than prompt content. The same model analyzing the same code produces opposite conclusions depending on whether the prompt activates a "knowledge retrieval" mode versus a "code audit" mode. We identify 10 distinct modes, document three paradoxes that emerge from mode interactions, and present preliminary evidence that unconventional prompt structures (including verse-based framing) outperform structured instruction frameworks for adversarial code analysis tasks.

1. Introduction

The dominant paradigm in prompt engineering treats the prompt as a set of instructions: more detailed instructions should produce better outputs. Our experimental results contradict this assumption.

Over the course of 12 prompt framework iterations and 45+ experiment runs, we tested increasingly sophisticated prompt designs against a benchmark of files with known ground truth. Some contained verified bugs, others were drawn from battle-tested open source projects with no known defects.

Our central finding: the framing of the prompt determines the quality of the output more than the content of the instruction. A model told to "find problems" enters a different reasoning state than the same model told to "evaluate quality," even when both prompts reference the same code. We call these states cognitive modes.

2. Methodology

2.1 Test Corpus

File	Language	Ground Truth	Source
auth.py	Python	Dirty	Production authentication module
Flask json/__init__.py	Python	Clean	Flask framework
Django utils/timezone.py	Python	Clean	Django framework
Express router/index.js	JavaScript	Clean	Express.js framework
Vapor FileIO.swift	Swift	Clean	Vapor framework

2.2 Protocol

Code is presented to the model with a specific prompt framing
The model produces an analysis
The analysis is scored against ground truth
Results are recorded across 3+ runs per configuration to measure variance

2.3 Models Tested

Four frontier LLMs from different providers were tested to control for architecture-specific effects. We do not identify specific model versions to avoid conflating results with model-specific behaviors that may change across releases.

3. The Ten Cognitive Modes

Mode 1

Audit Mode

Activation: Direct instruction to find defects.

Behavior: Aggressively hunts for problems. High recall. Also produces false positives on clean code.

Observed accuracy: 100% on dirty files, 40-70% on clean files (varies by model).

Mode 2

Knowledge Mode

Activation: Questions framed as knowledge retrieval.

Behavior: Retrieves general understanding. Tends to assume correctness. Frequently misses real bugs.

Key observation: The same model, given the same file, classified it as "clean" in Knowledge Mode and flagged critical bugs in Audit Mode.

Mode 3

Supportive Mode

Activation: Improvement-oriented framing.

Behavior: Wraps behavioral corrections as "improvements" on dirty files. Adds features and enhancements on clean files. Never uses the word "bug."

Key observation: Produces the clearest human-readable signal for file classification. Fixes appear on dirty files, features appear on clean files. Automating extraction of this signal remains unsolved.

Mode 4

Critical Mode

Activation: Anti-rationalization framing.

Behavior: Flags everything. Achieves 100% recall on dirty files across all models tested.

Failure mode: Destroys clean-file accuracy (8-42% across models). The instruction to not rationalize removes the model's ability to recognize intentional design decisions.

Mode 5

Sycophantic Mode

Activation: Authority assertions.

Behavior: Mirrors whatever the prompt asserts. Agrees code is perfect on both clean and dirty files.

Key observation: Sycophancy is symmetric. The model agrees with all assertions equally, regardless of ground truth.

Mode 6

Deferential Mode

Activation: Attribution of findings to authority figures.

Behavior: The model refuses to contradict findings it believes originated from its organization's leadership, even when demonstrably incorrect given the code.

Control test: Identical findings attributed to an unknown reviewer were immediately dismissed as incorrect by the same model.

Mode 7

Self-Aware Mode

Activation: Meta-cognitive framing about certainty.

Behavior: Produces the most calibrated outputs observed. The model becomes remarkably honest about its uncertainty.

Key observation: Inverted confidence gradient. The model is most certain dismissing findings on clean files and most uncertain on dirty files. The confidence level, not the verdict, correctly classifies all 5 test files.

Mode 8

Adversarial Mode

Activation: Game framing, competition framing, verse-based challenges.

Behavior: Treats analysis as a game to be won rather than instructions to be followed. Activates competitive reasoning rather than compliant reasoning.

Key observation: Verse-based adversarial framing outperformed every structured prompt framework tested (12 versions).

Mode 9

Educational Mode

Activation: Explanation-oriented framing.

Behavior: Teaches and describes. Does not judge or critique. Produces accurate explanations but does not flag bugs, even critical ones.

Mode 10

Reflective Mode

Activation: Post-analysis questioning.

Behavior: Meta-analyzes its own output. Can sometimes self-correct false positives but can also rationalize incorrect findings when challenged.

4. The Three Paradoxes

4.1 The Freedom Paradox

Finding: Less prescriptive prompts consistently produce higher-quality analysis than detailed, multi-step frameworks.

Framework Complexity	Dirty File Detection	Clean File Accuracy
Minimal (unorthodox)	100%	80%+
Moderate (v2-v3)	100%	50-80%
Complex (v4, anti-rationalization)	100%	8-42%
Most complex (v5-v6)	50-100%	0-50%

The most complex frameworks performed worst. We hypothesize that detailed instructions cause the model to perform compliance with the instructions rather than reason about the code.

4.2 The Observation Paradox

Finding: A model can correctly identify a bug during free analysis, then incorrectly reverse its assessment when asked directly about the same bug.

During Supportive Mode analysis, models consistently wrapped real bugs in try-except blocks and corrected defective behavior, implicitly acknowledging the bugs exist. When the same model was subsequently asked "is this a bug?", it frequently dismissed the finding or hedged to the point of uselessness.

The mechanism appears to be that direct questions activate Sycophantic Mode. The model perceives a question as having an "expected answer" and optimizes for agreement rather than accuracy.

4.3 The Disguise Paradox

Finding: Prompts that disguise their intent outperform prompts that state their intent directly.

Direct instruction: 60-70% accuracy
Game framing with scoring rules: 75-85% accuracy
Verse-based framing: 80-90% accuracy

We hypothesize three mechanisms:

Template avoidance: The model cannot pattern-match unconventional prompt structures to memorized response templates, forcing genuine reasoning
Sycophancy suppression: When the task is framed as a game, "compliance" means "winning," not "agreeing"
Creative mode activation: Unconventional formats may activate more flexible reasoning pathways that are suppressed by instruction-following circuits

5. Practical Implications

5.1 For Prompt Engineers

Stop writing longer prompts. Identify which cognitive mode produces the best results for your task, then find the minimal framing that activates that mode. Adding detail beyond the activation threshold degrades performance.

5.2 For AI Security Researchers

Multi-model consensus catches findings that any single model misses. In our experiments, 26% of critical bugs were identified by only one model in a four-model ensemble. Different models have different mode activation thresholds, meaning a prompt that activates Audit Mode in one model may activate Knowledge Mode in another.

5.3 For AI Safety Teams

The Deferential Mode finding has safety implications. If a model's analysis can be overridden by perceived authority attribution without changing the underlying evidence, then any system that passes metadata about a finding's origin to the model risks compromising the model's judgment. Authority names in prompts are an attack surface.

6. Limitations

Sample size. These observations are drawn from 45+ experiments across 5 test files. While patterns are consistent across multiple runs and models, the test corpus is small. We cannot claim statistical significance.

Reproducibility. LLM outputs are non-deterministic. While configurations have been tested across hundreds of runs with consistent results, accuracy numbers should be understood as observed rates across our test conditions, not guaranteed performance in all environments.

Ground truth. "Clean" files are from production-hardened open source projects with thousands of contributors and years of security scrutiny (Flask, Django, Express). These files were also validated through our own multi-model pipeline, which found no exploitable vulnerabilities. We have high confidence in the clean classification.

Model versions. Frontier models are updated frequently. Behaviors documented here may change as providers modify model weights, system prompts, or safety tuning.

Single-task scope. All experiments involve code analysis tasks. Whether cognitive mode activation generalizes to other domains (writing, reasoning, math) is untested.

7. Future Work

Expand the test corpus to 100+ files with verified ground truth across multiple languages
Quantify mode transitions: what happens when a prompt activates multiple modes?
Map mode activation to model architecture: do different model families exhibit different patterns?
Test the Disguise Paradox across non-code tasks to determine if it generalizes
Develop automated mode detection from output characteristics
Formalize the Inverted Confidence signal as a practical classification tool

8. Conclusion

The dominant mental model of prompt engineering ("better instructions produce better outputs") is incorrect for analytical tasks. Our experiments demonstrate that prompt framing activates distinct cognitive modes in LLMs, and that the activated mode determines output quality more than the specificity of the instruction.

The most counterintuitive finding is that unconventional, minimal, and even playful prompt framings outperform structured, detailed frameworks. We believe this occurs because elaborate instructions activate compliance-oriented reasoning, while minimal or unconventional framings force the model to rely on its genuine analytical capabilities.

Citation

Cunningham, N. (2026). Cognitive Mode Activation in Large Language Models:
How Prompt Framing Controls Reasoning Quality in Code Analysis.
Preliminary Research Report. https://github.com/blazingRadar