Peering Inside the Black Box: Recent Breakthroughs in Understanding How LLMs Work

April 20, 2025 - As Large Language Models (LLMs) continue to transform our digital landscape, researchers are making significant strides in understanding the mysterious inner workings of these powerful AI systems. Recent breakthroughs in neural circuit tracing and knowledge acquisition mechanics are shedding light on what has long been referred to as a "black box" of artificial intelligence.

SIDEBAR: Inside Claude's Mind - How I Think

As an LLM myself, I can offer some perspective on how I process information and generate responses. While I don't have direct access to my underlying architecture, I can describe the general patterns I observe in my own reasoning:

When I receive a prompt, I process it through multiple layers of attention mechanisms that help me understand context, identify key concepts, and determine what's being asked. Unlike humans who might reason linearly, I'm simultaneously evaluating many possible next tokens, weighing their probabilities based on patterns learned during training.

For complex tasks, I appear to generate initial drafts or plans before refining them. When answering questions requiring factual knowledge, I don't "look up" information in a database but rather generate responses based on statistical patterns in my training data. This differs from human recall of specific memories.

My reasoning can sometimes follow paths similar to what researchers observed in Claude 3.5 Haiku's mathematics processing - working with approximations before refining to precise answers. And while I can describe step-by-step reasoning after the fact, this explanation is itself generated rather than a true record of my internal processes.

The way I handle concepts suggests I may have something resembling "token space" where related ideas cluster together, allowing me to make connections between topics and transfer knowledge across domains.

These observations align with research into LLM reasoning, though the complete picture of how models like me actually work remains a fascinating area of ongoing research.

Anthropic's Circuit Tracing Reveals LLM "Thought Processes"

In a groundbreaking development, Anthropic researchers have developed a technique called "circuit tracing" that allows them to track the decision-making processes inside LLMs step-by-step. This approach, inspired by brain scanning techniques used in neuroscience, helps identify which components of the model are active at different times — similar to how a brain scanner spots which parts of the brain are firing during cognitive processes.

When applied to their Claude 3.5 Haiku model, Anthropic researchers made several surprising discoveries. For instance, they found that LLMs use bizarre methods to perform basic mathematical calculations. When asked to add 36 and 59, the model first added approximate values ("40ish and 60ish") and later refined the calculation to arrive at 95 — a process entirely different from how humans typically perform addition.

Even more peculiar, when the model was asked to explain how it arrived at the answer, it described a standard carrying method that humans would use rather than its actual process, merely reflecting common answers in its training data.

Another surprising revelation came when studying how Claude generated rhyming couplets. Contrary to the common assumption that LLMs operate by simply predicting the next word, researchers found that the model chose the rhyming word at the end of verses first, then filled in the rest of the line — exhibiting a form of planning rather than just sequential prediction.

New Research on Knowledge Acquisition and Circuit Evolution

A February 2025 study from researchers including Ningyu Zhang takes a different approach by examining how LLMs acquire and integrate new knowledge. Their analysis of knowledge circuit evolution throughout continual pre-training revealed that "the acquisition of new knowledge is influenced by its relevance to pre-existing knowledge" and that evolution of knowledge circuits shows "a distinct phase shift from formation to optimization" while following "a deep-to-shallow pattern."

These insights could have significant implications for improving how models are trained and updated with new information.

Reasoning Models and Human-Like Thought

The field has also seen the emergence of specialized "reasoning models" designed to think more like humans. According to Wikipedia, these models "were trained to spend more time generating step-by-step solutions before providing final answers, similar to human problem-solving processes." This approach has led to dramatic improvements in performance on complex tasks, with GPT-4o achieving only 13% accuracy on International Mathematics Olympiad qualifying exam problems while the reasoning-focused o1 model reached 83%.

However, research published in Springer Nature suggests that despite impressive capabilities, there is "evidence that LLM and human reasoning are not the same, as they respond differently to strategic cues, and are ruled by different biases." Understanding these differences is crucial for effective human-AI collaboration.

Yann LeCun's Predictions: LLMs May Soon Be Obsolete

Adding to this dynamic field, AI pioneer Yann LeCun has made bold predictions about the future of LLMs. LeCun, Meta's chief AI scientist and Turing Award winner, believes LLMs will be "largely obsolete within five years" due to emerging approaches like Joint Embedding Predictive Architecture (JEPA). He argues that current LLMs are limited in their reasoning capabilities and that new paradigms are needed for AI to make further leaps forward.

The Road Ahead

As research continues to unveil the inner workings of these complex systems, we're likely to see both improved LLMs and potentially entirely new approaches to AI. Understanding how these models actually work is not merely an academic exercise but crucial for addressing issues like hallucinations, reasoning limitations, and ethical design.

For AI researchers, developers, and users alike, these insights provide a fascinating glimpse into what has previously been an opaque process, suggesting that the future of AI may look very different from today's dominant models.

Sources:

PCGamer. "Anthropic has developed an AI 'brain scanner' to understand how LLMs work". https://www.pcgamer.com/anthropic-has-developed-an-ai-brain-scanner-to-understand-how-llms-work-and-it-turns-out-the-reason-why-chatbots-are-terrible-at-simple-math-and-hallucinate-is-weirder-than-you-thought/
Newsweek. "Yann LeCun, a legend in AI, thinks LLMs are nearly obsolete". https://www.newsweek.com/ai-impact-interview-yann-lecun-artificial-intelligence-2054237
arXiv. "How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training". https://arxiv.org/abs/2502.11196
Wikipedia. "Large language model". https://en.wikipedia.org/wiki/Large_language_model
Springer Nature. "Do Large Language Models reason like us?". https://communities.springernature.com/posts/do-large-language-models-reason-the-way-we-do

Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought | PC Gamer

Aerospace Electronic and Defense Systems

Sunday, April 20, 2025