...

Water Cooler Small Talk, Ep. 9: What “Thinking” and “Reasoning” Really Mean in AI and LLMs


talk is a special kind of small talk, typically observed in office spaces around a water cooler. There, employees frequently share all kinds of corporate gossip, myths, legends, inaccurate scientific opinions, indiscreet personal anecdotes, or outright lies. Anything goes. In my Water Cooler Small Talk posts, I discuss strange and usually scientifically invalid opinions that I, my friends, or some acquaintance of mine have overheard in the office that have literally left us speechless.

So, here’s the water cooler opinion of today’s post:

I was really disappointed by using ChatGPT the other day for reviewing Q3 results. This is not Artificial Intelligence — this is just a search and summarization tool, but not Artificial Intelligence.

🤷‍♀️

We often talk about AI, imagining some superior kind of intelligence, straight out of a 90s sci-fi movie. It’s easy to drift away and think of it as some cinematic singularity like Terminator’s Skynet or Dune’s dystopian AI. Commonly used illustrations of AI-related topics with robots, androids, and intergalactic portals, ready to transport us to the future, just further mislead us into interpreting AI wrongly.

Some of the top results appearing for ‘AI’ on Unsplash;
from left to right: 1) photo by julien Tromeur on Unsplash, 2) photo by Luke Jones on Unsplash, 3) photo by Xu Haiwei on Unsplash

Nevertheless, for better or for worse, AI systems operate in a fundamentally different way — at least for now. For the time being, there is no omnipresent superintelligence waiting to solve all of humanity’s insolvable problems. That’s why it’s essential to understand what current AI models actually are and what they can (and can’t) do. Only then can we manage our expectations and make the best possible use of this powerful new technology.


🍨 DataCream is a newsletter about what I learn, build, and think about AI and data. If you are interested in these topics subscribe here.


Deductive vs Inductive Thinking

in order to get our heads around what AI at its current state is and is not, and what it can and cannot do, we first need to understand the difference between deductive and inductive thinking.

Psychologist Daniel Kahneman dedicated his life to studying how our minds operate, leading to conclusions and decisions, forming our actions and behaviors — a vast and groundbreaking research that ultimately won him the Economics Nobel Prize. His work is beautifully summarized for the average reader in Thinking Fast and Slow, where he describes two modes of human thought:

  • System 1: fast, intuitive, and automatic, essentially unconscious.
  • System 2: slow, deliberate, and effortful, requiring conscious effort.

From an evolutionary standpoint, we tend to prefer to operate on System 1 because it saves time and energy — kind of like living life on autopilot, not thinking about things much. Nonetheless, System 1’s high effectiveness is many times accompanied by low accuracy, leading to mistakes.


Similarly, inductive reasoning aligns closely with Kahneman’s System 1. it moves from specific observations to general conclusions. This type of thinking is pattern-based and thus, stochastic. In other words, its conclusions always carry a degree of uncertainty, even if we don’t consciously recognize it.

For example:

Pattern: The sun has risen every day in my life.
Conclusion: Therefore, the sun will rise tomorrow.

As you may imagine, this type of thinking is prone to bias and error because it generalizes from limited data. In other words, the sun is most probably going to also rise tomorrow, since it has risen every day in my life, but not necessarily.

To reach this conclusion, we silently also assume that ‘all days will follow the same pattern as those we’ve experienced’, which may or may not be true. In other words, we implicitly assume that the patterns observed in a small sample are going to apply everywhere.

Such silent assumptions made in order to reach a conclusion, are exactly what make inductive reasoning lead to results that are highly plausible, yet never certain. Similarly to fitting a function through a few data points, we may assume what the underlying relationship may be, but we can never be sure, and being wrong is always a possibility. We build a plausible model of what we observe—and simply hope it’s a good one.

image by author

Or put another way, different people operating on different data or on different conditions are going to produce different results when using induction.


On the flip side, deductive reasoning moves from general principles to specific conclusions — that is, essentially Kahneman’s System 2. It’s rule-based, deterministic, and logical, following the structure of “if A, then for sure B”.

For example:

Premise 1: All humans are mortal.
Premise 2: Socrates is human.
Conclusion: Therefore, Socrates is mortal.

This type of thinking is less prone to errors, since every step of the reasoning is deterministic. There are no silent assumptions; since the premises are true, the conclusion must be true.

Back to the function fitting analogy, we can imagine deduction as the reverse process. Calculating a datapoint given the function. Since we know the function, we can for sure calculate the data point, and unlike multiple curves fitting the same data points better or worse, for the data point, there will be one definitive correct answer. Most importantly, deductive reasoning is consistent and robust. We can perform the recalculation at a specific point of the function a million times, and we are always going to get the exact same result.

Image by author

Apparently, even when using deductive reasoning, humans can make mistakes. For instance, we may mess up the calculation of the specific value of the function and get the result wrong. But this is going to be just a random error. On the contrary, the error in inductive reasoning is systemic. The reasoning process itself is prone to error, since we are including these silent assumptions without ever knowing to what extent they hold true.


So, how do LLMs work?

It’s easy, especially for people with a non-tech or computer science background, to imagine today’s AI models as an extraterrestrial, godly intelligence, able to provide wise answers to all of humanity’s questions. Nonetheless, this is not (yet) the case, and today’s AI models, as impressive and advanced as they are, remain limited by the principles they operate on.

Large Language Models (LLMs) don’t “think” or “understand” in the human sense. Instead, they rely on patterns in the data they’ve been trained on, much like Kahneman’s System 1 or inductive reasoning. Simply put, they work by predicting the next most plausible word of a given input.

You can think of an LLM as a very diligent student who memorized vast amounts of text and learned to reproduce patterns that sound correct without necessarily understanding why they’re correct. Most of the times this works because sentences that sound correct have a higher chance of actually being correct. This means that such models can generate human-like text and speech with impressive quality, and essentially sound like a very smart human. Nonetheless, generating human-like text and producing arguments and conclusions that sound correct does not guarantee they really are correct. Even when LLMs generate content that sounds like deductive reasoning, it is not. You can easily figure this out by taking a look at the nonsense AI tools like ChatGPT occasionally produce.

Image by author

It is also important to understand how LLMs get these next most probable words. Naively, we may assume that such models just count the frequencies of words in existing text and then somehow reproduce these frequencies to generate new text. But that’s not how it works. There are about 50,000 commonly used words in English, which results in practically infinite possible combination of words. For instance, even for a short sentence of 10 words the combinations would be 50,000 x 10^10 which is like an astronomically large number. On the flip side, all existing English text in books and the internet are a few hundreds billions of words words (around 10^12). As a result, there isn’t even nearly enough text in existence to cover every possible phrase, and generate text with this approach.

Instead, LLMs use statistical models built from existing text to estimate the probability of words and phrases that may never have appeared before. Like any model of reality, though, this is a simplified approximation, resulting in AI making mistakes or fabricating information.


What about Chain of Thought?

So, what about ‘the model is thinking’, or ‘Chain of Thought (CoT) reasoning‘? If LLMs can’t really think like humans do, what do those fancy terms mean? Is it just a marketing trick? Well, kind of, but not exactly.

Chain of Thought (CoT) is primarily a prompting technique allowing LLMs to answer questions by breaking them down into smaller, step-by-step reasoning sequences. In this way, instead of making one large assumption to answer the user’s question in a single step, with a larger risk of generating an incorrect answer, the model makes multiple generation steps with higher confidence. Essentially, the user ‘guides’ the LLM by breaking the initial question into multiple prompts that the LLM answers one after the other. For example, a very simple form of CoT prompting can be implemented by adding at the end of a prompt something like ‘let’s think it step by step’.

Taking this concept a step further, instead of requiring the user to break down the initial question into smaller questions, models with ‘long-thinking‘ can perform this process by themselves. In particular, such reasoning models can break down the user’s query into a sequence of step-by-step, smaller queries, resulting in better answers. CoT was one of the largest advances in AI, allowing models to effectively manage complex reasoning tasks. OpenAI’s o1 model was the first major example that demonstrated the power of CoT reasoning.

image by author

On my mind

Understanding the underlying principles enabling today’s AI models to work is essential in order to have realistic expectations of what they can and can’t do, and optimize their use. Neural networks and AI models inherently operate on inductive-style reasoning, even if they many times sound like performing deduction. Even techniques like Chain of Thought reasoning, while producing impressive results, still fundamentally operate on induction and can still produce information that sounds correct, but in reality are not.


Loved this post? Let’s be friends! Join me on:

📰Substack 💌 Medium 💼LinkedIn Buy me a coffee!


Source link

#Water #Cooler #Small #Talk #Thinking #Reasoning #LLMs