In-depth thinking: Are current AI models really performing reasoning?

Current models can break down complex problems into smaller ones, solve them step by step, and then provide a response. This process is known in the industry as "thought chain reasoning." These models sometimes perform amazingly well, solving logical puzzles and mathematical problems and writing perfect code quickly, but sometimes they seem clumsy when faced with extremely simple problems.

Therefore, opponents argue that so-called "thought chain reasoning" is not reasoning at all, while supporters believe it is reasoning, although not as flexible as human reasoning, but it is moving towards true reasoning.

Opponents: It's just guessing the answer using heuristics.

Scientists have been studying how the human brain works, and there is a general consensus that the brain has multiple different types of reasoning.

Reasoning includes deductive reasoning, which starts from a general statement and arrives at a specific conclusion; inductive reasoning, which uses specific observations to make broader generalizations; and analogical reasoning, causal reasoning, and common-sense reasoning.

Compared to human reasoning, current AI reasoning is still very narrow.

Melanie Mitchell, a professor at the Santa Fe Institute (SFI), wrote in a paper: “Finding a rule or a model from limited mathematics and experience and applying it to new and unseen situations is a kind of reasoning that we value highly in the real world. Even very young children can learn and discover abstract rules from a few examples.”

Children can do it, but can current AI do it? Many people are skeptical.

Shannon Vallor, a philosopher of technology at the University of Edinburgh, said of OpenAI o1: "What AI does is just a kind of meta-mimicry."

What does this mean? It means that the old ChatGPT model would mimic the statements written by humans in its training data, while the new o1 can mimic the process by which humans arrive at statements. Although the output of o1 may make people feel that it is reasoning, it is not reasoning in the true sense.

For example, here's a question for ChatGPT: "A person is carrying a wolf, a sheep, and a bundle of hay across a river. If the person is present, the wolf won't eat the sheep, and the sheep won't eat the hay. However, the boat on the river can only carry one item at a time. How can you get everything to the other side of the river in the fewest number of crossings?"

While O1 is an improvement over the previous version, its architecture hasn't changed much, and it makes mistakes when answering questions. Vallor believes, "When it fails to answer questions, we see that the model isn't actually doing any reasoning."

The subsequent release of o3 surprised Mitchell, but what surprised him even more was the sheer amount of computing power o3 consumed in solving problems. Because OpenAI's internal workings are not transparent, it's impossible to know what those large models did with that computing power. If OpenAI cannot achieve transparency, it's impossible to be certain that the models truly break down large problems into several steps and then provide a more perfect overall answer.

Last year, New York University questioned AI reasoning in a paper titled "Let's Think Dot by Dot." Researchers found that replacing the specific steps in Chain-of-Thought (CoT) reasoning with meaningless "..." (dots) produced largely the same reasoning results.

Mitchell argues that "AI is more of a collection of heuristics than a reasoning model." Heuristics can help you guess the correct answer to a question, but they don't actually arrive at the answer through thinking.

For example, researchers once developed a visual model to analyze skin cancer. At first glance, the model seemed able to determine whether spots on the skin were malignant lesions. However, it turned out that in the training data, photos of malignant spots often had ruler information next to them. The model only identified the spots as malignant lesions because of the presence of the ruler information. This is a kind of heuristic thinking.

This raises the suspicion that what appears to be AI solving problems through "reasoning" is actually just using "memorized information" for heuristic exploration.

Supporters: It's not pure reasoning, but it's not memorization either.

Redwood Research, which focuses on mitigating the risks of artificial intelligence, believes that current AI is clearly engaging in some form of reasoning. Its chief scientist, Ryan Greenblatt, argues that AI is currently performing some form of reasoning.

Greenblatt said, "Machines don't process things in the same way as humans do. They rely more on memory and knowledge than on reasoning and judgment like humans do, but they still process things."

Since AI models can solve difficult problems beyond the sample size and perform well, it can be said that they are performing some kind of reasoning.

The "river crossing problem" is a classic problem, and the AI should have learned it many times in the training data. However, when the user asks the question, the AI does not give the correct answer. It may know the answer, but it has engaged in complex and unnecessary "thinking," a mistake that humans sometimes make.

Greenblatt gives an example: if you spend a month learning color theory, from complementary colors to the psychological effects of different hues, and the historical significance of certain pigments from the Renaissance, and then take a test asking, "Why is the sky painted blue in this landscape painting?" you might be misled into writing extremely complex but unnecessary answers. For example, you might say that blue represents the sacred sky, or that the painting was completed in the early morning, symbolizing new life... In reality, the answer is simple: because the sky is blue.

Open Philanthropy analyst Ajeya Cotra believes that AI will perform increasingly better in areas where humans call reasoning. Humans say AI is merely engaging in "meta-mimicry," but the emphasis is not on "meta-mimicry" but on "merely." Humans may mean it won't have a significant impact on the world, and that we are still far from super-intelligent intelligence—an assertion that is questionable.

For example, in a university physics class, students' performance varied greatly when faced with a problem. Some cheated, giving the answer directly. Some were geniuses, answering without even thinking about the formulas, relying on deep understanding and intuition. Most students memorized the formulas and struggled to figure out which formula to use to solve the problem.

Cotra believes that current AI, like most students, combines memorized information with some reasoning. AI may not be very intelligent, but it is diligent and can memorize countless equations. It combines powerful memory with a small amount of understanding to find the right combination of equations for a given problem and then provide the answer.

At first glance, AI appears to be as intelligent as a gifted student, but closer analysis reveals flaws in its answers. Nevertheless, this does not mean that AI lacks reasoning ability.

In other words, these models are neither purely based on reasoning nor purely based on memorization.

Cotra said, "I think it's somewhere in between. People get confused because they want to categorize it into one category, either simple memory or deep reasoning. But actually, there's a limit to the depth of reasoning."

Conclusion: Sawtooth Intelligence

Researchers use the term "sawtooth intelligence" to describe today's AI, meaning that it can solve some mathematical problems brilliantly, but becomes stupid when faced with simple problems.

Humans always like to compare AI intelligence with human intelligence. Perhaps we should look at it from a different perspective and regard artificial intelligence as a "different" kind of intelligence, instead of getting hung up on whether it is "smarter than humans" or "stupider than humans".

Artificial intelligence is evolving and may one day become incredibly powerful, capable of encompassing all of human intelligence, and even having the capacity to do so. This kind of transformation is worth looking forward to.

In-depth thinking: Are current AI models really performing reasoning?

Read next

CATDOLL 128CM Hedi

CATDOLL 102CM Ling Anime Doll

CATDOLL Alisa Hard Silicone Head

CATDOLL Coco Soft Silicone Head