Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

0

The Turing Test, proposed by Alan Turing in his seminal 1950 paper Computing Machinery and Intelligence, has long been a benchmark for evaluating artificial intelligence. The test’s premise is straightforward: if a machine can converse with a human in such a way that the human cannot distinguish it from another human, the machine is deemed to possess intelligence. However, as large language models (LLMs) like GPT-4 continue to advance, questions arise about the validity of the Turing Test as a measure of AI intelligence.

The Challenge of Measuring AI Intelligence

Large language models, such as GPT-4, have shown remarkable progress in mimicking human-like conversation. They can pass certain versions of the Turing Test, including scenarios where they score highly on lawyer qualification exams. Yet, many computer scientists argue that machines are still far from matching human intelligence, and there is no consensus on how to measure it or what exactly to measure.

In a 2023 study by researchers at the University of California, San Diego (UCSD), the latest LLMs were put to the test against the 1960s chatbot Eliza. GPT-4, which achieved high scores on the lawyer exam, performed admirably, with 41% of the judges deeming it indistinguishable from a human. Its predecessor, GPT-3.5, only passed 14% of the games, while Eliza scored 27%. Humans, however, passed in 63% of the games.

Cameron Jones, a cognitive science doctoral student at UCSD responsible for the experiment, noted that the low human score was not surprising. Players expected the models to perform well, leading them to assume that a human-like model was, in fact, human. Jones admitted that it is unclear what score a chatbot must achieve to win the game.

The Limitations of the Turing Test

While the Turing Test can be useful for evaluating customer service chatbots and their ability to interact with humans in a socially intelligent manner, its effectiveness in identifying general intelligence remains questionable. Melanie Mitchell, a professor of complexity at the Santa Fe Institute, believes that the concept of the Turing Test has been overly literalized. She argues that Turing’s imitation game was a way to think about what machine intelligence might be, not a clearly defined test.

The term is used carelessly, Mitchell said. People say large language models pass the Turing Test, but in fact, they don’t pass the test.

Alternative Testing Methods

Given the limitations of the Turing Test, researchers are exploring alternative methods to evaluate machine intelligence. In a paper published in November 2023 in the journal Intelligent Computing, psychologists Philip Johnson-Laird from Princeton University and Marco Ragni from Chemnitz University of Technology in Germany proposed a different approach. They suggest treating models as participants in psychological experiments to see if they can understand their reasoning processes.

For instance, they might ask a model, If Ann is very smart, is she smart, rich, or both? While logic would suggest Ann is smart or rich or both, most humans would reject this inference due to the lack of context indicating she might be wealthy. If the model also rejects the inference, the next step involves asking the machine to explain its reasoning. If the reasons given are similar to those of humans, the researchers then examine the components in the source code that simulate human behavior.

Huma Shah, a computer science professor at Coventry University who has conducted Turing Tests, believes that Johnson-Laird and Ragni’s method may offer some interesting insights but questions the novelty of testing a model’s reasoning capabilities. The Turing Test allows for this kind of logical questioning, she said.

The Debate Continues

The challenge of measuring intelligence lies in the subjective definition of what intelligence is. Is it pattern recognition, creativity, or the ability to create music or comedy? Until there is a consensus on what constitutes intelligence in AI, the quest for a definitive test remains elusive.

Google software engineer and AI expert Francois Chollet believes that the Turing Test is not a special measure for AI intelligence. It’s a useful tool, but it’s not the only measure, he said. As AI continues to evolve, the conversation about how to evaluate its intelligence will likely continue to be a central topic in the field.


read more

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注