Ethereum co-founder Vitalik Buterin reports that the generative artificial intelligence (AI) model GPT-4 from OpenAI has passed the Turing test.
One illusive standard used by AI systems to gauge how human-like a conversational model is is the Turing test. The name was given to the test by renowned mathematician Alan Turing, who suggested it in 1950.
In his day, Turing believed that an AI system that could produce text that deceives people into believing they were having a discussion with another human would prove it could think.
The person mostly recognized for creating the second most popular cryptocurrency in the world has read recent preprint research from the University of California, San Diego, suggesting that a production model has at last passed the Turing test, over 75 years later.
The University of California San Diego researchers just released a preprint paper titled “People cannot distinguish GPT-4 from a human in a Turing test.” In order to see if they could distinguish which was which, about 500 human test subjects interacted in a blind test with AI models and humans.
Related: OpenAI Unveils ChatGPT’s Human-like Upgrade: GPT-4o Model
As to the study, humans incorrectly concluded that GPT-4 was a “human” 56% of the time. In other words, most of the time, a machine misled people into believing it was one of them.
Buterin claims that an AI system that can trick more than 50% of the people it comes into contact with meets the Turing test.
“Ok, not quite, because humans get guessed as humans 66% of the time vs. 54% for bots, but a 12% difference is tiny; in any real-world setting, that basically counts as passing,” Buterin clarified.
Responding to criticism on his initial cast, he also said later that the Turing test is “by far the single most famous socially accepted milestone for ‘AI is serious shit now’.” Reminding ourselves that the milestone has now been reached is, therefore, beneficial.
The Turing Test and GPT-4’s Performance
The Turing Test is a deceptively simple method of determining whether a machine can demonstrate human intelligence. If a machine can engage in a conversation with a human without being detected as a machine, it has demonstrated human intelligence.
The Turing Test was proposed in a paper published in 1950 by mathematician and computing pioneer Alan Turing. It has become a fundamental motivator in the theory and development of artificial intelligence (AI).
Though it is not without its critics, the Turing Test is still used to gauge how well artificial intelligence initiatives are going. A revised Turing Test features two individuals being questioned and conversed with by more than one human judge. If, after five minutes of communication, more than thirty percent of the judges believe the computer to be a human, the project is deemed successful.
Google Duplex made its task-performer capabilities over the phone known in 2018. In several examples, Duplex contacted a restaurant and made a hair appointment without the human on the other end of the line realizing they were speaking with a machine. Critics counter that the interaction defies the real Turing Test and that a computer has not yet been able to pass it.
When GPT-4 was released in March this year, its developer, OpenAI, tested GPT-4. The purpose of the test was to test GPT-4’s skills, such as reading comprehension, math, and coding. And GPT-4 gave the desired result in many of these.
GPT-4 achieved a surprising success in the lawyer exam, which is valid in many states in the USA. GPT-4 finished in the top 10 percent of those who took the exam. GPT-4 also solved exams for US high school students, exams measuring the clinical skills of US doctors, and a selection exam for graduate studies.