Are LLMs Like ChatGPT Passing The Turing Test?

In November 2022, OpenAI released ChatGPT 3, the impacts of which are still reorganizing our economy and society. Large Language Models (LLMs) are now competing with highly skilled work; they have been used to file lawsuits, diagnose medical issues, and give better answers than Google to search queries. Google has issued a ‘Code Red‘, acknowledging that ChatGPT poses an existential crisis. Students began using ChatGPT to write essays for school, and teachers feared they would have no way to differentiate between human generated and AI generated text. In January 2023, OpenAI quelled concerns by releasing a tool to classify human generated vs AI generated text. Problem solved, right? As of July 2023, that tool is no longer available due to low accuracy. It turns out, there may not be a way to distinguish between AI and human generated text. And if that’s the case, a logical conclusion is that LLMs are passing at least one form of the Turning Test, making them ‘Intelligent’.

How do you define intelligence? Is there a clear way to distinguish between a human and a parrot, that is just repeating phrases it heard? When, if ever, does a set of subroutines become sentient? It is surprisingly difficult to come up with a formal definition. In 1950, Computer Science visionary Alan Turning formulated the Turing Test as one way to answer this question. He proposed that a human in a closed room has conversations with different entities without seeing them, round after round. Sometimes the human is talking to another human, and sometimes an AI chat bot. After each round of conversation, the human submits a guess as to if they were talking to a human or an AI chatbot. When the human can no longer reliably say which they are talking to, the AI chat bot is now indistinguishable from a human in terms of conversational ability. And if we cannot tell the difference between an AI Chat bot and an intelligent human, the chat bot must also be intelligent. Once a Chat Bot is intelligent, is still moral to unplug its computer at the end of the session?

Even cutting edge chat bots still hallucinate, meaning they respond with inaccurate or fabricated information, and often they repeat content, generating the same sentence multiple times. So, there is still a ways to go before chat bots are really indistinguishable from humans. At the same time, there is an entire industry of AI Content providers which claim they can boost the SEO of your site without manual intervention or vetting of content.

If it IS possible to create intelligent text at the click of a button, it changes the types of activities that are worthwhile for us humans to do manually. Consider running a website that has thousands of pages of SEO content to pull in visitors from Google. Prevously you would hire a team of content writers to handle this. However, it doesn’t make sense to use the time or money on content when a computer program can do the same for much less. Surely Google is penalizing AI content in search results, right? Well, no. Google has stated that it is rewarding high-quality content however it is produced. This is an acknowledgment of two things:

  • It may not be possible to tell the difference between human an AI generated content
  • AI generated content is in some cases valuable, interesting, non-trivial content

In the September 2023 Google SEO Office hours from, John Mueller gives us some specific feedback on this:

Is content like this missing from the web and your site, and your site could add significant value for users overall, or is it just rehashed content that already exists on the rest of the web?

This month Google published a challenge on Kaggle, its platform for crowdsourcing the building of AI models, to find a way to detect if text is AI generated. This shows that Google is struggling to find a solution to this problem. Amazon’s Kindle Direct Publishing is asking authors to disclose if content was AI generated as a quality control, which indicates that they do not have a way to detecting this themselves 100% of the time.

There is an old saying, ‘There is nothing new under the sun’. How many of the articles humans write are also rehashed knowledge that was read and reorganized from other sources? This could be content that passes a word-for-word plagiarism detector, but is still just a summary of existing information from a set of sources? It is possible that there is no way to distinguish between human and AI authors, because LLMs are doing the exact same thing human brains are? That would mean they are passing the Turing Test, and have reached human level intelligence in terms of writing ability.

One must wonder what to do about all this. When Deep Blue beat Garry Kasparov in Chess, did it make playing chess a futile effort, like doing math with roman numerals? Does the existence of cars make running no longer worth doing? While chess and running are not meaningless pursuits, with the advancement of technology, their niche is now more ‘for the art’ than for productivity. The advancement of LLMs is just another example of a machine being able to do what used to require a human to do manually. Before the advent of the tractor, a single farmer could tend to just a few acres of land. But with a tractor, a single farmer can grow crops on 1000s of acres. And suddenly, farmers not using a tractor are scrambling to either level up technology or find a new line of work. This is no different. Amazon now limits the number of books an author can self publish to 3 per day, which may limit the amount of AI content, but still allows it to pace thousands of times faster than humans.

As always with technological advancements, the work doesn’t end, but it does move to a higher level of abstraction. There are and will continue to be plenty of higher level jobs creating, training and learning to better utilize AI models. For overall human productivity, this should be a net positive. But in the short term, some of us may need to adapt. Some teachers have responded to the fact that bot detection is not reliably working by using like against like; they are using their own AI LLMs to read in students’ essays and generate questions about the content that the student must answer on the spot. This reveals if the student is really and expert on 16th century philosophy, or if they submitted the work of a chat bot without reviewing it. With this approach, instead of banning the use of new technologies, we are incorporating them and training students to use them more effectively.






Leave a Reply

Your email address will not be published. Required fields are marked *