Mea Vita: Carpe Diem: August 2024

Wednesday, August 28, 2024

Hallucinations: How Many ‘R’s Are in the Word Strawberry?

Ask your favorite AI chatbot, "How many ‘R’s are in the word strawberry?"

Most will respond with, "The word 'strawberry' contains two ‘R’s." Obviously, that's wrong – the correct answer is three.

This is the difference between knowing and understanding.

AI models tokenize words. Tokenization is the process of breaking down a stream of text, such as sentences, into individual words and then assigning values to each word in multiple dimensions. An AI model doesn't break down a word into letters, so current models don't use introspection to know what letters make up a word. While an AI model could break down words into letters, the juice is not worth the squeeze when it comes to memory and storage requirements.

In the world of AI, this seemingly confidence, yet random guess, is called a hallucination.