A new study from Columbia Journalism Review showed that AI search engines and chatbots, such as OpenAI’s ChatGPT Search, Perplexity, Deepseek Search, Microsoft Copilot, Grok and Google’s Gemini, are just wrong, way too often.
A new study from Columbia Journalism Review showed that AI search engines and chatbots, such as OpenAI’s ChatGPT Search, Perplexity, Deepseek Search, Microsoft Copilot, Grok and Google’s Gemini, are just wrong, way too often.
This does seem to be exactly the problem. It is solvable, but I haven’t seen any that do it. They should be able to calculate a confidence value based on number of corresponding sources, quality ranking of sources, and how much interpolation of data is being done vs. Straightforward regurgitation of facts.
I haven’t seen any evidence that this is solvable. You can feed in more training data, but that doesn’t mean generative AI technology is capable of using that in the way you describe.
I’ve been saying this for a while. They need to train it to be able to say “I don’t know”. They need to add questions to the dataset without enough information to solve so that it can understand what is/isn’t facts vs hallucinating