AI Tools

Google AI Overviews and ChatGPT can get it wrong (or very wrong) about your product

[If you’re here for the worst AI fail, I’ll spoil it: ChatGPT hallucinated my app’s purpose and basically described it as some sort of keylogger, which it absolutely isn’t 🤷‍♂️. Google AI Overviews also failed both of my tests. Now, the article:]

Google search isn’t what it used to be.

Google now displays AI Overviews, short AI-generated snippets that attempt to satisfy your query, above its search results for over 20% of informational queries in over 100 countries, reaching 1B+ users.

The feature quietly delivers subtle inaccuracies to Google users every day, undermining its own search results. Here are a few examples others have noted, including one study from October claiming that AI Overviews provided misleading or inaccurate responses in 43% of finance-related searches. Here’s a distressed business owner complaining about Google AI Overviews misrepresenting their product. The feature had a rocky debut last year, where several blatant inaccuracies led Google to pull it down temporarily.

If you are a creator building something new, one essential thing has been true for ~20 years, you need users to be able to find your thing via Google. If AI Overviews feed misinformation that prevents that, then that’s bad. Similarly, when potential customers ask AI assistants like OpenAI’s ChatGPT about your product, you don’t want them to be misinformed. Ideally, users could even discover your product through these tools.

I wanted to see how potential users might find out about my app these days, so that’s why I did this testing. I found that Google and OpenAI confidently pass along total inaccuracies to potential customers. As you’ll see below, AI search tools failed 5 out of 6 times, and egregiously failed once (via ChatGPT’s default prompt box).

Example #1 and 2, AI Overviews:

A few months ago, my app gained a simple new feature: being able to display a custom image in your Mac’s menu bar. As far as I can tell, it’s the first app that lets you do that.

Here it is showing up in Google search results, as one might expect 👍:

On pages 2 and 4 of results, that’s great for a new feature!
Ok, so what’s the problem?

This is the AI Overview sitting above all the search results for the same query:

So if I’m a person Googling for a product that does X, and Google says matter-of-factly “that doesn’t exist” (even when it does): What are the odds I’ll push past that misinformation and see that it’s actually present in the search results? Pretty low, I’d imagine.

(The second part of the AI Overview is true, but that doesn’t make up for the fact that the first part (that it highlights) is false)

As far as I can tell, Google’s AI Overviews feature is undermining search results and dramatically hurting the discovery of long tail information.

(BTW, here’s the feature in question)

When asked to generally describe my app, Google AI Overviews provided an accurate but incomplete description — one that missed the app’s primary function that it’s known for (the ability to jump between specific Spaces on a Mac and assign names/icons to them in the menu bar). 🤷‍♂️ Not great.

Examples #3 and 4, ChatGPT free tier (very wrong):

Meanwhile, the highest-use, free tier of ChatGPT fails the same test and spectacularly fails a separate one:

First, when asked the same question from above (“how can I put a custom picture in the menu bar of my mac”), it suggests some apps that do other things but not “put a custom image in the menu bar”. Whatever, they are cool apps, but the exercise is largely a waste of time given the query.

The much bigger fail is that, when asked more generally about my app, it gives a flat out wrong/hallucinated description of what my app is and does, a description that paints it in a negative light.

So it not only fails to describe my app, but ChatGPT says the app’s purpose is to keep stats on your keystrokes(?!). That’s slander as far as I’m concerned. It sounds to me like it hallucinated that description because my app’s name is CurrentKey Stats and it basically made an incorrect guess just off the name. Like all of these AI search tools, its output text reads as totally confident.

There is fine print along the bottom of the page that says “ChatGPT can make mistakes. Check important info.” Right … and that begs some obvious questions about its value as a search tool.

[Some background: my app isn’t new and the chatbots should know about it (and some do). It has been around for 6 years, has 120+ ratings in the Mac App Store globally with a 4.5 rating, has had bloggers write about it, popular reddit posts, etc.. (Is definitely in ChatGPT’s training data). One would hope an AI assistant like ChatGPT could get a basic description about a years-old app correct.]

Examples #5 and 6, ChatGPT “Search”

If you log into ChatGPT, specifically select “Search”, and ask it “how can I add a custom image to my menu bar mac”: it performs the same as the free tier and suggests some apps that don’t accomplish the task. That’s a fail but whatever.

However, with the query: “i own a mac, would currentkey stats be good for me?” (same as what was used in example #3) – it actually delivers a useful, accurate, and adequately complete description of the app, unlike the basic free tier and unlike AI Overviews. It pulls from three authoritative sources and provides links; here it is:

So how many people click “Search” in the logged in experience vs. simply using the basic ChatGPT prompt as a search engine? Only OpenAI employees know, but I’d guess far more people use the basic ChatGPT prompt.

Conclusions

How has all of this impacted the business side of my app? It’s impossible to say, because you can’t prove negatives easily. There has definitely been some impact though: AI tools are taking huge bites out of the search market. It’s easy to forget how popular ChatGPT has been (even after its record breaking launch that got a bunch of press). The app has consistently remained among the top 5 most-downloaded apps in the US for about three years.

Of course search engines have always had an incomplete picture of the world’s information and are prone to missing things. But, whereas search engines used to just omit info that had yet to be indexed, they now very clearly and confidently offer wrong information much of the time. The latter is far worse, especially for discovering new things. Over time, one has to wonder if the average person will lose the skill of finding non-obvious information. Given how widespread Google AI Overviews are, it appears to be a topic that has not been extensively covered in academia (at least based on a few arxiv.org and scholar.google.com searches).

So what can be done if you find AI search tools passing along wrong info about your creation? Maybe contact Google Support. I think the best thing that you can do is publish more correct info to the web [say, in a blog post ;)] and hope that they correctly train on it in their next pass.