Personal finance app provider WalletHub is pushing back on Google and other AI companies that have been ingesting its content to train their large language models. According to WalletHub, these companies have been hoovering up its content and presenting slightly modified versions of it to users without attribution. In response, WalletHub recently took steps to protect 40,000 pages of its content from AI companies’ web crawlers.
“Google went from organizing the world’s information to stealing it, taking a sledgehammer to the implied contract the open web has relied on for so long,” said Odysseas Papadimitriou, WalletHub CEO. “The deal has always been that publishers give search engines access to their content in return for visibility within the search results. Now, Google and other AI-powered companies are taking the content and giving almost nothing back in return. A lot of AI content is just copyright infringement hiding behind the veil of technological progress.”
Google did not respond to a request for an interview or comment.
WalletHub’s move follows a copyright infringement lawsuit that
Book authors have filed similar suits. In recent rulings from the Northern District of California — Bartz v. Anthropic and Kadrey v. Meta — courts found that training AI models on copyrighted material is allowable fair use, a legal doctrine that allows for certain limited uses without the copyright holder’s permission.
“They rejected the idea that creators have a blanket right to license their work for AI training,” Ranjit Singh, program director at the Data & Society Research Institute, told American Banker. “What matters most, legally, is whether AI-generated outputs actually displace the original, if they substitute for or replicate the market value of that work.”
While the Northern California rulings were generally favorable to AI companies, the courts’ reasoning could provide an opening for firms like WalletHub, Singh said. “If they can show that Google’s AI-generated summaries are diverting traffic, replacing their original content in search, or reducing user engagement in concrete ways, they will be on stronger legal footing,” Singh said.
In the bigger picture, this issue affects every business that posts information on the web and wants to protect its intellectual property or that wants people to visit its site. It also affects the companies that are pushing the use of generative AI on employees in a quest for efficiency and cost savings. If information creators manage to block generative AI models from sucking up their content, the gen AI models become less effective. They are, after all, pattern generators that spit out responses based on the content they consume. Less content, less useful answers.
Unlike the earlier lawsuits, WalletHub does not accuse Google of outright plagiarism.
“We’re not saying there’s this one passage or whole article they plagiarized or anything like that,” John Kiernan, managing editor of Miami-based WalletHub, told American Banker. “They’re essentially putting a handful of websites’ content in the blender and producing content in their own search results, and essentially pushing the publishers themselves aside. So they’re basically repackaging the content as their own.”
What WalletHub discovered
Two years ago, WalletHub’s search engine optimization analysts saw a shift in Google from a meritocracy — “if you put forth the best results, you’ll rank at the top of the search results,” Kiernan said — to the point where the top search results are links to the biggest brands and Reddit. For instance, a search for “credit cards” would yield results from Visa, Mastercard and Reddit. This was true even if WalletHub was explicitly included in the search prompt, he said. (Reddit does well in this environment because in February 2024, Google and Reddit struck a $60 million deal that allows the search giant to use posts from the online discussion site to train its artificial intelligence models.)
“Those types of things got our hairs up, and we’re trying to figure out, what can we do to really improve and get better?” Kiernan said. “We’ve been trying to do everything we can to make things better for the user, which Google says is supposed to be a North Star, and nothing really seems to move the needle. As we watched the progression with AI, it started seeming to us like they were trying to make the search results worse, just to boost their clicks and keep people in the ecosystem. And then it has progressed to being what people are calling the zero-click environment: you search for something, you don’t click anywhere else.”
WalletHub’s research has shown that Google’s AI Overviews, which summarize search results and sometimes provide links to source articles, have very low click-through rates, Kiernan said. People read the summary and move on without ever clicking on a cited article.
“They’re essentially not returning any value for the content that they’re hijacking,” he said.
Google’s “People also ask” sections are another issue for WalletHub. WalletHub tracks certain terms on an hourly and daily basis and watches the results that come up under this header. “People also ask” used to present paragraphs from articles with links to the underlying articles, Kiernan said. But recently WalletHub has seen instances in which, under “People also ask,” Google displays an AI Overviews-style paragraph with “very, very similar content” and no link to WalletHub, Kiernan said.
“It’s not going to be word for word, so that if you put it into Copyscape it’ll come up, but it’s the same ideas without a direct link back to the original source,” he said. (Copyscape is a plagiarism checker.)
The same problems that occur in Google searches come up with the web crawlers of large language model providers like Perplexity, OpenAI and Anthropic.
“A lot of those are even worse than Google,” he said. “They don’t have the scale. But some are alleged to be ignoring website directives not to crawl them, and the large language model providers are using undeclared agents to still do it.”
The challenges of hiding from AI
As the WalletHub team looked for ways to counteract this effect, it decided to take some of its content away from Google.
Kiernan declined to share specifics about exactly how WalletHub is blocking Google’s AI bots from crawling its content, but he did say the company is working with Cloudflare, a content delivery network provider that in September 2024 introduced an option to block AI crawlers. According to Cloudflare, more than a million of its customers have chosen this option.
The downside of blocking AI bots is that a company becomes less visible, and potential new customers or partners can’t find it.
“If you completely lock down content, then you’ve achieved your objective, but you’ve probably cut your nose off to spite your face,” John Thompson, an adjunct professor at the University of Michigan School of Information and an author of five
WalletHub “spent a long time holding onto the cliff,” Kiernan said. “We’ve relied on Google for a long time, and it’s taken a while to come to the realization that we need to figure out a way to engage with our customer base in other ways and figure out how to survive without Google. If we can get Google traffic back to where it was, great, but that’s unlikely to happen. This is kind of that other side of the double-edged sword.”
WalletHub’s leaders have tried many times to talk with people at Google about these issues. “I think they’re sick of us,” Kiernan said. “We have been trying to be helpful in our discovery process, saying, this looks like an issue you might want to fix, thinking that they want it to be the best result. And we basically got stonewalled.”
WalletHub recently surveyed Americans to learn what their misgivings about AI were. It found that 62% think it should be illegal for AI companies to use people’s work without compensating them, and nearly 90% think all AI content should have a prominent disclosure about the source of the content or that it’s AI-generated.
“People understand it’s a fancy tool they can play with right now, but they’re concerned that true expertise and transparency are going to get erased in the process,” Kiernan said.
Going forward, Thompson expects more companies to try to fight AI web crawlers.
“People are going to move towards walled gardens and control and as many limitations as they can,” he said.
As more information gets blocked by generative AI models, “I think it’ll slow their momentum,” Thompson said. “Will it dull their effectiveness? Maybe a little.”
Meanwhile, companies can keep trying to defend their intellectual property. One antidote Thompson suggested is watermarking, or superimposing a logo, piece of text or signature on a piece of content.
“Watermarking your information really helps a lot, because those crawlers are so indiscriminate, they will hoover up watermarks and different things that you put in the text as well,” he said.