Fighting against water: AI is going to get the answer one way or another

When I ask GPT for a summary of an article, it gets Heisman’d by TechCrunch and immediately finds the summary from two other sources. This issue is about a lot more than chat bots. Duplicate content “provenance” is the key. Who broke the story? Who owns it? If it’s a press release, publishers are toast. If it’s original reporting and we block the bots… we’re invisible. If others paraphrase (the Huffington Post model)… and don’t block… the syndicators drink the SEO milkshake. It that the AI’s fault? Or are publishers just squeezing the audience engagement balloon?

I have a custom GPT that summarizes text into a single paragraph that I use for a first draft of my newsletter. I can paste text in, or give it a URL. In this case The Verge blocks GPT…. so it found two other ways to get my summary. I don’t trust the summary, so I am going to copy and paste the Verge article into GPT. All the corporate policies in the world are not going to keep one goober (me) from messing up their effort by pasting in articles by hand. Sorry, folks.

Google allows GPT to crawl it… duh, but still shows the difference in behavior when GPT hits a dead-end or is able to read. If it can read a site, it says ‘browsing’ in the status bar.

In a classic twist, Archive.ph hosts copies of paywalled articles… and is completely invisible to GPT.

Plot twist! Who’s dissing who here?

“Wall Street Journal owner News Corp has a content-licensing partnership with OpenAI.” You’d think GPT could read the WSJ? Nope. Not a paywalled article either.