> 1. One is building the index, which is a lot harder without a google offering its own API to boot. If other tech companies really wanted to break this monopoly, why can't they just do it?
FTA:
> Context matters: Google built its index by crawling the open web before robots.txt was a widespread norm, often over publishers’ objections. Today, publishers “consent” to Google’s crawling because the alternative - being invisible on a platform with 90% market share - is economically unacceptable. Google now enforces ToS and robots.txt against others from a position of monopoly power it accumulated without those constraints. The rules Google enforces today are not the rules it played by when building its dominance.
robots.txt was being enforced in court before google even existed, let alone before google got so huge:
> The robots.txt played a role in the 1999 legal case of eBay v. Bidder's Edge,[12] where eBay attempted to block a bot that did not comply with robots.txt, and in May 2000 a court ordered the company operating the bot to stop crawling eBay's servers using any automatic means, by legal injunction on the basis of trespassing.[13][14][12] Bidder's Edge appealed the ruling, but agreed in March 2001 to drop the appeal, pay an undisclosed amount to eBay, and stop accessing eBay's auction information.[15][16]
Not only was eBay v. Bidder's Edge technically after Google existed, not before, more critically the slippery-slope interpretation of California trespass to chattels law the District Court relied on in it was considered and rejected by the California Supreme Court in Intel v. Hamidi (2003), and similar logic applied to other states trespass to chattels laws have been rejected by other courts since; eBay v. Bidder's Edge was an early aberration in the application of the law, not something that established or reflected a lasting norm.
The point is, robots.txt was definitely a thing that people expected to be respected before and during google's early existence. This Kagi claim seems to be at least partially false:
> Google built its index by crawling the open web before robots.txt was a widespread norm, often over publishers’ objections.
Perhaps it wasn't a widespread norm though. But I don't really see why that matters as much, is the the issue that sites with robots.txt today only allow Googlebot and not other search engines? Or is Google somehow benefitting from having two decade old content that is now blocked because of robots.txt that the website operators don't want indexed?
Agree. It was not standard in the late 90s or early 00s. Most sites were custom built and relied on the _webmaster_ knowing and understanding how robots.txt worked. I'd heard plenty of examples where people had inadvertently blocked crawlers from their site, not knowing the syntax correctly. CMS' probably helped in the widespread adoption e.g. wordpress
> robots.txt was definitely a thing that people expected to be respected before and during google's early existence
As someone who was a web developer at that time, robots.txt wasn't a "widespread norm" by a large margin, even if some individuals "expected it to be respected". Google's use of robots.txt + Google's own growth made robots.txt a "widespread norm" but I don't think many people who were active in the web-dev space at that time, would agree that it was a widespread norm before Google.
A classic case of climbing the wall, and pulling the ladder up afterward. Others try to build their own ladder, and Google uses their deep pockets and political influence to knock the ladder over before it reaches the top.
Why does Google even need to know about your ladder? Build the bot, scale it up, save all the data, then release. You can now remove the ladder and obey robots.txt just like G. Just like G, once you have the data, you have the data.
Why would you tell G that you are doing something? Why tell a competitor your plans at all? Just launch your product when the product is ready. I know that's anathema to SV startup logic, but in this case it's good business
Running the bot nowadays is hard, because a lot of sites will now block you - not just by asking nicely via robots.txt, but by checking your actual source IP. Once they see it's not Google, they send you a 403.
> Microsoft spent roughly $100 billion over 20 years on Bing and still holds single-digit share. If Microsoft cannot close the gap, no startup can do it alone.
This is incorrect. Kagi does not use the Bing index, as detailed in the article:
> Bing: Their terms didn’t work for us from the start. Microsoft’s terms prohibited reordering results or merging them with other sources - restrictions incompatible with Kagi’s approach. In February 2023, they announced price increases of up to 10x on some API tiers. Then in May 2025, they retired the Bing Search APIs entirely, effective August 2025, directing customers toward AI-focused alternatives like Azure AI Agents.
There's one great example of a company that did that and managed to go viral on their release, Cuil. They claimed to have a Google size of search index. Unfortunately for them their search results weren't good and so that visibility quickly disappeared.
Going further back, AlltheWeb was actually pretty decent but was eventually bought by Overture and then Yahoo and ended up in their graveyard.
For everyone else it's the longer grind trying to gain visibility.
True. But the thing is if one says "We will make sure your site is in a world wide freely availabled index" which is kept fresh, google's monopoly ship already begins to take on water. Here is a appropriate line from a completely different domain of rare earth metals from The Economist on the chinese govt's weaponization of rare earths[1]:
> Reducing its share from 90% to 80% may not sound like much, but it would imply a doubling in size of alternative sources of supply, giving China’s customers far more room for manoeuvre.
This was on hn this year, and it was, in classic HN fashion, dismissed as a problem in search of a solution. Well, perhaps people in this thread will think differently
In my case, as you said it may not have exacerbated it, but for me it certainly perpetuated it.
A retreat into the online world seems like a comfort in difficult times but it is a retreat, and the longer you stay retreated, the less likely it is you'll regain the ground again.
Yeah but I'm glad you don't consider the ipad a toy. It's not a toy, i predict that we're going to look back at this time of 'ipad + headphone kids' and roll our eyes as much as we roll our eyes at bloodletting.
Screen devices can be a toy but it takes very intentional use of them.
My child has had an old iPhone se since 4yo. It has no network connection. I load music on it. It only has music, camera, and voice recorder apps. Like most toys it gets intense periods of play and then goes back in the toy box with a dead battery for weeks.
It's my assertion that the problem with tablets/phones as toys for kids is the endless stream of new content. It's addictive and never gets old. If you find a way to cut off the firehouse of new (and keep the addictive apps off) then they eventually become just another boring toy. Us adults could learn from this too.
It's more like inviting the troops besieging your walls inside and celebrating because "we have less enemies outside the walls to deal with now!"
It's the thin end of the wedge, you give it an inch and it will take a mile. The productivity boost you gain will quickly become an expectation, and then you'll be finding "liberation" by working on Saturday to get ahead of Sunday.
The government's where the offices of a software company are physically located exert control over them. To follow this logic to its end and apply it even handedly results in nation based NIH syndrome surely?
You are talking about an entity whose ownership is 99.8% Russian nationals and state companies; whose employees for the most part are Russian nationals, whose main market is Russia and with very little tangible assets that can be arrested in the Netherlands. The only reason for this "divestment" is sanctions evasion.
FTA:
> Context matters: Google built its index by crawling the open web before robots.txt was a widespread norm, often over publishers’ objections. Today, publishers “consent” to Google’s crawling because the alternative - being invisible on a platform with 90% market share - is economically unacceptable. Google now enforces ToS and robots.txt against others from a position of monopoly power it accumulated without those constraints. The rules Google enforces today are not the rules it played by when building its dominance.
reply