I mean, thats the way the capitalist, stock-return-driven economy works. The market expects a company to constantly grow to pump their stock price, so they have to find new revenue or cut costs somewhere. But they can’t do that forever…
The founders build a great product to pull in users, then they go public, then the MBAs turn to enshittification to drive more revenue and get rich while they can. The rest of us then move on to the next platform, if it even exists…
Thanks for the link to Common Crawl; I didn’t know about that project but it looks interesting.
That’s also an interesting point about heavily curated data sets. Would something like that be able to overcome some of the bias in current models? For example, if you were training a facial recognition model, access a curated, open source dataset that has representative samples of all races and genders to try and reduce the racial bias. Anyone training a facial recognition model for any purpose could have a training set that can be peer reviewed for accuracy.