Edit: obligatory explanation (thanks mods for squaring me away)…
What you see via the UI isn’t “all that exists”. Unlike Reddit, where everything is a black box, there are a lot more eyeballs who can see “under the hood”. Any instance admin, proper or rogue, gets a ton of information that users won’t normally see. The attached example demonstrates that while users will only see upvote/downvote tallies, admins can see who actually performed those actions.
Edit: To clarify, not just YOUR instance admin gets this info. This is ANY instance admin across the Fediverse.
Obviously, this isn’t ideal. But this isn’t as damning as some of the other commenters believe.
The way reddit operates, is that they are “trusted” with all our data. They can (and do), sell any data they like, to whomever they like. They store much more information than simply who upvoted what. They can’t simply allow upvotes with no claimant, they’d have no way of stopping or identifying bots or illegitimate upvotes.
This system is not ideal, but it’s also not necessarily worse. We’re still operating under that system, the only real difference is, we get to choose who that trusted party is. We get to move instances if the hosters interests become misaligned with our own.
Ultimately, there needs to be a smart solution to this problem to ensure it’s not abused. We can’t completely remove collection of the data, otherwise upvotes will be meaningless and hijacked by agendas. We can’t simply encrypt the data, if there’s a genuine use for it (which we’ve discussed), who SHOULD be allowed to decrypt it?
I completely understand the concern, and I share it. But this isn’t an issue so much with Lemmy, it’s an issue with upvotes on distributed social media.
Edit: Okay, ANY instance admin is where the issue lies. That much I agree with.
There’s a huge difference between Reddit keeping our data “locked away” on their private server vs. a system that puts it all out in public view. You can bet your behind that Big Tech and governments are harvesting ALL of it as we speak. This is MUCH worse than Reddit just selling some data to a few third party actors.
I completely agree that sharing it with other instances is a problem.
This is super nitpicky, but assuming it exposed even a minute amount of the data that Reddit freely ships to whoever buys it (including governments), I actually think it’s far less likely to be seen. Social media companies are well-known to freely give access to anything law enforcement, governments or advertisers would like. Most if not all, have exposed APIs which allow law enforcement at least to collect almost any data at their leisure. This data is packaged up by the orgs who have the data.
Scraping Lemmy for this information would require their own solutions, and backends to handle all the data. Here in the UK, our tecnically-inept government famously broke their multi-billion COVID test-and-trace system because the excel spreadsheet they used as a database, ran out of lines…
Even assuming it’s true that all of these groups have bothered to make their own solutions and bought server space to store the data themselves for a relatively tiny (certainly until very recently), the only data they get is who liked what post/comment.
That is a small snowflake compared to the iceberg that other social media organizations collect, package and sell. Facebook for example collect enough data that they earn more per user than Netflix.
Certainly, as Lemmy and ActivityPub gain more traction, this is a privacy hole which deserves some consideration, and should be immediately plugged. But I just don’t think it’s in the same solar system as exposing data to any social media site.
I think that the best solution is probably “best practices” and defederatiom used to enforce some sort of minimal Code of Conduct wrt the actual mechanics of running an instance.
Otherwise, the only other way I could see to address this is to lump some data at the instance level. I.e. each instance simply reports a total of upvotes and downvotes from it’s instance, and you just have to trust the instances to behave. There might be some checks to make sure the vote totals are plausible.
In reality, this will be the end of small instances. Only feasible way to enforce this is federation whitelists, and it will be very hard to get whitelisted. Not necessarily a bad thing in the big scheme of things when we weight the positives and negatives, but still sucks for anyone “self hosting” an instance.
True. Any random unverified instance could be set up just to harvest data from the Fediverse.