Revealed: The Authors Whose Pirated Books Are Powering Generative AI

Pete Hahnloser@beehaw.org · 1 year ago

Revealed: The Authors Whose Pirated Books Are Powering Generative AI

blindsight@beehaw.org · 1 year ago

I think it’s two sides of the same point; the downstream effect of LLMs is devaluing writing, and it’s trained on copyrighted works.

So, for instance, if you train a LLM on everything written by Stephen King, then ask the LLM to generate stories “in the style of Stephen King”, then you could potentially create verbatim text from his books (probabilistically, it’s bound to happen with the way LLM chains words) and/or create books similar enough to his style to be direct competition to his writing.

It’s up to the courts to decide if that argument has any legal weight, and legislators (and the public voting for them) to decide if the laws should change.

And, based on the mess that is Bill C18 in Canada, I have absolutely no confidence in new copyright laws having a lick of sense.

Beej Jorgensen@lemmy.sdf.org · 1 year ago

If it generates verbatim output, then we have a good old copyright violation, which courts could latch onto for standing.

But if I hire people to write books in the style of Stephen King and then train an AI with them, where’s King’s recourse?

And the AI could be trained on public domain data and still be a competitor to authors. It seems like the plaintiffs would have to be equally against this usage if they’re worried about their jobs.

But in those two cases, I don’t think any laws are broken.

I just think, aside from a plain old piracy violation, it’s going to be a tricky one in court. Sure you can’t just copy the book, but running a copy of a book through an algorithm is tougher to ban, and it’s not something that necessarily should be illegal.