• TWeaK@lemm.ee
    link
    fedilink
    English
    arrow-up
    50
    arrow-down
    7
    ·
    10 months ago

    OpenAI isn’t really proven as legal. They claim it is, and it’s very difficult to mount a challenge, but there definitely is an argument that they have no fair use protection - their “research” is in fact development of a commercial product.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      25
      arrow-down
      11
      ·
      edit-2
      10 months ago

      Using it to train is a grey area, if you paid for the works. If you didn’t, it’s still illegal

      What it does is output copyrighted works which is copyright infringement. That is the legal issue. It’s very easy to prompt it into giving full copyright text they never even paid to look at, let alone give to other people.

      “AI” can’t even handle switching synonyms to make it technically different like a college kid cheating on an essay

      • TWeaK@lemm.ee
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        1
        ·
        10 months ago

        Their argument is that the copying to their training database is “research”. This would be a legal fair use of unauthorised copying. However, normally with research you make a prototype, and that prototype is distinctly different from the final commercial product. With LLM’s the prototype is the finished commercial product, they keep adding to it, thus it isn’t normal fair use.

        When a court considers fair use, the first step is the type of use. The exemptions are education, research, news, comment, or criticism. Next, they consider the nature of the use, in particular whether it is commercial. Calling their copying “research” is a bit of a stretch - it’s not like they’re writing academic papers and making their data publicly available for review from other scientists - and their use is absolutely commercial. However, it needs to go before a judge to make the decision and it’s very difficult for someone to show a cause of action, if only because all their copying is done secretly behind closed doors.

        The output of the AI itself is a bit more difficult. The database ChatGPT runs off of does not include the whole works it learned from - it’s in the training database where all the copying occurs. However, ChatGPT and other LLM’s can sometimes still manage to reproduce the original works, and arguably this should be an offense. If a human being reads a book and then later writes a story that replicates significant parts of the book, then they would be guilty of plagiarism and copyright infringement, regardless of whether they genuinely believe they were coming up with original ideas.