These instructions are a small part of what makes a model answer like it does. Much more important is the training data. If you want to make a racist model, training it on racist text is sufficient.
Great care is put in the training data of these models by AI companies, to ensure that their biases are socially acceptable. If you train an LLM on the internet without care, a user will easily be able to prompt them into saying racist text.
Gab is forced to use this prompt because they’re unable to train a model, but as other comments show it’s pretty weak way to force a bias.
The ideal solution for transparency would be public sharing of the training data.
Access to training data wouldn’t help. People are too stupid. You give the public access to that, and all you’ll get is hundreds of articles saying “This company used (insert horrible thing) as part of its training data!)” while ignoring that it’s one of millions of data points and it’s inclusion is necessary and not an endorsement.
I’m afraid that would not be sufficient.
These instructions are a small part of what makes a model answer like it does. Much more important is the training data. If you want to make a racist model, training it on racist text is sufficient.
Great care is put in the training data of these models by AI companies, to ensure that their biases are socially acceptable. If you train an LLM on the internet without care, a user will easily be able to prompt them into saying racist text.
Gab is forced to use this prompt because they’re unable to train a model, but as other comments show it’s pretty weak way to force a bias.
The ideal solution for transparency would be public sharing of the training data.
Access to training data wouldn’t help. People are too stupid. You give the public access to that, and all you’ll get is hundreds of articles saying “This company used (insert horrible thing) as part of its training data!)” while ignoring that it’s one of millions of data points and it’s inclusion is necessary and not an endorsement.