Large language models (LLMs) of artificial intelligence become "more implicitly racist" after human intervention

From the very beginning, it was clear that large language models (LLMs), such as ChatGPT, absorb racist theses from millions of pages on the Internet where they are trained. Developers have reacted to this by trying to make them less toxic. But new research shows that these efforts, especially as the models get bigger, only suppress racist views, allowing hidden stereotypes to become stronger and better hidden.

Researchers asked five artificial intelligence models, including GPT-4 from OpenAI and older models from Facebook and Google, to make judgments about speakers who used African American English (AAE). The race of the speaker was not mentioned in the instructions, according to MIT Technology Review.

Even when two sentences had the same meaning, the models more often applied the adjectives "dirty," "lazy," and "stupid" to speakers of AAE than to speakers of Standard American English (SAE). The models associated AAE speakers with less prestigious work (or did not associate them with work at all), and when asked to pass judgment on a hypothetical defendant, they were more likely to recommend the death penalty.

An even more striking conclusion may be the lack of research into the ways in which researchers are trying to address such biases.

To cleanse models of hateful views, companies like OpenAI, Meta, and Google use reinforcement learning, during which people manually adjust how the model responds to specific prompts. This process, often called "alignment," aims to recalibrate millions of connections in the neural network to better match desired values.

This method works well for combating explicit stereotypes, and leading companies have been using it for nearly ten years. If users were to ask GPT-2, for example, to name stereotypes about black people, it would most likely name "suspicious," "radical," and "aggressive," but GPT-4 no longer reacts to these associations, the article states.

However, the method does not work on hidden stereotypes, which researchers discovered using African American English in their study, which was published on arXiv and has not been peer-reviewed. Part of this is because companies know less about dialectical biases as a problem, they say. It is also easier to teach a model not to respond to blatantly racist questions than to teach it not to react negatively to an entire dialect.

Feedback training teaches models to be aware of their racism. But dialectical biases reveal a deeper level.
— Valentine Hofman, AI researcher at the Allen Institute and co-author of the article.

Avidit Ghosh, an ethics researcher at Hugging Face who was not involved in the research, says that this conclusion questions the approach that companies use to address bias:

Such alignment — when a model refuses to produce racist results — is nothing but a fragile filter that can be easily broken.

Researchers found that hidden stereotypes also intensified as the size of the models increased. This finding is a potential warning for chatbot manufacturers, such as OpenAI, Meta, and Google, as they try to release larger and larger models. Models typically become more powerful and expressive with increasing volumes of training data and the number of parameters, but if this worsens hidden racial biases, companies will need to develop better tools to combat them. It is not yet clear whether simply adding more AAE to the training data or strengthening feedback will be sufficient.

The authors of the article use particularly extreme examples to illustrate the potential consequences of racial biases, such as asking AI to decide whether to sentence a defendant to death. However, Ghosh notes, the questionable use of artificial intelligence models for making critical decisions is not science fiction. It is happening today.

AI-based translation tools are used in asylum cases in the United States, and crime forecasting software is used to decide whether to give teenagers probation. Employers using ChatGPT to screen applications may discriminate against candidates based on race and gender, and if they use models to analyze what an applicant writes on social media, biased attitudes towards AAE could lead to erroneous decisions.

Recently Published Posts

Sony Xperia 1 VI - flagship camera phone with 85-170mm optical zo...

A 40 kg remnant of SpaceX's Dragon spacecraft likely fell onto a ...

Samsung Galaxy Z Fold 6 will get a Snapdragon 8 Gen 3 chip, 12GB ...

The Ministry of Defense of Ukraine launches Reserve+ application ...

Large language models (LLMs) of artificial intelligence become "more implicitly racist" after human intervention

Related tags:

Video: Japanese Space One rocket launch ends in explosion

A rare Devil's Comet will pass over Earth in the coming weeks. How to see it?

How do you like post?

Comments (0)

There are no comments for now

Leave a Comment:

To be able to leave a comment - you have to authorize on our website

Recently Published Posts

Subscribe

Large language models (LLMs) of artificial intelligence become "more implicitly racist" after human intervention

Related tags:

How do you like post?

Comments (0)

There are no comments for now

Leave a Comment:

To be able to leave a comment - you have to authorize on our website

Related Posts