The latest AI models can become biased, regressive, and echo chamber-y when trained on their own outputs
A new study explores a limitation in current-generation AI networks when trained on AI-generated data. When an AI model is trained on its own outputs, it becomes biased and regressive, according to researchers at the University of Washington and Microsoft. Dubbed “Model Autophagy Disorder” (MAD), this phenomenon occurs when an AI model loses quality and outputs oddly mutated results, as observed in ChatGPT and Midjourney.
The study demonstrates how training an AI model on its own outputs creates a convergence effect on the data, causing the model to lose access to the extremes of the data distribution. Essentially, the model begins to ‘prune’ the data at the edges of the spectrum, causing the data at the extremes to ‘disappear.’ This results in the model outputting results that are more aligned with the mean representation of the data.
“This is a form of conditioning that the model learns from its own outputs,” the study explains. “It learns that the edges of the distribution are less important, and the model can approximately reconstruct the full distribution from its mean alone.”
The researchers verify MAD against autoencoders, Gaussian mixture models, and large language models. Autoencoders can handle a variety of tasks, including image compression, denoising, and generation. Gaussian mixture models are used for density estimation and clustering, while large language models, such as ChatGPT, are also prone to going MAD when trained on their own outputs.
The research sheds light on the development of AI models and challenges the idea of using AI-generated data to create more data. If a commercial AI model has been trained on its own outputs, it likely regresses toward its mean and becomes biased.
“The model essentially eats itself,” the study explains. “By pruning the data at the edges of the distribution, the model is not exposed to the extremes of the original data, and the model converges toward its mean.”
Unlabeled AI-produced data has already been incorporated into various systems, making it challenging to identify and exclude from training data. Knowing about the limitations of AI models trained on their own outputs opens up opportunities to develop ways to identify and label AI-generated content.
Changing the weightings of the model can also compensate for biases and prevent pruning of data at the edges of the distribution, but this can be tricky. “Deciding on the weightings and understanding the effects of model fine-tuning is crucial,” the study explains.
The study raises questions about the truth behind the model’s answers, biases in the model, and the impact of training models on their own data. It also highlights the importance of exposing AI models to new experiences to avoid becoming echo chambers of past data.
Read the study here.