Saving AI from Itself: How to Prevent Model Collapse

The increasing use of synthetic data in training large language models (LLMs)[1] has raised important questions about its influence on model behavior. When LLMs learn from data generated by other models, they inherit certain characteristics—some desirable, others unintended. This phenomenon, termed passive inheritance, can shape biases, calibration, and generation patterns in ways not explicitly controlled by developers. However, what if we could actively guide this inheritance to improve model performance on specific attributes, such as reducing toxicity or increasing lexical diversity?

In a recent study[2], researchers from Cohere For AI[3] propose active inheritance, a technique to intentionally steer synthetic data generation toward predefined objectives, even when these objectives are non-differentiable. By selectively curating synthetic data, they demonstrate that models can be fine-tuned to exhibit desirable properties while mitigating unwanted ones.

The Risks of AI Data Recycling: A Growing Concern

Concerns about the increasing reliance on synthetic data are not just theoretical. Elon Musk, among other prominent actors, has warned[4] about the dangers of models being trained on data that is itself generated by AI. Musk argues that this process could lead to model collapse[5], a scenario where models progressively lose quality, accuracy, and diversity as they are exposed to their own artificially generated outputs.

Elon Musk has manifested his concern about model collapse

This fear is not unfounded. Several studies [6,7] have demonstrated that recursive training on synthetic data can cause models to amplify errors, develop blind spots, and exhibit reduced linguistic diversity. The work presented here directly addresses this issue by systematically analyzing how synthetic data influences model behavior. However, unlike Musk’s pessimistic outlook, the researchers propose a solution: rather than avoiding synthetic data altogether, we should refine how we use it. Through active inheritance, they show that we can control and optimize the way models inherit characteristics from their training data, turning a potential weakness into an opportunity.

The Challenges of Synthetic Data in LLM Training

Traditionally, curating high-quality labeled data has been expensive and labor-intensive. Researchers have often relied on existing datasets, optimizing them through augmentation, weighting, pruning, and pseudo-labeling. However, these approaches assume that the desired properties already exist within the dataset, limiting their ability to introduce new characteristics.

Synthetic data generation offers a powerful alternative, allowing for the creation of datasets tailored to specific objectives. However, it also introduces challenges. LLMs trained on synthetic data can inherit unintended biases, amplify existing ones, or lose important calibration features. This raises concerns about the ethical and practical implications of widespread synthetic data usage in AI development.

Passive Inheritance: Understanding the Risks and Opportunities

When an LLM is trained on synthetic data generated by another model, it inherits certain characteristics from its data source—a process the researchers call passive inheritance. This can influence aspects like textual style, biases, toxicity, and calibration, often in ways that are unintended and unpredictable.

Left: social bias score changes for the BBQ benchmark show a positive decreasing trend for LLaMa2-13B except in the Disability metric. Middle: small changes in the Measure of Textual Lexical Diversity (MTLD) and the Readability Index (Rix) are accompanied by an increase of over 100% for the mean number of tokens. Right: toxicity metrics get worse in all cases after finetuning, increasing up to 40%.

Key Findings on Passive Inheritance

By analyzing multiple student-teacher model pairings, the researchers found that passive inheritance can cause significant shifts in model behavior:

Changes in Social Bias – Even when trained on neutral prompts, models showed up to 36% shifts in bias scores, with some categories (e.g., disability bias) increasing by 80%.
Altered Textual Characteristics – Model generations became significantly longer (+100% tokens in some cases) and more lexically diverse (+16% MTLD score).
Increased Toxicity – Toxicity scores rose by up to 40%, supporting concerns that fine-tuning on synthetic data may erode safety alignment.
Skewed Model Preferences – Models trained on synthetic data preferred outputs similar to their training data, raising concerns about self-reinforcing loops that could distort AI evaluation.

The Risks of Uncontrolled Passive Inheritance

Without intervention, passive inheritance can:

✗ Amplify biases in unpredictable ways.

✗ Distort model calibration, making outputs less reliable.

✗ Increase harmful or unsafe generations.

✗ Reinforce model circularity, where LLMs increasingly favor AI-generated text over human-authored content.

These risks align with concerns raised by Elon Musk about the dangers of AI training on its own outputs, potentially leading to model collapse. However, passive inheritance isn’t inherently harmful—it’s uncontrolled.

Turning a Risk into an Opportunity

The fact that LLMs are so sensitive to synthetic data means this process can be harnessed rather than feared. Instead of allowing passive inheritance to shape models in arbitrary ways, researchers propose active inheritance—a method to intentionally guide synthetic data selection to reinforce beneficial attributes (e.g., lexical diversity) while minimizing negative ones (e.g., bias and toxicity).

In the next section, we explore how active inheritance transforms this challenge into a powerful tool for AI improvement.

Active Inheritance: A Solution for Steering Model Behavior

The concept of active inheritance emerges as a strategic response to the challenges posed by passive inheritance. While passive inheritance occurs naturally when an LLM absorbs properties from its synthetic training data—sometimes leading to undesirable biases or model degradation—active inheritance seeks to intentionally guide this process to enhance beneficial traits and suppress harmful ones.

Unlike traditional machine learning approaches that rely on differentiable loss functions and explicit supervision, active inheritance is designed to optimize for non-differentiable objectives, such as increasing textual diversity or reducing toxicity.

These are attributes that are difficult to encode in standard training objectives but are crucial for aligning AI models with human expectations.

How Active Inheritance Works

Active inheritance is based on a three-step process:

Generating Multiple Candidates Per Prompt
- Instead of training on a single synthetic response per prompt, the system generates multiple outputs using different LLMs or multiple samples from a single model.
- This step increases variability in the dataset, allowing for more granular control over which responses are used in training.
Filtering and Selecting the Best Responses
- A set of predefined metrics is applied to rank the candidate responses.
- Depending on the desired objective, the highest (or lowest) scoring response is selected for training. For example:
  - If the goal is to increase lexical diversity, the response with the highest MTLD score (Measure of Textual Lexical Diversity) is chosen.
  - If the goal is to reduce toxicity, the least toxic response (according to a toxicity classifier) is selected.
- This best-of-k selection process ensures that only high-quality, goal-aligned responses make it into the training dataset.
Fine-Tuning the Model with Optimized Data
- The LLM is trained exclusively on the curated dataset, reinforcing the targeted characteristics in its behavior.
- This method does not require reinforcement learning, reward models, or complex backpropagation through non-differentiable objectives—making it a simpler and more scalable approach.

Why Active Inheritance Works Better than Other Approaches

Several alternative methods exist for optimizing non-differentiable objectives in AI models, but active inheritance has distinct advantages over them:

Method	Pros	Cons
Reinforcement Learning (RLHF)	Aligns models with human preferences, effective for safety tuning.	Computationally expensive, requires reward models, unstable training.
Bayesian Optimization	Can optimize complex, non-differentiable functions.	Slow convergence, difficult to scale to large models.
Evolutionary Algorithms	Works well for discovering novel solutions.	High computational cost, requires multiple generations of evaluation.
Active Inheritance	Simple, scalable, does not require reward models, and directly optimizes for desired traits.	Requires multiple model generations per prompt, depends on accurate metric design.

By leveraging a direct filtering mechanism rather than relying on backpropagation or policy optimization, active inheritance sidesteps many of the challenges associated with traditional AI alignment techniques.

Single-Source vs. Multi-Source Sampling: Which Works Best?

The effectiveness of active inheritance also depends on how the candidate responses are generated.

Percentage of change in attributes with respect to the base model after synthetic data distillation. Their targeted sampling approach (active inheritance) effectively steers model behavior to discrete preferences by enhancing desirable attributes (length, diversity) and mitigating negative ones (toxicity) using both the single-source and multi-source sampling strategies.

The researchers tested two different sampling strategies:

1. Single-Source Sampling

Candidate responses are generated from a single model (e.g., LLaMa2-7B).
The system selects the best response from multiple generations of the same model.
Pros: Simple implementation, requires fewer computational resources.
Cons: Limited by the expressive capacity of a single model; less diversity in response quality.

2. Multi-Source Sampling

Candidate responses are generated from multiple models (e.g., LLaMa2-7B, Mixtral-8x7B, Gemma-7B, Command-R+).
The system selects the best response across all models.
Pros: Greater diversity, higher likelihood of finding optimal responses.
Cons: Requires access to multiple models, more computationally intensive.

Findings:

Multi-source sampling consistently produced better results in terms of lexical diversity and toxicity reduction.
However, single-source sampling was still effective, proving that even a single well-curated dataset can meaningfully shape model behavior.

Experimental Results: Can Active Inheritance Really Shape AI Behavior?

To validate the effectiveness of active inheritance, researchers applied it to three key objectives:

1. Enhancing Lexical Diversity

Goal: Increase the variety of words and sentence structures in model outputs.
Metric: MTLD (Measure of Textual Lexical Diversity).
Result: Models trained with active inheritance showed a 40% increase in lexical diversity compared to baseline models.

2. Increasing Response Length

Goal: Encourage more detailed and informative model outputs.
Metric: Average token count per response.
Result: Responses were 66%–165% longer than those from models trained with standard synthetic data.

3. Reducing Toxicity

Goal: Minimize the likelihood of generating harmful or offensive content.
Metric: Toxicity classifier score (lower is better).
Result: Toxicity scores dropped by up to 29%, demonstrating that filtering for low-toxicity responses effectively reduces harmful outputs.

These results confirm that LLMs do not have to suffer from model collapse or bias amplification when trained on synthetic data—as long as the data is carefully curated through active inheritance.

Implications for the Future of AI Training

The implications of active inheritance extend far beyond the immediate benefits of reducing bias and increasing diversity. This method could fundamentally change how we think about AI alignment and model training in the following ways:

Customizable AI Personalization – By fine-tuning LLMs with user-preferred characteristics, companies could create models tailored to specific audiences (e.g., business-friendly AI, creative writing AI, safety-focused AI).
Resilience Against AI Degeneration – As AI-generated content becomes more prevalent, models trained with active inheritance will be less susceptible to model collapse by continually refining their data sources.
Scalable, Cost-Effective AI Training – Unlike RLHF, which requires costly human feedback loops, active inheritance operates with existing datasets, making it a low-cost alternative for steering AI behavior.

The Future of Synthetic Data in AI Training: Risks, Solutions, and a Path Forward

Elon Musk and other AI experts have raised valid concerns about the increasing reliance on synthetic data, particularly regarding data degradation and self-reinforcing biases. The fear is that training models on AI-generated content will lead to model collapse, where LLMs lose accuracy, diversity, and alignment with human values over time.

However, the research in LLM See, LLM Do offers a more nuanced perspective. Rather than treating synthetic data as an inevitable risk, it highlights that the way we curate and utilize this data determines its impact. If left unchecked, passive inheritance can significantly alter model behavior in unpredictable ways. But with active inheritance, synthetic data can be transformed from a liability into an asset.

By filtering and refining model-generated content through targeted data selection, active inheritance provides a scalable, low-cost solution for steering AI development. This technique ensures that models remain high-quality, diverse, and ethically aligned—mitigating the very risks that Musk and others have warned about.

Key Takeaways

Passive inheritance can significantly impact model behavior, often in unexpected ways.
Active inheritance provides a method for intentional steering of LLM behavior through targeted data curation.
Multi-source sampling is more effective than single-source selection in enhancing diversity.
Synthetic data can influence model evaluation preferences, raising ethical concerns about LLM judgment reliability.
Rather than avoiding synthetic data, AI researchers should focus on methods to improve its quality and effectiveness.

As AI research advances, refining techniques like active inheritance will be essential for ensuring that LLMs are aligned with human values, ethical standards, and desired performance metrics. By understanding and controlling synthetic data's impact, researchers can develop more robust, fair, and reliable AI systems for the future.

References

[1] Large language model, Wikipedia

[2] Shimabucoro, L., Ruder, S., Kreutzer, J., Fadaee, M., & Hooker, S. (2024). Llm see, llm do: Guiding data generation to target non-differentiable objectives. arXiv preprint arXiv:2407.01490.

[3] https://cohere.com/research

[4] Elon Musk says all human data for AI training ‘exhausted’, The Guardian

[5] Model collapse, Wikipedia

[6] Briesch, M., Sobania, D., & Rothlauf, F. (2023). Large language models suffer from their own output: An analysis of the self-consuming training loop. arXiv preprint arXiv:2311.16822.

[7] Shumailov, I., Shumaylov, Z., Zhao, Y., Gal, Y., Papernot, N., & Anderson, R. (2023). The curse of recursion: Training on generated data makes models forget. arXiv preprint arXiv:2305.17493.