AI models are choking on junk data
AI models are choking on junk data
https://fortune.com/2026/05/03/ai-models-are-choking-on-junk-data/
Publish Date: 2026-05-03 09:30:00
Source Domain: fortune.com
- The progress of transitioning from language models like ChatGPT to sophisticated humanoid robots hinges on the quality of the training data.
- Historically, more data led to smarter models, but this approach may not suffice for physical AI which requires richer, multifaceted data.
- A looming crisis involves an overabundance of “junk data” — data that does not aid in AI model development.
- The demand for data has fueled the rise of expensive AI data startups but often produces useless data for high-stakes physical AI applications.
- Effective physical AI requires significantly more data, which is harder to obtain and often requires complex simulations due to the complexity of the physical world.
- Using junk data can degrade performance, delay market-readiness, and lead to unpredictable outcomes, especially for applications like self-driving cars.
- Examples like OpenAI’s sunset of its AI video app highlight issues stemming from inadequate training data for physical AI.
- To fully utilize AI’s potential, machine learning teams must prioritize filtering out junk data to ensure the training of effective AI models.
- High-quality data is now seen as the critical constraint, and those who adapt to this challenge first will likely lead in the development of successful AI systems.