AI models are choking on junk data

AI models are choking on junk data

https://fortune.com/2026/05/03/ai-models-are-choking-on-junk-data/

Publish Date: 2026-05-03 09:30:00

Source Domain: fortune.com

  • The progress of transitioning from language models like ChatGPT to sophisticated humanoid robots hinges on the quality of the training data.
  • Historically, more data led to smarter models, but this approach may not suffice for physical AI which requires richer, multifaceted data.
  • A looming crisis involves an overabundance of “junk data” — data that does not aid in AI model development.
  • The demand for data has fueled the rise of expensive AI data startups but often produces useless data for high-stakes physical AI applications.
  • Effective physical AI requires significantly more data, which is harder to obtain and often requires complex simulations due to the complexity of the physical world.
  • Using junk data can degrade performance, delay market-readiness, and lead to unpredictable outcomes, especially for applications like self-driving cars.
  • Examples like OpenAI’s sunset of its AI video app highlight issues stemming from inadequate training data for physical AI.
  • To fully utilize AI’s potential, machine learning teams must prioritize filtering out junk data to ensure the training of effective AI models.
  • High-quality data is now seen as the critical constraint, and those who adapt to this challenge first will likely lead in the development of successful AI systems.