Improving AI models’ ability to explain their predictions | MIT News

Improving AI models’ ability to explain their predictions | MIT News

Improving AI models’ ability to explain their predictions | MIT News

https://news.mit.edu/2026/improving-ai-models-ability-explain-predictions-0309

Publish Date: 2026-03-09 00:00:00

Source Domain: news.mit.edu

  • Concept Bottleneck Modeling: Uses an intermediate “bottleneck” step to improve AI explainability by forcing deep-learning models to predict understandable concepts before making a final prediction.

  • New Method Development: MIT researchers developed a method to extract and utilize concepts already learned by the model during training for more precise and accurate explanations.

  • Extraction of Learned Concepts: The researchers use a sparse autoencoder to extract relevant learned features and convert them into human-understandable concepts with a multimodal LLM.

  • Improved Accuracy and Explanations: The MIT approach outperformed other concept bottleneck methods in accuracy and provided clearer, more concise explanations, while also generating concepts better suited to the training dataset.

  • Limitations and Future Work: While showing success in interpretability, there’s a trade-off between it and model performance. Future work includes addressing information leakage and scaling up the method with larger datasets.

  • Researchers’ Goals: The goal is to build interpretable AI models by utilizing the internal mechanisms already learned by the models thus making AI reasoning more transparent and accountable.

  • Potential Benefits: The proposed method could push AI interpretability forward, creating pathways for integrating it with symbolic AI and knowledge graphs while reducing reliance on human-defined concepts.

  • Supporting Organizations: Research funded by various entities including the Progetto Rocca Doctoral Fellowship, the National Recovery and Resilience Plan, Thales Alenia Space, and the European Union’s NextGenerationEU project.