Improving AI models’ ability to explain their predictions | MIT News
Improving AI models’ ability to explain their predictions | MIT News
https://news.mit.edu/2026/improving-ai-models-ability-explain-predictions-0309
Publish Date: 2026-03-09 00:00:00
Source Domain: news.mit.edu
-
Concept Bottleneck Modeling: Uses an intermediate “bottleneck” step to improve AI explainability by forcing deep-learning models to predict understandable concepts before making a final prediction.
-
New Method Development: MIT researchers developed a method to extract and utilize concepts already learned by the model during training for more precise and accurate explanations.
-
Extraction of Learned Concepts: The researchers use a sparse autoencoder to extract relevant learned features and convert them into human-understandable concepts with a multimodal LLM.
-
Improved Accuracy and Explanations: The MIT approach outperformed other concept bottleneck methods in accuracy and provided clearer, more concise explanations, while also generating concepts better suited to the training dataset.
-
Limitations and Future Work: While showing success in interpretability, there’s a trade-off between it and model performance. Future work includes addressing information leakage and scaling up the method with larger datasets.
-
Researchers’ Goals: The goal is to build interpretable AI models by utilizing the internal mechanisms already learned by the models thus making AI reasoning more transparent and accountable.
-
Potential Benefits: The proposed method could push AI interpretability forward, creating pathways for integrating it with symbolic AI and knowledge graphs while reducing reliance on human-defined concepts.
-
Supporting Organizations: Research funded by various entities including the Progetto Rocca Doctoral Fellowship, the National Recovery and Resilience Plan, Thales Alenia Space, and the European Union’s NextGenerationEU project.