A better method for planning complex visual tasks | MIT News
A better method for planning complex visual tasks | MIT News
https://news.mit.edu/2026/better-method-planning-complex-visual-tasks-0311
Publish Date: 2026-03-11 00:00:00
Source Domain: news.mit.edu
- MIT researchers have created a generative AI framework for long-term visual task planning, like robot navigation, that is about twice as effective as some existing techniques.
- The framework uses a specialized vision-language model to perceive the scenario and simulate actions needed, then transforms the simulation into a planning problem in a formal language.
- The approach outperforms baseline models by generating actionable plans with a 70 percent success rate.
- The system can solve new, unseen problems, making it well-suited for dynamic environments.
- The researchers combined the strengths of vision-language models and formal planners, successfully generalizing to new instances.
- The developed system, called VLM-guided formal planning (VLMFP), includes two models, SimVLM and GenVLM, that facilitate action simulation and generation of formal planning files respectively.
- The framework achieved significant success rates in multiple planning tasks, including 2D and 3D scenarios such as multirobot collaboration and robotic assembly.
- The researchers aim to improve the complexity handling of the system and reduce errors or “hallucinations” from the vision-language models in future work.