A better method for planning complex visual tasks | MIT News

A better method for planning complex visual tasks | MIT News

A better method for planning complex visual tasks | MIT News

https://news.mit.edu/2026/better-method-planning-complex-visual-tasks-0311

Publish Date: 2026-03-11 00:00:00

Source Domain: news.mit.edu

  • MIT researchers have created a generative AI framework for long-term visual task planning, like robot navigation, that is about twice as effective as some existing techniques.
  • The framework uses a specialized vision-language model to perceive the scenario and simulate actions needed, then transforms the simulation into a planning problem in a formal language.
  • The approach outperforms baseline models by generating actionable plans with a 70 percent success rate.
  • The system can solve new, unseen problems, making it well-suited for dynamic environments.
  • The researchers combined the strengths of vision-language models and formal planners, successfully generalizing to new instances.
  • The developed system, called VLM-guided formal planning (VLMFP), includes two models, SimVLM and GenVLM, that facilitate action simulation and generation of formal planning files respectively.
  • The framework achieved significant success rates in multiple planning tasks, including 2D and 3D scenarios such as multirobot collaboration and robotic assembly.
  • The researchers aim to improve the complexity handling of the system and reduce errors or “hallucinations” from the vision-language models in future work.