RL without TD learning – The Berkeley Artificial Intelligence Research Blog

https://bair.berkeley.edu/blog/2025/11/01/rl-without-td-learning/

Publish Date:

A reinforcement learning (RL) algorithm based on the “divide and conquer” paradigm has been proposed as an alternative to traditional temporal difference learning, showcasing promising scalability for long-horizon tasks in off-policy RL. Unlike common approaches like Q-learning, which rely on temporal difference learning and struggle with error accumulation over long horizons, this new RL method leverages a divide-and-conquer strategy to reduce necessary recursive recursions logarithmically. The algorithm, named Transitive RL (TRL), was successfully applied in goal-conditioned RL settings, outperforming many strong baseline models on challenging tasks from OGBench without requiring extensive hyperparameter tuning. The effectiveness and adaptability of TRL highlight its potential to address the longstanding problem of creating scalable off-policy RL algorithms capable of handling complex, real-world tasks. The authors aim to further develop TRL for general reward-based RL, tackle stochastic environments, and improve its practical performance.

Key Points:
– Division and conquer in RL can scale well to long-horizon tasks by reducing recursion logarithmically.
– TRL outperforms strong baselines on complex goal-conditioned RL tasks.
– The method promises scalable, off-policy RL without extensive tuning.
– Future directions include extending the method to reward-based RL, handling stochastic environments, and improving TRL’s practical aspects.