Diverse Policy Learning via Random Obstacle Deployment for Zero-Shot Adaptation

IEEE Robotics and Automation Letters (RA-L), 2025

SeokJin Choi*, 1, Yonghyeon Lee*, 2, Seungyeon Kim, 1, Che-Sang Park1, Himchan Hwang1, Frank C. Park1
1Seoul National University, 2Korea Institute For Advanced Study
*Equal contribution

TL;DR: We propose a diverse policy learning framework leveraging random obstacle deployment for zero-shot adaptation in dynamic environments.

Abstract

In this paper, we propose a novel reinforcement learning framework that enables zero-shot policy adaptation in environments with unseen, dynamically changing obstacles. Adopting the idea that learning a policy capable of generating diverse actions is key to achieving such adaptability, our primary contribution is a novel learning algorithm that incorporates random obstacle deployment, enabling the policy to explore and learn diverse actions. This method overcomes the limitations of existing diverse policy learning approaches, which primarily rely on mutual information maximization to increase diversity. Experiments demonstrate that the proposed method generates significantly more diverse actions and adapts better to dynamically changing environments, making it highly effective for tasks with varying constraints such as moving obstacles.

Video

Core Components

1. Policy

Our policy is designed to generate diverse actions in response to dynamically changing environments. It learns from random obstacle deployment during training to create a rich set of action strategies.

2. Motion Predictor

The motion predictor forecasts state trajectories, enabling the system to evaluate the feasibility of actions in real-time. It ensures safe and effective adaptation to dynamic obstacles.

3. Latent Skill Sampler

The latent skill sampler produces diverse skill variables conditioned on the current state. It leverages state-dependent distributions to enhance adaptability and action diversity.

Experiments

We conducted extensive experiments across four reinforcement learning environments: Push-T, Ant, Reacher, and Swimmer.

experimental results

Citation


      @article{choi2025diverse,
        title={Diverse Policy Learning via Random Obstacle Deployment for Zero-Shot Adaptation},
        author={Choi, Seokjin and Lee, Yonghyeon and Kim, Seungyeon and Park, Che-Sang and Hwang, Himchan and Park, Frank C},
        journal={IEEE Robotics and Automation Letters},
        year={2025}
      }