Research Overview
Recent advances in Large Language Models (LLMs) have enabled language-guided multi-robot systems, allowing robots to execute tasks based on natural language instructions. However, achieving effective coordination in distributed multi-agent environments remains challenging due to two key issues:
Key Challenges
- Instruction-Task Misalignment: Natural language instructions may not perfectly align with the specific requirements of a task.
- Behavioral Inconsistency: Independent interpretation of ambiguous instructions by multiple robots leads to inconsistent behaviors.
To address these challenges, we propose Instruction-Conditioned Coordinator (ICCO), a Multi-Agent Reinforcement Learning (MARL) framework designed to enhance coordination in language-guided multi-robot systems.
ICCO consists of a Coordinator agent and multiple Local Agents, where the Coordinator generates Task-Aligned and Consistent Instructions (TACI) by integrating language instructions with environmental states, ensuring task alignment and behavioral consistency.
Our Approach

ICCO System Architecture
ICCO Architecture
The ICCO framework consists of two main components:
- Coordinator Agent: Processes language instructions and environmental states to generate Task-Aligned and Consistent Instructions (TACI).
- Local Agents: Individual robot controllers that execute actions based on the TACI received from the Coordinator.
The Coordinator and Local Agents are jointly trained to optimize a reward function that balances task efficiency and instruction following. This approach ensures that robots not only complete their tasks efficiently but also adhere to human instructions in a coordinated manner.
Demonstrations
Demonstration Comparison
Baseline Method
The model understood the instruction but couldn't simultaneously follow the instruction while completing the tasks.
ICCO Method
The model performed the tasks and followed the instruction.
ICCO Demonstrations in Sequence of 4 Instruction
1. First Instruction: "Go Right"
Agents located in the upper area prioritize invader defense, agents that spawned in the lower part of the environment prioritize resource collection, and agents on the right prioritize carrying out instructions.
2. Second Instruction: "Move Top"
All robots are moving to the upper area, performing invader defense and other related tasks as they go.
3. Third Instruction: "Gather Center"
All robots are moving from the upper part of the environment toward the center, performing resource collection and invader defense as needed.
4. Fourth Instruction: "Spread Out"
All robots are moving away from the center in order to receive and carry out instructions.
Key Findings
Our demonstrations show that ICCO successfully addresses the challenges of instruction-task misalignment and behavioral inconsistency:
Complete Movement Sequence
The video below shows the full sequence of ICCO coordinating multiple robots through 4 instructions in the environment.