Fair Multi-Agent Reinforcement Learning

Achieving equitable outcomes in cooperative systems via welfare optimization and attention mechanisms.

Multi-agent systems increasingly govern critical domains like autonomous networks and resource allocation, yet traditional methods often prioritize efficiency over fairness. This project introduces FAPPO and AT-FAPPO, novel algorithms that integrate welfare functions and attention mechanisms to ensure equitable reward distribution among cooperative agents while maintaining high performance.

Core innovations: (Left) Welfare-driven policy updates for balanced rewards, (Middle) Self-attention for inter-agent communication, (Right) Addressing the "rich-get-richer" phenomenon in cooperative tasks.

Approach

FAPPO extends Proximal Policy Optimization (PPO) to optimize a generalized Gini welfare function, prioritizing agents with lower rewards during training. AT-FAPPO enhances this with a multi-head self-attention mechanism, enabling agents to share context-aware signals for coordinated action selection. The framework operates under centralized training with decentralized execution (CTDE), ensuring scalability and practicality.

Experimental validation: (Left) FAPPO outperforms baselines in reward balance (CV: 0.21 vs. 0.45 for QMIX), (Right) AT-FAPPO mitigates the Matthew effect, achieving 32% higher minimum agent rewards than IPPO.

Impact

  • Equity-Driven Design: Policies reduce reward inequality by up to 53% in multi-agent grid worlds.
  • Scalability: Compatible with decentralized execution, suitable for real-world systems like traffic control or drone swarms.
  • Theoretical Rigor: Welfare functions adhere to Pigou-Dalton principles, ensuring mathematically fair solutions.

Future work explores continuous action spaces and applications in ethical AI systems, aiming to democratize benefits across heterogeneous agent populations.

(missing reference)

References