Fair Multi-Agent Reinforcement Learning
Achieving equitable outcomes in cooperative systems via welfare optimization and attention mechanisms.
Multi-agent systems increasingly govern critical domains like autonomous networks and resource allocation, yet traditional methods often prioritize efficiency over fairness. This project introduces FAPPO and AT-FAPPO, novel algorithms that integrate welfare functions and attention mechanisms to ensure equitable reward distribution among cooperative agents while maintaining high performance.



Approach
FAPPO extends Proximal Policy Optimization (PPO) to optimize a generalized Gini welfare function, prioritizing agents with lower rewards during training. AT-FAPPO enhances this with a multi-head self-attention mechanism, enabling agents to share context-aware signals for coordinated action selection. The framework operates under centralized training with decentralized execution (CTDE), ensuring scalability and practicality.


Impact
- Equity-Driven Design: Policies reduce reward inequality by up to 53% in multi-agent grid worlds.
- Scalability: Compatible with decentralized execution, suitable for real-world systems like traffic control or drone swarms.
- Theoretical Rigor: Welfare functions adhere to Pigou-Dalton principles, ensuring mathematically fair solutions.
Future work explores continuous action spaces and applications in ethical AI systems, aiming to democratize benefits across heterogeneous agent populations.
(missing reference)