Fair Multi-Agent Reinforcement Learning

Multi-agent systems increasingly govern critical domains like autonomous networks and resource allocation, yet traditional methods often prioritize efficiency over fairness. This project introduces FAPPO and AT-FAPPO, novel algorithms that integrate welfare functions and attention mechanisms to ensure equitable reward distribution among cooperative agents while maintaining high performance.

Core innovations: (Left) Welfare-driven policy updates for balanced rewards, (Middle) Self-attention for inter-agent communication, (Right) Addressing the "rich-get-richer" phenomenon in cooperative tasks.

Approach

FAPPO extends Proximal Policy Optimization (PPO) to optimize a generalized Gini welfare function, prioritizing agents with lower rewards during training. AT-FAPPO enhances this with a multi-head self-attention mechanism, enabling agents to share context-aware signals for coordinated action selection. The framework operates under centralized training with decentralized execution (CTDE), ensuring scalability and practicality.

Experimental validation: (Left) FAPPO outperforms baselines in reward balance (CV: 0.21 vs. 0.45 for QMIX), (Right) AT-FAPPO mitigates the Matthew effect, achieving 32% higher minimum agent rewards than IPPO.

Impact

Equity-Driven Design: Policies reduce reward inequality by up to 53% in multi-agent grid worlds.
Scalability: Compatible with decentralized execution, suitable for real-world systems like traffic control or drone swarms.
Theoretical Rigor: Welfare functions adhere to Pigou-Dalton principles, ensuring mathematically fair solutions.

Future work explores continuous action spaces and applications in ethical AI systems, aiming to democratize benefits across heterogeneous agent populations.

(missing reference)

Approach

Impact

References