MAEBE: Multi-Agent Emergent Behavior Framework

No ratings

Presented at HICSS 59

Explainability in evaluations of isolated large language models (LLMs) likely does not transfer to multi-agent AI ensembles (MAS), as MAS introduce novel emergent agent interaction and decision-making behaviors. To systematically assess differences in decision behaviors between isolated and ensemble agents, we present the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework. Using MAEBE with the Greatest Good Benchmark, a double-inversion question technique, and explainability analysis, we demonstrate that: (1) Robustness of decision preferences is substantially brittle in MAS LLM ensembles similarly as in isolated LLMs, as preferences shift significantly with changes to question framing. (2) Ensemble behavior is not directly predictable from isolated agent behavior due to emergent group dynamics. (3) Specifically, ensembles exhibit phenomena like peer pressure influencing decision convergence, even when guided by a supervisor. Our findings underscore the value and necessity of evaluating explainability of multi-agent AI systems in their interactive context to properly assess results generated by MAS, with potential implications for AI safety and alignment.