The Over-Complication Anomaly: Quantitative Evaluation of the Dynamic Hierarchical Multi-Agent Framework for Large Language Models
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Current AI frameworks struggle with scalability, validation, and accuracy, as most rely on single-model execution, lack structured verification, and use global context memory, leading to error propagation. This thesis implements and quantifies the performance vs. cost trade-off of the Dynamic Hierarchical Multi-Agent Framework (DHMAF), a system employing a hierarchy of Prime Meta Agent, Meta Agents, and Checker Agents to recursively instantiate, execute, and validate sub-tasks. The framework was benchmarked against established AI tests using the Gemini 1.5 Flash model, an evaluation covering 14,725 questions that consumed approximately 2.5 million API requests and cost $253. The results reveal a significant “over-complication anomaly”. While the framework improved scores on multi-step reasoning tasks (GSM8K +4.7%, MMLU-Pro +7.28%), it caused a severe performance decrease of -12.19% on the MMLU benchmark, focusing more on simple fact recall. These findings demonstrate that hierarchical validation frameworks, while providing modest benefits for multi-step reasoning tasks, can be actively detrimental and cost-prohibitive for generalist applications, providing a quantitative analysis of this critical trade-off.