The Over-Complication Anomaly: Quantitative Evaluation of the Dynamic Hierarchical Multi-Agent Framework for Large Language Models

Loading...
Thumbnail Image

Date

2025-12

Journal Title

Journal ISSN

Volume Title

Publisher

The Ohio State University

Research Projects

Organizational Units

Journal Issue

Abstract

Current AI frameworks struggle with scalability, validation, and accuracy, as most rely on single-model execution, lack structured verification, and use global context memory, leading to error propagation. This thesis implements and quantifies the performance vs. cost trade-off of the Dynamic Hierarchical Multi-Agent Framework (DHMAF), a system employing a hierarchy of Prime Meta Agent, Meta Agents, and Checker Agents to recursively instantiate, execute, and validate sub-tasks. The framework was benchmarked against established AI tests using the Gemini 1.5 Flash model, an evaluation covering 14,725 questions that consumed approximately 2.5 million API requests and cost $253. The results reveal a significant “over-complication anomaly”. While the framework improved scores on multi-step reasoning tasks (GSM8K +4.7%, MMLU-Pro +7.28%), it caused a severe performance decrease of -12.19% on the MMLU benchmark, focusing more on simple fact recall. These findings demonstrate that hierarchical validation frameworks, while providing modest benefits for multi-step reasoning tasks, can be actively detrimental and cost-prohibitive for generalist applications, providing a quantitative analysis of this critical trade-off.

Description

Keywords

Large Language Models, Agentic AI, Multi-Agent Systems, Hierarchical Validation, Recursive Task Decomposition

Citation