Archon: An Architecture Search Framework for Inference-Time Techniques
Submitted on 10 Dec 2024 (preprint), 2024
Abstract
We introduce Archon, a modular framework for optimizing large language model (LLM) systems through automated architecture search of inference-time techniques. While inference-time methods have shown great promise for enhancing LLM capabilities, developing effective systems that combine these techniques remains challenging due to limited understanding of their individual utility and interactions. Archon addresses this by providing an extensible design space for selecting, combining, and stacking inference-time techniques like generation ensembling, repeated sampling, ranking, fusion, critiquing, verification, and unit testing.