AutoThink: efficient inference for reasoning LLMs

Sharma, Asankhaya

doi:10.2139/ssrn.5253327

Download This Paper

Open PDF in Browser

Add Paper to My Library

AutoThink: efficient inference for reasoning LLMs

12 Pages Posted: 14 May 2025

See all articles by Asankhaya Sharma

Asankhaya Sharma

Patched Codes; Securade.ai

Date Written: May 13, 2025

Abstract

We explore different aspects of improving inference efficiency for reasoning LLMs. In particular, we study the impact of reasoning budgets, guided decoding and controlled steering on the accuracy of reasoning LLMs. Our approach is called AutoThink and it consists of two parts-a query complexity classifier which determines the number of reasoning tokens we allow during inference before generating the final response, and a dataset of control vectors we use to steer the response during inference. We derive the dataset of control vectors from pivotal tokens for the LLM that are discovered using a search procedure over the distribution of responses generated by the LLM on a calibration dataset. Our findings show that AutoThink can reduce the average number of output tokens by 55% while improving accuracy by 43% on GPQA-Diamond. Our implementation is open-source and available at https://github.com/codelion/optillm.

Keywords: Large Language Models, Reasoning Efficiency, Token Budgeting, Query Complexity Classification, Model Steering, Inference Optimization, Computational Efficiency, Pivotal Tokens, AI Reasoning, LLM Inference

Suggested Citation: Suggested Citation

Sharma, Asankhaya, AutoThink: efficient inference for reasoning LLMs (May 13, 2025). Available at SSRN: https://ssrn.com/abstract=5253327 or http://dx.doi.org/10.2139/ssrn.5253327