AutoThink: efficient inference for reasoning LLMs

12 Pages Posted: 14 May 2025

Date Written: May 13, 2025

Abstract

We explore different aspects of improving inference efficiency for reasoning LLMs. In particular, we study the impact of reasoning budgets, guided decoding and controlled steering on the accuracy of reasoning LLMs. Our approach is called AutoThink and it consists of two parts-a query complexity classifier which determines the number of reasoning tokens we allow during inference before generating the final response, and a dataset of control vectors we use to steer the response during inference. We derive the dataset of control vectors from pivotal tokens for the LLM that are discovered using a search procedure over the distribution of responses generated by the LLM on a calibration dataset. Our findings show that AutoThink can reduce the average number of output tokens by 55% while improving accuracy by 43% on GPQA-Diamond. Our implementation is open-source and available at https://github.com/codelion/optillm.

Keywords: Large Language Models, Reasoning Efficiency, Token Budgeting, Query Complexity Classification, Model Steering, Inference Optimization, Computational Efficiency, Pivotal Tokens, AI Reasoning, LLM Inference

Suggested Citation

Sharma, Asankhaya, AutoThink: efficient inference for reasoning LLMs (May 13, 2025). Available at SSRN: https://ssrn.com/abstract=5253327 or http://dx.doi.org/10.2139/ssrn.5253327

Securade.ai ( email )

HOME PAGE: http://securade.ai

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
827
Abstract Views
2,806
Rank
65,666
PlumX Metrics