Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models

29 Pages Posted: 8 Jul 2024

See all articles by Raeid Saqur

Raeid Saqur

University of Toronto - Department of Computer Science

Anastasis Kratsios

McMaster University

Blanka Horvath

Mathematical Institute, University of Oxford and Oxford Man Institute; University of Oxford; The Alan Turing Institute

Jacob-Junqi Tian

Vector Institute for Artificial Intelligence

John Willes

Vector Institute for Artificial Intelligence

Florian Krach

ETH Zürich

Yannick Limmer

University of Oxford - Oxford-Man Institute of Quantitative Finance; University of Oxford - Mathematical Institute

Frank Rudzicz

University of Toronto - Department of Computer Science

Date Written: June 06, 2024

Abstract

We propose MoE-F-a formalised mechanism for combining N pre-trained expert Large Language Models (LLMs) in online time-series prediction tasks by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Diverging from static (learned) Mixture of Experts (MoE) methods, MoE-F employs time-adaptive stochastic filtering techniques to combine experts. By framing the expert selection problem as a finite state-space, continuous-time Hidden Markov model (HMM), we can leverage the Wohman-Shiryaev filter. Our approach first constructs N parallel filters corresponding to each of the N individual LLMs. Each filter proposes its best combination of LLMs, given the information that they have access to. Subsequently, the N filter outputs are aggregated to optimize a lower bound for the loss of the aggregated LLMs, which can be optimized in closed-form, thus generating our ensemble predictor. Our contributions here are: (I) the MoE-F algorithm-deployable as a plug-and-play filtering harness, (II) theoretical optimality guarantees of the proposed filtering-based gating algorithm, and (III) empirical evaluation and ablative results using state of the art foundational and MoE LLMs on a real-world Financial Market Movement task where MoE-F attains a remarkable 17% absolute and 48.5% relative F1 measure improvement over the next best performing individual LLM expert.

Suggested Citation

Saqur, Raeid and Kratsios, Anastasis and Horvath, Blanka and Tian, Jacob-Junqi and Willes, John and Krach, Florian and Limmer, Yannick and Rudzicz, Frank, Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models (June 06, 2024). Available at SSRN: https://ssrn.com/abstract=4856254 or http://dx.doi.org/10.2139/ssrn.4856254

Raeid Saqur

University of Toronto - Department of Computer Science ( email )

Anastasis Kratsios

McMaster University ( email )

1280 Main Street West
Hamilton
Canada

Blanka Horvath (Contact Author)

Mathematical Institute, University of Oxford and Oxford Man Institute ( email )

Andrew Wiles Building
Woodstock Road
Oxford, OX2 6GG
United Kingdom

University of Oxford ( email )

The Alan Turing Institute ( email )

Jacob-Junqi Tian

Vector Institute for Artificial Intelligence ( email )

John Willes

Vector Institute for Artificial Intelligence ( email )

Florian Krach

ETH Zürich ( email )

Zurich
Switzerland

Yannick Limmer

University of Oxford - Oxford-Man Institute of Quantitative Finance ( email )

Eagle House
Walton Well Road
Oxford, Oxfordshire OX2 6ED
United Kingdom

University of Oxford - Mathematical Institute ( email )

Radcliffe Observatory, Andrew Wiles Building
Woodstock Rd
Oxford, Oxfordshire OX2 6GG
United Kingdom

Frank Rudzicz

University of Toronto - Department of Computer Science ( email )

Sandford Fleming Building
King’s College Road, Room 3302
Toronto, Ontario M5S 3G4
Canada

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
450
Abstract Views
1,206
Rank
137,994
PlumX Metrics