default author photo

Valeria Capretti

Polytechnic University of Milan

Piazza Leonardo da Vinci

Milan, 20100

Italy

SCHOLARLY PAPERS

1

DOWNLOADS

21

TOTAL CITATIONS

0

Scholarly Papers (1)

1.

Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference

Number of pages: 7 Posted: 25 Nov 2025
MATTEO CERCOLA, Valeria Capretti and Simone Formentin
Polytechnic University of Milan, Polytechnic University of Milan and Polytechnic University of Milan
Downloads 21 (1,448,301)

Abstract:

Loading...

Human-in-the-Loop optimization, Reinforcement Learning from Human Feedback (RLHF), Preferential Bayesian Optimization (PBO), Active learning, Preference-based optimization, Large Language Models (LLMs), High-dimensional optimization.