Towards Unified Objectives for Self-Reflective AI

Samwald, Matthias; Praas, Robert; Hebenstreit, Konstantin

doi:10.2139/ssrn.4446991

Download This Paper

Open PDF in Browser

Add Paper to My Library

Towards Unified Objectives for Self-Reflective AI

9 Pages Posted: 25 May 2023 Last revised: 15 Jul 2023

See all articles by Matthias Samwald

Matthias Samwald

Institute of Artificial Intelligence, Medical University of Vienna

Konstantin Hebenstreit

Medical University of Vienna

Date Written: May 12, 2023

Abstract

Large language models (LLMs) demonstrate outstanding capabilities, but challenges remain regarding their ability to solve complex reasoning tasks, as well as their transparency, robustness, truthfulness and ethical alignment. We devise a model of objectives for steering and evaluating the reasoning of LLMs by unifying principles from several strands of preceding work: structured reasoning in LLMs, red-teaming / self-evaluation / self-reflection, AI system explainability, guidelines for human critical thinking, AI system security/safety, and ethical guidelines for AI. We identify and curate a list of 162 objectives from literature, and create a unified model of 39 objectives organized into seven categories: assumptions and perspectives, reasoning, information and evidence, robustness and security, ethics, utility, and implications. We envision that this resource can serve multiple purposes: monitoring and steering models at inference time, improving model behavior during training, and guiding human evaluation of model reasoning.

Suggested Citation: Suggested Citation

Samwald, Matthias and Praas, Robert and Hebenstreit, Konstantin, Towards Unified Objectives for Self-Reflective AI (May 12, 2023). Available at SSRN: https://ssrn.com/abstract=4446991 or http://dx.doi.org/10.2139/ssrn.4446991