Towards Unified Objectives for Self-Reflective AI

9 Pages Posted: 25 May 2023 Last revised: 15 Jul 2023

See all articles by Matthias Samwald

Matthias Samwald

Institute of Artificial Intelligence, Medical University of Vienna

Robert Praas

Medical University of Vienna

Konstantin Hebenstreit

Medical University of Vienna

Date Written: May 12, 2023

Abstract

Large language models (LLMs) demonstrate outstanding capabilities, but challenges remain regarding their ability to solve complex reasoning tasks, as well as their transparency, robustness, truthfulness and ethical alignment. We devise a model of objectives for steering and evaluating the reasoning of LLMs by unifying principles from several strands of preceding work: structured reasoning in LLMs, red-teaming / self-evaluation / self-reflection, AI system explainability, guidelines for human critical thinking, AI system security/safety, and ethical guidelines for AI. We identify and curate a list of 162 objectives from literature, and create a unified model of 39 objectives organized into seven categories: assumptions and perspectives, reasoning, information and evidence, robustness and security, ethics, utility, and implications. We envision that this resource can serve multiple purposes: monitoring and steering models at inference time, improving model behavior during training, and guiding human evaluation of model reasoning.

Suggested Citation

Samwald, Matthias and Praas, Robert and Hebenstreit, Konstantin, Towards Unified Objectives for Self-Reflective AI (May 12, 2023). Available at SSRN: https://ssrn.com/abstract=4446991 or http://dx.doi.org/10.2139/ssrn.4446991

Matthias Samwald (Contact Author)

Institute of Artificial Intelligence, Medical University of Vienna ( email )

Robert Praas

Medical University of Vienna ( email )

Konstantin Hebenstreit

Medical University of Vienna ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
247
Abstract Views
938
Rank
237,617
PlumX Metrics