Towards Unified Objectives for Self-Reflective AI
9 Pages Posted: 25 May 2023 Last revised: 15 Jul 2023
Date Written: May 12, 2023
Abstract
Large language models (LLMs) demonstrate outstanding capabilities, but challenges remain regarding their ability to solve complex reasoning tasks, as well as their transparency, robustness, truthfulness and ethical alignment. We devise a model of objectives for steering and evaluating the reasoning of LLMs by unifying principles from several strands of preceding work: structured reasoning in LLMs, red-teaming / self-evaluation / self-reflection, AI system explainability, guidelines for human critical thinking, AI system security/safety, and ethical guidelines for AI. We identify and curate a list of 162 objectives from literature, and create a unified model of 39 objectives organized into seven categories: assumptions and perspectives, reasoning, information and evidence, robustness and security, ethics, utility, and implications. We envision that this resource can serve multiple purposes: monitoring and steering models at inference time, improving model behavior during training, and guiding human evaluation of model reasoning.
Suggested Citation: Suggested Citation