DOGE: Reforming AI Conferences and Towards a Future Civilization of Fairness and Justice
23 Pages Posted: 1 Mar 2025
Date Written: February 02, 2025
Abstract
AI conferences are expanding rapidly, yet the quality of peer reviews are declining. Authors risk damage to their reputation for mistakes, while anonymous reviewers can escape accountability if they make irresponsible judgments. In this paper, we propose DOGE 1.0, a system that uses AI-models to arbitrate reviewer-author disputes. We explain why AI arbitration could mitigate issues including human bias, emotional or malicious behavior, and present criteria for assessin.
By classifying "intelligence" into four levels (L1-L4), we argue that arbitrating disputes (L1) requires far less intelligence than authoring a paper (L4), reviewing it (L3), or even auditing a review (L2). For AI conferences, our evidence shows that many state-of-the-art AIs already excel at L1 tasks, though only a few approach L2-level. We therefore propose employing AIs with reliable L1 (ideally L2) intelligence as neutral arbitrators. For theoretical computer science experts, this may be unsurprising: a polynomial-time verifier in $P$ can validate languages in $NP$, or even $PSPACE$ via interactive proofs, demonstrating that even a relatively weak arbitrator can effectively help assess L4-level work.
We also hint at DOGE 2.0, which could introduce robust, persistent incentives for reviewers --- potentially including crypto-based rewards or penalties --- to foster a fairer and more accountable peer-review system. Moreover, the theory behind the L1-L4 intelligence hierarchy provides criteria for determining when AI models have attained the necessary intelligence to serve as neutral arbitrators in any field, potentially promoting fairness and justice in our civilization --- or at the very least, adding a dash of fun back to AI conferences.
Suggested Citation: Suggested Citation