Is wearing these sunglasses an attack? Obligations under IHL related to anti-AI countermeasures
T.M.C. Asser Institute for International & European Law, Asser Research Paper 2024-01
Forthcoming in: International Review of the Red Cross (2024)
29 Pages Posted: 17 Feb 2024
Date Written: February 16, 2024
Abstract
In 2017, researchers 3D-printed a seemingly-ordinary turtle. However, when photos of the object were fed into Google’s image classifier, the output was disconcerting: ‘rifle’. Even when rotated, flipped, and turned, the algorithm remained convinced: ‘rifle’, ‘rifle’, ‘rifle’ – predicted with around 90% certainty. The team had created a robust adversarial example: an undetectable pattern that could fool an otherwise well-performing algorithm into consistently misclassifying the object as something else.
The experiment brought much attention to the vulnerability of modern algorithms to targeted manipulation by enemies. The military ramifications are evident. As autonomous weapons usage expands, so will anti-AI countermeasures: adversarials. Belligerents will use techniques such as adversarial patterns, poisoning attacks and backdoors to induce hallucinations, bias and performance drops in their opponents’ AI systems. Adversarials place the civilian population at an aggravated risk. If an opposing force were to bring an autonomous system to classify handbags as rifles near a market, in order to disperse fire away from their own soldiers to marketgoers, the civilian toll could be immense.
IHL offers many protections through its ‘obligations in attack’, but the nature of adversarials generates ambiguity regarding which party (system user or opponent) should incur attacker responsibilities. Most countermeasures, such as ruses, camouflage and deception, do not reduce the attacking party’s obligations related to discrimination and precautions. In cases where control is wrested by the opponent, e.g. through hacking, obligations in attack are generally considered to transfer to the new controller. With adversarials, however, the legal characterisation becomes more complex. Who is in control of the weapon if it temporarily misclassifies an object due to an adversarial pattern, such as seemingly innocuous sunglasses? Are the sunglasses themselves an attack, a means to effectuate an attack through an opponent’s system, or ‘just’ a countermeasure? When it is ambiguous or debatable which party is considered as effectuating an attack, the risk arises that neither party upholds the associated obligations. Ultimately, the civilian population would pay the price.
This article attempts to address this legal uncertainty by exploring different types of adversarials, and proposes a cognitive framework for assessing them in light of the rules governing the conduct of hostilities. The article builds an analytical foundation in three stages. First, adversarials are explained in a technical sense. Then, the article dives into the tactical realm, theorising why an opponent would prefer some adversarials above others. Subsequently, it considers the legal criteria for an action to qualify as an attack, and the obligations associated with both attack and ‘sub-attack’ situations.
From this baseline, the article argues for an approach that uses the foreseeability of harmful consequences to determine (1) whether an adversarial is an attack, and (2) whether the main responsibility for enacting precautionary measures should remain with the system’s owner, or should transfer to the adversarial’s author. A consistent methodology is proposed to legally characterise different adversarials on the basis of these criteria. This article thereby provides guidance to the future combatant who ponders, before putting on their adversarial sunglasses: “Am I conducting an attack?”
Keywords: autonomous weapon, targeting, attack, artificial intelligence, poisoning, adversarial, hallucination, responsibility, precautions, backdoor
JEL Classification: K33
Suggested Citation: Suggested Citation