affiliation not provided to SSRN
Aerial video classification, video transformer, local semantic enhancement, video class attention, video feature representation