affiliation not provided to SSRN
Few-shot Learning, Object detection, vision-language model
Class-Incremental LearningVideo CaptioningEncoder-DecoderMultimodal AttentionKnowledge Distillation