Video Text Spotting, Small Text, Text Tracking, Dense Text
Cross-Modal, Retrieval, Text Reading, Contrastive Learning