Machine Learning, Deep Learning, vision transformer, Interpretability, Weak supervision, Object localization