Adaptation of foundation model: from Transformer adapters to VLM test-time prompt tuning
With the advent of foundation models, model development has shifted from tailoring task-specific models to training large models capable of addressing a wide range of tasks. One way of using foundation models is to directly use them with the pretrained weights from a very optimized training. This has the advantage of not using any training resources, and of exploiting the pretraining. However, this method may fail when there is a large domain shift between the training and testing data. Another way of using them is to train them on a target task. This ensures a better alignment with the target task but does require a large dataset and substantial compute. Adaptation proposes a balance between these two extremes, allowing to take advantage of the pretrained weights, while allowing to adapt the weights to the target task. In this talk, we will explore two methods for adaptation: the first one is applied to Vision Transformers (ViTs) in the few-shot regime, and the second one for test-time prompt tuning for zero-shot classification of Vision Language Models (VLMs).
To take part in this event, you must register.
