🖼️ GPT-2 + Vision Chat

Model: gurumurthy3/gpt2-stackformer-vision_V2  ·  GPT-2 small backbone + frozen ViT-B/16  ·  128 visual tokens  ·  fine-tuned on Flickr8k  ·  built on stackformer  ·  running on CPU

MultimodalTextbox
4 127
0.1 1.5
0.5 1
0 200