CLIP (Contrastive Language-Image Pre-training)

Vision Transformer

ResNet (Residual Network)

Swin Transformer

Xception

EfficientNet

VGG

Faster R-CNN

Mask R-CNN

SSD (Single Shot Multibox Detector)

YOLOv3

RetinaNet

Detr (Decision Transformer)

ViT (Vision Transformer)

U-Net

FCN (Fully Convolutional Network)

FPN (Feature Pyramid Network)

ALIGN

BLIP (Bootstrapping Language-Image Pre-training)

MobileNetV2