Merve Noyan's picture

Merve Noyan PRO

merve

·

AI & ML interests

VLMs, vision & co

Articles

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Vision Language Models Explained

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

Deploy MusicGen in no time with Inference Endpoints

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Jupyter X Hugging Face

Using Machine Learning to Aid Survivors and Race through Time

Introducing Skops

Announcing the Hugging Face Fellowship Program

Showcase Your Projects in Spaces using Gradio

Hosting your Models and Datasets on Hugging Face Spaces using Streamlit

Organizations

Posts 27

Post

867

So many of you have asked how to do segmentation and detection with PaliGemma, you've been served! 🫡
Here's the notebook to do so: https://colab.research.google.com/drive/16-Tq-iAMHNlSjDWgz43kYDMJERjU_KHW?usp=sharing 🤗

Post

1086

it's raining vision language models ☔️
CuMo is a new vision language model that has MoE in every step of the VLM (image encoder, MLP and text decoder) and uses Mistral-7B for the decoder part 🤓
You can try it yourself here: shi-labs/CuMo-7b-zero

the authors firstly did pre-training of MLP with the by freezing the image encoder and text decoder, then they warmup the whole network by unfreezing and finetuning which they state to stabilize the visual instruction tuning when bringing in the experts. 🤓

the mixture of experts MLP blocks above are simply the same MLP blocks initialized from the single MLP that was trained during pre-training and fine-tuned in pre-finetuning.
it works very well (also tested myself) that it outperforms the previous sota of it's size LLaVA NeXt and IDEFICS2-8B in several benchmarks! 😍

Collections 21

spaces 96

Running on Zero

Paligemma Doc

Try PaliGemma on document understanding tasks

Running on Zero

BLIP2 with transformers

BLIP2 (cutting edge image captioning) in 🤗transformers

Running on Zero

Compare VLMs

Running on Zero

GroundingDINO ⚔ OWL

Running on Zero

GroundingSAM

Running on Zero

Llava Next

models 76

merve/checkpoint

Updated 1 day ago

merve/output8

Updated 6 days ago

merve/output4

merve/VeCap-DFN-h14

Zero-Shot Image Classification • Updated Mar 26 • 5

merve/VeCap-DFN-l14

merve/VeCap-DFN-b16

Zero-Shot Image Classification • Updated Mar 26 • 4

merve/VeCLIP-b16-100m

Zero-Shot Image Classification • Updated Mar 26 • 4

merve/VeCLIP-b16-200m

Zero-Shot Image Classification • Updated Mar 26 • 3

merve/VeCLIP-b16-12m

Zero-Shot Image Classification • Updated Mar 26 • 3

merve/VeCLIP-b16-3m

Zero-Shot Image Classification • Updated Mar 26 • 2

datasets 21

merve/faiss_embeddings

merve/pokemon-ds-embeddings

Viewer • Updated Jan 10 • 3

merve/tr-h4-norobots

Updated Jan 7 • 7 • 10

merve/lego_sets_latest

Viewer • Updated Jan 6 • 12 • 1

merve/ai-tube-dummy

Updated Dec 1, 2023

merve/my-blog-images

Viewer • Updated Aug 25, 2023 • 1

merve/turkish_instructions

Viewer • Updated Apr 27, 2023 • 849 • 31

merve/ner-flags

Updated Feb 13, 2023

merve/xlm-roberta-large-df

Viewer • Updated Feb 7, 2023

merve/parsed-dataset-xlm-roberta

Viewer • Updated Feb 7, 2023