41 OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments · 17 authors 1
28 Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models · 11 authors 3
10 Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models · 6 authors 1