SyGra: The One-Stop Framework for Building Data for LLMs and SLMs By ServiceNow-AI and 3 others • 4 days ago • 9
mem-agent: Persistent, Human Readable Memory Agent Trained with Online RL By driaforall and 1 other • 14 days ago • 21
Qianfan-VL: A Milestone Achievement in Chinese Multimodal AI with Domestic Chips By baidu • 1 day ago • 7
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 222
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models By imomayiz and 4 others • 9 days ago • 14
🌎 What kind of environmental impacts are AI companies disclosing? (And can we compare them?) 🌎 By sasha and 1 other • 8 days ago • 10
Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face By dvgodoy • Feb 11 • 68
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • Feb 11 • 70
🥬 TinyLettuce: Efficient Hallucination Detection with 17–68M Encoders By adaamko and 1 other • 25 days ago • 13
Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! By Writer and 1 other • 14 days ago • 58
SyGra: The One-Stop Framework for Building Data for LLMs and SLMs By ServiceNow-AI and 3 others • 4 days ago • 9
mem-agent: Persistent, Human Readable Memory Agent Trained with Online RL By driaforall and 1 other • 14 days ago • 21
Qianfan-VL: A Milestone Achievement in Chinese Multimodal AI with Domestic Chips By baidu • 1 day ago • 7
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 222
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models By imomayiz and 4 others • 9 days ago • 14
🌎 What kind of environmental impacts are AI companies disclosing? (And can we compare them?) 🌎 By sasha and 1 other • 8 days ago • 10
Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face By dvgodoy • Feb 11 • 68
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • Feb 11 • 70
🥬 TinyLettuce: Efficient Hallucination Detection with 17–68M Encoders By adaamko and 1 other • 25 days ago • 13
Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! By Writer and 1 other • 14 days ago • 58