Self-Training Generative Foundation Reward Models for Reward Reasoning
-
GRAM-R^2: Self-Training Generative Foundation Reward Models for Reward Reasoning
Paper • 2509.02492 • Published -
wangclnlp/GRAM-RR-LLaMA-3.1-8B-RewardModel
Text Generation • 8B • Updated • 13 -
wangclnlp/GRAM-RR-LLaMA-3.2-3B-RewardModel
Text Generation • 3B • Updated • 6 -
wangclnlp/GRAM-RR-TrainingData
Updated • 40