Spaces:
				
			
			
	
			
			
					
		Running
		
			on 
			
			CPU Upgrade
	
	
	
			
			
	
	
	
	
		
		Ministral 3B results seem off
Hey @wenhu ,
Engineer from Mistral here!
Thanks for the nice leaderboard. Just a quick question - how did you retrieve the results for the 3B ministral model? They seem to be significantly off from what we evaluated internally - 10% is essentially random chance. Could you share the script that you used to retrieve the benchmark results? We'd love to make sure things are correctly reported.
Thanks a lot!
Just in case, was the model bench-marked ministral/Ministral-3b-instruct ? If yes, may be interesting to put the model's authors name in the leaderboard, unfortunately we have very similar model names.
Thank you for bringing this to our attention. There seems to have been some confusion due to similar model names. We have removed the potentially confusing results to avoid any misrepresentation.

 
						 
						