Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

Join Posts waitlist
PatrickHallerย 
posted an update 3 minutes ago
view post
Post
How Robust Is Your Model in Complex Code Generation Tasks? ๐Ÿค”

We've launched the PECC benchmark to challenge chat models in code generation, drawing from the Advent of Code for programming tasks and the Euler Project for math-heavy challenges. This new task tests models with problems presented in both detailed prose and concise "leet code" styles, evaluating their ability to understand and solve complex coding issues and math problem in chat-based interactions.

It seems that the Claude 3 models outperforme ChatGPT:
Model / Avg. (pass@3)
Claude 3 Haiku / 27.67
GPT-3.5-Turbo / 23.75
Mixtral-8x22B-Instruct-v0.1 / 8.35

Read our Preprint๐Ÿ“ƒ: PECC: Problem Extraction and Coding Challenges (2404.18766)
Look at the dataset๐Ÿ”Ž: PatrickHaller/pecc

We also got accepted at LREC-COLING '24 ๐ŸŽ‰
KingNishย 
posted an update 10 minutes ago
view post
Post
Introducing JARVIS Tony's voice assistant for You.

JARVIS responds to all your questions in audio format.
Must TRY -> KingNish/JARVIS

Jarvis is currently equipped to accept text input and provide audio output.
In the future, it may also support audio input.

DEMO Video:
radamesย 
posted an update about 4 hours ago
view post
Post
429
I've built a custom component that integrates Rerun web viewer with Gradio, making it easier to share your demos as Gradio apps.

Basic snippet
# pip install gradio_rerun gradio
import gradio as gr
from gradio_rerun import Rerun

gr.Interface(
    inputs=gr.File(file_count="multiple", type="filepath"),
    outputs=Rerun(height=900),
    fn=lambda file_path: file_path,
).launch()

More details here radames/gradio_rerun
Source https://github.com/radames/gradio-rerun-viewer

Follow Rerun here https://huggingface.co/rerun
m-ricย 
posted an update about 15 hours ago
view post
Post
1260
๐Ÿ’ฐโŒ ๐‘๐ž๐ฌ๐ž๐š๐ซ๐œ๐ก ๐Ÿ๐จ๐ซ ๐ญ๐ก๐ž ๐ฏ๐ž๐ซ๐ฒ ๐†๐๐” ๐๐จ๐จ๐ซ - ๐’๐œ๐š๐ฅ๐ข๐ง๐  ๐ฅ๐š๐ฐ๐ฌ ๐ซ๐ž๐ฉ๐ฅ๐ข๐œ๐š๐ญ๐ข๐จ๐ง

๐ŸŽ† Good news: ๐˜†๐—ผ๐˜‚ ๐—ฐ๐—ฎ๐—ป ๐—ฑ๐—ผ ๐—ฐ๐˜‚๐˜๐˜๐—ถ๐—ป๐—ด-๐—ฒ๐—ฑ๐—ด๐—ฒ ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐˜„๐—ถ๐˜๐—ต ๐—ฎ ๐—ฐ๐—ฎ๐—น๐—ฐ๐˜‚๐—น๐—ฎ๐˜๐—ผ๐—ฟ ๐—ฎ๐—ป๐—ฑ ๐— ๐—ถ๐—ฐ๐—ฟ๐—ผ๐˜€๐—ผ๐—ณ๐˜ ๐—ฃ๐—ฎ๐—ถ๐—ป๐˜ ๐Ÿฎ๐Ÿฌ๐Ÿฌ๐Ÿฒ!

The Chinchilla experiments (by Google DeepMind) ran hundreds of pre-trainings with models >1B parameters (I do not want to imagine how much that cost) to ๐—ณ๐—ถ๐—ป๐—ฑ ๐˜๐—ต๐—ฒ ๐—ผ๐—ฝ๐˜๐—ถ๐—บ๐—ฎ๐—น ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ ๐—ผ๐—ณ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜€๐—ถ๐˜‡๐—ฒ ๐˜ƒ๐˜€ ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐˜€. Why is this question so important?
Well, you only ever have access to a fixed compute, counted in FLOPs (floating point operations). So if your model is bigger, you will have less compute to train on many tokens, and if you want to train on more tokens, your model will be smaller. When model trainings cost million, you absolutely need to get this right.

The new paper "Chinchilla Scaling: A replication attempt" by Epoch AI sets on on the ambitious goal of reproducing this.

But since the authors do not have infinite money, they decided to directly run their computations from DeepMind's own experiments! They took the figure from the last experiment (cf slide below), measured point positions, picked color codes, and ended up reconstructing the underlying data.

๐Ÿ’ฅ They then just fit the scaling laws proposed by the Chinchilla Authors, but arrived at wildly different results! They find that as a rough rule of thumb, you should use 20 training tokens for each parameter in your model, instead of the 70 obtained in the original paper. They also point out inconsistencies in the paper, and unrealistically narrow confidence intervals.

โžก๏ธ This only contradicts the results from the last (out of 3) experiments in the Chinchilla paper. And the model trained at the end of the Chinchilla paper still seems properly scaled.

โœ… But it does show that a tiny bit more theoretical work can go a long way, especially given the huge financial costs that such an error can have!
joaoganteย 
posted an update about 16 hours ago
view post
Post
1021
Adding a long prompt can help you fight LLM hallucinations. However, if you know exactly how you want your LLM output constrained, there are much better strategies! ๐Ÿ’ช

Did you know you can force your LLM to ALWAYS generate a valid JSON file? Or to follow a well-defined answer template? You can do that and more with the ๐Ÿค— transformers-compatible outlines library.

It doesn't only allow you to master your LLM -- your text generation application will also become faster! ๐Ÿ”ฅ The more constrained your text generation is, the bigger speedups you'll see!

Follow @remi and other outlines folks to stay on top of the constrained generation game ๐Ÿง 
victorย 
posted an update about 16 hours ago
view post
Post
893
The hype is real: a mysterious gpt2-chatbot model has appeared on the LLM Arena Leaderboard ๐Ÿ‘€.
It seems to be at least on par with the top performing models (closed and open).

To try it out: https://chat.lmsys.org/ -> then click on the Direct Chat tab and select gpt2-chatbot.

Take your bet, what do you think it is?
  • 2 replies
ยท
fdaudensย 
posted an update about 17 hours ago
view post
Post
761
Should media organizations strike deals with big tech companies? Here are two colliding news stories about licensing:

1. The Financial Times has secured a licensing agreement with OpenAI to license its material both for training and queries on ChatGPT. It is the fifth such deal, following similar agreements with Associated Press, Axel Springer, Le Monde and Prisa Media. "Financial terms were not disclosed."

"Apart from the benefits to the FT, there are broader implications for the industry. Itโ€™s right, of course, that AI platforms pay publishers for the use of their material. OpenAI understands the importance of transparency, attribution, and compensation โ€“ all essential for us."

2. Meanwhile, French media outlet Mediapart is refusing to cash in money from Google, which it is entitled to under so-called "neighbouring rights" for the right to display their news content online.

Why? Due to issues with disclosing financial terms: "The confidentiality clauses imposed by Google today prevent us from publicizing to our readers not only the total amount paid, but also the amount Mediapart is entitled to receive."

"In our view, financial dependence on platforms is incompatible with our public service mission, which is to make the powerful face up to their responsibilities. It also seems extremely dangerous economically."

Two positions at opposite sides of the spectrum.

- The Financial Times and OpenAI strike content licensing deal
https://www.ft.com/content/33328743-ba3b-470f-a2e3-f41c3a366613

- Droits voisins : Mediapart lance la bataille de la transparence contre Google (in French) https
victorย 
posted an update about 19 hours ago
view post
Post
918
Am I the only one who think command-r-+ is a better daily Assistant than ChatGPT-4? (and it's not even close :D)
  • 1 reply
ยท
DavidVivancosย 
posted an update about 20 hours ago
view post
Post
849
#ICLR 2024 is almost there ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ counting the days to be again in the beautiful city of Vienna participating in the The Twelfth International Conference on Learning Representations, hope to see many of the Hugging Face comunity there!

I would like to contribute ๐ŸŽ by releasing the second Knowledge Vault, with 100 lectures visualized from the last 10 years of ICLR from 2014 to 2023, including knowledge graphs for all the Invited Lectures and some extras, with almost 3000 topics represented. (Of course using several AI tools including Llama3)

You can explore it here:
๐ŸŒ https://theendofknowledge.com/Vaults/2/ICLR2014-2023.html

And you can learn more about the Vaults here:
๐Ÿ“https://www.linkedin.com/pulse/knowledge-vaults-david-vivancos-lbjef/

Hope you like the Knowledge Vault!
victorย 
posted an update about 21 hours ago