Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

Join Posts waitlist

phenixrhyder

posted an update 25 minutes ago

Post

May the fourth be with you... Ai yoda #starwarsday

gsarti

posted an update about 3 hours ago

Post

322

🔍 Today's (self-serving) pick in Interpretability & Analysis of LMs:

A Primer on the Inner Workings of Transformer-based Language Models
by @javifer @gsarti @arianna-bis and M. R. Costa-jussà
( @mt-upc , @GroNLP , @facebook )

This primer can serve as a comprehensive introduction to recent advances in interpretability for Transformer-based LMs for a technical audience, employing a unified notation to introduce network modules and present state-of-the-art interpretability methods.

Interpretability methods are presented with detailed formulations and categorized as either localizing the inputs or model components responsible for a particular prediction or decoding information stored in learned representations. Then, various insights on the role of specific model components are summarized alongside recent work using model internals to direct editing and mitigate hallucinations.

Finally, the paper provides a detailed picture of the open-source interpretability tools landscape, supporting the need for open-access models to advance interpretability research.

📄 Paper: A Primer on the Inner Workings of Transformer-based Language Models (2405.00208)

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9

Taylor658

posted an update about 8 hours ago

Post

704

The Open Medical-LLM Leaderboard is now up on HF Spaces. 🤗

openlifescienceai/open_medical_llm_leaderboard

It will be interesting to add the results of the just announced Med-Gemini model to the Leaderboard to see how it compares and if its stated 91.1% MedQA benchmark is accurate.

Capabilities of Gemini Models in Medicine (2404.18416)

1 reply

fdaudens

posted an update about 10 hours ago

Post

719

I've added new collections to the Journalists on 🤗 community, focusing on Data Visualization, Optical Character Recognition, and Multimodal Models:

- TinyChart-3B: This model interprets data visualizations based on your prompts. It can generate the underlying data table from a chart or recreate the chart with Python code.
- PDF to OCR: Convert your PDFs to text—ideal for FOI records sent as images.
- Idefics-8b: A multimodal model that allows you to ask questions about images.

Explore these tools here: 👉 https://huggingface.co/JournalistsonHF

Locutusque

posted an update about 13 hours ago

Post

641

Introducing llama-3-neural-chat-v2.2-8b! This powerful conversational AI model builds on Meta's Llama 3, fine-tuned by Locutusque for enhanced performance in coding, math & writing.

Locutusque/llama-3-neural-chat-v2.2-8B

nateraw

posted an update about 13 hours ago

Post

536

I just shared a blogpost on https://nateraw.com explaining the motivation + process of training nateraw/musicgen-songstarter-v0.2 - including training details, WandB logs, hparams, and notes on previous experiments.

Check it out here ⤵️
https://nateraw.com/posts/training_musicgen_songstarter.html

:) still kinda a WIP so if there's anything else you want to see, let me know.

1 reply

AmelieSchreiber

posted an update about 14 hours ago

Post

398

1 reply

santiviquez

posted an update about 16 hours ago

Post

538

Looking for someone with +10 years of experience training Deep Kolmogorov-Arnold Networks.

Any suggestions?

Undi95

posted an update about 16 hours ago

Post

1009

Hello!
The 8B/70B OG Llama-3 models made with the Orthogonal Activation Steering script as been pushed in private.

After multiple test with an empty prompt system, I can confirm it's not uncensored enough, but I wanted to try all the GGUF before (and it take time to do lmao)

If you want to try that yourself, here is the script : https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af
And here is the same script that we modified to be able to use it on multiple GPU for 70B : https://files.catbox.moe/ya4rto.ipynb

Llama3-Unholy-8B-OAS don't have the problem as it was already trained to be less censored, but the OG one was really too much censored.

I will try to redo that soon, as it seems to HAVE WORKED for some prompt (as seen on the log, for exemple) but it's not enough.

32 entry of the dataset is clearly not enough, but it's okay, I really wanted to try that as it was something new.
I could take the Unholy way and retrain the 70B before using OAS but it should work without, that's not the goal.

14 replies

not-lain

posted an update about 17 hours ago

Post

496

one thousand Miku ヾ( ˃ᴗ˂ )◞ • *✰
parsee-mizuhashi/miku
by @parsee-mizuhashi

Recently active users