1 4 6

Jaward Sesay

Jaward

https://github.com/Jaykef

Jaykef_

Jaykef

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Articles

On Coding Your First Attention

26 days ago

• 7

Organizations

Posts 25

Post

455

When untrained tokens play "catch me if you can" the Fishing For Margikarp paper is the detective:)
The playbook:
- Inspect token vocab & study encode/decode pattern.
- Brute-force on architecture-dependent indicators (same matrix in token embeddings and final layer) to identify untrained tokens.
- Then verify if identified tokens are out of distribution by prompting a target llm (with no tied threshold).

Quite a bait huh, Cohere:)

Post

1185

Build your own GPT-4 Tokenizer! - @karpathy 's minbpe exercise.
Step 1: BasicTokenizer
Got "close" to beating minbpe's train speed :(
step 2 RegexTokenizer coming soon.

Notes on lessons learned:
- tokenization is the assembly language of LLMs:)
It's not a healthy choice to code it lol.
- encoding can literally drive you mad.
- merging is where sh*t gets real - moment of truth:)
- training requires precision.
- decoding is trivial.

View all posts