Mistral LoRA - BitNet 1.58 Q&A Expert
This is a LoRA fine-tuned adapter for [mistralai/Mistral-7B-Instruct-v0.2
] on a custom Q&A dataset derived from the paper "The Era of 1-bit LLMs" (BitNet b1.58).
Model Details
- Base model:
mistralai/Mistral-7B-Instruct-v0.2
- LoRA fine-tuning: 4-bit quantization (bitsandbytes) + PEFT
- Target modules:
q_proj
,k_proj
,v_proj
,o_proj
- Rank: 8, Alpha: 16, Dropout: 0.05
Dataset
Q&A pairs were auto-generated from the BitNet b1.58 paper. Each instruction asked about architectural and performance details of 1-bit LLMs.
Before vs. After Comparison
Question | Base Model Output | Fine-tuned Model Output |
---|---|---|
What is a 1-bit LLM? | β Talks about hardware cache lines | β Correctly defines quantized LLM |
How does BitNet b1.58 differ from standard 1-bit models? | β Talks about legacy networking | β Talks about ternary weights (-1, 0, 1) |
At what size does it outperform FP16? | β Refers to wrong paper | β Refers to performance table |
Why is it more memory/latency efficient? | β Talks about DHT routing | β Highlights no FP multiplication |
Edge deployment and hardware design? | β Talks about old protocols | β References new 1-bit hardware potential |
Usage
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model = PeftModel.from_pretrained(base, "ogflash/mistral-lora-qa-1bit")
tokenizer = AutoTokenizer.from_pretrained("ogflash/mistral-lora-qa-1bit")
prompt = "### Instruction:\nwhat is 1 bit llm\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for ogflash/mistral-lora-qa-1bit
Base model
mistralai/Mistral-7B-Instruct-v0.2