Mistral LoRA - BitNet 1.58 Q&A Expert

This is a LoRA fine-tuned adapter for [mistralai/Mistral-7B-Instruct-v0.2] on a custom Q&A dataset derived from the paper "The Era of 1-bit LLMs" (BitNet b1.58).

Model Details

Base model: mistralai/Mistral-7B-Instruct-v0.2
LoRA fine-tuning: 4-bit quantization (bitsandbytes) + PEFT
Target modules: q_proj, k_proj, v_proj, o_proj
Rank: 8, Alpha: 16, Dropout: 0.05

Dataset

Q&A pairs were auto-generated from the BitNet b1.58 paper. Each instruction asked about architectural and performance details of 1-bit LLMs.

Before vs. After Comparison

Question	Base Model Output	Fine-tuned Model Output
What is a 1-bit LLM?	❌ Talks about hardware cache lines	✅ Correctly defines quantized LLM
How does BitNet b1.58 differ from standard 1-bit models?	❌ Talks about legacy networking	✅ Talks about ternary weights (-1, 0, 1)
At what size does it outperform FP16?	❌ Refers to wrong paper	✅ Refers to performance table
Why is it more memory/latency efficient?	❌ Talks about DHT routing	✅ Highlights no FP multiplication
Edge deployment and hardware design?	❌ Talks about old protocols	✅ References new 1-bit hardware potential

Usage

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model = PeftModel.from_pretrained(base, "ogflash/mistral-lora-qa-1bit")
tokenizer = AutoTokenizer.from_pretrained("ogflash/mistral-lora-qa-1bit")

prompt = "### Instruction:\nwhat is 1 bit llm\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ogflash
/

mistral-lora-qa-1bit

Mistral LoRA - BitNet 1.58 Q&A Expert

Model Details

Dataset

Before vs. After Comparison

Usage

Model tree for ogflash/mistral-lora-qa-1bit