Mistral LoRA - BitNet 1.58 Q&A Expert

This is a LoRA fine-tuned adapter for [mistralai/Mistral-7B-Instruct-v0.2] on a custom Q&A dataset derived from the paper "The Era of 1-bit LLMs" (BitNet b1.58).

Model Details

  • Base model: mistralai/Mistral-7B-Instruct-v0.2
  • LoRA fine-tuning: 4-bit quantization (bitsandbytes) + PEFT
  • Target modules: q_proj, k_proj, v_proj, o_proj
  • Rank: 8, Alpha: 16, Dropout: 0.05

Dataset

Q&A pairs were auto-generated from the BitNet b1.58 paper. Each instruction asked about architectural and performance details of 1-bit LLMs.

Before vs. After Comparison

Question Base Model Output Fine-tuned Model Output
What is a 1-bit LLM? ❌ Talks about hardware cache lines βœ… Correctly defines quantized LLM
How does BitNet b1.58 differ from standard 1-bit models? ❌ Talks about legacy networking βœ… Talks about ternary weights (-1, 0, 1)
At what size does it outperform FP16? ❌ Refers to wrong paper βœ… Refers to performance table
Why is it more memory/latency efficient? ❌ Talks about DHT routing βœ… Highlights no FP multiplication
Edge deployment and hardware design? ❌ Talks about old protocols βœ… References new 1-bit hardware potential

Usage

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model = PeftModel.from_pretrained(base, "ogflash/mistral-lora-qa-1bit")
tokenizer = AutoTokenizer.from_pretrained("ogflash/mistral-lora-qa-1bit")

prompt = "### Instruction:\nwhat is 1 bit llm\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ogflash/mistral-lora-qa-1bit

Finetuned
(1004)
this model