view post Post 2161 Reply Maybe of interest, I just finished a long writeup of my weekend project exploring Qwen 2 7B Instruct's Chinese censorship: https://huggingface.co/blog/leonardlin/chinese-llm-censorship-analysisI also have an accompanying model and dataset (and codebase) for those curious to poke around:* augmxnt/Qwen2-7B-Instruct-deccp* augmxnt/deccp
view post Post 1899 Reply Interesting, I've just seen the my first HF spam on one of my new model uploads: shisa-ai/shisa-v1-llama3-70b - someone has an SEO spam page as a HF space attached to the model!?! Wild. Who do I report this to?
speed LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 255 PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 40 Accelerating LLM Inference with Staged Speculative Decoding Paper • 2308.04623 • Published Aug 8, 2023 • 21 LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Paper • 2208.07339 • Published Aug 15, 2022 • 4
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 255
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 40
Accelerating LLM Inference with Staged Speculative Decoding Paper • 2308.04623 • Published Aug 8, 2023 • 21
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Paper • 2208.07339 • Published Aug 15, 2022 • 4
quantize QuIP: 2-Bit Quantization of Large Language Models With Guarantees Paper • 2307.13304 • Published Jul 25, 2023 • 1 SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression Paper • 2306.03078 • Published Jun 5, 2023 • 3 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models Paper • 2308.13137 • Published Aug 25, 2023 • 14 AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Paper • 2306.00978 • Published Jun 1, 2023 • 5
QuIP: 2-Bit Quantization of Large Language Models With Guarantees Paper • 2307.13304 • Published Jul 25, 2023 • 1
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression Paper • 2306.03078 • Published Jun 5, 2023 • 3
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models Paper • 2308.13137 • Published Aug 25, 2023 • 14
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Paper • 2306.00978 • Published Jun 1, 2023 • 5