Accelerating Speculative Decoding using Dynamic Speculation Length Paper • 2405.04304 • Published 27 days ago • 2
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published 11 days ago • 14
view article Article Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon 25 days ago • 7
Improving Classification Performance With Human Feedback: Label a few, we label the rest Paper • 2401.09555 • Published Jan 17 • 6
H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Paper • 2306.14048 • Published Jun 24, 2023 • 10