From bb62d8d62cac79acc20b610e53053eec2ef0a39f Mon Sep 17 00:00:00 2001 From: Yu Li Date: Fri, 1 Dec 2023 21:20:23 -0600 Subject: [PATCH] update readme --- air_llm/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/air_llm/README.md b/air_llm/README.md index 1fe9102..b7c1749 100644 --- a/air_llm/README.md +++ b/air_llm/README.md @@ -81,7 +81,7 @@ Note: During inference, the original model will first be decomposed and saved la ### 3. Compression - 3x Inference Speed! -We just added model compression based on block-wise quantization based model compression. Which can further **speed up the inference speed** for up to **3x** , with almost ignorable accuracy loss(see more performance evaluation and why we use block-wise quantization in [this paper](https://arxiv.org/abs/2212.09720))! +We just added model compression based on block-wise quantization based model compression. Which can further **speed up the inference speed** for up to **3x** , with **almost ignorable accuracy loss!** (see more performance evaluation and why we use block-wise quantization in [this paper](https://arxiv.org/abs/2212.09720)) ![speed_improvement](https://github.com/lyogavin/Anima/blob/main/assets/airllm2_time_improvement.png?v=2&raw=true)