refine readme

This commit is contained in:
Yu Li
2023-12-25 17:26:21 -06:00
parent 6e3eaabef0
commit aba11e32cd

View File

@@ -1,9 +1,16 @@
![airllm_logo](https://github.com/lyogavin/Anima/blob/main/assets/airllm_logo_sm.png?v=3&raw=true)
[**Quickstart**](#quickstart) |
[**Configurations**](#configurations) |
[**MacOS**](#macos) |
[**Example notebooks**](#example-python-notebook) |
[**FAQ**](#faq)
**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed.
AirLLM优化inference内存4GB单卡GPU可以运行70B大语言模型推理。不需要任何损失模型性能的量化和蒸馏剪枝等模型压缩。
## Updates
[2023/12/25] v2.8.2: Support MacOS running 70B large language models.
@@ -350,6 +357,23 @@ input_tokens = model.tokenizer(input_text,
)
```
## Citing AirLLM
If you find
AirLLM useful in your research and wish to cite it, please use the following
BibTex entry:
```
@software{airllm2023,
author = {Gavin Li},
title = {AirLLM: scaling large language models on low-end commodity computers},
url = {https://github.com/lyogavin/Anima/tree/main/air_llm},
version = {0.0},
year = {2023},
}
```
## Contribution