diff --git a/air_llm/README.md b/air_llm/README.md index 1906363..93b4be8 100644 --- a/air_llm/README.md +++ b/air_llm/README.md @@ -1,9 +1,16 @@ ![airllm_logo](https://github.com/lyogavin/Anima/blob/main/assets/airllm_logo_sm.png?v=3&raw=true) +[**Quickstart**](#quickstart) | +[**Configurations**](#configurations) | +[**MacOS**](#macos) | +[**Example notebooks**](#example-python-notebook) | +[**FAQ**](#faq) + **AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed. AirLLM优化inference内存,4GB单卡GPU可以运行70B大语言模型推理。不需要任何损失模型性能的量化和蒸馏,剪枝等模型压缩。 + ## Updates [2023/12/25] v2.8.2: Support MacOS running 70B large language models. @@ -350,6 +357,23 @@ input_tokens = model.tokenizer(input_text, ) ``` +## Citing AirLLM + +If you find +AirLLM useful in your research and wish to cite it, please use the following +BibTex entry: + +``` +@software{airllm2023, + author = {Gavin Li}, + title = {AirLLM: scaling large language models on low-end commodity computers}, + url = {https://github.com/lyogavin/Anima/tree/main/air_llm}, + version = {0.0}, + year = {2023}, +} +``` + + ## Contribution