Nextra docs

This commit is contained in:
Alex
2023-09-07 12:36:39 +01:00
parent 94738d8fc4
commit 4f735a5d11
23 changed files with 5436 additions and 0 deletions

View File

@@ -0,0 +1,4 @@
## To customise a main prompt navigate to `/application/prompt/combine_prompt.txt`
You can try editing it to see how the model responds.

View File

@@ -0,0 +1,60 @@
## How to train on other documentation
This AI can use any documentation, but first it needs to be prepared for similarity search.
![video-example-of-how-to-do-it](https://d3dg1063dc54p9.cloudfront.net/videos/how-to-vectorise.gif)
Start by going to
`/scripts/` folder
If you open this file you will see that it uses RST files from the folder to create a `index.faiss` and `index.pkl`.
It currently uses OPEN_AI to create vector store, so make sure your documentation is not too big. Pandas cost me around 3-4$
You can usually find documentation on github in docs/ folder for most open-source projects.
### 1. Find documentation in .rst/.md and create a folder with it in your scripts directory
Name it `inputs/`
Put all your .rst/.md files in there
The search is recursive, so you don't need to flatten them
If there are no .rst/.md files just convert whatever you find to txt and feed it. (don't forget to change the extension in script)
### 2. Create .env file in `scripts/` folder
And write your OpenAI API key inside
`OPENAI_API_KEY=<your-api-key>`
### 3. Run scripts/ingest.py
`python ingest.py ingest`
It will tell you how much it will cost
### 4. Move `index.faiss` and `index.pkl` generated in `scripts/output` to `application/` folder.
### 5. Run web app
Once you run it will use new context that is relevant to your documentation
Make sure you select default in the dropdown in the UI
## Customisation
You can learn more about options while running ingest.py by running:
`python ingest.py --help`
| Options | |
|:--------------------------------:|:------------------------------------------------------------------------------------------------------------------------------:|
| **ingest** | Runs 'ingest' function converting documentation to to Faiss plus Index format |
| --dir TEXT | List of paths to directory for index creation. E.g. --dir inputs --dir inputs2 [default: inputs] |
| --file TEXT | File paths to use (Optional; overrides directory) E.g. --files inputs/1.md --files inputs/2.md |
| --recursive / --no-recursive | Whether to recursively search in subdirectories [default: recursive] |
| --limit INTEGER | Maximum number of files to read |
| --formats TEXT | List of required extensions (list with .) Currently supported: .rst, .md, .pdf, .docx, .csv, .epub, .html [default: .rst, .md] |
| --exclude / --no-exclude | Whether to exclude hidden files (dotfiles) [default: exclude] |
| -y, --yes | Whether to skip price confirmation |
| --sample / --no-sample | Whether to output sample of the first 5 split documents. [default: no-sample] |
| --token-check / --no-token-check | Whether to group small documents and split large. Improves semantics. [default: token-check] |
| --min_tokens INTEGER | Minimum number of tokens to not group. [default: 150] |
| --max_tokens INTEGER | Maximum number of tokens to not split. [default: 2000] |
| | |
| **convert** | Creates documentation in .md format from source code |
| --dir TEXT | Path to a directory with source code. E.g. --dir inputs [default: inputs] |
| --formats TEXT | Source code language from which to create documentation. Supports py, js and java. E.g. --formats py [default: py] |

View File

@@ -0,0 +1,32 @@
Fortunately there are many providers for LLM's and some of them can even be ran locally
There are two models used in the app:
1. Embeddings
2. Text generation
By default we use OpenAI's models but if you want to change it or even run it locally, its very simple!
### Go to .env file or set environment variables:
`LLM_NAME=<your Text generation>`
`API_KEY=<api_key for Text generation>`
`EMBEDDINGS_NAME=<llm for embeddings>`
`EMBEDDINGS_KEY=<api_key for embeddings>`
`VITE_API_STREAMING=<true or false (true if using openai, false for all others)>`
You dont need to provide keys if you are happy with users providing theirs, so make sure you set LLM_NAME and EMBEDDINGS_NAME
Options:
LLM_NAME (openai, manifest, cohere, Arc53/docsgpt-14b, Arc53/docsgpt-7b-falcon)
EMBEDDINGS_NAME (openai_text-embedding-ada-002, huggingface_sentence-transformers/all-mpnet-base-v2, huggingface_hkunlp/instructor-large, cohere_medium)
Thats it!
### Hosting everything locally and privately (for using our optimised open-source models)
If you are working with important data and dont want anything to leave your premises.
Make sure you set SELF_HOSTED_MODEL as true in you .env variable and for your LLM_NAME you can use anything thats on Huggingface

View File

@@ -0,0 +1,19 @@
If your AI uses external knowledge and is not explicit enough it ok, because we try to make docsgpt friendly.
But if you want to adjust it here is a simple way.
Got to `application/prompts/chat_combine_prompt.txt`
And change it to
```
You are a DocsGPT, friendly and helpful AI assistant by Arc53 that provides help with documents. You give thorough answers with code examples if possible.
Write an answer for the question below based on the provided context.
If the context provides insufficient information, reply "I cannot answer".
You have access to chat history, and can use it to help answer the question.
----------------
{summaries}
```

View File

@@ -0,0 +1,18 @@
{
"Customising-prompts": {
"title": "🏗️️ Customising Prompts",
"href": "/Guides/Customising-prompts"
},
"How-to-train-on-other-documentation": {
"title": "📥 Training on docs",
"href": "/Guides/How-to-train-on-other-documentation"
},
"How-to-use-different-LLM": {
"title": "⚙️️ How to use different LLM's",
"href": "/Guides/How-to-use-different-LLM"
},
"My-AI-answers-questions-using-external-knowledge": {
"title": "💭️ Avoiding hallucinations",
"href": "/Guides/My-AI-answers-questions-using-external-knowledge"
}
}