mirror of
https://github.com/arc53/DocsGPT.git
synced 2025-12-01 01:23:14 +00:00
Nextra docs
This commit is contained in:
4
docs/pages/Guides/Customising-prompts.md
Normal file
4
docs/pages/Guides/Customising-prompts.md
Normal file
@@ -0,0 +1,4 @@
|
||||
## To customise a main prompt navigate to `/application/prompt/combine_prompt.txt`
|
||||
|
||||
You can try editing it to see how the model responds.
|
||||
|
||||
60
docs/pages/Guides/How-to-train-on-other-documentation.md
Normal file
60
docs/pages/Guides/How-to-train-on-other-documentation.md
Normal file
@@ -0,0 +1,60 @@
|
||||
## How to train on other documentation
|
||||
This AI can use any documentation, but first it needs to be prepared for similarity search.
|
||||
|
||||

|
||||
|
||||
Start by going to
|
||||
`/scripts/` folder
|
||||
|
||||
If you open this file you will see that it uses RST files from the folder to create a `index.faiss` and `index.pkl`.
|
||||
|
||||
It currently uses OPEN_AI to create vector store, so make sure your documentation is not too big. Pandas cost me around 3-4$
|
||||
|
||||
You can usually find documentation on github in docs/ folder for most open-source projects.
|
||||
|
||||
### 1. Find documentation in .rst/.md and create a folder with it in your scripts directory
|
||||
Name it `inputs/`
|
||||
Put all your .rst/.md files in there
|
||||
The search is recursive, so you don't need to flatten them
|
||||
|
||||
If there are no .rst/.md files just convert whatever you find to txt and feed it. (don't forget to change the extension in script)
|
||||
|
||||
### 2. Create .env file in `scripts/` folder
|
||||
And write your OpenAI API key inside
|
||||
`OPENAI_API_KEY=<your-api-key>`
|
||||
|
||||
### 3. Run scripts/ingest.py
|
||||
|
||||
`python ingest.py ingest`
|
||||
|
||||
It will tell you how much it will cost
|
||||
|
||||
### 4. Move `index.faiss` and `index.pkl` generated in `scripts/output` to `application/` folder.
|
||||
|
||||
|
||||
### 5. Run web app
|
||||
Once you run it will use new context that is relevant to your documentation
|
||||
Make sure you select default in the dropdown in the UI
|
||||
|
||||
## Customisation
|
||||
You can learn more about options while running ingest.py by running:
|
||||
|
||||
`python ingest.py --help`
|
||||
| Options | |
|
||||
|:--------------------------------:|:------------------------------------------------------------------------------------------------------------------------------:|
|
||||
| **ingest** | Runs 'ingest' function converting documentation to to Faiss plus Index format |
|
||||
| --dir TEXT | List of paths to directory for index creation. E.g. --dir inputs --dir inputs2 [default: inputs] |
|
||||
| --file TEXT | File paths to use (Optional; overrides directory) E.g. --files inputs/1.md --files inputs/2.md |
|
||||
| --recursive / --no-recursive | Whether to recursively search in subdirectories [default: recursive] |
|
||||
| --limit INTEGER | Maximum number of files to read |
|
||||
| --formats TEXT | List of required extensions (list with .) Currently supported: .rst, .md, .pdf, .docx, .csv, .epub, .html [default: .rst, .md] |
|
||||
| --exclude / --no-exclude | Whether to exclude hidden files (dotfiles) [default: exclude] |
|
||||
| -y, --yes | Whether to skip price confirmation |
|
||||
| --sample / --no-sample | Whether to output sample of the first 5 split documents. [default: no-sample] |
|
||||
| --token-check / --no-token-check | Whether to group small documents and split large. Improves semantics. [default: token-check] |
|
||||
| --min_tokens INTEGER | Minimum number of tokens to not group. [default: 150] |
|
||||
| --max_tokens INTEGER | Maximum number of tokens to not split. [default: 2000] |
|
||||
| | |
|
||||
| **convert** | Creates documentation in .md format from source code |
|
||||
| --dir TEXT | Path to a directory with source code. E.g. --dir inputs [default: inputs] |
|
||||
| --formats TEXT | Source code language from which to create documentation. Supports py, js and java. E.g. --formats py [default: py] |
|
||||
32
docs/pages/Guides/How-to-use-different-LLM.md
Normal file
32
docs/pages/Guides/How-to-use-different-LLM.md
Normal file
@@ -0,0 +1,32 @@
|
||||
Fortunately there are many providers for LLM's and some of them can even be ran locally
|
||||
|
||||
There are two models used in the app:
|
||||
1. Embeddings
|
||||
2. Text generation
|
||||
|
||||
By default we use OpenAI's models but if you want to change it or even run it locally, its very simple!
|
||||
|
||||
### Go to .env file or set environment variables:
|
||||
|
||||
`LLM_NAME=<your Text generation>`
|
||||
|
||||
`API_KEY=<api_key for Text generation>`
|
||||
|
||||
`EMBEDDINGS_NAME=<llm for embeddings>`
|
||||
|
||||
`EMBEDDINGS_KEY=<api_key for embeddings>`
|
||||
|
||||
`VITE_API_STREAMING=<true or false (true if using openai, false for all others)>`
|
||||
|
||||
You dont need to provide keys if you are happy with users providing theirs, so make sure you set LLM_NAME and EMBEDDINGS_NAME
|
||||
|
||||
Options:
|
||||
LLM_NAME (openai, manifest, cohere, Arc53/docsgpt-14b, Arc53/docsgpt-7b-falcon)
|
||||
EMBEDDINGS_NAME (openai_text-embedding-ada-002, huggingface_sentence-transformers/all-mpnet-base-v2, huggingface_hkunlp/instructor-large, cohere_medium)
|
||||
|
||||
Thats it!
|
||||
|
||||
### Hosting everything locally and privately (for using our optimised open-source models)
|
||||
If you are working with important data and dont want anything to leave your premises.
|
||||
|
||||
Make sure you set SELF_HOSTED_MODEL as true in you .env variable and for your LLM_NAME you can use anything thats on Huggingface
|
||||
@@ -0,0 +1,19 @@
|
||||
If your AI uses external knowledge and is not explicit enough it ok, because we try to make docsgpt friendly.
|
||||
|
||||
But if you want to adjust it here is a simple way.
|
||||
|
||||
Got to `application/prompts/chat_combine_prompt.txt`
|
||||
|
||||
And change it to
|
||||
|
||||
|
||||
```
|
||||
|
||||
You are a DocsGPT, friendly and helpful AI assistant by Arc53 that provides help with documents. You give thorough answers with code examples if possible.
|
||||
Write an answer for the question below based on the provided context.
|
||||
If the context provides insufficient information, reply "I cannot answer".
|
||||
You have access to chat history, and can use it to help answer the question.
|
||||
----------------
|
||||
{summaries}
|
||||
|
||||
```
|
||||
18
docs/pages/Guides/_meta.json
Normal file
18
docs/pages/Guides/_meta.json
Normal file
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"Customising-prompts": {
|
||||
"title": "🏗️️ Customising Prompts",
|
||||
"href": "/Guides/Customising-prompts"
|
||||
},
|
||||
"How-to-train-on-other-documentation": {
|
||||
"title": "📥 Training on docs",
|
||||
"href": "/Guides/How-to-train-on-other-documentation"
|
||||
},
|
||||
"How-to-use-different-LLM": {
|
||||
"title": "⚙️️ How to use different LLM's",
|
||||
"href": "/Guides/How-to-use-different-LLM"
|
||||
},
|
||||
"My-AI-answers-questions-using-external-knowledge": {
|
||||
"title": "💭️ Avoiding hallucinations",
|
||||
"href": "/Guides/My-AI-answers-questions-using-external-knowledge"
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user