diff --git a/docs/pages/Models/cloud-providers.mdx b/docs/pages/Models/cloud-providers.mdx index 36e737fa..3dad7e6e 100644 --- a/docs/pages/Models/cloud-providers.mdx +++ b/docs/pages/Models/cloud-providers.mdx @@ -14,7 +14,7 @@ The primary method for configuring your LLM provider in DocsGPT is through the ` To connect to a cloud LLM provider, you will typically need to configure the following basic settings in your `.env` file: * **`LLM_PROVIDER`**: This setting is essential and identifies the specific cloud provider you wish to use (e.g., `openai`, `google`, `anthropic`). -* **`LLM_NAME`**: Specifies the exact model you want to utilize from your chosen provider (e.g., `gpt-4o`, `gemini-2.0-flash`, `claude-3-5-sonnet-latest`). Refer to your provider's documentation for a list of available models. +* **`LLM_NAME`**: Specifies the exact model you want to utilize from your chosen provider (e.g., `gpt-5.1`, `gemini-flash-latest`, `claude-3-5-sonnet-20241022`). Refer to your provider's documentation for a list of available models. * **`API_KEY`**: Almost all cloud LLM providers require an API key for authentication. Obtain your API key from your chosen provider's platform and securely store it in your `.env` file. ## Explicitly Supported Cloud Providers @@ -24,12 +24,15 @@ DocsGPT offers direct, streamlined support for the following cloud LLM providers | Provider | `LLM_PROVIDER` | Example `LLM_NAME` | | :--------------------------- | :------------- | :-------------------------- | | DocsGPT Public API | `docsgpt` | `None` | -| OpenAI | `openai` | `gpt-4o` | -| Google (Vertex AI, Gemini) | `google` | `gemini-2.0-flash` | -| Anthropic (Claude) | `anthropic` | `claude-3-5-sonnet-latest` | -| Groq | `groq` | `llama-3.1-8b-instant` | +| OpenAI | `openai` | `gpt-5.1` | +| Google (Vertex AI, Gemini) | `google` | `gemini-flash-latest` | +| Anthropic (Claude) | `anthropic` | `claude-3-5-sonnet-20241022`| +| Groq | `groq` | `llama-3.3-70b-versatile` | | HuggingFace Inference API | `huggingface` | `meta-llama/Llama-3.1-8B-Instruct` | -| Azure OpenAI | `azure_openai` | `gpt-4o` | +| Azure OpenAI | `azure_openai` | `azure-gpt-4` | +| Prem AI | `premai` | (See Prem AI docs) | +| AWS SageMaker | `sagemaker` | (See SageMaker docs) | +| Novita AI | `novita` | (See Novita docs) | ## Connecting to OpenAI-Compatible Cloud APIs diff --git a/docs/pages/Models/local-inference.mdx b/docs/pages/Models/local-inference.mdx index 0bba907b..811cd433 100644 --- a/docs/pages/Models/local-inference.mdx +++ b/docs/pages/Models/local-inference.mdx @@ -20,13 +20,24 @@ To connect to a local inference engine, you will generally need to configure the * **`OPENAI_BASE_URL`**: This is essential. Set this to the base URL of your local inference engine's API endpoint. This tells DocsGPT where to find your local LLM server. * **`API_KEY`**: Generally, for local inference engines, you can set `API_KEY=None` as authentication is usually not required in local setups. +## Native llama.cpp Support + +DocsGPT includes native support for llama.cpp without requiring an OpenAI-compatible server. To use this: + +``` +LLM_PROVIDER=llama.cpp +LLM_NAME=your-model-name +``` + +This provider integrates directly with llama.cpp Python bindings. + ## Supported Local Inference Engines (OpenAI API Compatible) -DocsGPT is readily configurable to work with the following local inference engines, all communicating via the OpenAI API format. Here are example `OPENAI_BASE_URL` values for each, based on default setups: +DocsGPT is also readily configurable to work with the following local inference engines, all communicating via the OpenAI API format. Here are example `OPENAI_BASE_URL` values for each, based on default setups: | Inference Engine | `LLM_PROVIDER` | `OPENAI_BASE_URL` | | :---------------------------- | :------------- | :------------------------- | -| LLaMa.cpp | `openai` | `http://localhost:8000/v1` | +| LLaMa.cpp (server mode) | `openai` | `http://localhost:8000/v1` | | Ollama | `openai` | `http://localhost:11434/v1` | | Text Generation Inference (TGI)| `openai` | `http://localhost:8080/v1` | | SGLang | `openai` | `http://localhost:30000/v1` | @@ -39,6 +50,40 @@ DocsGPT is readily configurable to work with the following local inference engin The `OPENAI_BASE_URL` examples above use `http://localhost`. If you are running DocsGPT within Docker and your local inference engine is running on your host machine (outside of Docker), you will likely need to replace `localhost` with `http://host.docker.internal` to ensure Docker can correctly access your host's services. For example, `http://host.docker.internal:11434/v1` for Ollama. +## How the Model Registry Works + +DocsGPT uses a **Model Registry** to automatically detect and register available models based on your environment configuration. Understanding this system helps you configure models correctly. + +### Automatic Model Detection + +When DocsGPT starts, the Model Registry scans your environment variables and automatically registers models from providers that have valid API keys configured: + +| Environment Variable | Provider Models Registered | +| :--------------------- | :------------------------- | +| `OPENAI_API_KEY` | OpenAI models (gpt-5.1, gpt-5-mini, etc.) | +| `ANTHROPIC_API_KEY` | Anthropic models (Claude family) | +| `GOOGLE_API_KEY` | Google models (Gemini family) | +| `GROQ_API_KEY` | Groq models (Llama, Mixtral) | +| `HUGGINGFACE_API_KEY` | HuggingFace models | + +You can also use the generic `API_KEY` variable with `LLM_PROVIDER` to configure a single provider. + +### Custom OpenAI-Compatible Models + +When you set `OPENAI_BASE_URL` along with `LLM_PROVIDER=openai` and `LLM_NAME`, the registry automatically creates a custom model entry pointing to your local inference server. This is how local engines like Ollama, vLLM, and others get registered. + +### Default Model Selection + +The registry determines the default model in this priority order: + +1. If `LLM_NAME` is set and matches a registered model, that model becomes the default +2. Otherwise, the first model from the configured `LLM_PROVIDER` is selected +3. If neither is set, the first available model in the registry is used + +### Multiple Providers + +You can configure multiple API keys simultaneously (e.g., both `OPENAI_API_KEY` and `ANTHROPIC_API_KEY`). The registry will load models from all configured providers, giving users the ability to switch between them in the UI. + ## Adding Support for Other Local Engines While DocsGPT currently focuses on OpenAI API compatible local engines, you can extend its capabilities to support other local inference solutions. To do this, navigate to the `application/llm` directory in the DocsGPT repository. Examine the existing Python files for examples of LLM integrations. You can create a new module for your desired local engine, and then register it in the `llm_creator.py` file within the same directory. This allows for custom integration with a wide range of local LLM servers beyond those listed above. \ No newline at end of file