Merge branch 'freqtrade:develop' into develop

2026-02-28 23:36:05 +00:00 · 2023-05-20 19:50:31 +02:00
parent 209eb63ede fcb75560c4
commit 5142b6bc0d
16 changed files with 221 additions and 136 deletions
--- a/docs/freqai-configuration.md
+++ b/docs/freqai-configuration.md
@@ -248,9 +248,11 @@ The easiest way to quickly run a pytorch model is with the following command (fo
 freqtrade trade --config config_examples/config_freqai.example.json --strategy FreqaiExampleStrategy --freqaimodel PyTorchMLPRegressor --strategy-path freqtrade/templates 
 ```

-!!! note "Installation/docker"
+!!! Note "Installation/docker"
    The PyTorch module requires large packages such as `torch`, which should be explicitly requested during `./setup.sh -i` by answering "y" to the question "Do you also want dependencies for freqai-rl or PyTorch (~700mb additional space required) [y/N]?".
    Users who prefer docker should ensure they use the docker image appended with `_freqaitorch`.
+    We do provide an explicit docker-compose file for this in `docker/docker-compose-freqai.yml` - which can be used via `docker compose -f docker/docker-compose-freqai.yml run ...` - or can be copied to replace the original docker file.
+    This docker-compose file also contains a (disabled) section to enable GPU resources within docker containers. This obviously assumes the system has GPU resources available.

 ### Structure

--- a/docs/freqai-reinforcement-learning.md
+++ b/docs/freqai-reinforcement-learning.md
@@ -145,94 +145,94 @@ As you begin to modify the strategy and the prediction model, you will quickly r
    The best reward functions are ones that are continuously differentiable, and well scaled. In other words, adding a single large negative penalty to a rare event is not a good idea, and the neural net will not be able to learn that function. Instead, it is better to add a small negative penalty to a common event. This will help the agent learn faster. Not only this, but you can help improve the continuity of your rewards/penalties by having them scale with severity according to some linear/exponential functions. In other words, you'd slowly scale the penalty as the duration of the trade increases. This is better than a single large penalty occuring at a single point in time.

 ```python
-    from freqtrade.freqai.prediction_models.ReinforcementLearner import ReinforcementLearner
-    from freqtrade.freqai.RL.Base5ActionRLEnv import Actions, Base5ActionRLEnv, Positions
+from freqtrade.freqai.prediction_models.ReinforcementLearner import ReinforcementLearner
+from freqtrade.freqai.RL.Base5ActionRLEnv import Actions, Base5ActionRLEnv, Positions


-    class MyCoolRLModel(ReinforcementLearner):
+class MyCoolRLModel(ReinforcementLearner):
+    """
+    User created RL prediction model.
+
+    Save this file to `freqtrade/user_data/freqaimodels`
+
+    then use it with:
+
+    freqtrade trade --freqaimodel MyCoolRLModel --config config.json --strategy SomeCoolStrat
+
+    Here the users can override any of the functions
+    available in the `IFreqaiModel` inheritance tree. Most importantly for RL, this
+    is where the user overrides `MyRLEnv` (see below), to define custom
+    `calculate_reward()` function, or to override any other parts of the environment.
+
+    This class also allows users to override any other part of the IFreqaiModel tree.
+    For example, the user can override `def fit()` or `def train()` or `def predict()`
+    to take fine-tuned control over these processes.
+
+    Another common override may be `def data_cleaning_predict()` where the user can
+    take fine-tuned control over the data handling pipeline.
+    """
+    class MyRLEnv(Base5ActionRLEnv):
        """
-        User created RL prediction model.
+        User made custom environment. This class inherits from BaseEnvironment and gym.env.
+        Users can override any functions from those parent classes. Here is an example
+        of a user customized `calculate_reward()` function.

-        Save this file to `freqtrade/user_data/freqaimodels`
-
-        then use it with:
-
-        freqtrade trade --freqaimodel MyCoolRLModel --config config.json --strategy SomeCoolStrat
-
-        Here the users can override any of the functions
-        available in the `IFreqaiModel` inheritance tree. Most importantly for RL, this
-        is where the user overrides `MyRLEnv` (see below), to define custom
-        `calculate_reward()` function, or to override any other parts of the environment.
-
-        This class also allows users to override any other part of the IFreqaiModel tree.
-        For example, the user can override `def fit()` or `def train()` or `def predict()`
-        to take fine-tuned control over these processes.
-
-        Another common override may be `def data_cleaning_predict()` where the user can
-        take fine-tuned control over the data handling pipeline.
+        Warning!
+        This is function is a showcase of functionality designed to show as many possible
+        environment control features as possible. It is also designed to run quickly
+        on small computers. This is a benchmark, it is *not* for live production.
        """
-        class MyRLEnv(Base5ActionRLEnv):
-            """
-            User made custom environment. This class inherits from BaseEnvironment and gym.env.
-            Users can override any functions from those parent classes. Here is an example
-            of a user customized `calculate_reward()` function.
+        def calculate_reward(self, action: int) -> float:
+            # first, penalize if the action is not valid
+            if not self._is_valid(action):
+                return -2
+            pnl = self.get_unrealized_profit()

-            Warning!
-            This is function is a showcase of functionality designed to show as many possible
-            environment control features as possible. It is also designed to run quickly
-            on small computers. This is a benchmark, it is *not* for live production.
-            """
-            def calculate_reward(self, action: int) -> float:
-                # first, penalize if the action is not valid
-                if not self._is_valid(action):
-                    return -2
-                pnl = self.get_unrealized_profit()
+            factor = 100

-                factor = 100
+            pair = self.pair.replace(':', '')

-                pair = self.pair.replace(':', '')
+            # you can use feature values from dataframe
+            # Assumes the shifted RSI indicator has been generated in the strategy.
+            rsi_now = self.raw_features[f"%-rsi-period_10_shift-1_{pair}_"
+                            f"{self.config['timeframe']}"].iloc[self._current_tick]

-                # you can use feature values from dataframe
-                # Assumes the shifted RSI indicator has been generated in the strategy.
-                rsi_now = self.raw_features[f"%-rsi-period_10_shift-1_{pair}_"
-                                f"{self.config['timeframe']}"].iloc[self._current_tick]
+            # reward agent for entering trades
+            if (action in (Actions.Long_enter.value, Actions.Short_enter.value)
+                    and self._position == Positions.Neutral):
+                if rsi_now < 40:
+                    factor = 40 / rsi_now
+                else:
+                    factor = 1
+                return 25 * factor

-                # reward agent for entering trades
-                if (action in (Actions.Long_enter.value, Actions.Short_enter.value)
-                        and self._position == Positions.Neutral):
-                    if rsi_now < 40:
-                        factor = 40 / rsi_now
-                    else:
-                        factor = 1
-                    return 25 * factor
-
-                # discourage agent from not entering trades
-                if action == Actions.Neutral.value and self._position == Positions.Neutral:
-                    return -1
-                max_trade_duration = self.rl_config.get('max_trade_duration_candles', 300)
-                trade_duration = self._current_tick - self._last_trade_tick
-                if trade_duration <= max_trade_duration:
-                    factor *= 1.5
-                elif trade_duration > max_trade_duration:
-                    factor *= 0.5
-                # discourage sitting in position
-                if self._position in (Positions.Short, Positions.Long) and \
-                action == Actions.Neutral.value:
-                    return -1 * trade_duration / max_trade_duration
-                # close long
-                if action == Actions.Long_exit.value and self._position == Positions.Long:
-                    if pnl > self.profit_aim * self.rr:
-                        factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
-                    return float(pnl * factor)
-                # close short
-                if action == Actions.Short_exit.value and self._position == Positions.Short:
-                    if pnl > self.profit_aim * self.rr:
-                        factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
-                    return float(pnl * factor)
-                return 0.
+            # discourage agent from not entering trades
+            if action == Actions.Neutral.value and self._position == Positions.Neutral:
+                return -1
+            max_trade_duration = self.rl_config.get('max_trade_duration_candles', 300)
+            trade_duration = self._current_tick - self._last_trade_tick
+            if trade_duration <= max_trade_duration:
+                factor *= 1.5
+            elif trade_duration > max_trade_duration:
+                factor *= 0.5
+            # discourage sitting in position
+            if self._position in (Positions.Short, Positions.Long) and \
+            action == Actions.Neutral.value:
+                return -1 * trade_duration / max_trade_duration
+            # close long
+            if action == Actions.Long_exit.value and self._position == Positions.Long:
+                if pnl > self.profit_aim * self.rr:
+                    factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
+                return float(pnl * factor)
+            # close short
+            if action == Actions.Short_exit.value and self._position == Positions.Short:
+                if pnl > self.profit_aim * self.rr:
+                    factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
+                return float(pnl * factor)
+            return 0.
 ```

-### Using Tensorboard
+## Using Tensorboard

 Reinforcement Learning models benefit from tracking training metrics. FreqAI has integrated Tensorboard to allow users to track training and evaluation performance across all coins and across all retrainings. Tensorboard is activated via the following command:

@@ -245,32 +245,30 @@ where `unique-id` is the `identifier` set in the `freqai` configuration file. Th

 ![tensorboard](assets/tensorboard.jpg)

-
-### Custom logging
+## Custom logging

 FreqAI also provides a built in episodic summary logger called `self.tensorboard_log` for adding custom information to the Tensorboard log. By default, this function is already called once per step inside the environment to record the agent actions. All values accumulated for all steps in a single episode are reported at the conclusion of each episode, followed by a full reset of all metrics to 0 in preparation for the subsequent episode.

-
 `self.tensorboard_log` can also be used anywhere inside the environment, for example, it can be added to the `calculate_reward` function to collect more detailed information about how often various parts of the reward were called:

-```py
-        class MyRLEnv(Base5ActionRLEnv):
-            """
-            User made custom environment. This class inherits from BaseEnvironment and gym.env.
-            Users can override any functions from those parent classes. Here is an example
-            of a user customized `calculate_reward()` function.
-            """
-            def calculate_reward(self, action: int) -> float:
-                if not self._is_valid(action):
-                    self.tensorboard_log("invalid")
-                    return -2
+```python
+    class MyRLEnv(Base5ActionRLEnv):
+        """
+        User made custom environment. This class inherits from BaseEnvironment and gym.env.
+        Users can override any functions from those parent classes. Here is an example
+        of a user customized `calculate_reward()` function.
+        """
+        def calculate_reward(self, action: int) -> float:
+            if not self._is_valid(action):
+                self.tensorboard_log("invalid")
+                return -2

 ```

 !!! Note
    The `self.tensorboard_log()` function is designed for tracking incremented objects only i.e. events, actions inside the training environment. If the event of interest is a float, the float can be passed as the second argument e.g. `self.tensorboard_log("float_metric1", 0.23)`. In this case the metric values are not incremented.

-### Choosing a base environment
+## Choosing a base environment

 FreqAI provides three base environments, `Base3ActionRLEnvironment`, `Base4ActionEnvironment` and `Base5ActionEnvironment`. As the names imply, the environments are customized for agents that can select from 3, 4 or 5 actions. The `Base3ActionEnvironment` is the simplest, the agent can select from hold, long, or short. This environment can also be used for long-only bots (it automatically follows the `can_short` flag from the strategy), where long is the enter condition and short is the exit condition. Meanwhile, in the `Base4ActionEnvironment`, the agent can enter long, enter short, hold neutral, or exit position. Finally, in the `Base5ActionEnvironment`, the agent has the same actions as Base4, but instead of a single exit action, it separates exit long and exit short. The main changes stemming from the environment selection include:

--- a/docs/freqai.md
+++ b/docs/freqai.md
@@ -78,6 +78,9 @@ pip install -r requirements-freqai.txt

 If you are using docker, a dedicated tag with FreqAI dependencies is available as `:freqai`. As such - you can replace the image line in your docker compose file with `image: freqtradeorg/freqtrade:develop_freqai`. This image contains the regular FreqAI dependencies. Similar to native installs, Catboost will not be available on ARM based devices.

+!!! note "docker-compose-freqai.yml"
+   We do provide an explicit docker-compose file for this in `docker/docker-compose-freqai.yml` - which can be used via `docker compose -f docker/docker-compose-freqai.yml run ...` - or can be copied to replace the original docker file. This docker-compose file also contains a (disabled) section to enable GPU resources within docker containers. This obviously assumes the system has GPU resources available.
+
 ### FreqAI position in open-source machine learning landscape

 Forecasting chaotic time-series based systems, such as equity/cryptocurrency markets, requires a broad set of tools geared toward testing a wide range of hypotheses. Fortunately, a recent maturation of robust machine learning libraries (e.g. `scikit-learn`) has opened up a wide range of research possibilities. Scientists from a diverse range of fields can now easily prototype their studies on an abundance of established machine learning algorithms. Similarly, these user-friendly libraries enable "citzen scientists" to use their basic Python skills for data exploration. However, leveraging these machine learning libraries on historical and live chaotic data sources can be logistically difficult and expensive. Additionally, robust data collection, storage, and handling presents a disparate challenge. [`FreqAI`](#freqai) aims to provide a generalized and extensible open-sourced framework geared toward live deployments of adaptive modeling for market forecasting. The `FreqAI` framework is effectively a sandbox for the rich world of open-source machine learning libraries. Inside the `FreqAI` sandbox, users find they can combine a wide variety of third-party libraries to test creative hypotheses on a free live 24/7 chaotic data source - cryptocurrency exchange data.