Results
Modelling
- In progress
Unity Model Analysis
With the modelling of the system done inside Unity. An analysis of the model was needed, such as changing the parameters, such as mass, drag, angular drag, both in the cart and in the pendulum, and check if it behave as we expect from the mathematical modelling.
The base parameters were
Mass [kg] | Drag | Angular Drag | |
Cart | 1 | 0 | 0.05 |
Pendulum | 1 | 0.5 | 0.05 |
And the same force was applied to all the tests. With the characteristics of an impulse response.
Cart parameters analysis
Mass of the cart
The first simulation was regarding the mass of the cart. A simulation with the mass of the cart as 1kg, 5kg, 10kg and 50kg was made and this was the result both for the position of the cart and the angle of the pendulum in those simulations.
As expected, as we increase the mass of the car, applying the same force will result in different positions, as the accelaration of the cart will be less than the previous one. Regarding the pendulum, the same logic applies, when the mass of the cart increases, the displacement of the cart decreases and the angle of the pendulum decreases as well.
Drag of the cart
Sidenote: In Unity, as we are using rigid bodies, we can interact with the drag and the angular drag.
Pendulum parameter analysis
Length of the pendulum
Angular drag of the pendulum
Model training - Stabilization
mlagents-learn ./trainer_0.16.1.yaml --run-id stable_004 --resume
tensorboard --logdir summaries
The final parameters of mass, drag and angular drag for the training
Mass [kg] | Drag | Angular Drag | |
Cart | 5 | 2 | 0 |
Pendulum | 1 | 0 | 0.5 |
The file configuration for the training
CartPole:
trainer_type: ppo
hyperparameters:
batch_size: 64
buffer_size: 12000
learning_rate: 0.0003
beta: 0.001
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 128
num_layers: 2
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 5.0e6
time_horizon: 1000
summary_freq: 1000
threaded: true
Cumulative reward
Fundamentação
Training response
number of points in the graph? frequency
Parameter variation
The model was trained according to the table x. In order to test if the model is robust, changes to the parameters of the cart and pendulum were made to test if it could still stabilize the model without retraining.