Results

July 26, 2021

Modelling

In progress

Unity Model Analysis

With the modelling of the system done inside Unity. An analysis of the model was needed, such as changing the parameters, such as mass, drag, angular drag, both in the cart and in the pendulum, and check if it behave as we expect from the mathematical modelling.

The base parameters were

	Mass [kg]	Drag	Angular Drag
Cart	1	0	0.05
Pendulum	1	0.5	0.05

And the same force was applied to all the tests. With the characteristics of an impulse response.

Cart parameters analysis

Mass of the cart

The first simulation was regarding the mass of the cart. A simulation with the mass of the cart as 1kg, 5kg, 10kg and 50kg was made and this was the result both for the position of the cart and the angle of the pendulum in those simulations.

As expected, as we increase the mass of the car, applying the same force will result in different positions, as the accelaration of the cart will be less than the previous one. Regarding the pendulum, the same logic applies, when the mass of the cart increases, the displacement of the cart decreases and the angle of the pendulum decreases as well.

Drag of the cart

Sidenote: In Unity, as we are using rigid bodies, we can interact with the drag and the angular drag.

Pendulum parameter analysis

Length of the pendulum

Angular drag of the pendulum

Model training - Stabilization

mlagents-learn ./trainer_0.16.1.yaml --run-id stable_004 --resume
tensorboard --logdir summaries

The final parameters of mass, drag and angular drag for the training

	Mass [kg]	Drag	Angular Drag
Cart	5	2	0
Pendulum	1	0	0.5

The file configuration for the training

CartPole:
  trainer_type: ppo
  hyperparameters:
    batch_size: 64
    buffer_size: 12000
    learning_rate: 0.0003
    beta: 0.001
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
  network_settings:
    normalize: true
    hidden_units: 128
    num_layers: 2
    vis_encode_type: simple
  reward_signals:
    extrinsic:
      gamma: 0.99
      strength: 1.0
  keep_checkpoints: 5
  max_steps: 5.0e6
  time_horizon: 1000
  summary_freq: 1000
  threaded: true

Cumulative reward

Fundamentação

Training response

number of points in the graph? frequency

Parameter variation

The model was trained according to the table x. In order to test if the model is robust, changes to the parameters of the cart and pendulum were made to test if it could still stabilize the model without retraining.

Jhonatan da Silva

Results

Modelling

Unity Model Analysis