To escape the daily routine a little and to provide an opportunity to jointly work together and combine individual research interests, the R&D team spent a few weeks together on a project. The members went into a huddle and came up with the following idea: let’s build and train a self-driving car-like robot that is able to navigate to a given target while avoiding any obstacles on its way.
The hardware
To keep things simple, the setup should make use of as few sensors as possible and altogether only make use of commodity hardware. Regarding the car-like robot, we could build upon a pre-existing movable platform of a previous project. The platform is equipped with two powered front-wheels and two freely movable caster wheels for support at the rear. The vehicle’s configuration is one with skid steering similar to wheelchairs. This kind of configuration is possibly the simplest one as it allows changing direction on-the-spot. The two motors are driven by an L289N motor driver and powered using a 12 volt battery.
The mobile robot
The environment was captured by a LiDAR sensor. LiDAR (an abbreviation for Light Detection and Ranging) is a prominent distance measurement sensor used in geodesy and autonomous driving. Distance measurement is based on Time of Flight: the sensor emits laser pulses and records the time elapsed until backscattered light is detected. The LiDAR we used was an inexpensive “starter” model. The distances the sensor can detect ranges between 0.11 to about 11 meters (any distance below and above this range yields 0), which was sufficient for the project. The Lidar rotates and thus it is possible to collect distances all around the car. Per 360° cycle the sensor is capable of detecting about 720 measurements. That means, its resolution is about 0.5°. The mount of the LiDAR and the wheel suspensions were custom made with a 3D-printer. The motor drivers and the Lidar were connected to a Raspberry Pi computer that is used to process the LiDAR sensor data, perform inference (i.e. determine direction changes) on the LiDAR data and control the motor driver. For now, no further sensors (e.g. for odometry) are used.
We created a small web interface that not only provided a GUI for manual motor control but also an API for the machine learning component to control the motors. The GUI thereby resembled some sort of joystick with the following configuration.
Steering layout
For example, a displacement of the joystick (the inner circle) of 1.0 along the x-axis and 0 along the y-axis results in on-the-spot-rotation to the right.
Training the models
The main focus for the machine learning part was to train the robot in avoiding obstacles along its way. To drastically simplify matters, we started with a constant forward velocity. That is, the velocity was not part of the training. Rather, the ML inference should suggest whether the robot should move to the left, go straight ahead or move to the right. We also reduced the data provided by the LiDAR: only data points between 60° to the left and 60° to the right of the current direction were considered. Between that 120° segment only 20 data points were sampled, yielding a resolution of six degrees.
Reinforcement Learning
Our general approach was to employ reinforcement learning where training takes place in a simulated environment. Reinforcement learning is a form of machine learning in which so-called agents (the entities that shall behave intelligently e.g. NPCs in games) interact with the environment so as to maximise rewards they receive by performing “good” actions. For our project we used Unity, a game engine that in combination with the ML-agents plugin allows setting up environments, training agents and using trained agents in games. The process is roughly as follows: during training the simulation is asked to provide current state information. The data is forwarded to an external ML-agent process, which uses Tensorflow for training. The ML-agent process responds with an action that the simulation rewards or punishes. After training is completed, the Tensorflow model is converted to a custom format and can be used for inference using Unity’s built-in inference engine.
For training, we created a corridor delimited by walls in which obstacles are arranged randomly. A car with LiDAR-capabilities is supposed to reach the end of that corridor without hitting the walls or the obstacles. As soon as a collision is detected, the car agent is punished and the episode ends. When a car reduces the distance to the end of the corridor it is rewarded. This should prevent the car from driving in circles. When the car reaches the end of the corridor it receives a large reward.
Sample parcour in which the model is trained
We firstly trained a recurrent neural network, which is capable of learning paths from start to end. Within the simulation, this model was able to drive the parcour perfectly. However, in reality, the results weren’t as good as we hoped as real life parameters, especially the time frequency in which the hardware was able to feed the model, were very different to the simulation.
Charts documenting the training progress in reinforcement learning
In a second run, we replaced the recurrent neural network with a simpler non-recurrent one. We are still in the process of achieving good results with reinforcement learning, intermediate results so fare were quite disillusioning. For some models it seemed as if they learned that the best way to get to the goal is by complete randomness.
A simpler approach
Meanwhile, we created a drastically simpler model using the classical regression approach instead of reinforcement learning. Unlike in reinforcement learning paradigm, the classical approach requires a solid base of pre-labeled data for training, which we also generated artificially. We simplified the LiDAR data to be binary vectors in which a “1” indicates that there is an obstacle within the given range and a “0” indicates free space. For example, the observation (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) indicates an obstacle on the left in the car’s moving direction. The model should then learn to find the the center of the largest free space in the observation: (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, [0], 0, 0, 0) in the previous example. We created 960.000 of such artificial observations (~75MB of data) and for each of these observations determined the center of the largest free space as respective target values. By trial-and-error we found that a network with two hidden layers and 256 units in each layer performed best.
The inference with that model is as follows: we convert each observation of the LiDAR to binary vectors, in which a “1” indicates an obstacle within a 0.4m range and a “0” indicates otherwise. At regular intervals we read the LiDAR and apply the model to it in order to get the next action, which the robot engages in until a new LiDAR reading is processed.
Lessons Learned
Although it was a small introductory project, we could gain valuable insights. For one, a proper ML-enabled system requires much more power in terms of CPU and RAM than can be provided by a standard Raspberry Pi. More performant inferences call for the use of dedicated hardware extensions. Regarding the learning paradigm, we saw that using models trained with reinforcement learning in simulations is quite difficult to transfer to the real-world.
Thus, applying systematic testing in RL is quite likely to improve outcomes. Regarding simulation, testing could focus on the following questions:
- Is the real-world (field of application) properly reflected in the simulation environment?
- Is the simulation environment still simple enough?
Also, special care has to be taken in the design of the return function. Testing might support this process by focusing on questions such as the following:
- Does the return function direct agents as expected? Does it incentivise and punish in a way that really makes agents behave as desired?
- Are there any disincentives included in the return function?
Finally, testing might significantly support validating and optimising models:
- How to find suitable hyper-parameters?
- Does a trained agent behave as expected (both in simulation as well as in real-world) – especially when stochasticity is involved (which is the case in many RL scenarios)?
Written by Alexander Pohl.
0 Comments