Convolutional neural networks (CNNs) are going to play a crucial role in the development of self-driving cars. Applications include driver alertness monitoring, driver gaze tracking, seat occupancy and more
Using CNNs in Cars
Bryce Johnstone | Imagination Technologies
Convolutional neural networks (CNNs) are going to play a crucial role in the development of self-driving cars. Applications include driver alertness monitoring, driver gaze tracking, seat occupancy, road-sign detection, drivable path analysis, road user detection, driver recognition and much more. As the number of autonomous vehicles and smart transportation systems increases in the coming years, so too will the applications.
As we move through the levels of autonomous driving, the need for more sensors and specifically cameras is crucial. Autonomous vehicles need to understand and interpret the surrounding through LIDAR, RADAR, and cameras, as well as infrared and other technologies. Sensor fusion is the point at which the information extracted from each of these sensor types is merged together and an understanding of what is happening in the car’s immediate environment becomes understood so that the vehicle or driver can react to it.
Today, we are currently at Level 2 bur moving to Level 3, where the driver's hands are still on the steering wheel and the driver is still engaged, up to Levels 4 and 5 means that the processing has to handle a lot more data from more sensors. It has to process this fast enough so that appropriate actions can be taken, such as automatic braking and automatic steering for example.
Hardware is key
To address the challenge of extracting useful information from the numerous camera inputs systems designers leverage the inherent efficiency of CNNs. CNNs are a form of machine learning that process data in a similar way to the human brain. Neural networks use different layers of mathematical processing to over time make increasing sense of the information they receive, from images to speech to text and beyond. For neural networks to actually work, they must be trained by feeding them massive amounts of data.
With a CNN, this might be done for image classification. In the case of a road-sign recognition system, hundreds of thousands, if not millions of images of road signs at different angles, distances, exposure levels, colours, and shapes would need to be run through the system. A set of coefficients for the various nodes/layers of the CNN would be built up so that the particular road-sign could be identified. Do note that there is a stage called inference, which as it suggests infers new data coming into the sensor and decides whether this is a road-sign or not. The neural network learns by training offline initially and then has the possibility to further learn in real-time through inference. Training a network for high accuracy takes a substantial amount of compute power and is often done on large arrays of GPUs in data centres.
CNNs can run on a generic computing resource such as a CPU or a GPU using its compute capabilities. GPUs are inherently better than CPUs at performing neural network algorithms providing a huge boost in performance as well as a large reduction in power dissipation. However, if we target dedicated hardware blocks optimised uniquely for the types of networks and algorithms being run, we can see an opportunity for yet another step-change in performance as well as a reduction in power. By driving these two vectors: power and performance, we enable highly complex neural network calculations to be executed at the network edge without requiring the work to be done in the cloud.
Keeping it local
In many cases, the inferencing could be run on powerful hardware in the cloud but for many reasons, it’s now time to move this to edge devices. Where a fast response is required, it’s not practical to run neural networks over the mobile network due to latency issues and worse, connection issues. Moving it on-device eliminates many of the potential security issues that are becoming a bane of the auto industry. As cellular networks may not always be available, be they 3G, 4G or 5G, dedicated local hardware will be more reliable, offer greater performance and, crucially, reduced power consumption.
A car typically drives at speeds of 70mph (34m/s) on British motorways. Without hardware, it would need to anticipate obstacles 20-50m ahead to avoid a collision but because of latency, bandwidth, and network availability, it’s impossible to do this over the cloud. With a true hardware solution, the car can run multiple neural networks to identify and track objects simultaneously at only 1m distance.
These days, our smartphones are repositories of our pictures, and typically there might be 1,000 or more photos on the device sorted for us automatically in a variety of ways, including, for example, identifying all photos featuring a particular person. This requires analysis; a premium GPU running a neural network could process in around 60 seconds – but a dedicated CNN accelerator can do it in just two seconds.
Then there’s battery life. A GPU could process around 2,400 pictures using 1% of the battery power. Conversely, consuming the same amount of power, the PowerVR Series2NX will handle 428,000 images, demonstrating its leadership with the highest inference/mW in the industry.
Flexible bit-depth support
To make these use cases possible, neural network accelerators such as the 2NX have been designed from the ground-up for efficient neural network inferencing. So what makes hardware acceleration technology different from other neural network solutions, such as DSPs and GPUs?
First, there are the ultra-low power consumption characteristics that leverage our expertise in designing for mobile platforms. Secondly, our flexible bit-depth support, available on a per-layer basis. A neural network is commonly trained at full 32-bit precision, but doing so for interference would require a large amount of bandwidth and consume a lot of power, which would be prohibitive within mobile power envelopes; i.e. even if you had the performance, your battery would take a huge hit.
To deal with this, certain advanced accelerators offer a variation of the bit-depth for both weights and data, making it possible to maintain high inference accuracy, while reducing bandwidth requirements, with a reduction in power requirements. There is only one solution on the market to support bit-depths from 16-bit, as required for use cases which mandate it, such as automotive, down to 4-bit.
Unlike other solutions, 2NX does not need a brute-force approach to reduced bit depth. It can be varied on a per-layer basis for both weights and data, so developers can fully optimise the performance of their networks. It also maintains precision internally to preserve accuracy. The result is higher performance at lower bandwidth and power.
In practice, such implementations require as little as 25% of the bandwidth compared with competing solutions. Moving from 8-bit down to 4-bit precision for those use cases where it is appropriate enables devices to consume 69% of the power with less than a 1% drop in accuracy.
Looking to the future
As our world becomes ever more expectant of computers and devices having a greater understanding of the world around them, neural network acceleration represents the most efficient way to extract meaningful information out of the camera captured image. This advancement is sitting at the heart of autonomous driving solutions as we move to partially and fully autonomous driving over the next decade. The need able to process these heavy algorithms in real-time and within ever stricter power budgets means that using edge devices will be crucial to the success of this autonomous vehicle technology.
The content & opinions in this article are the author’s and do not necessarily represent the views of RoboticsTomorrow
Comments (0)
This post does not have any comments. Be the first to leave a comment below.