Autonomous Driving (AD) requires complex software frameworks capable of performing real-time processing of massive amounts of diverse data (in the order of gigabytes of data per second), consistently coming from a score of on-board sensors, like cameras and Light Detection and Ranging devices (LiDARs), just to mention a few. This amount of data is inherently involved in the critical AD decision-making process, from perception to motion planning, for which advanced AI algorithms are sought after. One of the most widely used software frameworks in AD is Apollo.
Apollo uses a variant of YOLO for camera-based object detection, as one of the main parts of the perception module. YOLO (You Only Look Once) is a widely-used object detection system. Its most computationally-intensive function is a Convolutional Neural Network(CNN) inference algorithm. Every second, each camera captures multiple frames, and the object detector processes them on a frame-by-frame basis.
Neural Networks in general, by its own nature, exhibit a high degree of tolerance to errors and imperfections. Exploiting this trait, researchers from Universitat Politècnica de Catalunya (UPC) and Barcelona Supercomputing Center (BSC) aim, within the framework of the DRAC project, to design a hardware accelerator that consumes less power by using approximate computing principles. Two approaches are considered:
-
Use reduced precision in the number representation. The YOLO implementation uses by default 32-bit floating point number representation (FP32). Reducing the number of bits implies a reduction in power consumption since a smaller number of lines and elements are spending power when doing the computation.
-
Use approximate arithmetic units. These units are designed simplifying the add and multiply operations such that they are faster although they do not give exact results for certain input combinations. Given the inherent tolerance of CNN to errors, this fact is exploited by operating the arithmetic units at lower voltage so that they give the same speed at a smaller power consumption.
Alternatives such as FP16 (16 bits), FP8 (8 bits), or Posits-16 (16 bits) are being evaluated in terms of accuracy of detection using the same YOLO network. On the other hand, several integer and floating point arithmetic units have been designed in RTL. Their accuracy and energy savings have been studied for different conditions of voltage and clock frequency.
The results of these analyses will be useful to implement a hardware accelerator based on these approximations. This is the final goal of the P4 project in DRAC.