In WP1 we study methods for evaluating, selecting, and configuring a platform and mapping a given, trained DNN onto the selected platform. It represents hardware specific topics. The embedded platform is designed and optimized for inference, not for training. For estimation techniques will consider and build upon existing analytic models or layer-wise energy estimation. In cases where analytic models do not provide sufficient fidelity and accuracy, we apply a data-driven approach. This work package’s research question is: Given a network, what is the best platform to use?
Further development of the estimators ANNETTE and Blackthorn to include power estimation, to generalize them for more platforms, to make them more robustand mature, and to extend them to RNNs.
By working with a stream of images, performance of detection can be increased by considering temporal patterns. We extend processing on embedded hardware to Video Object Segmentation and Tracking (VOST) applications. It includes handling sparse 3D convolutions as well as efficient point cloud processing. In cooperation with WP3.2 and WP3.3, we will find ways to apply those networks on embedded platforms in an optimized way.
A toolbox for platform specific optimization and mapping for our target platforms. The techniques developed in phase 1 for mapping of CNNs on different platforms will be further developed to automate them to a higher degree and to cover more platform variants. Moreover, it will be extended to RNNs like long-short term memories and Gated Recurrent Units as well.
We study methods for evaluating, selecting, and optimizing deep neural networks to be mapped onto a given target platform for inference. We explore the possible architectural choices and the exploration and mapping process of networks. Based on estimation techniques, various DNN choices, derived by top-down and bottom-up approaches can be evaluated for candidate target platforms. This work package’s research question is: Given a target platform and an application, what are the neural network design and optimization strategies?
Focus is placed on the following challenges:
We concentrate on combinations of optimization methods, rather than individual methods like pruning and quantization, as motivated in the introduction.
We cover CNNs and RNNs, driven by application use cases. Time series applications is a new focus area in this project.
In addition to single node platforms we target distributed implementations where the front-end processing pipeline is mapped to the embedded platform, and the back-end to a server.
The main platforms under consideration are Xilinx boards (e.g. Ultrascale and Versal AI Core Series with Vitis AI), Nvidia (Jetson), ARM based (e.g. RasberryPi, STM32 with ARM NN) and Intel (e.g. Myriad X). In addition, NXP’s i.MX 8M Plus has beenidentified as another platform of interest. “best” DNN design and optimization strategy?
The focus is on image and video driven hardware-aware optimization use case sconsidering appropriate DNNs.
The focus is on time series driven hardware-aware optimization use cases considering appropriate DNNs.
This task considers hardware, algorithms and software in the TinyML range, capable of performing on-device sensor data analytics at extremely low power, enabling always-on use-cases and targeting battery operated devices. For the case that the sensor node is too limited to host the complete DNN, the DNN has to be split into a front-end, mapped onto the platform of the sensor node, and a back-end, mapped onto a server. We develop platform aware DNN transformation and partitioning methods under the constraints of the sensor node platform, the communication link and the application requirements.
We develop a toolbox for platform aware DNN optimization, partitioning and mapping methods for our target platforms.
Our focus is on continuous learning systems implementable on embedded systems, with autonomous driving as important use case. A main scenario under consideration is as follows: Suppose you have a semi-autonomous car that uses a pre-trained network for object detection and recognition. The system was trained on a specific data set,e.g. a dataset collected in the USA. While you drive, the car continuously collects data. The border cases are of special interest, i.e. where the recognition system fails. After you come back to your home, this data is used to re-train the system. Thereby the system gets better each day you take it out. We have chosen this scenario because it is:
Close to on-line learning, but feasible to implement on the current available hardware.
A realistic application scenario for our industrial partner.
Providing us with the freedom to experiment with different on-line methods.
For life-long learning we need a running system that we can use to get data to retrain/reevaluate our models constantly. The goal of the demonstrator is two-fold. It serves asa realistic data collection platform that we have access to. For our scientific goals this is of immanent importance. It allows us to have a constant exchange platform with our industrial partner. We can easily transfer our research results, this gets more important as the project progresses.
Temporal information in 2D and 3D proves to be a valuable source for mining additional labels for either improving existing detectors, for domain transfer, or for adaptation. Both, short term temporal information (e.g. scene flow) or long term tracking information as obtained for multiple object trackers (MOT), is important. In particular, we extend ongoing work that focuses on unsupervised domain adaptation through re-training with pseudo-labels.
Unsupervised domain transfer is a hot research topic. The area of domain transfer for 3D data has hardly been addressed so far. Due to the shortage of labeled 3D data (as compared to images) this is a pressing problem. Our initial experiments show an extreme sensitivity of methods to changes in the LiDAR, mounting of the LiDAR, environment and even annotation style.
There is the need to perform short term learning or adaptation. A typical example is changing weather conditions or fast illumination changes. We study methods that are able to perform this fast learning, i.e. learning form a single example (or short video sequences). A particular example is a detector that has been trained in good weather conditions but needs to adapt quickly to rainy conditions. The approach discussed in shows that using simple auxiliary tasks like image rotation can be used to retrain from a single image. In fact, we can formulate this problem as a feature re-weighting problem using auxiliary tasks. If this is successful, this will open a whole new research area.