APROPOS/DELTA Winter School Report

8.3.2022

The APROPOS/Delta Winter School focused on Machine Learning techniques and the challenges currently residing in their utilization. As the APROPOS project aims to reduce the energy consumption in computers, the consequences of the massive use of Machine Learning in processing today have been reviewed regarding processing requirements and energy consumption. During this school, 14 experts in various fields such as Machine Learning, Internet-of-Things (IoT), or embedded systems presented their state-of-the-art research about the energetic and processing speed challenges of Machine Learning. This school was meant for a broad audience, from individuals relatively new to the concepts of Machine Learning to more experienced users, with more than 50 persons attending it.

Basics of Machine Learning

Machine Learning is one of the most trending research topics in computer engineering nowadays. Its usage is happening almost everywhere. Although it has gained much momentum in recent years, the first concepts can be dated to more than 60 years ago, at the beginning of the computer era. During its presentation about “Machine learning and deep neural networks,” Luca Benini (UNIBO) started by showing the evolution of the interest in Machine Learning since 1957 when the idea of the “perceptron” was first invented. It represented the first algorithm based on learning, meaning the actions he must perform are not explicitly coded by a programmer but instead learned by the program itself on a training procedure. What must be designed is the code that will learn, but not the action specifically. However, perceptron became too limited in capacities, as they can only learn linearly separable patterns. Therefore, interest decreased around 1969 until 1979, when the concept of multi-layer perceptron rose, which could further push the limits of ML.

The concept of self-writing code was quite revolutionary. It meant that some complex problems that cannot be fully and explicitly defined by a model could still be interpreted by a program after it has been properly trained. A “basic” example is an image classifier whose purpose is to recognize the contents of a picture. While this is a tough problem to write in conventional coding, it is one of the first application cases of Machine Learning. Given a proper training set, a program can be trained to recognize patterns in images to classify its content. A direct application of this is used every day in the number recognition algorithm performed in automatic check deposit machines in banks. The number recognition system is a trained ML code integrated into the machine so that it can recognize any hand-written number.

The explosion of ML took off around 2006, as before it was always limited by the processing power and storage available at that time. With new processing capacities, ML is now in almost any application that deals with significant amounts of data. However, with increasing processing power also arrived power consumption challenges. ML models and applications nowadays range from data center level systems, with Peta-Ops/sec (10⁶ Ops/sec) and up to 10⁴W peak power, all the way to an embedded system, with Giga-Ops/sec and 10^-2 W peak power. With such a wide range capacity, it can be easily understood that ML is a very vast world and that each model characteristic is dependent on the application. Its massive usage in the computing world leads to significant energy consumption. As energy and its usage is one of the top world’s challenges for the current century, the need for such consumption can be questioned. And new more efficient and optimized ways of performing ML should be reviewed.

Summary of the presentations

Jari Nurmi (TAU) Opening, introducing DELTA and APROPOS.
Tagliavini, Giuseppe (UNIBO) Approximation by word length adjustments.
Kanduri, Anil (UTU) Exploiting accuracy tradeoffs in Edge orchestration.
Benini, Luca (UNIBO) Machine learning and deep neural networks.
Guy, Nili & Mandler, Benjamin (IBM Haifa Research laboratory) Autonomous Drone Inspections and real-time AI analysis.
Quintana-Orti, Enrique; Agosta, Giovani; Fornaciari, William (POLIMI) Machine learning applications.
Boybat-Kara, Irem (IBM Zurich Research Laboratory) In-memory computing for AI.
Zalani, Vidhi (IBM Yorktown Research Laboratory) Hardware acceleration for Machine Learning.
Talvitie, Jukka (TAU) Machine learning for communications and positioning.
Boutelier, Jani (UWASA) Efficient machine learning for imaging applications.
Stahis, Dimitrios (KTH) BCPNN (model of cortex) and SOM based genome recognition as approximate computing benchmarks.
Hemani, Ahmed (KTH) SiLago platform and plans to extend it for approximate computing.

Giuseppe Tagliavini presented the second lecture from the University of Bologna (UNIBO). The focus was the precision of computation, how to define it, and the consequence of reducing the bitwidth and quantization. Precision tuning can be used for floating-point and fixed-point numbers. The pros and cons of floating-point and fixed-point are mentioned in this presentation. Also, he introduced libraries such as MPRF, Flexfloat, FloatX for floating-point quantization. The primary usage of this quantization is in machine learning applications that are mostly compute-intensive.

Anil Kanduri presented the third lecture from the University of Turku. His speech was about “Exploiting Accuracy Trade-offs in Edge Orchestration.” In traditional cloud infrastructure, the primary processing and calculations have occurred in a cloud server. The user-end devices only sampled the data and sent the data through a network to the cloud. Due to the network latency, bandwidth, results, energy, and privacy, the edge layer is added to the infrastructure. So now data processing is divided between edge and cloud. The task distribution between them depends on the application and required performance of the system. The following part explained the orchestration in the eHealth-Pain Monitoring application.

The second day started with a presentation about “Autonomous Drone Inspections and real-time AI analysis” by Nili Guy and Benjamin Mandler from IBM Haifa Research Laboratory. That was about how Artificial Intelligence would work in drones. They explained about their designed drones and their results.

Next, Enrique Quintana-Orti, Giovanni Agosta, and William Fornaciari, from UPV and POLIMI, talked about IoT devices and how to decrease the energy consumption of calculations due to the battery life of the devices. Also, real-time processing is a critical issue in IoT devices. They introduced a precision tuning framework called TAFFO (Tuning Assistant for Floating-point to Fixed-point Optimization). The other part of the talk was about monitoring the power of a system in software or hardware. It was mentioned that monitoring the power in hardware is more accurate and efficient. They used regression as an ML technique for accurate power modeling, but this will increase the possibility of side-channel attacks. So, they discussed the solutions that have been proposed to solve this issue.

One of the most interesting topics in this winter school was “In-memory computing for AI” by Irem Boybat, a research staff member in IBM Research – Zurich. The main challenge of implementing NNs in hardware architectures is loading and storing data from/ to the memory. It is the most power-consuming part. One of the solutions for decreasing this large amount of power consumption is In-Memory computing that is achieved by analog memristors. The phase-change memory can do matrix multiplications. So memristors can be used for matrix multiplication in DNNs. Then there should be Digital to Analog and Analog to Digital convertors beside the memory array of unit cells. The precision of the operation depends on the physical properties of the resistors. In the end, she discussed some of the in-memory computing architectures.

The last talk on the second day was “Hardware Acceleration for Machine Learning” by Vidhi Zalani from IBM. Like other presentations, the focus of this talk was deep learning. She analyzes deep learning in three aspects: Approximate Computing, Hardware Accelerator, and custom compiler. Precision scaling is used for approximate computing. By quantization, the bitwidth of data representation will be changed. This will save compute, memory footprint, and bandwidth. Also, deep learning operations require a mix of precisions to maintain accuracy.

The last day started with “Machine learning for communications and positioning” by Jukka Talvitie from TAU. This lecture was about how ML can be used in wireless networks. There are some statements about how ML will affect future wireless networks. Now it is mainly used in 5G networks. Then Jukka discussed the challenges of ML in communication networks. In the end, he mentioned ML applications and use cases in this area.

The following presentation was “Efficient Machine Learning Inference with emphasis on image recognition” by Jani Boutellier from the University of VAASA. His focus was on how to map NN on hardware accelerators. Since GPUs are power-hungry, one efficient way to implement NNs is through hardware accelerators.

The last two presentations were from KTH. First, it was the “BCPNN (model of cortex) and SOM based genome recognition as approximate computing benchmarks” presented by Dimitrios Stathis.

BCPNN stands for Bayesian Confidence Propagation Neural Network. Its learning rule is based on Bayes Theory and Hebb Learning, and its Neuron behavior is based on the LIF (Leaky Integrate and Fire) model & winner-take-all mechanism. He described the structure of the BCPNN. The atomic unit is Mini-Column Unit (MCU). The Group of MCUs is called Hyper-Column Unit (HCU). HCUs are connected via synapses, and MCUs generate and receive spikes. Then the data structure of HCU is mentioned, and models of computation are discussed. The structure was implemented on eBrainII that has a custom 3D-Dram. As shown in the results, the main part of the power is consumed by Dram. So, they proposed the Column Update Elimination (CUE) for the model of computation. He mentioned the opportunities for approximation and techniques that can be applied.

The second part of his presentation was about the SOM algorithm used for bacterial identification. First, he discussed the algorithm. He then discussed how quantization works in SOM. He also mentioned some other possible approximations could be an approximation in arithmetic, heavy quantization of the weights during inference, and dynamic scaling to solve the problem with score accumulation.

The last presentation was “SiLago platform and planned to extend it for approximate computing” by Ahmed Hemani from KTH University. First, he gave a brief review in approximation, and he mentioned that energy and latency are dominated by wires and storage. He introduced the concept of Synchoricity. If space is discretized using a virtual grid, blocks Can be spatially composed. If the number of grid cells in each dimension are equal and their interconnect edges are abuttable. As an example, Lego Toy is a Synchoros system. According to this definition, SiLago Blocks are the new standard cells. Next, he discussed the SiLago architecture and its enhancements. Also, he described ongoing projects on BCPNN and memristors.

APROPOS/DELTA Winter School Report

Antoine Grenier

About me

Saba Yousefzadeh

About me