Defence Jie Lei: Accelerating Deep Learning on Low-Power Processors and Heterogeneous Platforms

Jie will defend his thesis titled Deep Learning Inference on Low-Power Commodity Processors and the AMD Versal AI Engine on 15.10.2024!

As deep learning continues to shape modern applications such as Natural Language Processing (NLP) and Convolutional Neural Networks (CNNs), the demand for energy-efficient, high-performance inference systems is growing rapidly. In his doctoral dissertation, MSc Jie Lei advanced the field by developing innovative techniques to accelerate General Matrix Multiply (GEMM) operations on low-power processors and specialized hardware platforms like the AMD Versal AI Engine (AIE), making deep learning more feasible in resource-limited environments.
Improving GEMM Operations in Deep Learning
GEMM operations play a critical role in deep learning inference, but optimizing them for power- and resource-constrained systems remains a challenge. In his research, MSc Jie Lei developed cutting-edge methods for enhancing GEMM performance across various architectures, including ARM, RISC-V, and AMD’s Versal AIE.
– “My research focuses on improving the efficiency of GEMM operations, which are foundational for deep learning tasks, by developing energy-efficient, high-performance kernels,” Jie Lei explains.
Key advancements of the study include:
ARM and RISC-V architectures: Developing mixed-precision GEMM kernels that leverage SIMD units to boost performance while lowering power consumption.
Template-based micro-kernel generation: Creating a flexible tool for generating optimized GEMM kernels, enabling performance tuning for ARM Neon and RISC-V Vector extensions.
AMD Versal AIE: Designing architecture-specific kernels that achieve up to 86.7% of the peak performance on a single AIE tile through low-level intrinsics.
Multi-tile scalability: Proposing a parallel GEMM design that reduces computation time by up to 31.5x across multiple AIE tiles.
Applications in Power-Constrained Systems
The optimization of GEMM operations has wide-ranging applications across industries that rely on embedded systems and low-power processors. These advancements enable more efficient deep learning inference in edge computing, mobile AI, and IoT networks, where both power and performance are critical.
– “These methods have the potential to improve a variety of AI-driven systems, from mobile applications to IoT devices, by offering high-performance solutions within power-constrained environments,” Jie Lei says.
Public Defence on Tuesday, 15 October
The doctoral dissertation of MSc Jie Lei in the field of Computer Science, titled Accelerating Deep Learning on Low-Power Processors and Heterogeneous Platforms, will be publicly examined at the Universitat Politècnica de València, at 11.30 UTC+2 on Tuesday, 15 October 2024, in Building 1H, room “Sala de Presentaciones 1” (second floor). Committee is composed of: Roman Iakymchuk (Uppsala University, Sweden); Jorge Gonzalez (Universidad de A Coruña, Spain); Pedro Alonso (Universitat Politécnica de Valencia, Spain)