As deep learning continues to shape modern applications such as Natural Language Processing (NLP) and Convolutional Neural Networks (CNNs), the demand for energy-efficient, high-performance inference systems is growing rapidly. In his doctoral dissertation, MSc Jie Lei advanced the field by developing innovative techniques to accelerate General Matrix Multiply (GEMM) operations on low-power processors and specialized hardware platforms like the AMD Versal AI Engine (AIE), making deep learning more feasible in resource-limited environments.
Improving GEMM Operations in Deep Learning
GEMM operations play a critical role in deep learning inference, but optimizing them for power- and resource-constrained systems remains a challenge. In his research, MSc Jie Lei developed cutting-edge methods for enhancing GEMM performance across various architectures, including ARM, RISC-V, and AMD’s Versal AIE.
– “My research focuses on improving the efficiency of GEMM operations, which are foundational for deep learning tasks, by developing energy-efficient, high-performance kernels,” Jie Lei explains.
Key advancements of the study include:
ARM and RISC-V architectures: Developing mixed-precision GEMM kernels that leverage SIMD units to boost performance while lowering power consumption.
Template-based micro-kernel generation: Creating a flexible tool for generating optimized GEMM kernels, enabling performance tuning for ARM Neon and RISC-V Vector extensions.
AMD Versal AIE: Designing architecture-specific kernels that achieve up to 86.7% of the peak performance on a single AIE tile through low-level intrinsics.
Multi-tile scalability: Proposing a parallel GEMM design that reduces computation time by up to 31.5x across multiple AIE tiles.
Applications in Power-Constrained Systems
The optimization of GEMM operations has wide-ranging applications across industries that rely on embedded systems and low-power processors. These advancements enable more efficient deep learning inference in edge computing, mobile AI, and IoT networks, where both power and performance are critical.
– “These methods have the potential to improve a variety of AI-driven systems, from mobile applications to IoT devices, by offering high-performance solutions within power-constrained environments,” Jie Lei says.
Public Defence on Tuesday, 15 October
The doctoral dissertation of MSc Jie Lei in the field of Computer Science, titled Accelerating Deep Learning on Low-Power Processors and Heterogeneous Platforms, will be publicly examined at the Universitat Politècnica de València, at 11.30 UTC+2 on Tuesday, 15 October 2024, in Building 1H, room “Sala de Presentaciones 1” (second floor). Committee is composed of: Roman Iakymchuk (Uppsala University, Sweden); Jorge Gonzalez (Universidad de A Coruña, Spain); Pedro Alonso (Universitat Politécnica de Valencia, Spain)