APROPOS at HiPEAC ACACES 10-16.07.2022

The 18th HiPEAC Summer School on Advanced Computer Architecture and Compilation for High-Performance Embedded Systems (ACACES) 2022 took place in Fiuggi (Italy) and consisted of courses focusing on sustainability, computer architecture, hardware and software security, compilation and hardware-software co-design, high-performance computing, and artificial intelligence (AI). In addition to these, the school also provided an inspiring keynote on quantum computing, an invited talk on entrepreneurship, a career session with representatives from both academia and industry, and a poster session for attendees to present their work.

More details about the event are available at https://www.hipeac.net/acaces/2022/#/

Courses

The Summer school’s main component were the courses, which were arranged into four slots of three parallel courses each day. Students were permitted to select one course in each slot to match their personal research interests. Our ESR number 10, Hans Jakob Damsgaard, attended the courses summarized below:

Sustainable Computing” lectured by Lieven Eeckhout (Ghent University, Belgium)*

The information and communications technology (ICT) sector currently makes up more than 2% of the global emissions of greenhouse gasses, and its contribution is expected to increase. Addressing this now is crucial! Doing so means that computer architects and system designers should introduce sustainability as a key parameter when designing future computing systems. However, till recently, it has been unclear which factors affect sustainability of computing the most. This course focused on exactly this issue and provided a clearer definition of sustainability, models for sustainable decision making, and first-order models of emissions over an electronic device’s lifespan.

The emission models for different products led to some particularly interesting observations. Firstly, computing seems to be affected by Jevon’s paradox (that improved efficiency with which a resource is used leads to increased consumption of that resource) much like any other technology has been throughout history. Secondly, emissions from manufacturing and end-of-life treatment dominate operational emissions for battery-powered devices, while the opposite relation is true for always-on devices. Moreover, both product categories’ manufacturing emissions are trending upward as modern transistor technologies require more manufacturing steps. Lastly, shifting all manufacturing and operation to renewable energy sources is insufficient to reduce emissions from the ICT sector to meet international development goals. The most impactful factor is die size – a positive, as this is easy to achieve by limiting the area of circuits being produced; a trend which occurs naturally due to Moore’s law.

*course adapted from https://studiekiezer.ugent.be/studiefiche/en/E034500/2022

RISC-V: Open ISA, Processors and Systems from AI-enabled IoT to HPC” lectured by Francesco Conti (University of Bologna, Italy, and ETH Zürich, Switzerland)

RISC-V has transformed computer architecture. Its license-free flexibility and broad industry adoption makes it perfect for both teaching, research, and production purposes. This course first presented the instruction set architecture (ISA) in detail and demonstrated some simple and advanced core microarchitectures. Research groups at University of Bologna and ETH Zürich have collaborated to provide an extensive catalogue of core and cluster designs as part of their PULP project – all of which are open source. The lectures proceeded to cover some of these designs and the supporting tools in more detail.

The overarching goal of PULP is to achieve incredible energy efficiency with highly optimized general-purpose cores and specialized accelerators. Clustering such cores in highly parallel systems, however, enables applications ranging from small AI models in embedded systems to heavy streaming applications in HPC-like environments. Their designs also make full use of RISC-V’s extensibility by implementing custom instructions to support hardware loops, fine-grained bit manipulation, and sub-word arithmetic. A corresponding compiler toolchain targets their clusters and emits the custom instructions whenever possible. Most recently, the project has focused on developing neural engines and matching tools for quantizing, tiling and scheduling AI workloads on embedded designs. The lecturer demonstrated these developments with a nano-drone capable of running automatic path generation and obstacle avoidance at extremely low power consumption.

Software-hardware Co-designs: The Compiler Science Behind the Spark” lectured by Alexandra Jimborean (Universidad de Murcia, Spain, and Uppsala University, Sweden)

Historically, hardware and software has been developed by separate teams of researchers and engineers, and either part has been able to trust the other to find significant performance improvements every development cycle. Recently, however, the performance of single processing cores has stagnated, and new development goals have been introduced – namely energy efficiency and security. These goals mean imply a need for software-hardware co-design, namely that the two should work closer together to optimize software and hardware for each other.

Many of the resulting techniques are based in the compiler and require only minimal microarchitectural changes to support them. The lecturer covered the state-of-the-art research in the field including performance and energy efficiency optimizations such as software prefetching to better utilize existing voltage-frequency scaling hardware; load reordering, for improving throughput in both in-order and out-of-order cores due to greater memory-level parallelism; and identifying and automatically parallelizing independent code regions supported by a custom cache coherence policy. Focusing instead on security, the compiler can reorder instructions to significantly reduce the impact of techniques such as shadowing which can otherwise nearly cancel the benefits of speculative execution.

Compiler Challenges for Heterogeneous Architectures” lectured by Henri-Pierre Charles (CEA Grenoble, France)

Modern systems are complex and implement many other processing architectures than traditional general-purpose cores. Compiling programs to target such diverse architectures is difficult and demands tool flexibility. The course addressed this issue by promoting compiler research to its participants, from defining a domain-specific language and writing parser and compiler parses to support it, to extending existing tool flows to target new architectures and simulating the resulting programs. Examples of such architectures are in-memory computing-based accelerators, often applied to parallel arithmetic-heavy applications such as machine learning.

After the more general introduction, the lecturer proceeded to their own field of research: dynamic compilation at runtime; a new technique to improve program adaptability. By including a small compiler (a so-called “compilette”) in an application binary, the application can, at runtime, generate different versions of specific functions in response to varying input data or precision requirements. While this remains a new field of research, the first results achieved are promising – particularly in embedded systems with limited storage space for program binaries.

Keynote summary

The keynote “The Quantum Decade” by Andrea Corbelli (IBM Infrastructure Technical Sales, Italy) motivated research into quantum computing as a potential solution to solving exponentially complex problems. Quantum computing can be considered a third essential technology for solving modern problems, the other two being traditional (math-inspired) computing based on bits and neuromorphic (biology-inspired) computing based on neurons. Quantum computing is hugely different from traditional computing in that its computations based on qubits are quantum mechanical and their states reasoned about through probabilities. The interesting feature is that each qubit exists in several states concurrently and that this number of states grows exponentially as more qubits are entangled, i.e., linked quantum mechanically. This enables executing hugely parallel workloads rapidly and efficiently.

The speaker highlighted research areas which need further development to make quantum computing more available to traditional developers. These include high-performance low-level quantum circuits, efficient quantum algorithms to make use of them, and applications using libraries of the algorithms. To smoothen the transition, languages and tools should be like those already used by developers. Following these developments, IBM expect to have practical quantum applications demonstrating quantum supremacy by the end of the 2020s. The greatest challenge in quantum computing is not technology – it is hiring enough people!

Invited talk summary

The invited talk “Silexica: From University Spin-off to Exit” by Maximilian Odendahl (Xilinx/AMD, USA) was a presentation of the entrepreneurial career of the speaker and a group of their doctoral colleagues. The group created the company Silexica, which worked in optimizing algorithms for electronic design automation flows, after finishing their doctoral studies. They grew the company, licensing their developments to other companies in the same line of business, before eventually being bought out by Xilinx (currently AMD). The speaker described in detail how the team had struggled in the early stages of running the company, with simple issues like selecting a suitable slogan and raising funds to more troublesome ones like identifying the proper customer companies, but also how they had succeeded through continuous pivoting. Overall, the talk was highly motivational!

Acronyms:

  • AI: Artificial intelligence
  • HPC: High-Performance Computing
  • ICT: Information and Communications Technology
  • IoT: Internet of Things
  • ISA: Instruction Set Architecture
  • PULP: Parallel Ultra-Low Power
  • RISC: Reduced Instruction Set Computer

Jie Lei

  • ESR 9
  • Universitat Politecnica de Valencia - IBM
More information

Hans Jakob Damsgaard

  • ESR 10
  • Tampere University - ISW
More information

Lev Denisov

  • ESR 8
  • Politecnico di Milano - IBT Systems
More information