ESR explains – Using TAFFO for precision tuning of floating-point computations

17.10.2022

Automated precision tuning

Floating-point arithmetic is a major contributor to the energy consumption of computer programs [1,2]. It is more complex than integer arithmetic and consequently requires more elaborate hardware implementation, more software operations, or both. Higher-precision types such as IEEE double (64 bit) require more memory than float (32 bit) or half (16 bit) and have more complex implementations requiring more CPU cycles and consuming more energy. Some programs that operate heavily with floating-point computations in some cases can be made faster or more energy efficient by using reduced precision [3,4]. Most programs are written with uniform precision, using float or double types for every variable. But only some operations require the same precision, and many of them can be performed with lower or fixed-point precision without significant difference in the numeric accuracy. In most cases, it is hard to identify and change these types manually as it requires deep analysis of the operations. Automated methods help to overcome this challenge. These methods are usually based on variable range and accumulated error analysis with various approaches such as interval arithmetic or affine arithmetic.

What is TAFFO?

TAFFO [5,6] is a framework for tuning the precision of floating-point calculations. The origin of name TAFFO is from “Tuning Assistant for Floating and Fixed point Optimization”. It is based on the LLVM compiler framework and works as a step in the compilation process. TAFFO is an open-source tool and it is available on GitHub [7].

TAFFO works with programs written in C/C++, and it requires the user to annotate the range of the input variables. From this information, it derives the ranges for the intermediate variables and finds the data type allocation that reduces precision without significantly affecting the accuracy of the result. If the static analysis is unsuccessful, it is possible to override its result with additional annotations.

Inside TAFFO consists of multiple LLVM passes such as VRA (Variable Range Analysis), DTA (Data Type Allocation), and Conversion (converts the program to the selected types). They analyse and modify the program’s LLVM intermediate representation (IR) that is compiled into a binary executable file. The stages pass the information between the compilation passes in the form of annotations in LLVM IR that the programmer can inspect.

There are two models for data type allocation implemented in TAFFO: fixed-point allocation and Integer Linear Programming (ILP) model that tries to allocate the program in the mix of fixed-point and floating-point types to achieve the best performance and accuracy.

In general, using TAFFO includes two steps: program annotation and compilation with the selected parameters. ILP model requires trade-off parameters provided in compilation, so it might take several tries to find the best configuration for your use case.

The compilation step includes using the TAFFO binary to compile the sources. It is designed to be a drop-in replacement for clang; it accepts the same arguments in addition to some taffo-specific arguments. It is possible to list the additional taffo-specific arguments by calling `taffo –help`.

Annotations in TAFFO

Precision tuning with TAFFO requires annotating the input and output variables. Every floating-point variable can be annotated with a small DSL (Domain-Specific Language):

scalar(): for homogeneous data types (float, float[]). The fields inside, white-space separated, describe the variable:
- range(min, max): specifies the base range
- disabled: disables type conversion of the variable
- final: forces range not to be changed
struct[]: for heterogeneous data types (structs, classes, etc.):
- Other data type patterns, recursively separated by commas for each element. If unspecified, use void.
target(‘var_name’): used to specify variables considered as the output. It is mandatory to have at least one target.

The following listing shows an excerpt of a program containing annotations on the input variables.

__attribute((annotate("scalar(range(-100,100))"))) float A[NI][NK];
__attribute((annotate("scalar(range(-100,100))"))) float B[NK][NJ];
__attribute((annotate("scalar()"))) float C[NI][NJ] = { 0 };
__attribute((annotate("scalar()"))) float alpha = 1.5, beta = 1.2;

Example

In this post, we will discuss using the simpler data type allocation – fixed point; we will compile a simple program with TAFFO and see the result. Our program will read five numbers from the input into an array and calculate sum, difference, division and multiplication using these numbers. We will put constraints on the input to be within the (-3000, 3000) range, and also specify that the division result will stay in the same range, and prevent TAFFO from overriding it with the final keyword. We will compile the program using the simple model converting variables to fixed-point.

#include <stdio.h>

#define MAX_N (5)

int main(int argc, char *argv[])
{
  float numbers[MAX_N] __attribute((annotate("scalar()")));
  int n = 0;
  float tmp __attribute((annotate("scalar(disabled range(-3000, 3000))")));

  for (int i=0; i<MAX_N; i++) {
    if (scanf("%f", &tmp) < 1)
      break;
    numbers[n++] = tmp;
  }

  float add __attribute((annotate("target('add') scalar()"))) = 0.0;
  float sub __attribute((annotate("target('sub') scalar()"))) = 0.0;
  float div 
    __attribute((annotate("target('div') scalar(range(-3000, 3000) final)"))) = 1.0;
  float mul __attribute((annotate("target('mul') scalar()"))) = 1.0;

  for (int i=0; i<n; i++) {
    add += numbers[i];
    sub -= numbers[i];
    if (numbers[i] != 0.0)
      div /= numbers[i];
    mul *= numbers[i];
  }

  printf("add: %f\nsub: %f\ndiv: %f\nmul: %f\n", add, sub, div, mul);
  return 0;
}

Then specify a path to TAFFO binary files as well as specify the path to LLVM. For convenience, we also create a build directory.

export TAFFO_LIB_DIR="/home/denisovlev/Projects/TAFFO/dist/lib"
export TAFFO_BIN_DIR="/home/denisovlev/Projects/TAFFO/dist/bin"
export LLVM_DIR="/home/denisovlev/Data/llvm-12-debug/"
export PATH="$TAFFO_BIN_DIR:$TAFFO_LIB_DIR:$PATH"
mkdir -p build

We run compilation with taffo binary.

taffo -temp-dir build \
	-float-output build/array1.out.float \
	-o build/array1.out.fixp \
	-O3 \
	-debug \
	array1.c \
	2> build/taffo.log

The options we specify:

-temp-dir build: use the build directory we created for the intermediary files TAFFO creates
-float-output build/array1.out.float: build a floating-point binary with the specified name
-o build/array1.out.fixp: output fixed-point binary with this name
-O3: standard -O3 optimization option
-debug: output more debug information about the precision tuning process (this works better if TAFFO and LLVM are built in debug mode)
2> build/taffo.log: redirect debug output to a log file

At last, we run the program’s floating-point and fixed-point versions and compare the results.

echo "Floating point:"
echo '0.123 200.74 6.989 0.7 2.3' | ./build/array1.out.float 
echo '-----'
echo "Fixed point:"
echo '0.123 200.74 6.989 0.7 2.3' | ./build/array1.out.fixp

We can see that the output differs slightly but it stays within reasonable margins of the accurate result.

Floating point:
add: 210.852005
sub: -210.852005
div: 0.003599
mul: 277.830505
-----
Fixed point:
add: 210.852001
sub: -210.852001
div: 0.003597
mul: 277.828070

We can see what optimization TAFFO made by looking into the intermediary files TAFFO created. In this case we look at the final LLVM IR of the program after TAFFO converted it.</p style = “text-align:justyfy;”> It is located in the file build/array1.out.fixp.5.taffotmp.ll.

%s13_19fixp = alloca [5 x i32], align 16, !taffo.info !9
%3 = alloca float, align 4, !taffo.info !12, !taffo.initweight !13
...
%11 = load float, float* %3, align 4, !taffo.info !12, !taffo.initweight !18
%12 = fmul float 5.242880e+05, %11, !taffo.info !12
%13 = fptosi float %12 to i32, !taffo.info !12
%14 = add nsw i32 %.03, 1, !taffo.info !16, !taffo.constinfo !22
%15 = sext i32 %.03 to i64, !taffo.info !23
%s13_19fixp14 = getelementptr inbounds [5 x i32], 
    [5 x i32]* %s13_19fixp, i64 0, i64 %15, !taffo.info !9
store i32 %13, i32* %s13_19fixp14, align 4, !taffo.info !25
...
%s13_19fixp23 = add i32 %.05.s13_19fixp, %s13_19fixp19, !taffo.info !9, !taffo.target !31
...
%s13_19fixp22 = sub i32 %.06.s13_19fixp, %s13_19fixp18, !taffo.info !9, !taffo.target !32
...
%39 = sdiv i64 %37, %38, !taffo.info !35, !taffo.target !30
...
%44 = mul i64 %42, %43, !taffo.info !37, !taffo.target !28

We can see that TAFFO allocated the array of five numbers as integers instead of floats. The label that TAFFO assigned to the register “%s13_19fixp” indicates that TAFFO interprets it as a fixed-point number with 13 integer bits and 19 fractional bits. Furthermore, if we look at the rest of the program we can see that TAFFO automatically handles the conversion between floating-point and fixed-point representations and performs arithmetic operations in fixed-point.

What is next?

With this simple example, we have just scratched the surface of precision tuning. This example just converted the program into fixed-point without any regard to the target platform, and it will most likely be slower than the floating-point version on x86 architecture.

TAFFO offers the ILP model that actually takes into account the performance of different arithmetic operations in the target architecture and exposes the parameters to control the error to performance trade-off.

Nevertheless, this simple example introduces the basic features of TAFFO. In the future posts, we will discuss using the ILP model for your target architecture as well as using TAFFO to tune programs for embedded platforms such as STM32.

References

[1] Daniel Molka, Daniel Hackenberg, Robert Schöne, and Matthias S. Müller. Characterizing the energy consumption of data transfers and arithmetic operations on x86-64 processors. In International Green Computing Conference 2010, Chicago, IL, USA, 15-18 August 2010, pages 123–133. IEEE Computer Society, 2010. doi: 10.1109/GREENCOMP.2010.5598316. URL https://doi.org/10.1109/GREENCOMP.2010.5598316.

[2] Kiran Kumar Matam, Hoang Le, and Viktor K. Prasanna. Evaluating energy efficiency of floating point matrix multiplication on fpgas. In 2013 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–6, 2013. doi: 10.1109/HPEC.2013.6670345.

[3] Fabio Montagna, Simone Benatti, and Davide Rossi. Flexible, scalable and energy efficient bio-signals processing on the pulp platform: A case study on seizure detection. Journal of Low Power Electronics and Applications, 7:16, 06 2017. doi: 10.3390/jlpea7020016.

[4] D. Guenther, A. Bytyn, R. Leupers, and G. Ascheid. Energy-efficiency of floating-point and fixed-point simd cores for mimo processing systems. In 2014 International Symposium on System-on-Chip (SoC), pages 1–7, 2014. doi: 10.1109/ISSOC.2014.6972429.

[5] Stefano Cherubin, Daniele Cattaneo, Michele Chiari, Antonio Di Bello, and Giovanni Agosta. TAFFO: Tuning assistant for floating to fixed point optimization. IEEE Embedded Systems Letters, 2019. ISSN 1943-0663. doi: 10.1109/LES.2019.2913774.

[6] Stefano Cherubin, Daniele Cattaneo, Michele Chiari, and Giovanni Agosta. Dynamic precision autotuning with TAFFO. ACM Trans. Archit. Code Optim., 17(2), May 2020. ISSN 1544-3566. doi: 10.1145/3388785. URL https://doi.org/10.1145/3388785.

[7] TAFFO-org. Taffo. https://github.com/TAFFO-org/TAFFO, 2019.