= 0.
sum for j in 1:n
for i in 1:m
= sum + A[i,j]
sum end
end
Type stability for performance
This page provides in introduction to performance enhancement in Julia. First, we’ll take a look at the fundamentals of computer hardware, understanding how processors and memory impact the speed of your code. Once we grasp these hardware concepts, we’ll explore how type-stability in Julia can significantly enhance performance by enabling the compiler to generate more efficient machine code.
Simplified hardware
(Ref. Jakob Nissen, Hardware for scientists)
Before we dive into the software side of performance, let us gain a basic understanding of computer hardware. The following schematic provides a simplified representation of a computer.
[CPU] \Leftrightarrow [RAM] \Leftrightarrow [DISK]
In this figure, where the arrows represent bidirectional data flows, the three important components are:
- The central processing unit (CPU) where all the computations occur;
- Random access memory (RAM) which is a temporary memory. Data stored here is lost in case of power failure;
- The disk which is the persistent storage of the computer.
The operations executed by the CPU are coordinated by its internal clock. A clock cycle refers to the time required to execute a basic operation. To be executed a program has to be broken down into CPU instructions, a set of operations. This translation is done by the compiler. Any optimisation performed at compile time allows to generate more efficient CPU instructions, which speeds up the program runtime.
The execution of a program requires data stored in the RAM. Note that the number of clock cycles required to transfer data from the RAM to the CPU is on the order of magnitude of 500 clock cycles.
In order to limit data transfers between the CPU and RAM, there are intermediate storage spaces with, however, limited capacity. Closest to the CPU compute cores are the registers which store very limited data compared to the various levels of cache further away. The simplified computer schematic can be updated.
[CPU] \Leftrightarrow [Registers] \Leftrightarrow [Caches] \Leftrightarrow [RAM] \Leftrightarrow [DISK]
Type-stability
(Ref. Chris Rackauckas, Parallel Computing and Scientific Machine Learning (SciML): Methods and Applications)
Common knowledge has it that Julia is fast because it is Just-in-Time (JIT) compiled. The actual reason is the type inference algorithm executed before the compiler combined with type specialization in functions.
Recall the hardware point of view. Let a CPU instruction read a value from a register. How to interpret the memory of that value?
This is straight forward if the instruction has the type information. Take a 64 bit integer and a 64 bit floating point number. They require the same memory space. However, the associated memory chunck is interpreted differently. The binary representation of that integer is [sign bit][value in two's complement on 63 bits]
and [sign bit][11 exponent bits][52 mantissa bits]
for the floating point number. This is what happens when writing code in the C or Fortran languages, which require explicit typing.
Let b \geq 2 be a base used for numerical encoding. Any positive number can be expressed in this base as a sum of the form
\sum_{k=0}^{q} c_k b^k.
Computers typically use binary encoding, where the base b is 2. The coefficients c_k, known as bits (and often grouped in sets of 8 called bytes), represent the digits in this base.
Integers
When encoding an integer in binary using 64 bits, the first bit indicates the sign:
0
represents a non-negative (positive or zero) number
1
represents a negative number
The remaining 63 bits encode the absolute value of the integer using the binary representation of its coefficients c_k.
Floating Point Numbers
Floating-point number encoding was standardized in 1985 by the IEEE 754 specification. In this format, a number x is represented as:
x = (-1)^s \times M \times 2^e
where: - s is the sign bit (0 for positive, 1 for negative), - M is the mantissa (also called the significand), - e is the exponent.
Example: Encoding x = 6.5
First, we convert 6.5 into binary: - 6 = 4 + 2 = 2^2 + 2^1, so in binary: 110
- 0.5 = 2^{-1}, so in binary: 0.1
Thus, 6.5 in binary is 110.1
.
To express this in normalized binary form, we rewrite it as:
6.5 = 1.101 \times 2^2
Here, the exponent e = 2, and the mantissa M = 1.101.
IEEE 754 uses a bias to store only non-negative exponent values. For double-precision (64-bit) floating-point numbers: - The exponent is stored using 11 bits, - The bias is 1023.
So the encoded exponent is:
e_{\text{encoded}} = e + \text{bias} = 2 + 1023 = 1025
This value (1025) is written in binary and stored in the 11-bit exponent field.
The mantissa stores only the fractional part after the leading 1 (which is implicit), so we store 101
(from 1.101
) and pad it with zeros to fill the 52-bit mantissa field.
In a dynamic typed language such as Python, the interpreter must operate type checks at runtime for every operation. Then the associated set of instructions for this type should be loaded. Leaving aside typing in Python makes it accessible for beginners. However, this introduces a significant execution time overhead.
Hinting the type
In the Julia programming language, type information is not mandatory. Let us examine, from the software perspective, how providing hints to the type inference algorithm allows us to maximize fast executing. First, let’s try to understand what the type inference algorithm is trying to do.
= 2
a = 3
b +b a
5
The type of a
and b
was found here by human knowledge. The type inference algorithm would have associated Any
to both a
and b
. Indeed, the global scope of the REPL is too dynamic to be optimized. Further in thid section we will, thus, only consider type inference in functions.
function f(a,b)
return a+b
end
f (generic function with 1 method)
Function f
adds input variables a
and b
, for which no type hints are provided. Using @code_llvm
, let us have a look at the associated CPU instructions for input variables of different types. The instructions generated by the compiler bellow are not in the Assembly language but in the intermediate representation (IR). It could be seen as a hardware agnostic Assembly language.
First, we run f
to sum two integers, 1+2
.
@code_llvm f(1,2)
; Function Signature: f(Int64, Int64)
; @ In[4]:1 within `f`
define i64 @julia_f_8979(i64 signext %"a::Int64", i64 signext %"b::Int64") #0 {
top:
; @ In[4]:2 within `f`
; ┌ @ int.jl:87 within `+`
%0 = add i64 %"b::Int64", %"a::Int64"
; └
ret i64 %0
}
Then, we run f
to sum two floating point numbers, 1.5+2.5
.
@code_llvm f(1.5,2.5)
; Function Signature: f(Float64, Float64)
; @ In[4]:1 within `f`
define double @julia_f_9061(double %"a::Float64", double %"b::Float64") #0 {
top:
; @ In[4]:2 within `f`
; ┌ @ float.jl:491 within `+`
%0 = fadd double %"a::Float64", %"b::Float64"
; └
ret double %0
}
Let us have a look at what happens when we mix types in the function call. We run f
to add an integer 1
and a floating point number 2.5
.
@code_llvm f(1,2.5)
; Function Signature: f(Int64, Float64)
; @ In[4]:1 within `f`
define double @julia_f_9065(i64 signext %"a::Int64", double %"b::Float64") #0 {
top:
; @ In[4]:2 within `f`
; ┌ @ promotion.jl:429 within `+`
; │┌ @ promotion.jl:400 within `promote`
; ││┌ @ promotion.jl:375 within `_promote`
; │││┌ @ number.jl:7 within `convert`
; ││││┌ @ float.jl:239 within `Float64`
%0 = sitofp i64 %"a::Int64" to double
; │└└└└
; │ @ promotion.jl:429 within `+` @ float.jl:491
%1 = fadd double %0, %"b::Float64"
; └
ret double %1
}
We can see the term promote in the output. Thus, the compiler found a type that works as a common ground for the computation.
Type unstability
In the previous examples, identifying the type of the returned value was a relatively straightforward process. Let us examine the function h
, which breaks type stability. We are interested in understanding the reasons behind this problem and techniques to prevent it in our own code, with the aim of achieving optimal performance.
function h(x,y)
= x + y
z if z < 2
return z
else
return Float64(z)
end
end
h (generic function with 1 method)
Using @code_warntype
provides type labels for all the variables of a function. We are particularly interested in the type returned by h
.
@code_warntype h(2,5)
MethodInstance for h(::Int64, ::Int64)
from h(x, y) @ Main In[8]:1
Arguments
#self#::Core.Const(Main.h)
x::Int64
y::Int64
Locals
z::Int64
Body::Union{Float64, Int64}
1 ─ (z = x + y)
│ %2 = z::Int64
│ %3 = (%2 < 2)::Bool
└── goto #3 if not %3
2 ─ %5 = z::Int64
└── return %5
3 ─ %7 = z::Int64
│ %8 = Main.Float64(%7)::Float64
└── return %8
An other maner to determine type-stability of a function is to use @inferred
.
using Test
@inferred h(2,5)
return type Float64 does not match inferred return type Union{Float64, Int64} Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] top-level scope @ In[10]:2
The error raised shows this function to be a simple test (yes or no) about inference.
From this analysis we conclude the following rule.