Haskell is Faster than Rust! … Wait a Sec!

To evaluate the impact of memory management in Rust, I implemented a short benchmark in Rust and Kotlin. You can find all the details here and here. Measurements showed that Rust is roughly a factor of 10 faster than Kotlin, most probably caused by better handling of memory garbage. Being a big fan of Haskell, I was curious to see how the grand old lady of functional programming would compete against these two. So, I implemented the benchmark, did some measurements, and was surprised.

The Haskell Implementation of the Benchmark

If you already know how to program in Haskell, you can skip this section. There is nothing surprising. If you are curious, how development is done with a pure functional programming language, this section can be of interest. The idea behind this benchmark was an enterprisy use case: fetch a lot of entities from a database, store them in idiomatic objects/structures, and do some computation. To make it more handy, instead of loading the entities from a data base, they are randomly generated. The objective of the whole experiment is to measure the performance of the programming language doing garbage collection.

Haskell, being a pure functional programming language, does only support immutable data and idempotent functions. Thus, accessing databases and using random number generators is done differently than in almost every other programming language. Haskell uses a development pattern called Monads, which easily raises fear in someone not used to it. So let us see, how fearful these Monads are in a toy benchmark.

Lets us quickly go through the code bottom-up. First, we need some data structures for our entities:

data Address = Address
  { street:: String
  , postalCode:: String
  , city:: String
  , country:: String
  } deriving (Eq)

data Employee = Employee
  { firstName:: String
  , lastName:: String
  , address:: Address
  , salary:: Int
  } deriving (Eq) 

Nothing surprising here. Almost every attribute is a String. To randomly create characters you need a random number generator (RNG).

Randomness and Monads

The standard RNG lives in the IO-monad. Hm, this sounds pretty esoteric. It means, that the System.Random library stores a global RNG in the IO-monad and, if the code is executed in the IO-monad, it can access it. It still sounds probably esoteric. This is, what you have to do, to use it:

chars = ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']
nrOfChars = length chars

createRandomStringOf80Chars :: IO String
createRandomStringOf80Chars = do
  random <- newStdGen
  return $
    take 80
         $ map (\i -> chars !! i)
               $ randomRs ( 0,nrOfChars -1) random

I store all available characters in the list chars. An array would be more efficient, but lists are IMHO the more idiomatic approach in Haskell. I access the global RNG and create from it a new RNG and store its reference in random. Finally, I create 80 random numbers, use them as indices for my list chars, create a String as list of chars, and returns it. Do not get confused by the operator $. It is just used to save some brackets.

The code is readable. But, at second sight you probable start to notice some confusing things, such as: What is the type of this function that looks like a block of statements with side effects? How does this RNG work in a pure functional language where every function call always returns the same result for the same input parameters? And probably several other questions…

To be able to develop in Haskell you need a good understanding of the underlying concepts. These are usually not taught at the university and you have to acquire them on your own. This is similar to Rust with its approach of handling references.

Putting all together

Being able to create a String of random characters, it is easy to create a randomly filled address:

createRandomAddress :: IO Address
createRandomAddress = do
  streetV <- createRandomStringOf80Chars
  postalCodeV <- createRandomStringOf80Chars
  cityV <- createRandomStringOf80Chars
  countryV <- createRandomStringOf80Chars
  return Address
    { street = streetV
    , postalCode = postalCodeV
    , city = cityV
    , country = countryV
    }

I omit the code for createRandomEmployee, because it is pretty much the same. This code is readable. When you are used to programming languages, such as Kotlin, you may have the urge to inline the calls to createRandomStringOf80Chars. It would reduce the noise to get rid of streetV and directly set the attribute street. But, because the code only looks like statement with side effects and actually defines a pure function, the calls are not inlineable.

To explain this, I would have to delve deeper into Monads and the do-notation, and most probably confuse any reader who is not familiar with these concepts. If you are interested, you can find a very good introduction in this book.

Being able to create one random employee, I can generate arbitrarily many:

lookupAllEmployees :: Int -> IO [Employee]
lookupAllEmployees numberOfAllEmployees =
  replicateM numberOfAllEmployees createRandomEmployee

This code is so short because it just uses a library function. This is IMHO typical for Haskell. If you start to use the more abstract structures, such as functors, monoids, and monads, suddenly it is pretty easy to create complex stuff by combining simple functions.

In the benchmark the average income is computed over all employees. This is done easily with a “functional” loop, a.k.a. folding:

computeAverageIncome :: [Employee] -> Float
computeAverageIncome employees =
  let (nrEmployees, sumOfAllSalaries) =
        foldr (\ employee (counter, sum)  -> (counter + 1, sum + (salary employee)))
              (0,0) employees
  in
    (fromIntegral sumOfAllSalaries) / (fromIntegral nrEmployees)

Not much to explain here, if you know how folding works. If not, have a look at this chapter. The function fromIntegral converts an integer number to a float. You may notice that there is no IO in the type signature and no strange do-notation. This is, because there is no need to encapsulate side effects in a monad, such as accessing a RNG.

Now it is easy to put everything together:

computeAverageIncomeOfAllEmployees :: Int -> IO (Float)
computeAverageIncomeOfAllEmployees numberOfAllEmployees = do
  employees <- lookupAllEmployees numberOfAllEmployees
  return $ computeAverageIncome employees

The Moment of Truth

After coding the benchmark in Haskell, let us do some measurement to see how it compares to the benchmarks implemented in Kotlin and Rust. Here are the results. The x-axis denotes the number of employees created, the y-axis the time spent by the benchmark. Both axes are logarithmically scaled.

Benchmark executions of Kotlin, Rust, and Haskell

OMG, Haskell is 50% faster than Rust! Even though I am a big fan of Haskell, this does not sound plausible. Rust is too optimized for performance, computational- and memory-wise. The Haskell approach of compiling to an intermediate representation, which is continuously reduced, should not be faster.

After a short moment of confusion, it became clear, that the good performance of Haskell in this benchmark is caused by its laziness. Expressions are only evaluated (by the reduction mentioned above) if needed. To compute the average income, the computation of all the other fields are not used and thus not computed. I did these benchmarks, to get an impression about the efficiency of memory handling, or more precisely garbage collection, of different programming languages. Thus, not creating garbage in the first place by lazy execution, gives Haskell an unfair advantage from my point of view.

I Need to be More Strict

Though Haskell is lazy by default, it can be forced to strictly evaluate expressions. The simplest way for this benchmark is to add a {-# LANGUAGE Strict #-} parameter to the whole module. Let us look at the differences:

Benchmark executions of Kotlin, Rust, and Haskell (strict and lazy)

Now, the results are more plausible. The strict Haskell implementation is up to 30% slower than the implementation done with Rust. I am still surprised, that Haskell is a factor of 6 faster than the Kotlin variant. I double checked that only one core is used and that there is actually a lot of garbage collection going on. Roughly 75% of the whole execution time in Haskell is used by the garbage collector.

The reason is probably due to the fact, that garbage collection in Haskell can be much more efficient than in JVM-based applications because of Haskells data immutability. You can find a short and good explanation here: GHC/Memory Management. The main point is that older data can never references newer data. This is simply, because the data was not there at creation time and can not be changed later. So, if you add to every data allocated an additional information about its age (for example by storing them in a list), you can simplify your garbage collection a lot. This is especially true for loops generating data, such as our benchmark.

ADDED: I got this summary of the strategy behind garbage collection in Haskell by Hekate and it is much better than mine: GHC’s (the famous Haskell compiler) garbage collector is optimized for generating lots of garbage. What it does is that, instead of specifically deleting garbage data, it takes the non-garbage, moves it elsewhere, and delete the previous memory space where all the garbage has stayed.

To Sum it Up

I invented my little benchmark to see how efficient different programming languages are in regards to the handling of memory garbage and how high the cognitive cost for the developer is. Haskell shows a surprisingly good performance, by being only 30% slower than Rust, but a factor of 6 faster than Kotlin. This is due to Haskell functional pureness with its immutable data. This adds the burden on a developer to learn how to handle side-effects in a pure language. But, IMHO, this is an effort worth spent. 🙂

You can find the code and the measurements here: https://github.com/akquinet/GcRustVsJvm

2 thoughts on “Haskell is Faster than Rust! … Wait a Sec!

  1. I agree. I am usually really impressed how well Java performs.

    The measurements were done with
    JVM: 1.8.0_102 (Oracle Corporation 25.102-b14)

    I just redid the measurements with
    16.0.1 (AdoptOpenJDK 16.0.1+9)

    The results were slightly better, but Kotlin/JVM is still by far the slowest solution.

  2. This is very interesting!
    I would however like to know _which_ JVM was used, as that has a tremendous impact on performance. In fact, due to JIT compilation, there are some algorithms which, on some JVMs, can be more performant than GCC-compiled C.

Leave a Reply to tnfink Cancel reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.