Haskell/Performance examples

Goal: Explain optimizations step by step with examples that actually happened.

Tight loop edit

dons: Write Haskell as fast as C: exploiting strictness, laziness and recursion.

dons: Write Haskell as fast as C: exploiting strictness, laziness and recursion.

CSV Parsing edit

haskell-cafe: another Newbie performance question I hope he doesn't mind if I post his code here, I still have to ask him. -- apfeλmus 08:46, 18 May 2008 (UTC)

type CSV = [[String]]

main = do
                  args <- getArgs
                  file <- readFile (head args)
                  writeFile (head args ++ "2") (processFile (args !! 1) file)

processFile s     = writeCSV . doInteraction s . readCSV
doInteraction line csv = insertLine (show line) (length csv - 1) csv
writeCSV          = (\x -> x ++ "\n") . concat . intersperse "\n" . (map (concat . intersperse "," . (map show)))
insertLine line pos csv = (take pos csv) ++ [readCSVLine line] ++ drop pos csv
readCSVLine       = read . (\x -> "["++x++"]")
readCSV           = map readCSVLine . lines

I think there was another cvs parsing thread on the mailing list which I deemed appropriate, but I can't remember.

Space Leak edit

jkff asked about some code in #haskell which was analyzing a logfile. Basically, it was building a histogram

foldl' (\m (x,y) -> insertWith' x (\[y] ys -> y:ys) [y] m) M.empty
  [(ByteString.copy foo, ByteString.copy bar) | (foo,bar) <- map (match regex) lines]

The input was a 1GB logfile and the program blew the available memory mainly because the ByteString.copy weren't forced and the whole file lingered around in memory.

Branch Prediction Failure edit

This code demonstrates how your CPU may fail or succeed in predicting code branches (aka if clauses). The code generates a large random Vector of Ints. From this we compute the sum of all numbers larger than 999. If the data is sorted the CPU can easily deduce the probable outcome of the next branch. If the data is not sorted this will be much more difficult and less successful. Unfortunately, sorting for one purpose only is seldom a good idea. The code uses the criterion package for time measurement. Compile this e.g. with -Wall -keep-llvm-files -O2 -optlo-O3 -fllvm -rtsopts -threaded -eventlog -feager-blackholing -fmax-simplifier-iterations=20 -fsimplifier-phases=3 -fno-liberate-case -fspecialise-aggressively. The idea is from github.com/Kobzol/hardware-effects. (tkx68hamburg)

{-# LANGUAGE FlexibleInstances, RankNTypes, BangPatterns, CPP, PartialTypeSignatures, UnicodeSyntax, OverloadedStrings #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}
module Main (main) where

import           Control.DeepSeq
import           Control.Exception (evaluate)
import           Control.Monad.ST
import           Criterion.Main
import qualified Data.Vector.Algorithms.Radix as Radix
import qualified Data.Vector.Unboxed as UV
import           System.Random

#ifdef DEFINE_NFDATA_VECTOR
instance (NFData a) => NFData (V.Vector a)              where
  rnf v = V.foldr (\a b -> rnf a `seq` b) () v
#endif

setupUnsorted :: IO [UV.Vector Int]
setupUnsorted = do
  small <- randomSampleUVVector 1024
  med <- randomSampleUVVector $ 1024*1024
  large <- randomSampleUVVector $ 20*1024*1024
  return [small, med, large]

setupSorted :: IO [UV.Vector Int]
setupSorted = fmap (fmap sortUVec) setupUnsorted

main :: IO ()
main = do
  defaultMainWith
    defaultConfig [
      env setupUnsorted
          (\ ~[small, med, large] -> bgroup "Branch without sort" [
             bench "small" $ whnf core1 small,
             bench "medium" $ whnf core1 med,
             bench "large" $ whnf core1 large
             ]
          ),
      env setupUnsorted
          (\ ~[small, med, large] -> bgroup "Branch without sort using filter" [
             bench "small" $ whnf core1a small,
             bench "medium" $ whnf core1a med,
             bench "large" $ whnf core1a large
             ]
          ),
      env setupUnsorted
          (\ ~[small, med, large] -> bgroup "Branch including sort" [
             bench "small" $ whnf core2 small,
             bench "medium" $ whnf core2 med,
             bench "large" $ whnf core2 large
             ]
          ),
      env setupSorted
          (\ ~[small, med, large] -> bgroup "Branch after sort" [
             bench "small" $ whnf core1 small,
             bench "medium" $ whnf core1 med,
             bench "large" $ whnf core1 large
             ]
          )
    ]

{- INLINE -}
core1 :: UV.Vector Int -> Int
core1 = UV.foldl' condSum 0

{- INLINE -}
core2 :: UV.Vector Int -> Int
core2 = UV.foldl' condSum 0 . sortUVec

{- INLINE -}
core1a :: UV.Vector Int -> Int
core1a = UV.sum . (UV.filter (>999))

{- INLINE -}
condSum :: Int -> Int -> Int
condSum !x !y = if y>999 then x+y else x

randomSampleUVVector :: Int -> IO (UV.Vector Int)
randomSampleUVVector i = evaluate $ force $ UV.fromList (take i (randoms (mkStdGen 0) :: [Int]))

sortUVec :: forall a . (Ord a, Radix.Radix a, UV.Unbox a) => UV.Vector a -> UV.Vector a
sortUVec vec =
  runST
    (do mv <- UV.thaw vec
        Radix.sort mv
        UV.unsafeFreeze mv)