Haskell Pitfalls

Authors:: Chris Allen; Mark Wotton

Mark and I were discussing some of the newbie traps in the Haskell ecosystem, particularly in libraries. They're a recurring topic for us and we finally decided to round some of the more common ones up into a list. Hopefully this prevents some pain in the future.

Partial functions

This is a classic case of snatching defeat from the jaws of victory. The language has handed you all the tools you need for confidence in your design, and you ruin it all by trying to fill an inside straight. What's especially frustrating is that there are a bunch of partial functions in the Prelude where new people can find it tempting to use them.

You'll want to be wary of these listed functions in Prelude at a minimum. The particularly egregious ones are:

fromJust
head / tail / !! / init / last
maximum / minimum
read

These may be ok when applied to literals, or in a test suite where any exception is going to be flagged anyway. Often even these can be avoided with quasiquoters.

Sometimes you'll see test code with something like the following:

{-# LANGUAGE OverloadedStrings #-}

module Main where

import Data.Aeson
import Data.Maybe (fromJust)

john :: Value
john = fromJust $ decode "tasmanian devil"

main = print john

Where they're using fromJust to ignore the potential for the JSON parse to fail. This has several problems including increasing the likelihood that there's a runtime exception instead of a type error. It's also not a good habit to get into as you definitely don't want to do anything like this in your programs. A good alternative to this is aeson-qq, where the following causes a type error:

{-# LANGUAGE QuasiQuotes #-}

module Main where

import Data.Aeson
import Data.Aeson.QQ

john :: Value
john = [aesonQQ|tasmanian devil|]

main :: IO ()
main = do
  print john

Whereas this does not raise an error at compile-time and results in john being a well-formed Aeson Value:

john = [aesonQQ|
  {"age": 23,
   "name": "John",
   "likes": ["linux", "Haskell"]}
|]

Control.Concurrent.Chan

This point isn't limited to Haskell and applies to designing robust concurrent systems. Control.Concurrent.Chan is an unbounded FIFO, and there is never a good reason for them. If the producer is slower than the consumer you could have used a blocking, limited-capacity channel. If the consumer is slower than the producer your memory fooprint will grow until something uncontrolled like the OOMKiller puts your straining program out of it misery. We recommend unagi-chan's bounded channels or TBQueue.

What you're looking for here is back pressure, a design principle common to software and physical engineering disciplines. You want the signal that your consumers are saturated to be available within your program, not to happen via the kernel's OOM assassin. For that to happen, you need channels to be of bounded capacity. If they're bounded, then your producers have a chance to decide to do something different when consumers are taking too long to process data. You can wrap your bounded FIFO insert in a timeout and return something like Either TimeoutError () where () indicates the insert succeeded. From there you can take some sort of remedial action. One option for remedying the overload, albeit usually a last resort, is called load shedding which means you start dropping work until there's capacity to start processing all the data again. A simple option is to start by blocking your producers on the insert until the consumers catch up again. This could lead to things upstream happening like your web worker pool getting starved. Your web app's worker pool is bounded too, right? This is a good thing as response times can be used as a signal by your load balancer to direct work to less overloaded servers.

Some recommended resources on back pressure in concurrent & distributed systems:

String

String values are sometimes unavoidable, such as when you're working with types in base like FilePath. When this happens, grin and bear it or use a more structured alternative. One of the main reasons Haskellers warn folks off String aggressively is that unlike other programming languages, String is not backed by array types. It's built on a linked list which can lead to absurd memory consumption for seemingly innocuous quantities of data.

One alternative to the Prelude's functions which return String is to use the Data.Text.IO module. A String is fine if you have small and not numerous bits of data, so no need to go on a crusade. Otherwise, you want Text for human readable text data and ByteString for arbitrary binary data, data going over a network, or for UTF-8.

This does not always mean a benchmark program will show Text being faster than String. Short-cut fusion and other optimizations can go a long way if you're not actually materializing the String values.

foldl

foldl' is always what you want, don't use foldl! foldl will always have to examine your whole list, so there's no reason to make your accumulator lazy. If you accidentally use foldl you might be using a lot more memory to do the same work than is necessary. The Haskell Wiki has a good article on this topic.

Lazy IO

Most new people use lazy IO without meaning to and then discover they were using it as an unpleasant surprise when their program breaks. There are several ways to avoid lazy IO:

Use the IO APIs for ByteString or Text, whatever fits your use-case better.
Use a library that offers a strict IO API but still returns String values.
When the work you're doing fits a streaming model, use Conduit, Pipes, or Streaming.

N.B. Lazy IO has some defenders in the Haskell community but this should be viewed as an unnecessary risk.

Abandonware

Haskell has been around since the late 80s, GHC Haskell as a project since the early 90s. The package ecosystem mostly exists on Hackage and has been around for quite some time as well. Accordingly, there are a lot of packages on Hackage and not all of them are maintained or up to date.

You'll need to watch out for abandoned or disused packages. We can't tell you every single package not to use, so we've written up some guidelines and techniques for gauging how well maintained a package is.

First, look at upload date, factor in whether it might be "done". A package with an upload six months ago might be fine if it's complete and there hasn't been a GHC release since then, but you'll have to calibrate for how stable the package is meant to be. Something that had a release a year ago but is 10% done is probably a no-go except as a source of code snippets.

This page listing packages and how many other packages depend on them can be useful for identifying how widely used packages are, but it works better for foundational libraries than things only used in applications. The reason for this is that the data is calculated from Hackage itself. Accordingly, it works better for a package like errors than esqueleto, but is still a good quick read.

You can also use the download counts and package download rankings on Hackage. Download counts can serve as a good complement to reverse dependencies as it's more a measure of what application developers are using. You can get the listing of top packages by downloads or you can inspect an individual package's page to see downloads. At time of writing, Bloodhound had 3161 total downloads, 186 in the last 30 days. Keep in mind that Hackage's content delivery network hides downloads from the Hackage app server, so the numbers aren't meaningful in terms of exact magnitude, only relatively.

If you really want to dig in, you can also check the source control history. Usually this'll be git and often on Github. Linked on Bloodhound's package page is the Bloodhound Github project. From there you can inspect the commit history and pull requests to see if it is actively developed. A graveyard of pull requests not getting merged with no replies from the maintainers is a bad sign!

Another really useful way to gauge how popular a package is to use the Haskell community itself! Search the Haskell subreddit, mailing list, and Stack Overflow haskell tag for the name of the library or modules from the library to see how people are getting on with it. Keep in mind I used Conduit in my examples because it's a fairly unique name. If you need examples for a library like time, you might want to use module names from the library like Data.Time or Data.Time.Clock to narrow the search.