Hierarchy of documentation dysfunction

Author:: Chris Allen

Any similarity to real library documentation, living or dead, is purely coincidental.

Library documentation is a common topic in programming language communities. The topic is complex. On the one hand, documentation imposes a cost on the person writing the documentation, often a maintainer of an open-source library who is already giving unpaid time to write and maintain code others use. On the other hand, undocumented libraries impose real costs on their users, in particular on inexperienced users. So, the topic of documentation -- whether to write docs, how to write docs, what information docs should contain -- comes up frequently.

It comes up somewhat more frequently in the Haskell community, in part because there isn't a single body of knowledge, or even a consistent idiomatic style, that is common to most of the libraries on Hackage. Haskell tends to attract a motley group of programmers with diverse interests, from astrophysics to natural language to category theory, so the background knowledge assumed by library authors may already be unfamiliar to potential users. Furthermore, many Haskellers write code in quite idiosyncratic styles; some prefer what is called "mtl style"; some may shun lenses while others embrace them. This divergence means that the new user of a library may have quite a lot to learn in order to use the library.

However, if you're sharing your code with the community in the hope that people, who may not share your experiences or head space at time of writing the library, be able to use your code, it behooves you to write quality documentation. With that in mind, let us try to slice the potato of library documentation into some potato chips and get a sense of what is required to make library documentation more accessible.

The goal is to show off at least some of what your project can do with thorough documentation. The lower tiers of documentation are important, but it's the upper tiers that most projects fail to provide. It's those upper tiers that will be the most helpful to your users and allow you to provide insight into your project, to them and perhaps to your later self.

Tier 0: Dumping the bag of tools at your user's feet

Does your documentation look something like this?

-- context/origin stripped for demonstration purposes.
backback :: Monad m =>
            T x' x b' b m a'
         -> (b -> T x' x c' c m b')
         -> T x' x c' c m a'

This is the least helpful tier of documentation. Usually this looks like a Haddocks documentation page with a listing of types and names -- and little else. It is a necessary base layer of documentation, but it is rarely sufficient for either the user or the maintainer. It's fine if you don't have time to do better, but if someone offers to improve on the situation, try to be cooperative about it.

Tier 1: Cataloging individual pieces

In some cases, you'll see documentation that is a listing of names, types, and then a short description of what each piece does. That'll look like a sequence of the following:

runE :: Monad m => E m r -> m r
-- Run a self-contained E,
-- converting it back to the base monad

This is a good addition. It increases the usability of the library while imposing only small costs on the author. You're writing comments about what your code does anyway, right?

Tier 2: Combining individual pieces into common reusable components

This one is surprisingly uncommon. I think it doesn't occur to many library authors that it might be helpful. But just because it's obvious (to an experienced Haskeller) that:

f :: (a -> b)
g :: (b -> c)

f and g can be combined, that doesn't mean they're right next to each other in the documentation. In fact, they often won't be because documentation is usually structured in term of modules and subsections. Those subsections and modules are usually groupings of roughly the same thing, which isn't helpful when you need to figure out how to combine different components.

An example of this from the Conduit library looks like this:

isolate :: Monad m => Int -> Conduit a m a Source

-- Ensure that the inner sink consumes no more than the
-- given number of values. Note this this does not ensure
-- that the sink consumes all of those values. To get the
-- latter behavior, combine with sinkNull, e.g.:

src $$ do
    x <- isolate count =$ do
        x <- someSink
        sinkNull
        return x
    someOtherSink
    ...

The reference to sinkNull links to the documentation for that function, but they're in quite separate locations and would not necessarily be easy to find and incorporate without this example.

Tier 3: Minimal working demonstrations

Minimal working demonstrations aren't too rare for the more popular libraries and frameworks, but you'll still encounter libraries that don't have one at all. A minimal working demonstration is an example which will compile and work, but which is pretty bare and does one thing one way. This sometimes leads to frustration later where a programmer will try to use the library for something at work and then have no idea how to do something slightly off the beaten path.

One example of this would be the Wiki chat example from the Yesod project. Another would be the synopsis example in the Conduit project README.

A somewhat more elaborate example of this would be the mini-projects from the Servant tutorial.

The hard part is usually getting people to add an examples directory which is part of the build. Once you get a project to that point (sometimes difficult), then you can usually add more examples as long as they're novel. Getting the examples built as part of the project is important so that if the author changes the library in a way that invalidates the examples, they have to fix the documentation too. If someone offers to do this for the project, ideally the maintainer would be cooperative. I find this approach is often easier to consume as a learner and easier to maintain as a developer than doctests.

Tier 4: Ensemble demonstrations

The most useful and rarest form of documentation is a project that doesn't just compile and work, but goes beyond a minimal demonstration. It uses multiple components from what it is demonstrating, potentially using them in different ways. These are quite rare, but if you haven't tried to make something end to end with your library, how do you know your library is nice to use? If you have made something end to end with your library, why not make the example(s) part of the project?

I don't believe they were written for this purpose, but Snowdrift and Thoughtbot's Carnival are among the only examples of this kind available on Github if you're looking to use Yesod for writing a Haskell web application.

If you know of more examples (for any library or framework), please send them to us and we'll add them to this post!

Take-aways

You really need every tier. You need listings of what's in the API, names and types. You need to describe any declaration that isn't painfully obvious. You need to demonstrate common combinations of individual functions in your API. You need to demonstrate a minimal working project. You need to demonstrate a more comprehensive example that would cover more of what a production project would need.

I talk about this not to shame anyone who hasn't had or taken the time to do these things, but to offer a tiered checklist of sorts of where to start, what to do, and what's most critical. However, I will say that if you only have time to do one or two of the things listed here, try to at least check these boxes: