Structured concurrency and pure functions

Posted on

A function or a method is pure, if:

  • it has no side effects
  • the returned value depends only on the provided arguments

We will be most interested in the first property, that is lack of side effects. An example of a side effect is anything that changes the “state of the world”, such as modifying a global variable or performing any kind of I/O.

Pure functions are widely described, see for example the article on Wikipedia, a blog post using Java, or Haskell docs page.

Purity doesn’t have to be a binary property, though. There are some side effects that we care more about than others. For example, logging isn’t often counted as a side effect (even though it performs I/O), and hence a function which makes some computations and performs logging might be still considered pure.

Structured concurrency is a property of concurrent programs, where (following the definition by Martin Sústrik) the lifetimes of concurrent functions are cleanly nested. That is, if a function foo starts running a function bar in the background (bar runs concurrently to foo), then bar must finish before foo completes.

This ensures that the scopes of functions running concurrently are cleanly nested. This might seem like a reasonable property, however concurrent code is rarely written in that style in practice. Computations, in the forms of threads, fibers, coroutines or processes — depending on the language and framework used — are started in the background and left running without supervision or joining.

Nathaniel J. Smith makes a successful comparison between goto and go (which here stands for unconstrained forking of concurrent computations) in his blog “Notes on structured concurrency, or: Go statement considered harmful“. The article provides an alternative introduction to structured concurrency; by avoiding goto we make sure we can read and try to understand the source code without arbitrary jumps. Similarly, by avoiding go we can avoid even worse kid of jumping: to multiple locations at once, again making the code easier to read.

Another way of looking at the structured concurrency property is that lifetimes of concurrently running functions must follow the syntactic layout of the program. Hence the name: the scope of concurrency follows the structure of the code.

Do the two properties described above relate? Yes! Starting a thread and leaving it running after the function completes is a side-effect. Not only we change the state of the world by creating a new thread, the thread itself must perform some state-changing actions to do any meaningful work (if all it did was running a pure function, it wouldn’t make sense to start it at all). That’s all under the assumption that the lifetime of the thread exceeds the lifetime of the function call.

What structured concurrency specifies is that this cannot happen: thread-related side effects are not permitted. Any threads that are started by the function, must complete before the function completes. In other words:

A function that satisfies the structured concurrency property is a function that is pure wrt threading side-effects.

That way, threading becomes an implementation detail of the function. It’s the responsibility of the function to decide whether threads must be started (either for performance, or to implement its logic), and it’s also its responsibility to ensure that these threads to terminate (by waiting for them to complete, or interrupting them).

A pure function might still use side-effecting logic internally, such as using mutable state for performance, but it cannot mutate anything that’s global. Similarly, a threading-pure function might create concurrent threads of execution internally, but they cannot be left running after the function completes.

Other side effects might happen in a threading-pure function; it doesn’t have to be pure wrt to all kinds of side effects.

What if the function at hand returns a Future or a Promise: that is, a value representing a computation running in the background, that will eventually yield a result? Can such functions be structurally concurrent?

They might be, but we need to adjust our definition a bit; requiring that a function returning a Future doesn’t leave any concurrent threads running after it completes, would defeat the purpose of using Futures in the first place.

Hence, a Futue-returning function will satisfy the structural concurrency property as long as the returned future will complete only once all other futures/promises created by the function are complete as well (successfully or with an error).

Again, this translates to the fact that no threads of execution are left running in the background, but this time when the computation returned by the function completes. In the same way, we can extend the definition of threading-purity.

What about functional effects, that is various forms of IOs? Unlike Futures, such values are descriptions of computations (not running computations), which might have side-effects once interpreted. That is, IOs are lazy, while Futures are eager.

By a similar extension to the one we’ve done before, a function returning an IO is structurally concurrent if the computation that is described by the value doesn’t leave any threads running as a side-effect (once it is complete). This makes the described computation “self-contained”.

Structured concurrency is a promising concept, which provides a great tool for writing correct concurrent programs and reading them afterwards. It’s being implemented in Java (see project Loom), Kotlin, Scala, Python, C and other languages.

Pure functions are the basic building block of Functional Programming (FP). That’s where FP takes its power from; code which uses pure functions is easier to read, follow and understand.

Structured concurrency, seen as a specialisation of function purity to the domain of threading, is one more example of this property. It’s easier to understand and write correct code where threads can’t exceed the lifetime of the function in which they are started.

Which might make you wonder: maybe the same applies to other side effects, such as I/O or global state? That’s what the FP community has been saying for a long time, and depending on the complexity of the problem at hand, indeed avoiding such side effects in functions might be beneficial to the maintainability and long-term evolution of a codebase.

Leave a Reply

Your email address will not be published. Required fields are marked *