Let it crash!

Oct 23 2016

At this blog post we'll discuses one of Elixir greatest features - Supervisors & Supervision trees, the engine behind Elixir's let it crash motto.

What is a supervisor?

A supervisor is a process which watches other processes, which we refer to as child processes, and make sure they are always alive.

What is a supervision tree?

Supervisors are used to build a hierarchical process structure called a supervision tree.
Supervision trees are a nice way to structure fault-tolerant applications.

supervision strategies

The supervision strategy dictates what happens when one of the children crashes.

Demo

First we'll define a simple hello world genServer.

defmodule App.Child do
  use GenServer
  require Logger

  ################## Client API ##################

  def start_link(name) do
    GenServer.start_link(__MODULE__, name, [])
  end

  def say_hey(server) do
    GenServer.call(server, {:say_hey})
  end

  def kill(server) do
    GenServer.cast(server, {:announce_death})
    GenServer.stop(server)
  end


  ################## Server Callbacks ##################

  def init(name) do
    Logger.debug "Starting App.Child with name #{name}!"
    {:ok, %{name: name}}
  end

  def handle_call({:say_hey}, _from, state) do
    msg = "'#{state[:name]}' says hey!"
    Logger.info msg
    {:reply, {:ok, msg}, state}
  end

  def handle_cast({:announce_death}, state) do
    Logger.warn "'#{state[:name]}' is dying!"
     {:noreply, state}
  end
end

Now we can create a bunch of child processes

  worker(App.Child, "child_1", [id: "child_1"])

Creating a Supervisor process is very simple too

  children = [
    worker(App.Child, "child_1", [id: "child_1"]),
    worker(App.Child, "child_2", [id: "child_2"])
    ]
  Supervisor.start_link(children, strategy: strategy)

In order to put it all together, we'll create a simple program that lunches a few child process, randomly killing them. Very G.R.R. Martin Like :)

  defp create_supervision_tree(strategy) do
    children = create_children(10)
    {:ok, sup_pid} = Supervisor.start_link(children, strategy: strategy)
    sup_pid
  end

  defp create_children(n) do
    (1..n)
      |> Enum.map(fn(i) ->
        name = "child_#{i}"
        worker(App.Child, [name], [id: name])
      end)
  end

  def main_loop(sup_pid) do
    children_pids = Supervisor.which_children(sup_pid) |> Enum.map(&(elem(&1, 1)))
    kill_round(children_pids)
    main_loop(sup_pid)
  end

  defp kill_round(children_pids) do
    children_pids
      |> Enum.each(fn(c) ->
          random_killer(c)
      end)
  end

  defp random_killer(child_pid) do
    if Process.alive? child_pid do
      case Enum.random 1..10 do
        1 -> App.Child.kill(child_pid)
        _ -> App.Child.say_hey(child_pid)
      end
      :timer.sleep(1000)
    end
  end

Thats it, whenever a child process will die, the supervisor reacts, making sure we'll always have all children running.

Lets see how different strategies behave

  def one_for_one do
    sup_pid = create_supervision_tree(:one_for_one)
    spawn_link(fn() -> main_loop(sup_pid) end)
  end

  def one_for_all do
    sup_pid = create_supervision_tree(:one_for_all)
    spawn_link(fn() -> main_loop(sup_pid) end)
  end

  def rest_for_one do
    sup_pid = create_supervision_tree(:rest_for_one)
    spawn_link(fn() -> main_loop(sup_pid) end)
  end

Lunch our console by running  iex -S mix

iex(1)> App.one_for_one

11:45:27.304 [debug] Starting App.Child with name child_1!

11:45:27.304 [debug] Starting App.Child with name child_2!

11:45:27.304 [debug] Starting App.Child with name child_3!

11:45:27.304 [debug] Starting App.Child with name child_4!

11:45:27.304 [debug] Starting App.Child with name child_5!

11:45:27.304 [debug] Starting App.Child with name child_6!

11:45:27.304 [debug] Starting App.Child with name child_7!

11:45:27.305 [debug] Starting App.Child with name child_8!

11:45:27.305 [debug] Starting App.Child with name child_9!

11:45:27.305 [debug] Starting App.Child with name child_10!

11:45:27.311 [info]  'child_10' says hey!
 
11:45:28.314 [info]  'child_9' says hey!

11:45:29.315 [info]  'child_8' says hey!

11:45:30.316 [info]  'child_7' says hey!

11:45:31.317 [info]  'child_6' says hey!

...

11:45:39.325 [warn]  'child_8' is dying!

11:45:39.325 [debug] Starting App.Child with name child_8!

11:45:40.326 [info]  'child_7' says hey!

11:45:46.660 [info]  'child_6' says hey!


We can see that when when child_8 dies, a new process immediately takes its place!

iex(2)> App.one_for_all

11:50:44.951 [debug] Starting App.Child with name child_1!

11:50:44.951 [debug] Starting App.Child with name child_2!

11:50:44.951 [debug] Starting App.Child with name child_3!

11:50:44.951 [debug] Starting App.Child with name child_4!

11:50:44.951 [debug] Starting App.Child with name child_5!

11:50:44.951 [debug] Starting App.Child with name child_6!

11:50:44.951 [debug] Starting App.Child with name child_7!

11:50:44.951 [debug] Starting App.Child with name child_8!

11:50:44.951 [debug] Starting App.Child with name child_9!

11:50:44.951 [debug] Starting App.Child with name child_10!

11:50:44.953 [info]  'child_10' says hey!

11:50:45.956 [info]  'child_9' says hey!

...

11:51:10.981 [warn]  'child_4' is dying!

11:51:10.981 [debug] Starting App.Child with name child_1!

11:51:10.981 [debug] Starting App.Child with name child_2!

11:51:10.981 [debug] Starting App.Child with name child_3!

11:51:10.981 [debug] Starting App.Child with name child_4!

11:51:10.981 [debug] Starting App.Child with name child_5!

11:51:10.981 [debug] Starting App.Child with name child_6!

11:51:10.981 [debug] Starting App.Child with name child_7!

11:51:10.981 [debug] Starting App.Child with name child_8!

11:51:10.981 [debug] Starting App.Child with name child_9!

11:51:10.981 [debug] Starting App.Child with name child_10!

11:51:11.982 [info]  'child_10' says hey!

This time, whenever a child process dies, all other child processes are killed, and the whole batch is regenerated.

iex(1)> App.rest_for_one

11:54:12.123 [debug] Starting App.Child with name child_1!

11:54:12.123 [debug] Starting App.Child with name child_2!

11:54:12.123 [debug] Starting App.Child with name child_3!

11:54:12.123 [debug] Starting App.Child with name child_4!

11:54:12.123 [debug] Starting App.Child with name child_5!

11:54:12.123 [debug] Starting App.Child with name child_6!

11:54:12.123 [debug] Starting App.Child with name child_7!

11:54:12.123 [debug] Starting App.Child with name child_8!

11:54:12.123 [debug] Starting App.Child with name child_9!

11:54:12.123 [debug] Starting App.Child with name child_10!

11:54:12.125 [info]  'child_10' says hey!

11:54:13.128 [info]  'child_9' says hey!

...

11:54:26.141 [warn]  'child_6' is dying!

11:54:26.141 [debug] Starting App.Child with name child_6!

11:54:26.141 [debug] Starting App.Child with name child_7!

11:54:26.141 [debug] Starting App.Child with name child_8!

11:54:26.141 [debug] Starting App.Child with name child_9!

11:54:26.141 [debug] Starting App.Child with name child_10!

11:54:27.142 [info]  'child_5' says hey!

This time, whenever a process dies, all child process that were spawned after it are regenerated.

Conclusion

Elixir Supervision Trees allow us to create robust applications, with very little code. 
Instead of worrying about each process and make sure it never fails, let a supervisor watch it, and simply let it crash.

Links:

 - The full code is aviable at github
 - Supervisor docs

Guy Y.
Software Developer
Back to Blog