Quick Note

This is just a written version of the talk my colleague and I delivered in Code BEAM Lite Amsterdam 2019. So if you have already seen the video, you can just ignore and throw this in the trash can.

What is this about?

Years ago, if we need something, let's say, a payment gateway, we would need to design and build it on our own. But now, there are a lot of companies and tools dedicated into providing these services. If we need access to weather data, we don't need to build a weather station. We can just access data from a provider.

The thing is, most of the time, you cannot just access the data. We need a way to authenticate ourself to their platform/service. One of the common ways is by using an authentication token. Most authentication tokens are not forever. They expire. This talk —nay, writeup— is about dealing with this kind of tokens.

What are we building upon?

A very beautiful diagram created by my co-speaker, Bruno

In this writeup, we're going to assume that we have a weather app that's communicating to an external provider (in this case, FakeProvider). This weather app has a very simple API that simply get's the current weather from the external provider.

Now, let's look into this FakeProvider.

defmodule FakeProvider do
  def authenticate(credentials) do
    Process.sleep(2_000) # Simulates network latency
    token = generate_token(credentials)
    {:ok, token}
  end

  def get_weather(token) do
    with :ok <- validate_token(token) do
      weather = fetch_weather()
      {:ok, weather}
    end
  end
end

Notice that we have a Process.sleep/1, this is just for simulating the network latency. You can lower your eyebrows now. This FakeProvider has two public functions:

  • authenticate/1 - it accepts some sort of credentials and returns an authentication token
  • get_weather/1 - it accepts an authentication token, validates it, and then returns the current weather

Now it's time to talk about the first strategy.

Strategy #1: Not holding it at all

Holding a token is an optimization already. You may not need it at all. If you are just sending, let's say, 1 request a day, forget about holding the authentication token. You can just re-authenticate yourself on each request. Let's put it into action.

# lib/not_holding.ex

...

  def get_current_weather() do
    with {:ok, token} <- FakeProvider.authenticate(@key),
         {:ok, weather} <- FakeProvider.get_weather(token) do
      posts
    end
  end

...

It's very straightforward. You might be tempted to just go with this solution every time you need to communicate with an external provider. One of the drawbacks of this strategy is that the provider would definitely hate you. You will be flooding them with requests.

Strategy #2: Sleeper Worker

In this strategy, we would now really hold the token. Let's start by creating a test.

# test/token_holder_test.exs

defmodule TokenHolderTest do
  test “Test that token is set" do
    assert {:ok, token} = TokenHolder.fetch_token()
  end
end

This test just verifies that you can actually get the token once the application has started. Let's dive in to the code.

# lib/sleeper_worker.ex

...

  def get_current_weather() do
    with {:ok, token} <- TokenHolder.fetch_token(),
         {:ok, weather} <- FakeProvider.get_weather(token) do
      posts
    end
  end

...

It's almost the same as the one from the first strategy. But if you look closer, you would notice that instead of calling the authenticate/1 function from the FakeProvider, we are fetching the token from the TokenHolder.

So what is this TokenHolder?

# lib/token_holder.ex

defmodule TokenHolder do
  def fetch_token() do
    {:ok, Application.get_env(:sleeper_worker, :token)}
  end
  
  def refresh_token() do
    case FakeProvider.authenticate(@credentials) do
      {:ok, token} ->
        Application.put_env(:sleeper_worker, :token, token)
        Process.sleep(@interval)
        refresh_token()

      _ ->
        Process.sleep(1_000)
        refresh_token()
    end
  end
end

To fetch the token, we just get it from the application's environment. So how did it end up in the app's environment? Take a look at the refresh_token/0 function. It's the one responsible for getting the token from the external provider. After getting the token, it puts it to the app's environment. Then we let the process sleep for a specific interval. The interval depends on your use-case. If you have a token that expires every hour, maybe it's a good idea to refresh it every 50 minutes. After the interval, the process would proceed by calling the same function. This is _theoretically_ an infinite cycle.

Of course, we need to start this process somehow. We put it in the application supervision tree that we have. It's basically a Task that's starting with the function that we pass along.

# lib/sleeper_worker/application.ex

...

  def start(_type, _args) do

    children = [
      %{
        id: SleeperWorker.TokenHolder,
        start: {Task, :start_link, 
                  [&TokenHolder.refresh_token/0]}
      }
    ]


    opts = [strategy: :one_for_one, name: 
              SleeperWorker.Supervisor]
    Supervisor.start_link(children, opts)
  end

...

There are actually big problems with this.

First, app environment is global. So if you want to have multiple instances of TokenHolder, all the instances would update the same key in the app environment.

Second, Process.sleep/1. Yes, yes, yes. A quote straight from the Elixir documentation of Process.sleep/1:

Use this function with extreme care. For almost all situations where you would use sleep/1 in Elixir, there is likely a more correct, faster and precise way of achieving the same with message passing.

Maybe there's a better way to do this without the Elixir gods frowning upon you.

Strategy #3: GenServer State

In this strategy, we're still going to start something on the supervision tree. But instead of a Task, we'll start a GenServer.

# lib/genserver_state/application.ex

...

  def start(_type, _args) do
    children = [
      %{
        id: GenserverState.TokenHolder,
        start: {GenserverState.TokenHolder, :start, []}
      }
    ]

    opts = [strategy: :one_for_one, name: 
              GenserverState.Supervisor]
    Supervisor.start_link(children, opts)
  end
...

How should we fetch the token from the GenServer?

# lib/genserver_state/token_holder.ex

...

  def fetch_token() do
    token = GenServer.call(__MODULE__, :fetch_token)

    {:ok, token}
  end

  def handle_call(:fetch_token, _from, token) do
    {:reply, token, token}
  end
  
...

On our fetch_token/0 function, we make a synchronous call to the GenServer with the message :fetch_token.

Then, on our handle_call/3 – callback that handles the synchronous message, we get the token from our state and just reply it back to the calling process. Remember, this is a synchronous call. The calling process would be blocked until it gets a reply from the GenServer.

Now, let us take a look at the implementation of the GenServer that holds our token.

# lib/genserver_state/token_holder.ex

...

  use GenServer

  def init(_) do
    with {:ok, token} <- FakeProvider.authenticate(@key) do
      Process.send_after(self(), :refresh, @interval)
      {:ok, token}
    end
  end
  
  def handle_info(:refresh, previous_token) do
    case FakeProvider.authenticate(@key) do
      {:ok, new_token} ->
        Process.send_after(self(), :refresh, @interval)
        {:noreply, new_token}

      _ -> # handle failure (try again?)
    end
  end
  
...

On the init/1 callback – a callback that's invoked when the GenServer is started, we get the token from the FakeProvider. Once we have the token from the provider, we schedule a message addressed to ourself to refresh the token. Then we set the token as the initial state of our server.

When we receive the message that we scheduled before, it would be handled by the handle_info/2 callback – a callback that is invoked to handle all other messages. On this callback we basically do the same things we did in the init/1, we authenticate ourself, schedule the :refresh message, and set the token as the new state.

And yayyy, with this strategy, we completely stopped using Process.sleep/1. Sounds good, right? But there's another problem. Remember, we are starting this on our supervision tree. The application startup would only continue with its work once the init/1 callback of our GenServer returns. We're doing an expensive and unpredictable work on our init/1 callback. We're completely unsure how much time would it take for the FakeProvider to return a token. We're basically halting our application startup. That's not a good idea.

What you can do to unblock the application startup is to just let the handle_info/2 callback do the work instead of the init/1 callback.

# lib/genserver_state/token_holder.ex

...
 
  def init(_) do
   
    Process.send_after(self(), :refresh, 0)
    {:ok, nil}
    
  end
  
...

On our init/1 callback, we just send the :refresh message without waiting. Doing this would make the init/1 callback return instantaneously, thus, unblocking the application startup.

But again, this feels like a hacky solution. There might be a better way to do this.

Since Elixir 1.7 and Erlang/OTP 21+, we can use the handle_continue/2 callback – a callback that is invoked to handle continue instructions. It's useful for performing work after server initialization and  breaking down work in a callback into multiple chunks.

# lib/genserver_state/token_holder.ex

...

  use GenServer

  def init(_key) do
    {:ok, nil, {:continue, :refresh}}
  end

  def handle_continue(:refresh, previous_token) do
    case FakeProvider.authenticate(@key) do
      {:ok, new_token} ->
        Process.send_after(self(), :refresh, @interval)
        {:noreply, new_token}

      _ -> # handle failure (try again via send_after?)
    end
  end

...

On our init/1 callback, we specify a continue instruction with a :refresh message. Right after the process enters its loop, the handle_continue/2 callback will be invoked. Ensuring that it's the first thing the process would do after initialization.

So now, we are not blocking the application startup and we are sure that :refresh is the first message the GenServer would consume. Bad news though, there are still some problems with this strategy.

First, since we are using a GenServer, access to the token is all sequential. Let's say, many clients want to access the token at the same time, it's going to go through the GenServer bottleneck. Well, this might not even be a problem. GenServers are really fast and can take quite a load. Maybe it's not even a problem for your use case! But if it is, we still have another strategy to look at that somehow solves this problem.

Strategy #4: GenServer & ETS

Yes, we're still going to use GenServer. But the difference from the previous strategy is that we make use of Erlang Term Storage (ETS).

# lib/ets_token_holding/token_holder.ex

...

  use GenServer

  def init(_key) do
    :ets.new(@token_store, [:set, :protected, :named_table])

    {:ok, nil, {:continue, :refresh}}
  end

...

Instead of doing nothing in the init/1 callback, we would now be creating an ETS table. We are setting the table's access to protected since we want the GenServer process to be able to write to the table and any other process to just read from it.

# lib/ets_token_holding/token_holder.ex

...

  def handle_continue(:refresh, nil) do
    case FakeProvider.authenticate(@key) do
      {:ok, new_token} ->

        :ets.insert(@token_store, {@token_key, new_token})

        {:noreply, nil}

      _ -> # handle failure (try again via send_after?)
    end
  end

...

On our handle_continue/2 callback, instead of setting the token on our state, we just insert it to our ETS table.

So how do we get the token now?

# lib/ets_token_holding/token_holder.ex

...

  def fetch_token() do
    [{:token, token}] = :ets.lookup(@token_store, @token_key)

    {:ok, token}
  end

...

Erlang's ets module has function called lookup/2 which matches the @token_key to the @token_store – in our case, the name of the ETS table. It returns a list so we just need to pattern-match to get the actual token.

Let's take a look back at our test.

# test/token_holder_test.exs

defmodule TokenHolderTest do
    test “token is set" do
      assert {:ok, token} = TokenHolder.fetch_token()
    end
end

Running this test would result to this:

1) test Test that token is set (TokenHoldingTest)
    test/token_holding_test.exs:5
     ** (MatchError) no match of right hand side value: []
     code: assert {:ok, token} = TokenHolder.fetch_token()
     stacktrace:
       (token_holding) lib/token_holder.ex:63: TokenHolder.fetch_token/0
       test/token_holding_test.exs:6: (test)

Remember, the ETS table was created during the server initialization – before we execute the handle_continue/2 callback. We do not have any guarantee that when the tests run, the handle_continue/2 callback is finished doing its job.

Again, the fix for this depends on your use case. In our use case, once the application boots up, we expect it to have a token in there. Your use case might be different. Maybe it's totally fine for you to not have the token immediately in the beginning.

# lib/ets_token_holding/token_holder.ex

...

  def fetch_token() do
    case :ets.lookup(@token_store, @token_key) do
      [] -> fetch_token()
      [{:token, token}] -> {:ok, token}
    end
  end

...

Since in our use case we expect the token to be there immediately, if the token is still not in the ETS table by the time I asked for it, it just tries to find again until it actually gets it.

It feels _hacky_. But it works. 🤷🏽‍♂️

Conclusion

We explored other options and solutions regarding this topic but we understand that it really depends on the business case that you are trying to do.

The strategies we enumerated here are not fit for every problem you are trying to solve. These are not set in stone. There might be better ways to do it. We would love to hear your opinions and/or suggestions.

Vince Urag | [LinkedIn] [Twitter]
Bruno Azenha | [LinkedIn] [Twitter]