Foreword

In this post, we'll talk about tracing a function to know its input arguments and return value. For demonstrations of the principles, we'll be using Erlang's dbg module (but of course, using Elixir syntax) then later on we will use a library called SmartTracer.

There are a lot of opinions whether you should or should not trace on a live production node. This post is not about it. Personally, I think it's fine as long as you know what you're doing and you added ample safeguards to your traces.

Anyway, let's jump in.

Using Erlang's dbg

The dbg module is included in the runtime_tools app which is part of OTP. It contains tools that could help you in debugging/tracing and is suitable to be included in production systems.

For demonstration purposes, we'll be using the Foo module defined below as an example.

defmodule Foo do
  @moduledoc false
  
  @doc false
  def bar(name) do
    "Hello, my name is #{get_name(name)}"
  end
  
  defp get_name(name) do
    "Fizz" <> name  
  end
end

Fire up your IEx and load the Foo module.

To start a tracer process we just need to invoke :dbg.tracer/0. This process will be the recipient of all trace messages.

iex(2)> :dbg.tracer()
{:ok, #PID<0.121.0>}

Now we need to configure which processes or ports to trace and what. For this example, let's trace all the function calls in existing and future processes.

iex(3)> :dbg.p(:all, :c)
{:ok, [{:matched, :nonode@nohost, 56}]}

:all means we want to trace existing processes before this was set and all processes that would be created. The second argument specifies what we want to trace, in this case :c, which means, function calls.

Erlang documentation has a long list of flags to configure your tracer. You can take a look at those here.

You might be wondering about the somehow cryptic return value of :dbg.p/2. It's just saying that it's matching 56 processes based on your configuration. To verify this, You can run this:

iex(4)> length(Process.list())
56

Now we need to setup a trace pattern. We'll tell the tracer which function to trace.

iex(5)> :dbg.tp(Foo, :bar, 1, [{:_, [], [{:return_trace}]}])
{:ok, [{:matched, :nonode@nohost, 1}, {:saved, 1}]}

The first to third argument is the MFA. The last argument is called MatchSpec. We won't dive into it but Erlang has a comprehensive document that you can read here. Basically, we told the tracer that we want to trace calls to Foo.bar/1 and we also want the return values.

We're quite done with setting up the tracer. Let's try it out.

iex(6)> Foo.bar("Buzz")
"Hello, my name is FizzBuzz"

iex(7)> (<0.104.0>) call 'Elixir.Foo':bar(<<"Buzz">>)
(<0.104.0>) returned from 'Elixir.Foo':bar/1 -> <<"Hello, my name is FizzBuzz">>

Woohoo! 🎉

Remember to stop the tracer since it also consumes memory.

iex(8)> :dbg.stop_clear()
:ok

Let's try tracing a local function call. In this case, a call to get_name/1 which is a private function.

iex(9)> :dbg.tracer()
{:ok, #PID<0.130.0>}

iex(10)> :dbg.p(:all, :c)
{:ok, [{:matched, :nonode@nohost, 56}]}

iex(11)> :dbg.tp(Foo, :get_name, 1, [{:_,  [], [{:return_trace}]}])
{:ok, [{:matched, :nonode@nohost, 0}, {:saved, 1}]}

iex(12)> Foo.bar("Buzz")
"Hello, my name is FizzBuzz"

Nothing happened. It's because :dbg.tp/4 does not trace local function calls. To trace local function calls, we need to use :dbg.tpl/4. I assume the l there stands for local. 🤷‍♂️

iex(13)> :dbg.tpl(Foo, :get_name, 1, [{:_,  [], [{:return_trace}]}])
{:ok, [{:matched, :nonode@nohost, 1}, {:saved, 1}]}

iex(14)> Foo.bar("Buzz")
(<0.104.0>) call 'Elixir.Foo':get_name(<<"Buzz">>)
(<0.104.0>) returned from 'Elixir.Foo':get_name/1 -> <<"FizzBuzz">>
"Hello, my name is FizzBuzz"

That's about everything you need to know to start tracing functions in Elixir. In the following section we'll talk about using the library called SmartTracer by yours truly.

Using SmartTracer

SmartTracer is an Elixir library I built to aid developers with tracing function calls. It has some added safeguards and rich features like trace recording and defining your own custom trace handlers.

Design

As show in the diagram above, every process spawned by the SmartTracer has the IEx shell as its parent process. We don't want to have dangling/rogue processes in our system. This way, if the shell exits, all SmartTracer-related processes will be brought down as well. The shell also trap exits so it won't die when the tracer and trace handler processes exit.

The recorder process is not in the tracer process chain since we want it to be independent from the tracer processes. If ever the tracer process terminates, we'd still be able to playback the recorded function calls.

Safeguards

Rate Limiting

As I have mentioned before, tracing function calls consumes resources so it's not ideal to trace unlimited function calls as it may degrade your app's performance. Most of the time, you just need a small sample data. When using the SmartTracer library, a limit is necessary. When the rate limit is tripped, the tracer processes are killed. You would still be able to playback the recorded calls.

Shell as the Parent Process

Having the IEx shell as the parent process of all SmartTracer-related processes, we could be sure that once the IEx shell exits, there will be no rogue processes left behind.

Features

Easy to use interface

Instead of manually starting the tracers and setting the pattern, starting a trace can be done via a one-liner:

iex(2)> SmartTracer.trace(&Foo.bar/1, 5)
:ok

iex(3)> Foo.bar("Buzz")
"Hello, my name is FizzBuzz" # <- return value from shell, ignore

03:31:34.429 [info] Elixir.Foo.bar/1 is being called with: ["Buzz"]

You just need to provide the function reference and the number of calls you want to trace.

If you need the return value, it's just an option away.

iex(7)> SmartTracer.trace(&Foo.bar/1, 5, return: true)
:ok

iex(8)> Foo.bar("Buzz")
03:33:00.636 [info] Elixir.Foo.bar/1 is being called with: ["Buzz"]
"Hello, my name is FizzBuzz" # <- return value from shell, ignore
03:33:00.636 [info] Elixir.Foo.bar/1 returns: "Hello, my name is FizzBuzz"

To trace local functions, you just need to set the scope to :local.

iex(13)> SmartTracer.trace(&Foo.get_name/1, 5, scope: :local)
:ok

iex(14)> Foo.bar("Buzz")
"Hello, my name is FizzBuzz" # <- return value from shell, ignore

03:39:42.136 [info] Elixir.Foo.get_name/1 is being called with: ["Buzz"]

Call Recording

Let's say you would be tracing a function that is being called very frequently, there's a chance that the tracer's output will be buried deep in the shell. To avoid this, you can record the calls and play it back when you need to (as long as the IEx and recorder agent is alive).

iex(15)> SmartTracer.trace(&Foo.bar/1, 5, record: true)
:ok

iex(16)> Foo.bar("Buzz")                                     
03:47:35.238 [info]  Elixir.Foo.bar/1 is being called with: ["Buzz"]
"Hello, my name is FizzBuzz" # <- return value from shell, ignore

iex(17)> SmartTracer.playback()
[
  %SmartTracer.Utils.Recorder.Call{
    args: ["Buzz"],
    arity: 1,
    datetime: ~U[2021-02-05 19:47:35Z],
    function: :bar,
    module: Foo,
    type: :call
  }
]

Custom Tracer

Another big feature of SmartTracer is the ability to define your own tracer. By default, SmartTracer would just use Logger to output the function calls. When you define your own tracer, you would have full control on what to do with the traces. You can write it to a file, send it to an API, etc. Just be mindful of any performance impact.

Here's a very simple definition of a CustomTracer.

defmodule MyApp.CustomTracer do
  @moduledoc false
  
  use SmartTracer.Custom
  
  require Logger
  
  def handle(:call, {_module, _fun, args}) do
    # maybe write to a file?
    Logger.info("The function was called with #{inspect(args)}")
  end
  
  def handle(:return, {_module, _fun, _arity, return_value}) do
    Logger.info("The function returned: #{inspect(return_value)}")
  end
end

Now, instead of calling SmartTracer directly, you would now invoke the trace/3 function from your CustomTracer.

iex(4)> MyApp.CustomTracer.trace(&Foo.bar/1, 5)
:ok

iex(5)> Foo.bar("Buzz")
"Hello, my name is FizzBuzz" # <- return value from shell, ignore
03:59:26.435 [info]  The function was called with ["Buzz"]

That's all! To learn more, refer to SmartTracer's documentation. Also, feel free to contribute into this project. [Github]

See you on the next blog post!

If you have any questions, you can reach me via [Twitter] / [E-mail].