Elixir: A Journey to Simplifying System Architecture

Elixir is a fundamental building block at Invision AI:

  1. Provides infrastructure that would require many frameworks in other languages.
  2. Developers who have never touched Elixir can be productive within 2 weeks.
  3. Phoenix / Phoenix LiveView: we can develop and ship back and front-end functionality ultra fast.
  4. Its concurrency model makes handling parallel tasks straightforward.
  5. Unit testing, integration testing, and live debugging are incredibly powerful thanks to the BEAM.

This article explains how and why we introduced Elixir, and the features that we like and make our software easier to maintain an extend.

About Invision AI

At Invision, we develop real-time systems for mobility. We combine AI, Computer Vision, and Software Engineering to deliver high quality products.

We own market-leading products in Vehicle Occupancy Detection (VOD), Automatic Incident Detection (AID) and camera analytics for smart cities.

Our VOD Edge Computer

A large part of our processing runs on the edge. One or more Edge Computers (ECs) are installed along highways and monitor vehicle occupancy for HOT (High Occupancy Tolling) or HOV (High Occupancy Vehicle) lanes.

Each Edge Computer does a few things:

  1. Detect presence of vehicles from LIDAR measurements.
  2. Trigger, acquire, and process raw images from cameras (USB, GigE).
  3. Run ML models to count the number of people in each section of the car.
  4. Anonymize images with our patented anonymization technique.
  5. Provide a web interface to configure the system, and visualize and debug real-time operation.
  6. Connect to a cloud service to upload images and metadata for enforcement and/or statistics.
  7. Record and provide monitoring metrics and logs.

It is a mix of real-time critical LIDAR and Computer Vision components, as well as data fusion, storage and monitoring ones.

We run docker containers to keep components isolated and therefore, simpler to debug and maintain.

A bit of history

Our system started as one large C++ process, due to real-time constraints. Most developers at the time were experts in C++, and thus seemed the most natural way.

We soon moved the parts that don’t need C++ to Golang, leaving C++ mostly for Signal Processing (Eigen, OpenCV).

Next, we adopted NATS.io to handle inter-process communication: a success for simplification and observability. NATS is a messaging system that supports pub/sub, request/reply, and streaming with persistence. It is open source and adapts both to edge and cloud computing (we hope to make an article about it sometime soon).

While NATS made life a lot easier for us, we still had a few pain points:

  1. The web interface was, due to historical reasons, a mix of Python (Django), Go, C++, HTML and JavaScript — a pain to maintain.
  2. Isolated modules in C++ and Go work well, but harm observability. One cannot just ‘launch a shell’ and debug. NATS with nats-cli helped, but it’s far from a full-fledged REPL.
  3. Go makes parallelism easy, but it’s still easy to create race conditions in concurrent code.
  4. We can use Go for the web interface, but we end up with the same lack of observability and introspection.

Why Elixir

We began testing Elixir in June 2025, to create a full-stack application for another product vertical.

It quickly proved promising: a single person implemented, in two months, a production-ready and fully-functional web app, with testssignificant concurrency, and great frontend features.

Most impressive was realizing how many separate frameworks it takes in other stacks to match what a single Elixir + Phoenix setup provides out of the box (See Saša Jurić’s Soul of Erlang and Elixir talk).

In November 2025 we wrote a detailed plan with a simple goal: To simplify EC architecture, monitoring and development by swapping large parts of our tech stack to Elixir.

Four months later, it is a success. The migration is almost done, and we see how much more confidently we can handle concurrency and develop web interfaces with much less pain.

Let’s go in detail about what made Elixir stand out for us:

Concurrency friendly and fault tolerant

Elixir compiles to run on the BEAM machine.

A few unique features of the BEAM:

  • The BEAM allows its processes (conceptually similar to goroutines, but fully isolated) to communicate only through message passing. Data is copied when passed, there is no shared mutable state. This means no need for mutexes and a reduced risk of race conditions.
  • Fault tolerance is achieved by making processes crash on errors. A process supervisor catches the problem and can restart the process if needed.

Coming from other languages requires some paradigm shift, but it’s worth it: handling concurrency and failure becomes more natural and simpler. If the connection to the database or a component breaks, the corresponding process may exit, and the supervisor will restart it on its own. If this fails too many times, the whole Elixir application crashes, indicating a big problem, and we can monitor that through Prometheus.

Learning Elixir

Our team had experience with C++, Go and Python. Learning Elixir was a pretty smooth journey with the right resources:

  • We recommended everyone to start with Elixir in Action, by Saša Jurić, that offers a great intro into the syntax of Elixir and the philosophy of the BEAM.
  • Next, Elixir School goes deeper into more advanced topics and popular Elixir packages, including testing and debugging.

We also maintain a small list of interesting articles that will grow over time:

For more advanced techniques, we also bought a copy of Metaprogramming Elixir, by Chris McCord.

Within two weeks of starting to learn Elixir, team members started posting their first merge requests. A month later, we all spoke the same language and both development and reviewing speed improved.

LLMs have been reported to do well writing Elixir code, achieving up to 82% on AutoCodeBench, compared to 40% for JavaScript or 42% for Python. We use LLMs in our everyday work at Invision, which helped us speed up learning Elixir. However, one must be careful as it may give a false sense of confidence before really understanding the language. It can also be a hindrance by introducing unnecessarily complex code. We have seen both.

Full-stack with Phoenix LiveView

Replacing the spaghetti of Python (Django), C++, Go, HTML and JS with ‘just’ Phoenix LiveView was an incredible experience.

Phoenix/LiveView is now the only web framework we use, and that keeps it simple:

  1. CSS, JS and hooks are in a single place.
  2. Forms, validation, and sending such data to the respective container is easy to maintain and evolve.
  3. Sending async updates from the device to the browser is part of what LiveView does best, through its websocket connection. LiveView keeps a persistent WebSocket connection with the browser, re-renders HTML server-side on state changes, and pushes only the diff – no client-side JS needed.
  4. Render HTML directly from Elixir code with HEEx.

We can do all of this with Django or other frameworks, but the simplicity of BEAM processes and concurrency has proven to be a big plus: if we need a running process to communicate with any other component (web or backend), we simply send messages or use Phoenix PubSub between them. In Python/Django, that would require much more setup (inter process communication, threading with mutexes or queues), and would be prone to deadlocks and concurrency bugs.

Furthermore, with Django or similar frameworks, a real-time UI means maintaining both a backend API and client-side JS to update the DOM. LiveView’s server-owned rendering loop eliminates that entirely. This does not work for all types of applications, but fits our use case well.

Introspection with the BEAM

In both prod and dev, we can jump into a REPL within the running BEAM and inspect the state of components, call functions, analyze telemetry, and inspect per-process memory and CPU usage.

The best demo for this is in Saša Jurić’s presentation, at the 21-minute mark.

GenServer state inspection

GenServers are a generic building block in Elixir/Erlang, and it’s likely most processes will be running one. A GenServer is a process that keeps state by calling a function recursively, with its argument being the latest state. This function handles incoming messages from other processes and evolves its own internal state.

The BEAM understands this pattern and allows state introspection. For example, one can have a GenServer from the Elixir docs example that emulates a stack:

defmodule Stack do
  use GenServer

  @impl true
  def init(elements) do
    initial_state = String.split(elements, ",", trim: true)
    {:ok, initial_state}
  end

  @impl true
  def handle_call(:pop, _from, state) do
    [to_caller | new_state] = state
    {:reply, to_caller, new_state}
  end

  @impl true
  def handle_cast({:push, element}, state) do
    new_state = [element | state]
    {:noreply, new_state}
  end
end

In production, we can open a REPL and print its internal state with

:sys.get_state(stack_pid) |> IO.inspect()

and debug potential issues. We can also use the same REPL in production to place an element in the stack:

GenServer.cast(stack_pid, {:push, "my element"})

This can happen while other processes send concurrent requests to the same Stack GenServer.

Process inspection

The list of processes and their information is queried with the Process module, with Process.list() and Process.info().

For example, we can easily query which process used the most CPU since the BEAM is up, and show which function and line it is executing:

Process.list() 
  |> Enum.map(fn p -> {p, Process.info(p, [:reductions, :current_stacktrace])} end)
  |> Enum.max_by(fn {_, info} -> info[:reductions] || 0 end)
  |> then(fn {pid, info} -> 
       stack = info[:current_stacktrace] |> List.first()
       {pid, reductions: info[:reductions], current_location: stack}
     end)

Oban: Job scheduling and monitoring

Before using Elixir, we developed our own Rust-based docker process scheduling tool, called Alfred. Alfred provides Prometheus telemetry to identify failed and succeeded jobs, and logging to the docker journal. This was handy but was designed for static job specifications.

However, Alfred is likely to be another tool we may not need anymore: Enter Oban.

Oban is much more than Alfred, and provides extra handy features:

  1. Queues support: define multiple queues, their parallelism, and the jobs that run on each.
  2. Job retry on failure.
  3. Job data is retained for historic metrics and inspection.
  4. Inspection web interface.

Oban can handle scheduled and arbitrarily launched jobs. We use both for our system:

  • Oban schedules jobs to expire old captures and any other data that must be cleaned up.
  • We provide a vehicle push service, where clients register zero or more REST endpoints to push data to when a capture is done. We use a dedicated Oban queue with limited parallelism to cap request rates, while still monitoring and ensuring failed requests get the desired number of retries.

Telemetry and monitoring

Many existing Elixir packages like Phoenix and Req provide telemetry out of the box, and define telemetry events that can be exported with OpenTelemetry or a Prometheus exporter. The common building block is the Telemetry package.

Reliability and monitoring are a priority to us. Having telemetry already implemented within Elixir packages is a big plus. Prometheus scrapes Elixir and other processes, and Alert Manager notifies us of any issues.

Powerful Tests with the BEAM

Writing unit tests is straightforward with ExUnit, similar to other languages. A big plus is that Elixir macros keep the test .exs files very readable.

Another great advantage is the ability to run tests from outside the BEAM running the application we want to test.

Let’s say you want to end-to-end test a production-ready docker image for a website. There are no extra debug endpoints, and no possibility to inspect internal state from HTTP requests or similar.

Just create another BEAM machine to supervise the tests, and connect the two so they can see each other (e.g. same docker network), and ensure they have the same cookie. Give each BEAM instance a name. Ready! Now we can run commands from one BEAM to the other, and get the response back, with something as simple as

# calls `MyApp.Ecto.get_by(MyApp.SomeTable, id: 4)` on the remote node
assert :erpc.call(:production_node, MyApp.Ecto, :get_by, [MyApp.SomeTable, [id: 4]]) != nil

This is handy in so many situations: create a temporary user, check that an API call wrote something to the database, or verify an internal state. We can write simpler end-to-end tests faster, invoking any function within the production Elixir code.

Other great features

Parallel processing

Use Task.async_stream() to process a list in parallel:

file_names = ["f1.txt", "f2.txt", "f3.txt"]
lines_per_file =
   file_names
     |> Task.async_stream(fn fname ->
          File.stream!(fname)
          |> Enum.count()
        end)
     |> Enum.to_list()

Writing SQL queries with Ecto

Express what you need close to SQL, but within Elixir. Compose queries while keeping them easy to read:

# Create a query
query_1 = from u in User, where: u.age > 18

# Extend the query
query_2 = from u in query_1, select: u.name

Also check out Backpex for admin panels, and Sagents for calling LLMs.

What’s Next?

Beyond new features and integrations, our next step is to integrate LLMs with the BEAM’s observability and telemetry to reduce support ticket and monitoring alerts response times.

As soon as an alert or ticket is created, we launch an LLM to inspect and analyze potential problems and report them back to a support engineer, saving time and improving client experience.

Elixir has already proven to be an ideal tool for AI and Agents, and has recently been adopted by OpenAI for orchestration in Symphony.

There also have been recent efforts to use LLMs for self healing supervision trees with Beamlens. The idea is to feed process crash reasons, BEAM health, OS-level metrics, and other available information into an LLM that can diagnose failures and suggest or apply corrective actions, like adjusting a GenServer’s initial state or reconfiguring a supervisor’s restart strategy.

This is still experimental, but worth keeping an eye on.

Wrap up

Adopting Elixir at Invision has let us replace a fragmented stack of C++, Go, Python, and JavaScript with a cohesive platform built on the BEAM.

The concurrency model eliminated entire classes of bugs, Phoenix LiveView unified our front and back-end development, and the BEAM’s introspection capabilities gave us production debugging we never had before.

If your team deals with real-time systems, concurrent workloads, and/or full-stack web development, Elixir is worth a serious look. The learning curve is real but short, and the payoff in simplicity and reliability has been significant for us.

We’re happy to share more about our experience — feel free to reach out at contact@invision.ai, and stay tuned for more updates!

— A big thank you to the tech team at Invision AI for helping review this article.