...

Publish Interactive Data Visualizations for Free with Python and Marimo


Working in Data Science, it can be hard to share insights from complex datasets using only static figures. All the facets that describe the shape and meaning of interesting data are not always captured in a handful of pre-generated figures. While we have powerful technologies available for presenting interactive figures — where a viewer can rotate, filter, zoom, and generally explore complex data  —  they always come with tradeoffs.

Here I present my experience using a recently released Python library — marimo — which opens up exciting new opportunities for publishing interactive visualizations across the entire field of data science.

Interactive Data Visualization

The tradeoffs to consider when selecting an approach for presenting data visualizations can be broken into three categories:

  • Capabilities — what visualizations and interactivity am I able to present to the user?
  • Publication Cost — what are the resources needed for displaying this visualization to users (e.g. running servers, hosting websites)?
  • Ease of Use – how much of a new skillset / codebase do I need to learn upfront?

JavaScript is the foundation of portable interactivity. Every user has a web browser installed on their computer and there are many different frameworks available for displaying any degree of interactivity or visualization you might imagine (for example, this gallery of amazing things people have made with three.js). Since the application is running on the user’s computer, no costly servers are needed. However, a significant drawback for the data science community is ease of use, as JS does not have many of the high-level (i.e. easy-to-use) libraries that data scientists use for data manipulation, plotting, and interactivity.

Python provides a useful point of comparison. Because of its continually growing popularity, some have called this the “Era of Python”. For data scientists in particular, Python stands alongside R as one of the foundational languages for quickly and effectively wielding complex data. While Python may be easier to use than Javascript, there are fewer options for presenting interactive visualizations. Some popular projects providing interactivity and visualization have been Flask, Dash, and Streamlit (also worth mentioning — bokeh, HoloViews, altair, and plotly). The biggest tradeoff for using Python has been the cost for publishing – delivering the tool to users. In the same way that shinyapps require a running computer to serve up the visualization, these Python-based frameworks have exclusively been server-based. This is by no means prohibitive for authors with a budget to spend, but it does limit the number of users who can take advantage of a particular project.

Pyodide is an intriguing middle ground — Python code running directly in the web browser using WebAssembly (WASM). There are resource limitations (only 1 thread and 2GB memory) that make this impractical for doing the heavy lifting of data science. However, this can be more than sufficient for building visualizations and updating based on user input. Because it runs in the browser, no servers are required for hosting. Tools that use Pyodide as a foundation are interesting to explore because they give data scientists an opportunity to write Python code which runs directly on users’ computers without their having to install or run anything outside of the web browser.

As an aside, I’ve been interested previously in one project that has tried this approach: stlite, an in-browser implementation of Streamlit that lets you deploy these flexible and powerful apps to a broad range of users. However, a core limitation is that Streamlit itself is distinct from stlite (the port of Streamlit to WASM), which means that not all features are supported and that advancement of the project is dependent on two separate groups working along compatible lines.

Introducing: Marimo

This brings us to Marimo.

The first public announcements of marimo were in January 2024, so the project is very new, and it has a unique combination of features:

  • The interface resembles a Jupyter notebook, which will be familiar to users.
  • Execution of cells is reactive, so that updating one cell will rerun all cells which depend on its output.
  • User input can be captured with a flexible set of UI components.
  • Notebooks can be quickly converted into apps, hiding the code and showing only the input/output elements.
  • Apps can be run locally or converted into static webpages using WASM/Pyodide.

marimo balances the tradeoffs of technology in a way that is well suited to the skill set of the typical data scientists:

  • Capabilities — user input and visual display features are rather extensive, supporting user input via Altair and Plotly plots.
  • Publication Cost — deploying as static webpages is basically free — no servers required
  • Ease of Use — for users familiar with Python notebooks, marimo will feel very familiar and be easy to pick up.

Publishing Marimo Apps on the Web

The best place to start with marimo is by reading their extensive documentation

As a simple example of the type of display that can be useful in data science, consisting of explanatory text interspersed with interactive displays, I have created a barebones GitHub repository. Try it out yourself here.

Example publication created with marimo (image created by author)

Using just a little bit of code, users can:

  • Attach source datasets
  • Generate visualizations with flexible interactivity
  • Write narrative text describing their findings
  • Publish to the web for free (i.e. using GitHub Pages)

For more details, read their documentation on web publishing and template repository for deploying to GitHub Pages.

Public App / Private Data

This new technology offers an exciting new opportunity for collaboration — publish the app publicly to the world, but users can only see specific datasets that they have permission to access.

Rather than building a dedicated data backend for every app, user data can be stored in a generic backend which can be securely authenticated and accessed using a Python client library — all contained within the user’s web browser. For example, the user is given an OAuth login link that will authenticate them with the backend and allow the app to temporarily access input data.

As a proof of concept, I built a simple visualization app which connects to the Cirro data platform, which is used at my institution to manage scientific data. Full disclosure: I was part of the team that built this platform before it spun out as an independent company. In this manner users can:

  • Load the public visualization app — hosted on GitHub Pages
  • Connect securely to their private data store
  • Load the appropriate dataset for display
  • Share a link which will direct authorized collaborators to the same data

Try it out yourself here.

Example visualization app sourcing user controlled data (image created by author)

As a data scientist, this approach of publishing free and open-source visualization apps which can be used to interact with private datasets is extremely exciting. Building and publishing a new app can take hours and days instead of weeks and years, letting researchers quickly share their insights with collaborators and then publish them to the wider world.


Source link
#Publish #Interactive #Data #Visualizations #Free #Python #Marimo