Deploying Streamlit Apps to Dash Enterprise with Databricks

Plotly
Plotly
Published in
9 min readFeb 6, 2024

--

📌 Plotly on Databricks Blog Series — Article 7

Authors: Ivan Trusov (Databricks), Cal Reynolds (Plotly)
Contributors: Dave Gibbon (Plotly)

This Streamlit application utilizes Databricks SQL as its back end for complex calculations and Plotly charts on the front end for dynamic visualizations.

If you haven’t already, check out our Plotly Dash / Databricks webpage for more interesting use cases, points of inspiration, and joint customer stories.

We encourage you to hear directly from our common customers (CIBC, Uniper, S&P, Molson Coors, Collins Aerospace, Ballard Power Systems, and more).

Background

Dash Enterprise — Plotly’s commercial offering for developing and operationalizing data apps — emerged initially due to the industry’s need to self-host open-source Dash applications reliably. Since then, Plotly has added numerous enterprise-exclusive features to lower the barrier to entry for app development, including the Dash Enterprise Design Kit for lower-code app design, Snapshot Engine for report generation, our brand new notebook-to-app tool, App Studio, and more.

Lately at Plotly, we have been hearing that our customers are looking to deploy other Python data app frameworks on Dash Enterprise, including but not limited to:

  • Django
  • Flask
  • Streamlit

You may ask yourself:

Why are Plotly’s customers interested in deploying non-Dash apps to Dash Enterprise, especially when Dash is literally in the product’s title?

Here are a couple of use cases that we hear about frequently from our customers:

1. Varied skill sets across different data science teams

Most Dash Enterprise customers are large enterprises that have many different data application teams. Some teams may choose to use Dash, while others may have more expertise with Streamlit, Flask, or Django. Aligning those teams on Dash Enterprise as a platform allows organizations to centralize their data app workflows, allowing for simpler resource management and better collaboration.

2. Leveraging Dash Enterprise features

For organizations that are utilizing Flask, Django, or Streamlit, Dash Enterprise provides tons of features to make developers’ lives easier, regardless of app framework. We cover these features in the Dash Enterprise” section below.

At Plotly, many of our Dash Enterprise customers use Databricks in some way as their application back end. That said, we thought that an appropriate course of action would be to collaborate with Databricks on this workflow. In this article, we will cover, with the help of Databricks Senior Specialist Solution Architect Ivan Trusov:

  1. How users can develop and deploy Streamlit applications using Dash Enterprise.
  2. How users can leverage Databricks SQL as their back end for those applications.

Let’s get started!

Introduction

Here’s a scenario: you bought flights to New York for a romantic getaway with your partner.

Issue is… you have been too busy at work to do any planning! There are seemingly an infinite number of shops, events, and generally… places to go to in New York. What should you spend your time doing?

Let’s build a simple Streamlit app to help you figure out where to go in New York!

We will leverage the Databricks Marketplace, powered by Delta Sharing, to check out our dataset — Foursquare’s free “Places in NYC” dataset.

Our dataset covers tons of different types of NYC places, including bars, restaurants, roofers, toilet repair shops… even souvlaki shops! We will have plenty of inspiration and ideas to ensure we don’t blow our romantic getaway with this app.

Because this sample dataset is free via the Databricks Marketplace, you can follow along live as we walk through this workflow.

Above is a basic architecture for this application. Databricks works as the application back end, with a Streamlit-built front end deployed and managed on Dash Enterprise.

The Streamlit App

Our Streamlit app leverages the Databricks Python SQL Connector to read data from Databricks. We will cover how this workflow’s integration with Databricks functions in the following section.

Our app has a few key features:

1. Plotly Map Chart

We include a Plotly geospatial visualization that plots the number of places in each NYC zip code, based on geoJSON bounds. Plotly Express allows for this kind of complex geospatial visualization out-of-box.

2. Plotly Sankey Diagram

Next, we have a Plotly Sankey diagram — a categorical visualization that links places based on shared categories. Again, out-of-box our Sankey diagram is interactive, meaning an end user can shift categories around on the screen and any associated linkages will respond dynamically.

3. Plotly Bar + Line Chart

This dual-threat Plotly chart shows both the Google Maps popularity and provenance rating of each place in a certain zip code.

Plotly charts empower this Streamlit app with interactivity and sophistication, allowing easy navigation. The workflow for this app is as follows:

  1. A user selects the types of NYC places they’re interested in seeing (i.e., cafes, museums, or more).
  2. That selection triggers a filtered Databricks SQL query using the Python SQL Connector which runs directly on your Databricks Serverless Compute.
  3. That queried data is cached via Streamlit’s st.cache functionality.
  4. If a user wants to zero in on places in a certain zip code, they can use the Streamlit text input to type in a certain zip code. This will kick off a different DBX SQL query, which then populates our joint provenance + popularity chart.

The workflow in this case is simple. We leverage simple Streamlit APIs to build the app layout. Then, we utilize sophisticated data visualization from Plotly and the full horsepower of Databricks SQL Warehouse to make it powerful, compelling for storytelling, and fully interactive for end users.

(NB: In the “Dash Enterprise’’ section below, we’ll show how we can leverage it to publish the app for others to utilize.)

Databricks back end

Databricks offers several interfaces for data application developers that can be used to retrieve and send data in Databricks’ data intelligence platform. Users can use Databricks Connect V2 for Spark API access, or SQL Execution API for REST operations, and a DBSQL interface for ODBC/JDBC access to the data.

One of the most simple, useful, and popular interfaces is the DBSQL interface, which allows users to connect via the familiar ODBC/JDBC protocol and send queries to the data stored in the cloud storage in the Delta format.

In our example, we’re using a structured dataset that any user can download via the Databricks Marketplace.

To follow this example, please go to your Databricks workspace, open the Marketplace tab on the left side of the screen, and search for the “Places — Free New York City sample” dataset:

Click on the dataset and follow the instructions on the page to save this dataset into your catalog.

After doing this, create a new Databricks SQL Endpoint (we recommend the Serverless option), or use an existing one. Click on the “Connection details” and provide them in the env file, as well as prepare a token via the “Create a personal access token” button on the same page. Now, put all of these details into the .env file, according to the .env.sample file.

Now it’s time to add all of the relevant data operations to our application.

For the sake of better code, we’ve factored out all of the data-related actions into a separate class called DataProvider. It contains all of the necessary methods to retrieve data using a DBSQL connection. The Databricks Python DBSQL connector also provides a built-in capability to immediately transform the data into a Pandas DataFrame object, making the overall process simple and straightforward for the developers.

Here is part of the code of this class:

import os

import pandas as pd
import pyarrow as pa
from databricks import sql


class DataProvider:
def __init__(self) -> None:
self._connection = self._get_connection()

def _get_connection(self):
return sql.connect(
server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path=os.getenv("DATABRICKS_HTTP_PATH"),
access_token=os.getenv("DATABRICKS_TOKEN"),
use_cloud_fetch=True,
)

@property
def source_table(self) -> str:
return f'{os.getenv("DATABRICKS_CATALOG")}.{os.getenv("DATABRICKS_SCHEMA")}.{os.getenv("DATABRICKS_TABLE")}'

def get_all_categories(self) -> list[str]:
with self._connection.cursor() as cursor:
cursor.execute(f"select distinct explode(fsq_category_labels[0]) FROM {self.source_table}")
pa_table: pa.Table = cursor.fetchall_arrow()
return pa_table.to_pandas().squeeze().dropna().to_list()

def get_top_categories(self, n=50) -> pd.DataFrame:
with self._connection.cursor() as cursor:
cursor.execute(
f"""
select category, count(1) as cnt
from (
select explode(fsq_category_labels[0]) as category
FROM {self.source_table}
) group by 1 order by 2 desc limit {n}
"""
)
pa_table: pa.Table = cursor.fetchall_arrow()
return pa_table.to_pandas()

As you can see, in the class initialization step we prepare a connection object that will be used for further queries. Users can easily execute new queries inside this connection by using the cursor.execute method. To improve the data retrieval performance, we’re using the fetch_arrow() method which allows us to directly access the tables in Arrow format and convert them on the fly into Pandas data frames.

This class encapsulates the logic related to data retrieval, making it easy to use this with other parts of the code, for instance:

with st.spinner("Loading data..."):
df2 = data_provider.get_popular_places(selected, zipcode_input)Dash Enterprise

What is Dash Enterprise?

We have now built out our simple Streamlit application with Databricks as its back end. How should we publish/host our app so that users can access it?

Dash Enterprise is a centralized data application platform for building, hosting, and securing organizations’ apps. With Dash Enterprise comes a variety of back-end features that become crucial for the Fortune 500 firms that we work with every day, including but not limited to:

  • Versatile CI/CD Integration (including GitHub Actions, GitLab, Azure DevOps, etc.)
  • Two-click instantiation of middleware for data & query caching.
  • A persistent filesystem for large-scale data processing and caching.
  • Version history of data applications, with a one-click ability to roll back changes.
  • An in-browser IDE with a GUI-based deployment mechanism.
  • Encrypted environment variables for connections to data sources (e.g. Databricks).
  • Reporting, UI-based app-theming, AI-integrated Plotly widgets, and more.
  • Ability to serve many concurrent app users without extra configuration.

For this exercise, we will leverage Dash Enterprise’s simple app deployment and hosting functionality for operationalizing our Streamlit app to a URL that users can access. We will do minimal extra configuration, and many concurrent end users will be able to access the app without impeding each other’s progress and/or session.

Deploying our Streamlit App to DE

First, we will leverage Dash Enterprise’s App Manager UI for deploying an application.

We will then create a new application. Dash Enterprise allows users to start from pre-built templates, or to directly push their apps to the platform.

Dash Enterprise’s built-in Streamlit app template within the Enterprise App Catalog.

Since we have an existing streamlit app, we will instead directly push our application to Dash Enterprise with Git.

Dash Enterprise will provide instructions for doing so as seen below

When we follow those Git instructions and push our application to Dash Enterprise, it will automatically deploy to a specified URL, provided our application does not have critical Python errors. When our app has initialized, we will want to upload our Databricks environment variables to Dash Enterprise’s encrypted environment variable UI. This will ensure a seamless integration with Databricks at the app level.

Dash Enterprise’s encrypted environment variable UI

As a part of Dash Enterprise, we also offer an in-browser IDE, called a workspace. It allows for a dual-screen app development environment, where code is on the left-hand side of the screen, and a live, dynamic preview of the application you’re building is on the right.

Dash Enterprise’s dual-screen app development environment

When we are happy with our deployed application, we can utilize Dash Enterprise to govern/restrict viewer access of the app. In this case, we made our app publicly accessible (i.e. no access restrictions). Most customers choose to integrate with Azure Active Directory, recently renamed Microsoft Entra ID for users and groups access management.

As you can see, Dash Enterprise as a platform, whether we utilize Dash, Streamlit, or any other app framework, makes deploying data apps a whole lot easier.

Conclusion

With little effort, we have saved your romantic getaway weekend by providing tons of ideas for places to go in NYC with Dash Enterprise, Streamlit, and Databricks.

To remind you of where each of these fits:

  1. Dash Enterprise is our centralized platform software for hosting, building, and distributing data applications. It can be installed on any major cloud provider or on-prem (including in air gap networks).
  2. Streamlit is the framework that we leveraged to build this app.
  3. Databricks is the back end, acting as a warehouse for our data as well as an engine for our data computation.

We are excited to see how our customers deploy Streamlit, Flask, and Django apps on Dash Enterprise as a complement to their use of Dash.

Questions?

Keep an eye out for future articles! We’re dedicated to expanding our partnership offerings with Databricks and beyond.

Questions? Or want to discuss how this workflow may apply to your team’s needs? Please email info@plotly.com.

--

--

Plotly
Plotly

The low-code framework for rapidly building interactive, scalable data apps in Python.