Amplify Your Organization’s Custom LLM Strategy Using Databricks with Plotly (Part 1)

Plotly

Published in

Plotly

6 min readMar 25, 2024

Author: Sachin Seth Contributors: Cal Reynolds, Dave Gibbon

***TL;DR****: Nontechnical readers can rejoice (and feel affirmed by* *Jensen Huang’s prediction). You too can participate in the LLM revolution by leveraging an intuitive, interactive front end to derive full value from your Data/ML/AI stack, all without having to look at any code.* ***Send this article to your data engineering colleagues.***

Overview

It has never been easier to develop and deploy custom LLMs tailored to your organization. Gone are the days of parsing through pages of documentation and code across different websites and services to leverage an open-source LLM model catered to your data. Instead, this article demonstrates how end users (i.e., non-coders!) can use a Plotly Dash application to check out, register, manage, and deploy LLM models on Databricks, all from the comfort of an interactive web application front end.

This app in particular acts as a device for exploring models that might be too large to run on your local machine by leveraging high-powered and easily accessible Databricks GPUs. Moving forward, we want this app to be for Databricks, what alternative desktop solutions such as LM Studio or Ooogabooga are for those with the hardware on hand.

Note: This article is part one of a two-part series. In this article, we provide the barebones implementation and back end information for our workflow. In article two, we will demonstrate what this workflow is capable of — think complex Plotly visualizations, big data, cascading LLMs, and more!

High-level steps

Behind the scenes, this app is running a notebook as a workflow in a Databricks workspace. On a broader level, the process can be broken down into a few steps. This application:

Allows users to dynamically select either a model from Hugging Face or one of the Databricks Foundational models.
Automatically registers the model to Unity Catalog inside of Databricks with MLFlow and the Databricks SDK.
Dynamically deploys the model to a Serving Endpoint hosted on a Databricks GPU.
Provides an interface for querying that model from a sleek, full-stack, Python-based Plotly Dash application.

Below is a simple visual architecture that shows this application’s flow:

Our application’s architecture visualized

In the following steps, we’ll break down this application’s workflow even more granularly. If you are less interested in the nitty-gritty details, skip to our Conclusion section below.

For a step-by-step guide to running this application, see this article’s corresponding Github repository.

Browse Hugging Face for a model you want to chat with

Hugging Face is a platform best known for the development and maintenance of its Transformers library. The library itself provides a vast collection of pre-trained models for a wide range of NLP tasks as well as refined techniques for fine-tuning and pipeline construction. This article’s Dash app allows for end users to interact with any of Hugging Face’s models without having to know how to make an interface, host the model, or any of the other technical back end steps.

An end user of this application can navigate to Hugging Face to browse and input the name of the model they are interested in chatting with into their Dash app. Hugging Face provides high-level, nontechnical descriptions of each of their hosted models’ capabilities. Currently, this app is designed for text-generation tasks, but keep an eye out for more NLP tasks coming soon.

Using this Dash app to register and deploy a Hugging Face model on Databricks

In your Plotly Dash application, to provision and utilize a model from Hugging Face + Databricks, here’s what you will need to do:

Provide some information: You’ll need to fill in a few things, like the name of the model you found on Hugging Face, the maximum amount of text you want the model to generate, and a starting setting for how creative it should be with its answers (called temperature).
Optionally, pick your resources: If your model requires more GPU resources than the default values of GPU Medium (A10G) and Small 4 concurrency (24 DBU), you can make those changes here as well.
Start the process: Once you’ve filled in everything, just click a button to get the notebook working!

What happens when you click “Register”?

When an end user clicks the “Register…” button in this Dash app, a few crucial steps happen behind the scenes:

Firing up the notebook: Our application uses a special type of cluster, a spot-type job cluster, to run a Databricks notebook that provisions our Hugging Face model. It only runs when you need it (like a temporary worker!). Utilizing the Jobs API via the Databricks SDK for Python, our application pushes the parameters the end user entered into the now-running Databricks notebook, which then does the hard work of interfacing with Hugging Face.

2. Databricks back end interface: This single Databricks notebook not only registers our new LLM model to Unity Catalog for us, but it also logs it in MLFlow so that we can create a model serving endpoint. Here, we can specify what kind of GPU we need in order to query your endpoint. Keep in mind that bigger models might need more memory!

If completed exclusively on Databricks, these steps are technical by nature. They require that a Python-literate analyst or engineer can run notebooks in Databricks. By packaging this workflow into a Dash application, we avoid a Python interface.

Heads up! Depending on the size of your model, this whole process could take up to a few hours before it’s ready to go.

Chatting with the model from your Dash application

Once your model is all set up, here’s how to interact with it:

Pick your model: A dropdown menu will appear near the chat window, where you can choose the model you registered.
Check the status: A badge will show you if your chosen model is ready to use (“READY” means go!).
Start the conversation: Now, you can simply type your message and hit send. The system will send your message to the model and the response will appear in the chat window. All three Databricks Foundational Model Types (chat, embedded, and completion) are supported natively.

Remember: choose the right model from the dropdown and send the appropriate message type to get the best results!

This application logs chat history in a separate tab that is accessible at any time. And, because each endpoint request consumes a certain amount of computational resources, we track resource consumption as well to ensure that the end user isn’t racking up an inordinate amount on non-business critical queries.

Conclusion

The power of Databricks with Plotly is clear in this workflow. Databricks provides the most powerful data platform on the market, allowing users to build, develop, and derive value from generative AI securely.

Plotly Dash provides both a seamless front-end visual experience and a strong back end capable of complex API connections to Databricks in Python. Dash enables even the least technical end users to derive value from their organization’s Databricks platform — due to intuitive full-stack web UIs, perhaps without even knowing it.

This workflow shows just the beginning of what is possible using Dash as a front end for the robust model-serving capabilities provided by Databricks. In the future, this architecture could be used to build a fully-fledged, Databricks-powered application to explore large, open-source LLMs and additionally configure and finetune these models into custom solutions.

This guide aims to empower Dash and Databricks users to envision and actualize LLM-based data science solutions within their organizations. We are eager to see how organizations large and small will build off of this workflow.

Looking forward

In part two of this article, we will cover how we can extend this application to build Dash applications complete with Plotly charts utilizing Databricks LLMs. See below for a preview:

Build Dash applications with Plotly charts utilizing Databricks LLMs

Please reach out if you want to learn how to stream OpenAI API responses and responses from Databricks endpoints into your Dash app.

Stay tuned for more detailed tutorials, code snippets, and best practices for integrating Dash and Databricks.