Hosting your Models and Datasets on Hugging Face Spaces using Streamlit

Resumen Principal

El artículo presenta una guía práctica para alojar modelos de inteligencia artificial y conjuntos de datos en Hugging Face Spaces utilizando Streamlit como herramienta de desarrollo. Esta combinación permite a investigadores, desarrolladores y entusiastas de la IA crear y compartir aplicaciones interactivas de manera accesible y eficiente. Hugging Face Spaces ofrece una plataforma basada en la nube que facilita la colaboración y el despliegue rápido de proyectos de machine learning, mientras que Streamlit proporciona un marco intuitivo para construir interfaces web sin necesidad de experiencia en desarrollo frontend. El enfoque se alinea con el objetivo más amplio de democratizar el acceso a la inteligencia artificial mediante prácticas de ciencia abierta y código abierto, permitiendo que innovaciones tecnológicas lleguen a una audiencia más amplia. La integración entre estas tecnologías representa una solución escalable para la demostración de modelos, la visualización de datos y la interacción comunitaria en proyectos de IA.

Elementos Clave

Hugging Face Spaces: Plataforma en la nube diseñada para el despliegue y compartición de aplicaciones de machine learning, facilitando la colaboración y el acceso público a modelos y datasets.
Streamlit: Framework de código abierto que permite construir aplicaciones web interactivas con Python de forma rápida y sencilla, ideal para prototipos y demos de modelos de IA.
Modelos y Datasets: Recursos centrales que pueden ser cargados, gestionados y visualizados directamente en la plataforma, promoviendo la transparencia y reutilización en investigación.
Ciencia Abierta y Código Abierto: Principios fundamentales que guían la iniciativa, buscando eliminar barreras al acceso de la IA y fomentar la innovación colaborativa a través de herramientas accesibles.

Análisis e Implicaciones

Esta integración tecnológica tiene un impacto significativo en la accesibilidad de la inteligencia artificial, permitiendo que investigadores y desarrolladores compartan sus trabajos sin barreras técnicas o financieras. Además, impulsa la reproducibilidad en ciencia de datos, al ofrecer entornos configurables y versionables para cada proyecto. La capacidad de crear demos interactivas también mejora la comunicación entre técnicos y no técnicos, facilitando la adopción de soluciones basadas en IA en diversos sectores.

Contexto Adicional

Hugging Face se ha consolidado como una de las plataformas líderes en el ecosistema de IA, promoviendo activamente el intercambio colaborativo de modelos y conocimiento. La combinación con herramientas como Streamlit refuerza su compromiso con la democratización tecnológica y el fortalecimiento de comunidades de código abierto en todo el mundo.

Showcase your Datasets and Models using Streamlit on Hugging Face Spaces

Streamlit allows you to visualize datasets and build demos of Machine Learning models in a neat way. In this blog post we will walk you through hosting models and datasets and serving your Streamlit applications in Hugging Face Spaces.

Building demos for your models

You can load any Hugging Face model and build cool UIs using Streamlit. In this particular example we will recreate "Write with Transformer" together. It's an application that lets you write anything using transformers like GPT-2 and XLNet.

write-with-transformers

We will not dive deep into how the inference works. You only need to know that you need to specify some hyperparameter values for this particular application. Streamlit provides many components for you to easily implement custom applications. We will use some of them to receive necessary hyperparameters inside the inference code.

The .text_area component creates a nice area to input sentences to be completed.
The Streamlit .sidebar method enables you to accept variables in a sidebar.
The slider is used to take continuous values. Don't forget to give slider a step, otherwise it will treat the values as integers.
You can let the end-user input integer vaues with number_input .

import streamlit as st

# adding the text that will show in the text box as default
default_value = "See how a modern neural network auto-completes your text 🤗 This site, built by the    Hugging Face team, lets you write a whole document directly from your browser, and you can trigger the Transformer anywhere using the Tab key. Its like having a smart machine that completes your thoughts 😀 Get started by typing a custom snippet, check out the repository, or try one of the examples. Have fun!"

sent = st.text_area("Text", default_value, height = 275)
max_length = st.sidebar.slider("Max Length", min_value = 10, max_value=30)
temperature = st.sidebar.slider("Temperature", value = 1.0, min_value = 0.0, max_value=1.0, step=0.05)
top_k = st.sidebar.slider("Top-k", min_value = 0, max_value=5, value = 0)
top_p = st.sidebar.slider("Top-p", min_value = 0.0, max_value=1.0, step = 0.05, value = 0.9)
num_return_sequences = st.sidebar.number_input('Number of Return Sequences', min_value=1, max_value=5, value=1, step=1)

The inference code returns the generated output, you can print the output using simple st.write. st.write(generated_sequences[-1])

Here's what our replicated version looks like. streamlit-rep

You can checkout the full code here.

Showcase your Datasets and Data Visualizations

Streamlit provides many components to help you visualize datasets. It works seamlessly with 🤗 Datasets, pandas, and visualization libraries such as matplotlib, seaborn and bokeh.

Let's start by loading a dataset. A new feature in Datasets, called streaming, allows you to work immediately with very large datasets, eliminating the need to download all of the examples and load them into memory.

from datasets import load_dataset
import streamlit as st

dataset = load_dataset("merve/poetry", streaming=True)
df = pd.DataFrame.from_dict(dataset["train"])

If you have structured data like mine, you can simply use st.dataframe(df) to show your dataset. There are many Streamlit components to plot data interactively. One such component is st.barchart() , which I used to visualize the most used words in the poem contents.

st.write("Most appearing words including stopwords")
st.bar_chart(words[0:50])

If you'd like to use libraries like matplotlib, seaborn or bokeh, all you have to do is to put st.pyplot() at the end of your plotting script.

st.write("Number of poems for each author")
sns.catplot(x="author", data=df, kind="count", aspect = 4)
plt.xticks(rotation=90)
st.pyplot()

You can see the interactive bar chart, dataframe component and hosted matplotlib and seaborn visualizations below. You can check out the code here.

spaces-streamlit-dataset-demo

Hosting your Projects in Hugging Face Spaces

You can simply drag and drop your files as shown below. Note that you need to include your additional dependencies in the requirements.txt. Also note that the version of Streamlit you have on your local is the same. For seamless usage, refer to Spaces API reference.

spaces-streamlit

There are so many components and packages you can use to demonstrate your models, datasets, and visualizations. You can get started here.

Absortio