Tutorial: Data Science
In this tutorial, we will introduce Solara from the perspective of a data scientist or when you are thinking of using Solara for a data science app. It is therefore focused on data (Pandas), visualizations (plotly) and how to add interactivity.
You should know
This tutorial will assume:
- You have successfully installed Solara
- You know how to display a Solara component in a notebook or script
If not, please follow the Quick start.
Extra packages you need to install
For this tutorial, you need plotly and pandas, you can install them using pip:
$ pip install plotly pandas
Note: You might want to refresh your browser after installing plotly when using Jupyter.
You will learn
In this tutorial, you will learn:
- To create a scatter plot using plotly.express
- Display your plot in a Solara component.
- Build a UI to configure the X and Y axis.
- Handle a click event and record which point was clicked on.
- Refactor your code to build a reusable Solara component.
- Compose your newly built component into a larger application.
The dataset
For this tutorial, we will use the Iris flow data set which contains the lengths and widths of the petals and sepals of three species of Iris (setosa, virginica and versicolor).
This dataset comes with many packages, but since we are doing to use plotly.express for this tutorial, we will use:
import plotly.express as px
df = px.data.iris()
Our first scatter plot
We use plotly express to create our scatter plot with just a single line.
fig = px.scatter(df, "sepal_length", "sepal_width")
To display this figure in a Solara component, we should return an element that can render the plotly figure. FigurePlotly will do the job for us.
Putting this together
import plotly.express as px
import solara
df = px.data.iris()
@solara.component
def Page():
fig = px.scatter(df, "sepal_length", "sepal_width")
solara.FigurePlotly(fig)
Configuring the X-axis
To configure the X-axis, first, create a global application state using:
x_axis = solara.reactive("sepal_length")
This code creates a reactive variable. You can use this reactive variable in your component and pass it to a Select
component to control the selected column.
columns = list(df.columns)
solara.Select(label="X-axis", values=columns, value=x_axis)
Now, when the Select component's value changes, it will also update the reactive variable x_axis.
If your components use the reactive value to create the plot, for example:
fig = px.scatter(df, x_axis.value, "sepal_width")
The component will automatically re-execute the render function when the x_axis
value changes, updating the figure accordingly.
columns = list(df.columns)
x_axis = solara.reactive("sepal_length")
@solara.component
def Page():
# Create a scatter plot by passing "x_axis.value" to px.scatter
# This will automatically make the component listen to changes in x_axis
# and re-execute this function when x_axis value changes
fig = px.scatter(df, x_axis.value, "sepal_width")
solara.FigurePlotly(fig)
# Pass x_axis to Select component
# The select will control the x_axis reactive variable
solara.Select(label="X-axis", value=x_axis, values=columns)
Understanding (optional)
State
Understanding state management and how Solara re-renders component is crucial for understanding building larger applications. If you don't fully graps it now, that is ok. You should first get used to the pattern, and consider reading About state management later on to get a deeper understanding.
Configure the Y-axis.
Now that we can configure the X-axis, we can repeat the same for the Y-axis. Try to do this yourself, without looking at the code, as a good practice.
y_axis = solara.reactive("sepal_width")
@solara.component
def Page():
fig = px.scatter(df, x_axis.value, y_axis.value)
solara.FigurePlotly(fig)
solara.Select(label="X-axis", value=x_axis, values=columns)
solara.Select(label="Y-axis", value=y_axis, values=columns)
Interactive plot
We now built a small UI to control a scatter plot. However, often we also want to interact with the data, for instance select a point in our scatter plot.
We could look up in the plotly documentation how exactly we can extract the right data, but lets take a different approach. We are simply going to store the data we get from on_click
into a new reactive variable (click_data
) and display the raw data into a Markdown component.
click_data = solara.reactive(None)
@solara.component
def Page():
fig = px.scatter(df, x_axis.value, y_axis.value)
solara.FigurePlotly(fig, on_click=click_data.set)
solara.Select(label="X-axis", value=x_axis, values=columns)
solara.Select(label="Y-axis", value=y_axis, values=columns)
# display it pre-formatted using the backticks `` using Markdown
solara.Markdown(f"`{click_data}`")
Inspecting the on_click data
Click a point and you should see the data printed out like:
{'event_type': 'plotly_click', 'points': {'trace_indexes': [0], 'point_indexes': [32], 'xs': [5.2], 'ys': [4.1]}, 'device_state': {'alt': False, 'ctrl': False, 'meta': False, 'shift': False, 'button': 0, 'buttons': 1}, 'selector': None}
From this, we can get the row index, and the x and y coordinate.
click_data = solara.reactive(None)
@solara.component
def Page():
fig = px.scatter(df, x_axis.value, y_axis.value)
solara.FigurePlotly(fig, on_click=click_data.set)
solara.Select(label="X-axis", value=x_axis, values=columns)
solara.Select(label="Y-axis", value=y_axis, values=columns)
# display it pre-formatted using the backticks `` using Markdown
if click_data.value:
row_index = click_data.value["points"]["point_indexes"][0]
x = click_data.value["points"]["xs"][0]
y = click_data.value["points"]["ys"][0]
solara.Markdown(f"`Click on index={row_index} x={x} y={y}`")
Displaying the nearest neighbours
We now have the point we clicked on, we will use that to improve our component, we will.
- Add an indicator in the scatter plot to highlight which point we clicked on.
- Find the nearest neighbours and display them in a table.
For the first item, we simply use plotly express again, and add the single trace it generated to the existing figure (instead of displaying two separate figures).
We add a function to find the n
nearest neighbours:
def find_nearest_neighbours(df, xcol, ycol, x, y, n=10):
df = df.copy()
df["distance"] = ((df[xcol] - x)**2 + (df[ycol] - y)**2)**0.5
return df.sort_values('distance')[1:n+1]
We now only find the nearest neighbours if click_data.value
is not None, and display the dataframe using the DataFrame
component.
click_data = solara.reactive(None)
def find_nearest_neighbours(df, xcol, ycol, x, y, n=10):
df = df.copy()
df["distance"] = ((df[xcol] - x) ** 2 + (df[ycol] - y) ** 2) ** 0.5
return df.sort_values("distance")[1 : n + 1]
@solara.component
def Page():
fig = px.scatter(df, x_axis.value, y_axis.value, color="species", custom_data=[df.index])
if click_data.value is not None:
x = click_data.value["points"]["xs"][0]
y = click_data.value["points"]["ys"][0]
# add an indicator
fig.add_trace(px.scatter(x=[x], y=[y], text=["⭐️"]).data[0])
df_nearest = find_nearest_neighbours(df, x_axis.value, y_axis.value, x, y, n=3)
else:
df_nearest = None
solara.FigurePlotly(fig, on_click=click_data.set)
solara.Select(label="X-axis", value=x_axis, values=columns)
solara.Select(label="Y-axis", value=y_axis, values=columns)
if df_nearest is not None:
solara.Markdown("## Nearest 3 neighbours")
solara.DataFrame(df_nearest)
else:
solara.Info("Click to select a point")