Scatter Plot Gallery¶
Scatter plots show the relationship between two numeric variables, one observation per point. They're the chart of choice when the question is "are these two things correlated" or "which observations are outliers". Adding colour by group reveals whether the relationship differs across categories, and adding bubble sizing folds in a third dimension without crowding the chart.
Scatter plots excel at:
- Relationship analysis: see whether two variables move together, against each other, or independently
- Group comparison: contrast the shape of a relationship across product categories or store formats
- Outlier identification: spot the SKU, store, or customer that sits well off the cluster
- Bubble charts: encode a third variable as point size for richer comparison without a second chart
import matplotlib.pyplot as plt
import pandas as pd
from openretailscience.plots import scatter
from openretailscience.plots.styles.graph_utils import set_axis_shorthand
Basic Scatter Plot¶
The simplest call: pick an x_col and a value_col, one point per row. Use this shape when each row is an independent observation (a SKU, a store, a transaction) and you want to see how the two variables relate. With no grouping or sizing, the dots all share one colour and one size.
scatter.plot(
sku_price_volume,
x_col="price",
value_col="units_sold",
eyebrow="Price elasticity",
title="Higher prices map cleanly onto lower unit volumes across 8 SKUs",
subtitle="Monthly units sold versus list price ($), single category",
x_label="Price ($)",
y_label="Units sold",
)
plt.show()
Multiple Scatter Groups¶
Pass group_col to colour each point by a category, drawing the groups as overlaid scatter clouds. Pick this when the analytical question is whether the relationship between the two axes differs across groups, not just whether it exists overall. Each group's slope, intercept, and spread become visible at a glance.
scatter.plot(
category_sku_panel,
x_col="price",
value_col="units_sold",
group_col="category",
eyebrow="Price elasticity",
title="Electronics is most sensitive to price; Home holds up best",
subtitle="Monthly units sold versus list price ($) by product category",
x_label="Price ($)",
y_label="Units sold",
legend_title="Category",
move_legend_outside=True,
)
plt.show()
Multiple Value Columns¶
If the data is already in wide format with several metrics per x-value, pass a list to value_col to overlay them on a shared axis. The two are mutually exclusive with group_col: passing both raises ValueError. As with line charts, the shared y-axis can squash smaller series, which is often the point of the chart (showing how disparate the magnitudes are).
ax = scatter.plot(
store_monthly,
x_col="month",
value_col=["revenue", "profit"],
eyebrow="Store performance",
title="Revenue and profit move together but at very different scales",
subtitle="Monthly revenue and profit ($), first half of the year",
y_label="Amount ($)",
move_legend_outside=True,
)
set_axis_shorthand(ax.yaxis)
plt.show()
Bubble Chart with size_col¶
Pass size_col to encode a third numeric variable as point area, turning the scatter into a bubble chart. Use size_scale to multiply the raw values up or down until the bubbles are visually balanced (large enough to compare, small enough to avoid overlap).
scatter.plot(
store_economics,
x_col="avg_order_value",
value_col="avg_daily_visits",
size_col="avg_daily_revenue",
size_scale=0.15,
eyebrow="Store economics",
title="High-AOV stores convert fewer visits but earn the most revenue per day",
subtitle="Average daily visits versus average order value (\\$), bubble area = average daily revenue (\\$)",
x_label="Average order value ($)",
y_label="Average daily visits",
)
plt.show()
Point Labels with label_col¶
Pass label_col to write each point's name next to it, using textalloc to nudge labels apart so they don't overlap. Use this for small datasets (under ~20 points) where the reader needs to identify each observation, like a product comparison or a regional benchmark. Note: label_col is mutually exclusive with a list-typed value_col — labels only make sense when there's one value per point.
scatter.plot(
product_satisfaction,
x_col="price",
value_col="satisfaction",
label_col="product_name",
eyebrow="Price vs satisfaction",
title="MacBook Air leads satisfaction at the highest price point",
subtitle="Customer satisfaction (1 to 5) versus price ($), product-level",
x_label="Price ($)",
y_label="Satisfaction",
)
plt.show()