Index Plot Gallery¶
Index plots are used for comparing different categories or segments against a baseline or average value, typically set at 100. They're ideal for retail analytics to identify which categories over- or underperform relative to the average.
Index plots excel at:
- Performance Comparison: Compare sales performance across product categories or regions against a baseline
- Segment Analysis: Analyze customer segments against overall average behavior
- Benchmarking: Evaluate stores or regions relative to company-wide metrics
- Opportunity Identification: Highlight high-potential areas for growth or investment
import matplotlib.pyplot as plt
import pandas as pd
from openretailscience.plots import index
Basic Index Plot¶
The simplest call: pick a baseline category (value_to_index="Electronics") and the plot computes each group's share of that category versus its share of overall volume. Bars to the right of 100 over-index the baseline category; bars to the left under-index it.
Setting sort_by="value" (rather than the alphabetical default) puts the strongest over-indexer at the top so the eye lands on the headline first.
index.plot(
regional_sales,
value_col="sales",
group_col="region",
index_col="product_category",
value_to_index="Electronics",
sort_by="value",
sort_order="ascending",
eyebrow="Regional performance",
title="North leads Electronics share by 90 index points over East",
subtitle="Sales index by region, baseline of 100 set by Electronics",
x_label="Index",
)
plt.show()
Multiple Series Index Plot¶
Pass series_col to break each group into a stacked set of bars — one per series value (e.g. one bar per quarter, per store type). Useful for tracking how each group's index moves across time periods or segments.
Note: sort_by="value" is unsupported here because there's no single value to sort on (each group has multiple), so groups stay in the data's natural order.
index.plot(
quarterly_revenue,
value_col="revenue",
group_col="store_type",
index_col="product_category",
value_to_index="Electronics",
series_col="quarter",
eyebrow="Channel mix",
title="Online captures more Electronics share each quarter while Mall slips",
subtitle="Performance index by store type and quarter, baseline of 100 set by Electronics",
x_label="Index",
legend_title="Quarter",
move_legend_outside=True,
)
plt.show()
Different Aggregation Function¶
Use agg_func="mean" to index on the average value per group rather than the sum. Pick this when row counts differ between groups and you care about the per-row magnitude — e.g. average order value across customer segments — rather than total volume.
Other supported functions: "max", "min", "nunique". Default is "sum".
index.plot(
segment_aov,
value_col="avg_order_value",
group_col="customer_segment",
index_col="product_category",
value_to_index="Electronics",
agg_func="mean",
sort_by="value",
sort_order="ascending",
eyebrow="Customer segments",
title="VIP and Premium spend disproportionately on Electronics; Budget barely touches it",
subtitle="Mean order value index by customer segment, baseline of 100 set by Electronics",
x_label="Index",
)
plt.show()
Sort Order: Worst-First¶
Pair sort_by="value" with sort_order="descending" to flip the default leaderboard. The lowest indexes land at the top — useful when the chart's job is triage rather than celebration, and you want the eye to land on the worst case first.
# sort_order="descending" puts the lowest indexes at the top — useful for triaging underperformers.
index.plot(
region_performance,
value_col="performance_score",
group_col="region",
index_col="product_category",
value_to_index="Electronics",
sort_by="value",
sort_order="descending",
eyebrow="Regional ranking",
title="South and Mountain need the most attention on Electronics performance",
subtitle="Performance index by region, ranked worst-first, baseline of 100 set by Electronics",
x_label="Index",
)
plt.show()
Custom Highlight Range¶
The shaded band marks an "acceptable" performance range. The default is (80, 120); tighten it (e.g. (90, 110)) when small deviations matter, or widen it when you only care about extremes. Bars whose tips fall outside the band stand out visually as the ones that need attention.
index.plot(
dept_efficiency,
value_col="efficiency_metric",
group_col="department",
index_col="product_category",
value_to_index="Electronics",
sort_by="value",
sort_order="ascending",
highlight_range=(90, 110),
eyebrow="Efficiency",
title="Three of six departments fall outside the tighter 90 to 110 efficiency band",
subtitle="Efficiency index by department, baseline of 100 set by Electronics",
x_label="Index",
)
plt.show()
Color by Threshold¶
With color_by_threshold=True, bars get colored by where they fall relative to highlight_range: ≥ the upper threshold uses plot.color.positive (green by default), ≤ the lower threshold uses plot.color.negative (red), and anything in between uses plot.color.neutral (gray). Skips the need for the reader to compare bar tips to the band — color does the work.
index.plot(
region_performance,
value_col="performance_score",
group_col="region",
index_col="product_category",
value_to_index="Electronics",
sort_by="value",
sort_order="ascending",
color_by_threshold=True,
eyebrow="Regional ranking",
title="Color flags which regions clear the 80 to 120 performance band",
subtitle="Performance index by region, color-coded by threshold, baseline of 100 set by Electronics",
x_label="Index",
)
plt.show()
Top N Filtering¶
Pass top_n=N to keep only the N highest-indexing groups. Useful when there are too many groups to fit comfortably and you only care about the leaders. Combine with sort_by="value" and sort_order="ascending" to put the strongest at the top of the chart.
index.plot(
subcategory_volume,
value_col="sales_volume",
group_col="product_subcategory",
index_col="product_category",
value_to_index="Electronics",
top_n=8,
sort_by="value",
sort_order="ascending",
eyebrow="Subcategory leaders",
title="Top 8 subcategories all over-index Electronics",
subtitle="Sales volume index, top 8 subcategories shown, baseline of 100 set by Electronics",
x_label="Index",
source_text="Source: Product Sales Database - Q4 2024",
)
plt.show()
Bottom N Filtering¶
Mirror of top_n — bottom_n=N keeps only the N lowest-indexing groups. Pair with sort_order="descending" to surface the worst case at the top of the chart for triage. You can use top_n and bottom_n together to show the extremes and drop everything in the middle.
index.plot(
london_stores,
value_col="sales_performance",
group_col="store_location",
index_col="product_category",
value_to_index="Electronics",
bottom_n=4,
sort_by="value",
sort_order="descending",
eyebrow="Underperformers",
title="Four outer-London locations under-index Electronics by 12 to 50 points",
subtitle="Sales performance index, bottom 4 locations shown, baseline of 100 set by Electronics",
x_label="Index",
)
plt.show()
Filter by Threshold¶
filter_above=N keeps only groups whose raw index exceeds N (e.g. filter_above=115 keeps groups indexing above 115); filter_below=N is the mirror, keeping only groups whose raw index is below N. Both thresholds are expressed in the same units as the chart axis, so 100 is the baseline and 120 reads as "20 points above". Use them to surface the standouts on either end and hide everything close to baseline.
The two parameters AND together when combined, so you can't get both extremes in a single chart — render two charts side by side instead, one per direction. (Pair filter_below with sort_order="descending" to put the worst case at the top.)
index.plot(
sales_channels,
value_col="performance_metric",
group_col="sales_channel",
index_col="product_category",
value_to_index="Electronics",
filter_above=115,
sort_by="value",
sort_order="ascending",
eyebrow="Channel ranking",
title="All four digital channels over-index Electronics by 15 or more points",
subtitle="Performance index, channels indexing above 115, baseline of 100 set by Electronics",
x_label="Index",
)
plt.show()
index.plot(
sales_channels,
value_col="performance_metric",
group_col="sales_channel",
index_col="product_category",
value_to_index="Electronics",
filter_below=80,
sort_by="value",
sort_order="descending",
eyebrow="Channel ranking",
title="All four traditional channels under-index Electronics by 20 or more points",
subtitle="Performance index, channels indexing below 80, baseline of 100 set by Electronics",
x_label="Index",
)
plt.show()
Include Only Specific Groups¶
Pass include_only_groups=[...] to restrict the chart to a hand-picked subset of group values. Useful when the underlying data covers many regions/channels/segments but the analysis is about a specific competitive set, geography, or peer group. Anything not in the list is silently dropped before plotting.
index.plot(
market_regions,
value_col="market_share",
group_col="market_region",
index_col="product_category",
value_to_index="Electronics",
include_only_groups=["North America", "Europe", "Asia Pacific", "Latin America"],
sort_by="value",
sort_order="ascending",
eyebrow="Market regions",
title="North America leads Electronics share; Latin America trails the other three majors",
subtitle="Market share index by region, baseline of 100 set by Electronics",
x_label="Index",
)
plt.show()
Exclude Specific Groups¶
Pass exclude_groups=[...] to drop specific group values from the chart. Useful for keeping out test/internal segments, deprecated regions, or any outliers that distort the visual. Everything else is kept.
index.plot(
customer_types,
value_col="revenue_contribution",
group_col="customer_type",
index_col="product_category",
value_to_index="Electronics",
exclude_groups=["Test", "Internal"],
sort_by="value",
sort_order="ascending",
eyebrow="Customer types",
title="VIP and Corporate customers skew toward Electronics; Wholesale skews away",
subtitle="Revenue contribution index by customer type, test data excluded, baseline of 100 set by Electronics",
x_label="Index",
)
plt.show()