Skip to content

Price Plot

Bubble chart visualizations for price distribution analysis across categories.

The bubble chart shows price distribution as vertical layers (price bands) with bubble sizes representing the percentage of products in each price range for different categories like retailers, countries, etc.

Core Features

  • Price Band Analysis: Automatically bins price data into ranges using pandas.cut()
  • Categorical Grouping: Groups data by categorical columns (retailers, countries, etc.)
  • Bubble Sizing: Bubble sizes represent percentage of products in each price band per group
  • Flexible Binning: Supports both integer (equal-width bins) and array (custom boundaries) inputs
  • Grid Layout: X-axis shows categories, Y-axis shows price bands

Use Cases

  • Retailer Price Comparison: Compare price distributions across different retailers
  • Regional Price Analysis: Analyze price positioning by country/region
  • Competitive Pricing: Identify pricing gaps and opportunities
  • Price Architecture Visualization: Visualize competitive pricing landscapes

Limitations

  • Pandas DataFrame Only: No Ibis table support
  • Pre-aggregated Data: Data should be at product level (one row per product)
  • Numeric Price Column: Requires numeric price/value column for binning

plot(df, value_col, group_col, bins, title=None, eyebrow=None, subtitle=None, x_label=None, y_label=None, legend_title=None, ax=None, source_text=None, move_legend_outside=False, **kwargs)

Creates a bubble chart visualization showing price distribution analysis across categories.

The chart displays price bands as vertical layers with bubble sizes representing the percentage of products in each price range for different groups (retailers, countries, etc.).

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame containing product-level data.

required
value_col str

Column containing the price/value data (e.g., "unit_price").

required
group_col str

Column containing the categorical grouping (e.g., "retailer").

required
bins int | list[float]

Either number of equal-width bins (int) or custom bin boundaries (list).

required
title str

The title of the plot. Defaults to None.

None
eyebrow str

Small uppercase label rendered above the title. Defaults to None.

None
subtitle str

Supporting copy rendered below the title. Defaults to None.

None
x_label str

The label for the x-axis. Defaults to None.

None
y_label str

The label for the y-axis. Defaults to None.

None
legend_title str

The title for the legend. Defaults to None.

None
ax Axes

The Matplotlib Axes object to plot on. Defaults to None.

None
source_text str

Text to be displayed as a source at the bottom of the plot. Defaults to None.

None
move_legend_outside bool

Whether to move the legend outside the plot area. Defaults to False.

False
**kwargs Any

Additional keyword arguments for the scatter plot function.

{}

Returns:

Name Type Description
SubplotBase SubplotBase

The Matplotlib Axes object with the generated bubble chart.

Raises:

Type Description
ValueError

If DataFrame is empty, columns don't exist, or bins parameter is invalid.

KeyError

If specified columns are not found in DataFrame.

TypeError

If bins parameter has invalid type.

Source code in openretailscience/plots/price.py
def plot(
    df: pd.DataFrame,
    value_col: str,
    group_col: str,
    bins: int | list[float],
    title: str | None = None,
    eyebrow: str | None = None,
    subtitle: str | None = None,
    x_label: str | None = None,
    y_label: str | None = None,
    legend_title: str | None = None,
    ax: Axes | None = None,
    source_text: str | None = None,
    move_legend_outside: bool = False,
    **kwargs: Any,  # noqa: ANN401
) -> SubplotBase:
    """Creates a bubble chart visualization showing price distribution analysis across categories.

    The chart displays price bands as vertical layers with bubble sizes representing the percentage
    of products in each price range for different groups (retailers, countries, etc.).

    Args:
        df (pd.DataFrame): Input DataFrame containing product-level data.
        value_col (str): Column containing the price/value data (e.g., "unit_price").
        group_col (str): Column containing the categorical grouping (e.g., "retailer").
        bins (int | list[float]): Either number of equal-width bins (int) or custom bin boundaries (list).
        title (str, optional): The title of the plot. Defaults to None.
        eyebrow (str, optional): Small uppercase label rendered above the title. Defaults to None.
        subtitle (str, optional): Supporting copy rendered below the title. Defaults to None.
        x_label (str, optional): The label for the x-axis. Defaults to None.
        y_label (str, optional): The label for the y-axis. Defaults to None.
        legend_title (str, optional): The title for the legend. Defaults to None.
        ax (Axes, optional): The Matplotlib Axes object to plot on. Defaults to None.
        source_text (str, optional): Text to be displayed as a source at the bottom of the plot. Defaults to None.
        move_legend_outside (bool, optional): Whether to move the legend outside the plot area. Defaults to False.
        **kwargs (Any): Additional keyword arguments for the scatter plot function.

    Returns:
        SubplotBase: The Matplotlib Axes object with the generated bubble chart.

    Raises:
        ValueError: If DataFrame is empty, columns don't exist, or bins parameter is invalid.
        KeyError: If specified columns are not found in DataFrame.
        TypeError: If bins parameter has invalid type.
    """
    # Validate inputs and get clean data
    df_clean, bins = _validate_inputs(df, value_col, group_col, bins)

    # Create price bins
    df_clean["price_bin"] = pd.cut(df_clean[value_col], bins=bins, include_lowest=True)

    # Calculate percentage distribution for each group
    group_totals = df_clean.groupby(group_col, observed=True).size()
    bin_counts = df_clean.groupby([group_col, "price_bin"], observed=True).size().unstack(fill_value=0)

    # Convert to proportions (0-1 range)
    proportions = bin_counts.div(group_totals, axis=0)

    ax = ax or plt.gca()

    # Get unique groups and bins
    groups = proportions.index.tolist()
    price_bins = proportions.columns.tolist()

    # Set up color mapping
    colors = get_plot_colors(len(groups))

    alpha = kwargs.pop("alpha", 0.7)
    s_scale = kwargs.pop("s", 2000)
    edge_color = kwargs.pop("edgecolor", "black")  # black stroke around bubbles
    line_width = kwargs.pop("linewidth", 1.5)  # Stroke width

    # Validate that we have some data
    if proportions.max().max() == 0 or pd.isna(proportions.max().max()):
        raise ValueError("All proportions are zero - no data falls within the specified bins")

    # Melt to get all (group, price_bin) combinations with their proportions
    melted = proportions.reset_index().melt(id_vars=group_col, var_name="price_bin", value_name="proportion")
    # Filter out zero proportions to avoid invisible bubbles
    melted = melted[melted["proportion"] > 0]

    if len(melted) > 0:  # Only plot if there are non-zero proportions
        x_positions = [groups.index(group) for group in melted[group_col]]
        y_positions = [price_bins.index(price_bin) for price_bin in melted["price_bin"]]
        # Calculate bubble sizes using absolute proportion values for cross-group comparison
        bubble_sizes = (melted["proportion"] * s_scale).to_numpy()
        bubble_colors = [colors[groups.index(group)] for group in melted[group_col]]

        ax.scatter(
            x_positions,
            y_positions,
            s=bubble_sizes,
            c=bubble_colors,
            alpha=alpha,
            edgecolor=edge_color,
            linewidth=line_width,
            **kwargs,
        )

    ax.set_xticks(range(len(groups)))
    ax.set_xticklabels(groups)
    ax.set_yticks(range(len(price_bins)))

    # Reserve half a unit on each side to give bubbles breathing room even at high s_scale.
    ax.set_xlim(-0.5, len(groups) - 0.5)
    ax.set_ylim(-0.5, len(price_bins) - 0.5)

    # pd.cut(..., include_lowest=True) extends the lowest bin's left edge by ~0.1% of the data range below the minimum
    # so the minimum value is included; when that epsilon lands within rounding distance of zero, naive formatting
    # produces "-0.0", which is meaningless to readers.
    formatted_labels = [f"{_fmt_bin_edge(bin_.left)} - {_fmt_bin_edge(bin_.right)}" for bin_ in price_bins]

    ax.set_yticklabels(formatted_labels)

    # The single ax.scatter() call above draws all bubbles in one collection, so
    # matplotlib's legend auto-discovery has no per-group handles to find. Seed
    # one invisible labeled marker per group so standard_graph_styles can build
    # the legend, and so tight_layout can reserve room for it when
    # move_legend_outside=True. The bubble's edge stroke is omitted on the
    # proxies — at legend marker size the stroke would dominate and wash out
    # the fill color.
    if len(groups) > 1:
        for i, group in enumerate(groups):
            ax.scatter([], [], c=[colors[i]], alpha=alpha, linewidths=0, label=group)

    return standard_graph_styles(
        ax=ax,
        title=title,
        eyebrow=eyebrow,
        subtitle=subtitle,
        x_label=x_label,
        y_label=y_label,
        legend_title=legend_title,
        move_legend_outside=move_legend_outside,
        show_legend=len(groups) > 1,
        source_text=source_text,
        grid_axis="y",
    )