Histogram Plot
This module provides flexible functionality for creating histograms from pandas DataFrames or Series.
It allows you to visualize distributions of one or more value columns and optionally group them by a categorical column. The module is designed to handle both DataFrames and Series, allowing you to create simple histograms or compare distributions across categories by splitting the data into multiple histograms.
Core Features
- Single or Multiple Histograms: Plot one or more value columns (
value_col) as histograms. For example, visualize the distribution of a single metric or compare multiple metrics simultaneously. - Grouped Histograms: Create separate histograms for each unique value in
group_col(e.g., product categories or regions), allowing for easy comparison of distributions across groups. - Outlier-Preserving Clipping: Use
clip_range=(lower, upper)to clamp out-of-range values to the boundary so they pile up at the edge bins instead of being dropped. PassNoneon either side for one-sided clipping (e.g.,clip_range=(0, None)to clamp negatives only). To drop out-of-range values instead, pass matplotlib's nativerange=(lower, upper)through**kwargs. - Comprehensive Customization: Customize plot titles, axis labels, and legends, with the option to move the legend outside the plot.
Use Cases
- Distribution Analysis: Visualize the distribution of key metrics like revenue, sales, or user activity using single or multiple histograms.
- Group Comparisons: Compare distributions across different groups, such as product categories, geographic regions, or customer segments. For instance, plot histograms to show how sales vary across different product categories.
- Outlier Visibility: Use
clip_rangeto keep extreme values visible at the edge bins rather than dropping them, so the shape of the central mass is readable without hiding how much sits beyond it.
Limitations and Handling of Data
- Pre-Aggregated Data Required: This module does not perform any data aggregation, so all data must be pre-aggregated before being passed in for plotting.
- Grouped Histograms: If
group_colis provided, the data will be pivoted so that each unique value ingroup_colbecomes a separate histogram. Otherwise, a single histogram is plotted. - Series Support: The module can also handle pandas Series, though
group_colcannot be provided when plotting a Series.
Additional Features
- Outlier-Preserving Clipping:
clip_range=(lower, upper)clamps values outside the bounds to the nearest boundary so the edge bins absorb the outlier mass. This differs from matplotlib's nativerange, which drops out-of-range values entirely. The two are mutually exclusive. - Legend Customization: For multiple histograms, you can add legends, including the option to move the legend outside the plot for clarity.
plot(df, value_col=None, group_col=None, title=None, eyebrow=None, subtitle=None, x_label=None, y_label=None, legend_title=None, ax=None, source_text=None, move_legend_outside=False, clip_range=None, use_hatch=False, **kwargs)
Plots a histogram of value_col, optionally split by group_col.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame | Series
|
The dataframe (or series) to plot. |
required |
value_col |
str or list of str
|
The column(s) to plot. Can be a list of columns for multiple histograms. |
None
|
group_col |
str
|
The column used to define different histograms. |
None
|
title |
str
|
The title of the plot. |
None
|
eyebrow |
str
|
Small uppercase label rendered above the title. Defaults to None. |
None
|
subtitle |
str
|
Supporting copy rendered below the title. Defaults to None. |
None
|
x_label |
str
|
The x-axis label. |
None
|
y_label |
str
|
The y-axis label. |
None
|
legend_title |
str
|
The title of the legend. |
None
|
ax |
Axes
|
Matplotlib axes object to plot on. |
None
|
source_text |
str
|
The source text to add to the plot. |
None
|
move_legend_outside |
bool
|
Move the legend outside the plot. |
False
|
clip_range |
tuple[float | None, float | None]
|
|
None
|
use_hatch |
bool
|
Whether to use hatching for the bars. |
False
|
**kwargs |
Any
|
Additional keyword arguments for Pandas' |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
SubplotBase |
SubplotBase
|
The matplotlib axes object. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If both |
Source code in openretailscience/plots/histogram.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | |