Scatter Plot
This module provides functionality for creating scatter plots and bubble charts from pandas DataFrames.
It is designed to visualize relationships between variables, highlight distributions, and compare different categories using scatter points with optional variable sizing for bubble chart functionality.
Core Features
- Flexible X-Axis Handling: Uses an index or a specified x-axis column (
x_col) for plotting. - Multiple Scatter Groups: Supports plotting multiple columns (
value_col) or groups (group_col). - Bubble Chart Support: Variable point sizes via
size_colandsize_scaleparameters. - Point Labels: Text labels with automatic positioning to avoid overlaps.
- Dynamic Color Mapping: Automatically selects a colormap based on the number of groups.
Use Cases
- Category-Based Scatter Plots: Compare different categories using scatter points.
- Bubble Charts: Visualize three dimensions of data with x, y positions and point sizes.
- Labeled Scatter Plots: Identify specific data points with text labels (e.g., product names, store IDs).
Limitations and Warnings
- Pre-Aggregated Data Required: Data should be pre-aggregated before being passed to the function.
- Label Limitations: Point labels are not supported when
value_colis a list (raises ValueError). - Size Column Requirements:
size_colmust contain numeric, non-negative values.
plot(df, value_col, x_label=None, y_label=None, title=None, x_col=None, group_col=None, size_col=None, size_scale=1.0, ax=None, source_text=None, legend_title=None, move_legend_outside=False, label_col=None, label_kwargs=None, **kwargs)
Plots a scatter chart for the given value_col over x_col or index, with optional grouping by group_col.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame or Series
|
The dataframe or series to plot. |
required |
value_col |
str or list of str
|
The column(s) to plot. |
required |
x_label |
str
|
The x-axis label. |
None
|
y_label |
str
|
The y-axis label. |
None
|
title |
str
|
The title of the plot. |
None
|
x_col |
str
|
The column to be used as the x-axis. If None, the index is used. |
None
|
group_col |
str
|
The column used to define different scatter groups. |
None
|
size_col |
str
|
The column name containing values to determine point sizes. If None, all points have uniform size. Creates bubble charts when specified. When used with multiple value_col columns, the same size values apply to all series. |
None
|
size_scale |
float
|
Scaling factor for point sizes. Default: 1.0. Actual size = size_col_value * size_scale. |
1.0
|
ax |
Axes
|
Matplotlib axes object to plot on. |
None
|
source_text |
str
|
The source text to add to the plot. |
None
|
legend_title |
str
|
The title of the legend. |
None
|
move_legend_outside |
bool
|
Move the legend outside the plot. |
False
|
label_col |
str
|
Column name containing text labels for each point. Not supported when value_col is a list. Defaults to None. |
None
|
label_kwargs |
dict
|
Keyword arguments passed to textalloc.allocate(). Common options: textsize, nbr_candidates, min_distance, max_distance, draw_lines. By default, draw_lines=False to avoid lines connecting labels to points. Defaults to None. |
None
|
**kwargs |
Any
|
Additional keyword arguments for matplotlib scatter function. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
SubplotBase |
SubplotBase
|
The matplotlib axes object. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If |
KeyError
|
If |
KeyError
|
If |
ValueError
|
If |
ValueError
|
If |
Source code in openretailscience/plots/scatter.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 | |