Skip to content

Distribution Metrics

ACV (All Commodity Volume) metric.

ACV measures total dollar sales across all products in a set of stores, expressed in millions ($MM).

Acv

Calculates ACV (All Commodity Volume) for a set of stores.

ACV represents total dollar sales across all products, expressed in millions ($MM). NaN values in the spend column are excluded from the sum.

Results are accessible via the table attribute (ibis Table) or the df property (materialized pandas DataFrame).

Parameters:

Name Type Description Default
df DataFrame | Table

Transaction data containing at least a unit_spend column.

required
group_col str | list[str] | None

Optional column(s) to group the ACV calculation by (e.g., store_id). Defaults to None for total ACV.

None
acv_scale_factor float

Factor to scale the ACV result (default is 1,000,000 for $MM).

1000000

Raises:

Type Description
TypeError

If df is not a pandas DataFrame or an Ibis Table.

ValueError

If required columns are missing from the data or if acv_scale_factor is not positive.

Source code in openretailscience/metrics/distribution/acv.py
class Acv:
    """Calculates ACV (All Commodity Volume) for a set of stores.

    ACV represents total dollar sales across all products, expressed in millions ($MM).
    NaN values in the spend column are excluded from the sum.

    Results are accessible via the `table` attribute (ibis Table) or the `df` property
    (materialized pandas DataFrame).

    Args:
        df (pd.DataFrame | ibis.Table): Transaction data containing at least a unit_spend column.
        group_col (str | list[str] | None, optional): Optional column(s) to group the ACV calculation by
            (e.g., store_id). Defaults to None for total ACV.
        acv_scale_factor (float, optional): Factor to scale the ACV result (default is 1,000,000 for $MM).

    Raises:
        TypeError: If df is not a pandas DataFrame or an Ibis Table.
        ValueError: If required columns are missing from the data or if acv_scale_factor is not positive.
    """

    def __init__(
        self,
        df: pd.DataFrame | ibis.Table,
        *,
        group_col: str | list[str] | None = None,
        acv_scale_factor: float = 1_000_000,
    ) -> None:
        """Initializes the ACV calculation."""
        self._df: pd.DataFrame | None = None
        self.table: ibis.Table

        df = ensure_ibis_table(df)

        if acv_scale_factor <= 0:
            raise ValueError("acv_scale_factor must be positive.")

        unit_spend_col = get_option("column.unit_spend")

        if isinstance(group_col, str):
            group_col = [group_col]

        required_cols = [unit_spend_col]
        if group_col is not None:
            required_cols.extend(group_col)
        validate_columns(df, required_cols)

        if group_col is not None:
            df = df.group_by(group_col)

        self.table = df.aggregate(acv=_[unit_spend_col].sum() / acv_scale_factor)

    @property
    def df(self) -> pd.DataFrame:
        """Returns the materialized pandas DataFrame of ACV results.

        Returns:
            pd.DataFrame: DataFrame with ACV values. Cached after first access.
        """
        if self._df is None:
            self._df = self.table.execute()
        return self._df

df: pd.DataFrame property

Returns the materialized pandas DataFrame of ACV results.

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame with ACV values. Cached after first access.

__init__(df, *, group_col=None, acv_scale_factor=1000000)

Initializes the ACV calculation.

Source code in openretailscience/metrics/distribution/acv.py
def __init__(
    self,
    df: pd.DataFrame | ibis.Table,
    *,
    group_col: str | list[str] | None = None,
    acv_scale_factor: float = 1_000_000,
) -> None:
    """Initializes the ACV calculation."""
    self._df: pd.DataFrame | None = None
    self.table: ibis.Table

    df = ensure_ibis_table(df)

    if acv_scale_factor <= 0:
        raise ValueError("acv_scale_factor must be positive.")

    unit_spend_col = get_option("column.unit_spend")

    if isinstance(group_col, str):
        group_col = [group_col]

    required_cols = [unit_spend_col]
    if group_col is not None:
        required_cols.extend(group_col)
    validate_columns(df, required_cols)

    if group_col is not None:
        df = df.group_by(group_col)

    self.table = df.aggregate(acv=_[unit_spend_col].sum() / acv_scale_factor)

% of Stores (Numeric Distribution) metric.

% of Stores measures the share of total stores in the dataset that sell a given product. Every store counts equally regardless of its sales volume.

PctOfStores

Calculates the percentage of stores selling each product.

This is the simplest, unweighted distribution metric (numeric distribution). It answers the question: "What fraction of stores carry this product?"

Results are accessible via the table attribute (ibis Table) or the df property (materialized pandas DataFrame).

Parameters:

Name Type Description Default
df DataFrame | Table

Transaction-level data containing at least store_id and product_id columns.

required
product_col str | list[str] | None

Column(s) defining product granularity. Defaults to get_option("column.product_id").

None
group_col str | list[str] | None

Additional grouping dimensions (e.g., "category_0_name"). Defaults to None.

None
within_group bool

Controls the denominator when group_col is specified. When False (default), the percentage is relative to all stores in the dataset. When True, the percentage is relative to stores within each group independently. Has no effect when group_col is None. Defaults to False.

False

Raises:

Type Description
TypeError

If df is not a pandas DataFrame or an Ibis Table.

ValueError

If required columns are missing from the data, or if product_col appears in group_col.

Source code in openretailscience/metrics/distribution/pct_of_stores.py
class PctOfStores:
    """Calculates the percentage of stores selling each product.

    This is the simplest, unweighted distribution metric (numeric distribution).
    It answers the question: "What fraction of stores carry this product?"

    Results are accessible via the ``table`` attribute (ibis Table) or the ``df`` property
    (materialized pandas DataFrame).

    Args:
        df (pd.DataFrame | ibis.Table): Transaction-level data containing at least
            store_id and product_id columns.
        product_col (str | list[str] | None, optional): Column(s) defining product granularity.
            Defaults to ``get_option("column.product_id")``.
        group_col (str | list[str] | None, optional): Additional grouping dimensions
            (e.g., ``"category_0_name"``). Defaults to None.
        within_group (bool, optional): Controls the denominator when ``group_col`` is specified.
            When ``False`` (default), the percentage is relative to all stores in the dataset.
            When ``True``, the percentage is relative to stores within each group independently.
            Has no effect when ``group_col`` is None. Defaults to False.

    Raises:
        TypeError: If df is not a pandas DataFrame or an Ibis Table.
        ValueError: If required columns are missing from the data, or if product_col
            appears in group_col.
    """

    def __init__(
        self,
        df: pd.DataFrame | ibis.Table,
        *,
        product_col: str | list[str] | None = None,
        group_col: str | list[str] | None = None,
        within_group: bool = False,
    ) -> None:
        """Initializes the % of Stores calculation."""
        self._df: pd.DataFrame | None = None
        self.table: ibis.Table

        df = ensure_ibis_table(df)

        store_id_col = get_option("column.store_id")

        if product_col is None:
            product_col = [get_option("column.product_id")]
        elif isinstance(product_col, str):
            product_col = [product_col]

        if isinstance(group_col, str):
            group_col = [group_col]

        required_cols = [store_id_col, *product_col]
        group_cols = list(product_col)
        if group_col is not None:
            overlap = set(product_col) & set(group_col)
            if len(overlap) > 0:
                msg = f"product_col {overlap} must not also appear in group_col"
                raise ValueError(msg)
            required_cols.extend(group_col)
            group_cols.extend(group_col)
        validate_columns(df, required_cols)

        store_product = df.select([store_id_col, *group_cols]).distinct()

        agg_stores_col = get_option("column.agg.store_id")
        per_group = store_product.group_by(group_cols).aggregate(
            **{agg_stores_col: _[store_id_col].count()},
        )

        use_within_group = within_group and group_col is not None
        if use_within_group:
            total_stores = store_product.group_by(group_col).aggregate(
                **{_TEMP_TOTAL_STORES: _[store_id_col].nunique()},
            )
            per_group = per_group.inner_join(total_stores, group_col)
            denominator = _[_TEMP_TOTAL_STORES]
        else:
            denominator = store_product[store_id_col].nunique()

        pct_stores_col = ColumnHelper.join_options("column.agg.store_id", "column.suffix.percent")
        final_cols = [*group_cols, agg_stores_col, pct_stores_col]
        self.table = per_group.mutate(
            **{pct_stores_col: ratio_metric(_[agg_stores_col], denominator)},
        ).select(final_cols)

    @property
    def df(self) -> pd.DataFrame:
        """Returns the materialized pandas DataFrame of % of Stores results.

        Returns:
            pd.DataFrame: DataFrame with % of stores values. Cached after first access.
        """
        if self._df is None:
            self._df = self.table.execute()
        return self._df

df: pd.DataFrame property

Returns the materialized pandas DataFrame of % of Stores results.

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame with % of stores values. Cached after first access.

__init__(df, *, product_col=None, group_col=None, within_group=False)

Initializes the % of Stores calculation.

Source code in openretailscience/metrics/distribution/pct_of_stores.py
def __init__(
    self,
    df: pd.DataFrame | ibis.Table,
    *,
    product_col: str | list[str] | None = None,
    group_col: str | list[str] | None = None,
    within_group: bool = False,
) -> None:
    """Initializes the % of Stores calculation."""
    self._df: pd.DataFrame | None = None
    self.table: ibis.Table

    df = ensure_ibis_table(df)

    store_id_col = get_option("column.store_id")

    if product_col is None:
        product_col = [get_option("column.product_id")]
    elif isinstance(product_col, str):
        product_col = [product_col]

    if isinstance(group_col, str):
        group_col = [group_col]

    required_cols = [store_id_col, *product_col]
    group_cols = list(product_col)
    if group_col is not None:
        overlap = set(product_col) & set(group_col)
        if len(overlap) > 0:
            msg = f"product_col {overlap} must not also appear in group_col"
            raise ValueError(msg)
        required_cols.extend(group_col)
        group_cols.extend(group_col)
    validate_columns(df, required_cols)

    store_product = df.select([store_id_col, *group_cols]).distinct()

    agg_stores_col = get_option("column.agg.store_id")
    per_group = store_product.group_by(group_cols).aggregate(
        **{agg_stores_col: _[store_id_col].count()},
    )

    use_within_group = within_group and group_col is not None
    if use_within_group:
        total_stores = store_product.group_by(group_col).aggregate(
            **{_TEMP_TOTAL_STORES: _[store_id_col].nunique()},
        )
        per_group = per_group.inner_join(total_stores, group_col)
        denominator = _[_TEMP_TOTAL_STORES]
    else:
        denominator = store_product[store_id_col].nunique()

    pct_stores_col = ColumnHelper.join_options("column.agg.store_id", "column.suffix.percent")
    final_cols = [*group_cols, agg_stores_col, pct_stores_col]
    self.table = per_group.mutate(
        **{pct_stores_col: ratio_metric(_[agg_stores_col], denominator)},
    ).select(final_cols)