Distribution Metrics

ACV (All Commodity Volume) metric.

ACV measures total dollar sales across all products in a set of stores, expressed in millions ($MM).

`Acv`

Calculates ACV (All Commodity Volume) for a set of stores.

ACV represents total dollar sales across all products, expressed in millions ($MM). NaN values in the spend column are excluded from the sum.

Results are accessible via the table attribute (ibis Table) or the df property (materialized pandas DataFrame).

Parameters:

Name	Type	Description	Default
`df`	`DataFrame \| Table`	Transaction data containing at least a unit_spend column.	required
`group_col`	`str \| list[str] \| None`	Optional column(s) to group the ACV calculation by (e.g., store_id). Defaults to None for total ACV.	`None`
`acv_scale_factor`	`float`	Factor to scale the ACV result (default is 1,000,000 for $MM).	`1000000`

Raises:

Type	Description
`TypeError`	If df is not a pandas DataFrame or an Ibis Table.
`ValueError`	If required columns are missing from the data or if acv_scale_factor is not positive.

Source code in openretailscience/metrics/distribution/acv.py

class Acv:
    """Calculates ACV (All Commodity Volume) for a set of stores.

    ACV represents total dollar sales across all products, expressed in millions ($MM).
    NaN values in the spend column are excluded from the sum.

    Results are accessible via the `table` attribute (ibis Table) or the `df` property
    (materialized pandas DataFrame).

    Args:
        df (pd.DataFrame | ibis.Table): Transaction data containing at least a unit_spend column.
        group_col (str | list[str] | None, optional): Optional column(s) to group the ACV calculation by
            (e.g., store_id). Defaults to None for total ACV.
        acv_scale_factor (float, optional): Factor to scale the ACV result (default is 1,000,000 for $MM).

    Raises:
        TypeError: If df is not a pandas DataFrame or an Ibis Table.
        ValueError: If required columns are missing from the data or if acv_scale_factor is not positive.
    """

    def __init__(
        self,
        df: pd.DataFrame | ibis.Table,
        *,
        group_col: str | list[str] | None = None,
        acv_scale_factor: float = 1_000_000,
    ) -> None:
        """Initializes the ACV calculation."""
        self._df: pd.DataFrame | None = None
        self.table: ibis.Table

        df = ensure_ibis_table(df)

        if acv_scale_factor <= 0:
            raise ValueError("acv_scale_factor must be positive.")

        unit_spend_col = get_option("column.unit_spend")

        if group_col is not None:
            group_col = ensure_columns(df, group_col, "group_col")

        # group_col is already validated above; only the function's hard-coded requirement remains.
        ensure_data_has_columns(df, [unit_spend_col])

        if group_col is not None:
            df = df.group_by(group_col)

        self.table = df.aggregate(acv=_[unit_spend_col].sum() / acv_scale_factor)

    @property
    def df(self) -> pd.DataFrame:
        """Returns the materialized pandas DataFrame of ACV results.

        Returns:
            pd.DataFrame: DataFrame with ACV values. Cached after first access.
        """
        if self._df is None:
            self._df = self.table.execute()
        return self._df

`df: pd.DataFrame` `property`

Returns the materialized pandas DataFrame of ACV results.

Returns:

Type	Description
`DataFrame`	pd.DataFrame: DataFrame with ACV values. Cached after first access.

`init(df, *, group_col=None, acv_scale_factor=1000000)`

Initializes the ACV calculation.

Source code in openretailscience/metrics/distribution/acv.py

def __init__(
    self,
    df: pd.DataFrame | ibis.Table,
    *,
    group_col: str | list[str] | None = None,
    acv_scale_factor: float = 1_000_000,
) -> None:
    """Initializes the ACV calculation."""
    self._df: pd.DataFrame | None = None
    self.table: ibis.Table

    df = ensure_ibis_table(df)

    if acv_scale_factor <= 0:
        raise ValueError("acv_scale_factor must be positive.")

    unit_spend_col = get_option("column.unit_spend")

    if group_col is not None:
        group_col = ensure_columns(df, group_col, "group_col")

    # group_col is already validated above; only the function's hard-coded requirement remains.
    ensure_data_has_columns(df, [unit_spend_col])

    if group_col is not None:
        df = df.group_by(group_col)

    self.table = df.aggregate(acv=_[unit_spend_col].sum() / acv_scale_factor)

% of Stores (Numeric Distribution) metric.

% of Stores measures the share of total stores in the dataset that sell a given product. Every store counts equally regardless of its sales volume.

`PctOfStores`

Calculates the percentage of stores selling each product.

This is the simplest, unweighted distribution metric (numeric distribution). It answers the question: "What fraction of stores carry this product?"

Results are accessible via the table attribute (ibis Table) or the df property (materialized pandas DataFrame).

Parameters:

Name	Type	Description	Default
`df`	`DataFrame \| Table`	Transaction-level data containing at least store_id and product_id columns.	required
`product_col`	`str \| list[str] \| None`	Column(s) defining product granularity. Defaults to `get_option("column.product_id")`.	`None`
`group_col`	`str \| list[str] \| None`	Additional grouping dimensions (e.g., `"category_0_name"`). Defaults to None.	`None`
`within_group`	`bool`	Controls the denominator when `group_col` is specified. When `False` (default), the percentage is relative to all stores in the dataset. When `True`, the percentage is relative to stores within each group independently. Has no effect when `group_col` is None. Defaults to False.	`False`

Raises:

Type	Description
`TypeError`	If df is not a pandas DataFrame or an Ibis Table.
`ValueError`	If required columns are missing from the data, or if product_col appears in group_col.

Source code in openretailscience/metrics/distribution/pct_of_stores.py

class PctOfStores:
    """Calculates the percentage of stores selling each product.

    This is the simplest, unweighted distribution metric (numeric distribution).
    It answers the question: "What fraction of stores carry this product?"

    Results are accessible via the ``table`` attribute (ibis Table) or the ``df`` property
    (materialized pandas DataFrame).

    Args:
        df (pd.DataFrame | ibis.Table): Transaction-level data containing at least
            store_id and product_id columns.
        product_col (str | list[str] | None, optional): Column(s) defining product granularity.
            Defaults to ``get_option("column.product_id")``.
        group_col (str | list[str] | None, optional): Additional grouping dimensions
            (e.g., ``"category_0_name"``). Defaults to None.
        within_group (bool, optional): Controls the denominator when ``group_col`` is specified.
            When ``False`` (default), the percentage is relative to all stores in the dataset.
            When ``True``, the percentage is relative to stores within each group independently.
            Has no effect when ``group_col`` is None. Defaults to False.

    Raises:
        TypeError: If df is not a pandas DataFrame or an Ibis Table.
        ValueError: If required columns are missing from the data, or if product_col
            appears in group_col.
    """

    def __init__(
        self,
        df: pd.DataFrame | ibis.Table,
        *,
        product_col: str | list[str] | None = None,
        group_col: str | list[str] | None = None,
        within_group: bool = False,
    ) -> None:
        """Initializes the % of Stores calculation."""
        self._df: pd.DataFrame | None = None
        self.table: ibis.Table

        df = ensure_ibis_table(df)

        store_id_col = get_option("column.store_id")

        if product_col is None:
            product_col = [get_option("column.product_id")]
        else:
            product_col = ensure_columns(df, product_col, "product_col")

        if group_col is not None:
            group_col = ensure_columns(df, group_col, "group_col")

        group_cols = list(product_col)
        if group_col is not None:
            overlap = set(product_col) & set(group_col)
            if len(overlap) > 0:
                msg = f"product_col {overlap} must not also appear in group_col"
                raise ValueError(msg)
            group_cols.extend(group_col)
        # store_id_col + any unvalidated product_col defaults still need to exist in df;
        # already-validated user inputs (group_col, an explicitly-passed product_col) are
        # excluded to avoid redundant set-difference work.
        ensure_data_has_columns(df, [store_id_col, *product_col])

        store_product = df.select([store_id_col, *group_cols]).distinct()

        agg_stores_col = get_option("column.agg.store_id")
        per_group = store_product.group_by(group_cols).aggregate(
            **{agg_stores_col: _[store_id_col].count()},
        )

        use_within_group = within_group and group_col is not None
        if use_within_group:
            total_stores = store_product.group_by(group_col).aggregate(
                **{_TEMP_TOTAL_STORES: _[store_id_col].nunique()},
            )
            per_group = per_group.inner_join(total_stores, group_col)
            denominator = _[_TEMP_TOTAL_STORES]
        else:
            denominator = store_product[store_id_col].nunique()

        pct_stores_col = ColumnHelper.join_options("column.agg.store_id", "column.suffix.percent")
        final_cols = [*group_cols, agg_stores_col, pct_stores_col]
        self.table = per_group.mutate(
            **{pct_stores_col: ratio_metric(_[agg_stores_col], denominator)},
        ).select(final_cols)

    @property
    def df(self) -> pd.DataFrame:
        """Returns the materialized pandas DataFrame of % of Stores results.

        Returns:
            pd.DataFrame: DataFrame with % of stores values. Cached after first access.
        """
        if self._df is None:
            self._df = self.table.execute()
        return self._df

`df: pd.DataFrame` `property`

Returns the materialized pandas DataFrame of % of Stores results.

Returns:

Type	Description
`DataFrame`	pd.DataFrame: DataFrame with % of stores values. Cached after first access.

`init(df, *, product_col=None, group_col=None, within_group=False)`

Initializes the % of Stores calculation.

Source code in openretailscience/metrics/distribution/pct_of_stores.py

def __init__(
    self,
    df: pd.DataFrame | ibis.Table,
    *,
    product_col: str | list[str] | None = None,
    group_col: str | list[str] | None = None,
    within_group: bool = False,
) -> None:
    """Initializes the % of Stores calculation."""
    self._df: pd.DataFrame | None = None
    self.table: ibis.Table

    df = ensure_ibis_table(df)

    store_id_col = get_option("column.store_id")

    if product_col is None:
        product_col = [get_option("column.product_id")]
    else:
        product_col = ensure_columns(df, product_col, "product_col")

    if group_col is not None:
        group_col = ensure_columns(df, group_col, "group_col")

    group_cols = list(product_col)
    if group_col is not None:
        overlap = set(product_col) & set(group_col)
        if len(overlap) > 0:
            msg = f"product_col {overlap} must not also appear in group_col"
            raise ValueError(msg)
        group_cols.extend(group_col)
    # store_id_col + any unvalidated product_col defaults still need to exist in df;
    # already-validated user inputs (group_col, an explicitly-passed product_col) are
    # excluded to avoid redundant set-difference work.
    ensure_data_has_columns(df, [store_id_col, *product_col])

    store_product = df.select([store_id_col, *group_cols]).distinct()

    agg_stores_col = get_option("column.agg.store_id")
    per_group = store_product.group_by(group_cols).aggregate(
        **{agg_stores_col: _[store_id_col].count()},
    )

    use_within_group = within_group and group_col is not None
    if use_within_group:
        total_stores = store_product.group_by(group_col).aggregate(
            **{_TEMP_TOTAL_STORES: _[store_id_col].nunique()},
        )
        per_group = per_group.inner_join(total_stores, group_col)
        denominator = _[_TEMP_TOTAL_STORES]
    else:
        denominator = store_product[store_id_col].nunique()

    pct_stores_col = ColumnHelper.join_options("column.agg.store_id", "column.suffix.percent")
    final_cols = [*group_cols, agg_stores_col, pct_stores_col]
    self.table = per_group.mutate(
        **{pct_stores_col: ratio_metric(_[agg_stores_col], denominator)},
    ).select(final_cols)

Distribution Metrics

Acv

df: pd.DataFrame property

__init__(df, *, group_col=None, acv_scale_factor=1000000)

PctOfStores

df: pd.DataFrame property

__init__(df, *, product_col=None, group_col=None, within_group=False)

`Acv`

`df: pd.DataFrame` `property`

`init(df, *, group_col=None, acv_scale_factor=1000000)`

`PctOfStores`

`df: pd.DataFrame` `property`

`init(df, *, product_col=None, group_col=None, within_group=False)`