btyd.models#

class btyd.models.BaseModel#

Abstract class defining all base model methods as well as methods to be overridden when creating a model subclass.

fit(rfm_df: DataFrame, tune: int = 1200, draws: int = 1200) SELF#

Fit a custom pymc model with parameter prior definitions to observed RFM data.

Parameters:
  • rfm_df (pandas.DataFrame) – Pandas dataframe containing customer ids, frequency, recency, T and monetary value columns.

  • tune (int) – Number of starting ‘burn-in’ samples for posterior parameter distribution convergence. These are discarded after the model is fit.

  • draws (int) – Number of samples from posterior parameter distrutions after tune period. These are retained for model usage.

Returns:

Fitted model with _idata attribute for model evaluation and predictions.

Return type:

self

abstract generate_rfm_data() None#

Generate synthetic RFM data from parameters of fitted model. This is useful for posterior predictive checks and hyperparameter optimization.

load_json(filename: str) SELF#

Load InferenceData from an external file.

Parameters:

filename (str) – Path and/or filename of InferenceData.

Returns:

Model object containing _idata attribute for model evaluation and predictions.

Return type:

self

save_json(filename: str) None#

Dump InferenceData from fitted model into a JSON file.

Parameters:

filename (str) – Path and/or filename where model data will be saved.

class btyd.models.PredictMixin(*args, **kwds)#

Abstract class defining predictive methods for all models except GammaGamma. In research literature these are commonly referred to as quantities of interest. These are internal methods and not intended to be called directly. Docstrings for each method are provided in respective model subclass.

predict(method: str, rfm_df: Optional[DataFrame] = None, t: Optional[int] = None, n: Optional[int] = None, sample_posterior: bool = False, posterior_draws: int = 100, join_df=False) ndarray#

Base method for running model predictions.

Parameters:
  • method (str) – Predictive quantity of interest; accepts ‘cond_prob_alive’, ‘cond_n_prchs_to_time’,’n_prchs_to_time’, or ‘prob_n_prchs_to_time’.

  • rfm_df (pandas.DataFrame) – Dataframe containing recency, frequency, monetary value, and time period columns.

  • t (int) – Number of time periods for predictions.

  • n (int) – Number of transactions predicted.

  • sample_posterior (bool) – Flag for sampling from parameter posteriors. Set to ‘True’ to return predictive probability distributions instead of point estimates.

  • posterior_draws (int) – Number of draws from parameter posteriors.

  • join_df (bool) – NOT SUPPORTED IN 0.1beta2. Flag to add columns to rfm_df containing predictive outputs.

Returns:

predictions – Numpy arrays containing predictive quantities of interest.

Return type:

np.ndarray

btyd.models.beta_geo_model#

class btyd.models.beta_geo_model.BetaGeoModel(hyperparams: Dict[float] = None)#

Also known as the BG/NBD model. Based on [1], this model has the following assumptions: 1) Each individual, i, has a hidden lambda_i and p_i parameter 2) These come from a population wide Gamma and a Beta distribution respectively. 3) Individuals purchases follow a Poisson process with rate \(\lambda_i*t\) . 4) After each purchase, an individual has a p_i probability of dieing (never buying again).

Parameters:

hyperparams (dict) – Dictionary containing hyperparameters for model prior parameter distributions.

_hyperparams#

Hyperparameters of prior parameter distributions for model fitting.

Type:

dict

_param_list#

List of estimated model parameters.

Type:

list

_model#

Hierarchical Bayesian model to estimate model parameters.

Type:

pymc.Model

_idata#

InferenceData object of fitted or loaded model. Used for predictions as well as evaluation plots, and model metrics via the ArViZ library.

Type:

ArViZ.InferenceData

References

generate_rfm_data(size: int = 1000) DataFrame#

Generate synthetic RFM data from fitted model parameters. Useful for posterior predictive checks of model performance.

Parameters:

size (int) – Rows of synthetic RFM data to generate. Default is 1000.

Returns:

self.synthetic_df – dataframe containing [“frequency”, “recency”, “T”, “lambda”, “p”, “alive”, “customer_id”] columns.

Return type:

pd.DataFrame

btyd.models.mod_beta_geo_model#

class btyd.models.mod_beta_geo_model.ModBetaGeoModel(hyperparams: Dict[float] = None)#

Also known as the MBG/NBD model.

Based on [5], [6], this model has the following assumptions: 1) Each individual, i, has a hidden lambda_i and p_i parameter 2) These come from a population wide Gamma and a Beta distribution

respectively.

  1. Individuals purchases follow a Poisson process with rate \(\lambda_i*t\) .

  2. At the beginning of their lifetime and after each purchase, an individual has a p_i probability of dieing (never buying again).

Parameters:

hyperparams (dict) – Dictionary containing hyperparameters for model prior parameter distributions.

_hyperparams#

Hyperparameters of prior parameter distributions for model fitting.

Type:

dict

_param_list#

List of estimated model parameters.

Type:

list

_model#

Hierarchical Bayesian model to estimate model parameters.

Type:

pymc.Model

_idata#

InferenceData object of fitted or loaded model. Used for predictions as well as evaluation plots, and model metrics via the ArViZ library.

Type:

ArViZ.InferenceData

References

generate_rfm_data(size: int = 1000) DataFrame#

Generate synthetic RFM data from fitted model parameters. Useful for posterior predictive checks of model performance.

Parameters:

size (int) – Rows of synthetic RFM data to generate. Default is 1000.

Returns:

self.synthetic_df – dataframe containing [“frequency”, “recency”, “T”, “lambda”, “p”, “alive”, “customer_id”] columns.

Return type:

pd.DataFrame

btyd.models.gamma_gamma_model#

class btyd.models.gamma_gamma_model.GammaGammaModel(hyperparams: Dict[float] = None)#

The Gamma-Gamma model is used to estimate the average monetary value of customer transactions.

This implementation is based on the Excel spreadsheet found in [3]. More details on the derivation and evaluation can be found in [4].

Parameters:

hyperparams (dict) – Dictionary containing hyperparameters for model prior parameter distributions.

_hyperparams#

Hyperparameters of prior parameter distributions for model fitting.

Type:

dict

_param_list#

List of estimated model parameters.

Type:

list

_model#

Hierarchical Bayesian model to estimate model parameters.

Type:

pymc.Model

_idata#

InferenceData object of fitted or loaded model. Used for predictions as well as evaluation plots, and model metrics via the ArViZ library.

Type:

ArViZ.InferenceData

References

penalizer_coef#

The coefficient applied to an l2 norm on the parameters

Type:

float

params_#

The fitted parameters of the model

Type:
obj:

Series

data#

A DataFrame with the values given in the call to fit

Type:
obj:

DataFrame

variance_matrix_#

A DataFrame with the variance matrix of the parameters.

Type:
obj:

DataFrame

confidence_intervals_#

A DataFrame 95% confidence intervals of the parameters

Type:
obj:

DataFrame

standard_errors_#

A Series with the standard errors of the parameters

Type:
obj:

Series

summary#

A DataFrame containing information about the fitted parameters

Type:
obj:

DataFrame

generate_rfm_data()#

Not currently supported for GammaGammaModel.

predict(method: str, rfm_df: pd.DataFrame = None, sample_posterior: bool = False, posterior_draws: int = 100, join_df=False, transaction_prediction_model: btyd.Model = None, time: int = 12, discount_rate: float = 0.01, freq: str = 'D') np.ndarray#

Base method for running Gamma-Gamma model predictions.

Parameters:
  • method (str) – Predictive quantity of interest; accepts ‘avg_value’ or ‘clv’.

  • rfm_df (pandas.DataFrame) – Dataframe containing recency, frequency, monetary value, and time period columns.

  • sample_posterior (bool) – Flag for sampling from parameter posteriors. Set to ‘True’ to return predictive probability distributions instead of point estimates.

  • posterior_draws (int) – Number of draws from parameter posteriors.

  • join_df (bool) – NOT SUPPORTED IN 0.1beta2. Flag to add columns to rfm_df containing predictive outputs.

  • transaction_prediction_model (btyd.models) – the model to predict future transactions, literature uses pareto/ndb models but we can also use a different model like beta-geo models

  • time (float, optional) – the lifetime expected for the user in months. Default: 12

  • discount_rate (float, optional) – the monthly adjusted discount rate. Default: 0.01

  • freq (string, optional) – {“D”, “H”, “M”, “W”} for day, hour, month, week. This represents what unit of time your T is measured in.

Returns:

predictions – Numpy arrays containing predictive quantities of interest.

Return type:

np.ndarray