Also known as the Beta-Geometric/Beta-Binomial Model [1].
Future purchases opportunities are treated as discrete points in time.
In the literature, the model provides a better fit than the Pareto/NBD
model for a nonprofit organization with regular giving patterns.
The model is estimated with a recency-frequency matrix with n transaction
opportunities.
Parameters:
penalizer_coef (float) – The coefficient applied to an l2 norm on the parameters
Conditional expected purchases in future time period.
The expected number of future transactions across the next m_periods_in_future
transaction opportunities by a customer with purchase history
(x, tx, n).
frequency (array_like) – Total periods with observed transactions
recency (array_like) – Period of most recent transaction
n_periods (array_like) – Number of transaction opportunities. Previously called n.
weights (None or array_like) – Number of customers with given frequency/recency/T,
defaults to 1 if not specified. Fader and
Hardie condense the individual RFM matrix into all
observed combinations of frequency/recency/T. This
parameter represents the count of customers with a given
purchase pattern. Instead of calculating individual
log-likelihood, the log-likelihood is calculated for each
pattern and multiplied by the number of customers with
that pattern. Previously called n_custs.
verbose (boolean, optional) – Set to true to print out convergence diagnostics.
tol (float, optional) – Tolerance for termination of the function minimization process.
index (array_like, optional) – Index for resulted DataFrame which is accessible via self.data
kwargs – Key word arguments to pass to the scipy.optimize.minimize
function as options dict
Conditional expected number of purchases up to time.
Calculate the expected number of repeat purchases up to time t for a
randomly chosen individual from the population, given they have
purchase history (frequency, recency, T).
frequency (array_like) – the frequency vector of customers’ purchases
(denoted x in literature).
recency (array_like) – the recency vector of customers’ purchases
(denoted t_x in literature).
T (array_like) – customers’ age (time units since first purchase)
weights (None or array_like) – Number of customers with given frequency/recency/T,
defaults to 1 if not specified. Fader and
Hardie condense the individual RFM matrix into all
observed combinations of frequency/recency/T. This
parameter represents the count of customers with a given
purchase pattern. Instead of calculating individual
loglikelihood, the loglikelihood is calculated for each
pattern and multiplied by the number of customers with
that pattern.
initial_params (array_like, optional) – set the initial parameters for the fitter.
verbose (bool, optional) – set to true to print out convergence diagnostics.
tol (float, optional) – tolerance for termination of the function minimization process.
index (array_like, optional) – index for resulted DataFrame which is accessible via self.data
kwargs – key word arguments to pass to the scipy.optimize.minimize
function as options dict
Returns:
with additional properties like params_ and methods like predict
frequency (array_like, optional) – a vector containing the customers’ frequencies.
Defaults to the whole set of frequencies used for fitting the model.
monetary_value (array_like, optional) – a vector containing the customers’ monetary values.
Defaults to the whole set of monetary values used for
fitting the model.
Returns:
The conditional expectation of the average profit per transaction
This method computes the average lifetime value for a group of one
or more customers.
Parameters:
transaction_prediction_model (model) – the model to predict future transactions, literature uses
pareto/ndb models but we can also use a different model like beta-geo models
frequency (array_like) – the frequency vector of customers’ purchases
(denoted x in literature).
recency (the recency vector of customers' purchases) – (denoted t_x in literature).
T (array_like) – customers’ age (time units since first purchase)
monetary_value (array_like) – the monetary value vector of customer’s purchases
(denoted m in literature).
time (float, optional) – the lifetime expected for the user in months. Default: 12
frequency (array_like) – the frequency vector of customers’ purchases
(denoted x in literature).
monetary_value (array_like) – the monetary value vector of customer’s purchases
(denoted m in literature).
weights (None or array_like) – Number of customers with given frequency/monetary_value,
defaults to 1 if not specified. Fader and
Hardie condense the individual RFM matrix into all
observed combinations of frequency/monetary_value. This
parameter represents the count of customers with a given
purchase pattern. Instead of calculating individual
loglikelihood, the loglikelihood is calculated for each
pattern and multiplied by the number of customers with
that pattern.
initial_params (array_like, optional) – set the initial parameters for the fitter.
verbose (bool, optional) – set to true to print out convergence diagnostics.
tol (float, optional) – tolerance for termination of the function minimization process.
index (array_like, optional) – index for resulted DataFrame which is accessible via self.data
q_constraint (bool, optional) – when q < 1, population mean will result in a negative value
leading to negative CLV outputs. If True, we penalize negative values of q to avoid this issue.
kwargs – key word arguments to pass to the scipy.optimize.minimize
function as options dict
Based on [5], [6], this model has the following assumptions:
1) Each individual, i, has a hidden lambda_i and p_i parameter
2) These come from a population wide Gamma and a Beta distribution
respectively.
Individuals purchases follow a Poisson process with rate \(\lambda_i*t\) .
At the beginning of their lifetime and after each purchase, an
individual has a p_i probability of dieing (never buying again).
Conditional expected number of repeat purchases up to time t.
Calculate the expected number of repeat purchases up to time t for a
randomly choose individual from the population, given they have
purchase history (frequency, recency, T)
See Wagner, U. and Hoppe D. (2008).
Parameters:
t (array_like) – times to calculate the expectation for.
frequency (array_like) – historical frequency of customer.
recency (array_like) – historical recency of customer.
frequency (array_like) – the frequency vector of customers’ purchases
(denoted x in literature).
recency (array_like) – the recency vector of customers’ purchases
(denoted t_x in literature).
T (array_like) – customers’ age (time units since first purchase)
weights (None or array_like) – Number of customers with given frequency/recency/T,
defaults to 1 if not specified. Fader and
Hardie condense the individual RFM matrix into all
observed combinations of frequency/recency/T. This
parameter represents the count of customers with a given
purchase pattern. Instead of calculating individual
log-likelihood, the log-likelihood is calculated for each
pattern and multiplied by the number of customers with
that pattern.
verbose (bool, optional) – set to true to print out convergence diagnostics.
tol (float, optional) – tolerance for termination of the function minimization process.
index (array_like, optional) – index for resulted DataFrame which is accessible via self.data
kwargs – key word arguments to pass to the scipy.optimize.minimize
function as options dict
Returns:
With additional properties and methods like params_ and predict
Conditional expected number of purchases up to time.
Calculate the expected number of repeat purchases up to time t for a
randomly choose individual from the population, given they have
purchase history (frequency, recency, T).
frequency (array_like) – the frequency vector of customers’ purchases
(denoted x in literature).
recency (array_like) – the recency vector of customers’ purchases
(denoted t_x in literature).
T (array_like) – customers’ age (time units since first purchase)
weights (None or array_like) – Number of customers with given frequency/recency/T,
defaults to 1 if not specified. Fader and
Hardie condense the individual RFM matrix into all
observed combinations of frequency/recency/T. This
parameter represents the count of customers with a given
purchase pattern. Instead of calculating individual
log-likelihood, the log-likelihood is calculated for each
pattern and multiplied by the number of customers with
that pattern.
save_data (bool, optional) – Whether to save data from fitter.data to pickle object
values_to_save (list, optional) – Placeholders for original attributes for saving object. If None
will be extended to attr_list length like [None] * len(attr_list)