Generate RFM Summaries From a SQL Store#
Example SQL statement to transform transactional data into RFM data#
Let’s review what our variables mean:
frequency
represents the number of repeat purchases the customer has made, or one less than the total number of purchases. Repeat purchases made within the same time period are only counted as one purchase.recency
represents the age of the customer when they made their most recent purchases. This is equal to the duration between a customer’s first purchase and their latest purchase. (Thus if they have made only 1 purchase, the recency is 0.)monetary_value
represents the average value of a given customer’s repeat purchases. Customers who have only made a single purchase have monetary values of zero.T
represents the age of the customer in whatever time units chosen (weekly, in the above dataset). This is equal to the duration between a customer’s first purchase and the end of the period under study.
Thus, executing a query against a transactional dataset, called orders
, in a SQL-store may look like:
SELECT
customer_id,
COUNT(DISTINCT DATE(transaction_at)) - 1 as frequency,
datediff('day', MIN(transaction_at), MAX(transaction_at)) as recency,
CASE -- MONETARY VALUE CALCULATION
WHEN COUNT(DISTINCT transaction_at) = 1 THEN 0 -- 0 if only one order
ELSE
SUM(
CASE WHEN first_transaction = transaction_at THEN 0 -- daily average of all but first order
ELSE salesamount
END
) / (COUNT(DISTINCT transaction_at) - 1)
END as monetary_value
datediff('day', CURRENT_DATE, MIN(transaction_at)) as T
FROM orders
GROUP BY customer_id
Create table with RFM summary matrix with holdout#
Variables frequency
, T
and recency
have the same meaning as in previous section.
Two variables to set before executing:
duration_holdout
- holdout duration in days.CURRENT_DATE
- current date, could be changed to final date of the transactional data.
select
a.*,
COALESCE(b.frequency_holdout, 0) as frequency_holdout,
duration_holdout as duration_holdout
from (
select
customer_id,
datediff(max(event_date), min(event_date)) as recency,
count(*) - 1 as frequency,
datediff(date_sub(CURRENT_DATE, duration_holdout), min(event_date)) as T
from orders
where event_date < date_sub(CURRENT_DATE, duration_holdout)
group by customer_id
) a
left join (
select
customer_id,
count(*) as frequency_holdout
from orders
where event_date >= date_sub(CURRENT_DATE, duration_holdout)
and event_date < CURRENT_DATE
group by customer_id
) b
on a.customer_id = b.customer_id