Generate RFM Summaries From a SQL Store#

Example SQL statement to transform transactional data into RFM data#

Let’s review what our variables mean:

  • frequency represents the number of repeat purchases the customer has made, or one less than the total number of purchases. Repeat purchases made within the same time period are only counted as one purchase.

  • recency represents the age of the customer when they made their most recent purchases. This is equal to the duration between a customer’s first purchase and their latest purchase. (Thus if they have made only 1 purchase, the recency is 0.)

  • monetary_value represents the average value of a given customer’s repeat purchases. Customers who have only made a single purchase have monetary values of zero.

  • T represents the age of the customer in whatever time units chosen (weekly, in the above dataset). This is equal to the duration between a customer’s first purchase and the end of the period under study.

Thus, executing a query against a transactional dataset, called orders, in a SQL-store may look like:

SELECT
  customer_id,
  COUNT(DISTINCT DATE(transaction_at)) - 1 as frequency,
  datediff('day', MIN(transaction_at), MAX(transaction_at)) as recency,
  CASE                                              -- MONETARY VALUE CALCULATION
      WHEN COUNT(DISTINCT transaction_at) = 1 THEN 0    -- 0 if only one order
      ELSE
        SUM(
          CASE WHEN first_transaction = transaction_at THEN 0  -- daily average of all but first order
          ELSE salesamount
          END
          ) / (COUNT(DISTINCT transaction_at) - 1)
      END as monetary_value  
  datediff('day', CURRENT_DATE, MIN(transaction_at)) as T
FROM orders
GROUP BY customer_id

Create table with RFM summary matrix with holdout#

Variables frequency, T and recency have the same meaning as in previous section.

Two variables to set before executing:

  • duration_holdout - holdout duration in days.

  • CURRENT_DATE - current date, could be changed to final date of the transactional data.

select
    a.*,
    COALESCE(b.frequency_holdout, 0) as frequency_holdout,
    duration_holdout as duration_holdout
from (
    select
        customer_id,
        datediff(max(event_date), min(event_date)) as recency,
        count(*) - 1 as frequency,
        datediff(date_sub(CURRENT_DATE, duration_holdout), min(event_date)) as T
    from orders
    where event_date < date_sub(CURRENT_DATE, duration_holdout)
    group by customer_id
) a
left join (
    select
        customer_id,
        count(*) as frequency_holdout
    from orders
    where event_date >= date_sub(CURRENT_DATE, duration_holdout)
      and event_date < CURRENT_DATE
    group by customer_id
) b
on a.customer_id = b.customer_id