Analyzing Customer Purchase Intervals with LAG()
Analyzing Customer Purchase Intervals with LAG()
You already know how to rank rows with ROW_NUMBER().
But what if you want to compare a row against the previous one?
That's exactly what LAG() is for.
The Problem
You have an orders table. You want to know:
How many days passed between each customer's purchases?
This kind of question appears constantly in e-commerce:
- Repurchase cycle analysis
- Churn detection ("it's been 60 days since their last order")
- Revenue trend ("did this customer buy more or less than last time?")
Sample Data
CREATE TABLE orders ( order_id INT, customer_id INT, order_date DATE, amount INT ); INSERT INTO orders VALUES (1, 101, '2024-01-05', 30000), (2, 101, '2024-01-20', 45000), (3, 101, '2024-02-10', 20000), (4, 102, '2024-01-08', 15000), (5, 102, '2024-02-01', 60000), (6, 102, '2024-02-25', 25000);
LAG() Syntax
LAG(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)
| Parameter | Meaning |
|---|---|
column | The value you want to look back at |
offset | How many rows back (default: 1) |
default | What to return when there is no previous row |
Step 1 — Bring in the Previous Order Date
SELECT customer_id, order_date, amount, LAG(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS prev_order_date FROM orders;
Result:
| customer_id | order_date | amount | prev_order_date |
|---|---|---|---|
| 101 | 2024-01-05 | 30000 | NULL |
| 101 | 2024-01-20 | 45000 | 2024-01-05 |
| 101 | 2024-02-10 | 20000 | 2024-01-20 |
| 102 | 2024-01-08 | 15000 | NULL |
| 102 | 2024-02-01 | 60000 | 2024-01-08 |
| 102 | 2024-02-25 | 25000 | 2024-02-01 |
PARTITION BY customer_id ensures each customer's history is independent.
The first purchase per customer returns NULL — there is no previous row.
Step 2 — Calculate the Interval
Wrap the result in a CTE and compute the gap.
WITH lagged AS ( SELECT customer_id, order_date, amount, LAG(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS prev_order_date FROM orders ) SELECT customer_id, order_date, prev_order_date, DATEDIFF(order_date, prev_order_date) AS days_since_last_order FROM lagged;
Result:
| customer_id | order_date | prev_order_date | days_since_last_order |
|---|---|---|---|
| 101 | 2024-01-05 | NULL | NULL |
| 101 | 2024-01-20 | 2024-01-05 | 15 |
| 101 | 2024-02-10 | 2024-01-20 | 21 |
| 102 | 2024-01-08 | NULL | NULL |
| 102 | 2024-02-01 | 2024-01-08 | 24 |
| 102 | 2024-02-25 | 2024-02-01 | 24 |
Step 3 — Add Amount Change
You can call LAG() multiple times in one query.
WITH lagged AS ( SELECT customer_id, order_date, amount, LAG(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS prev_order_date, LAG(amount) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS prev_amount FROM orders ) SELECT customer_id, order_date, amount, DATEDIFF(order_date, prev_order_date) AS days_since_last_order, amount - prev_amount AS amount_change FROM lagged;
Result:
| customer_id | order_date | amount | days_since_last_order | amount_change |
|---|---|---|---|---|
| 101 | 2024-01-05 | 30000 | NULL | NULL |
| 101 | 2024-01-20 | 45000 | 15 | +15000 |
| 101 | 2024-02-10 | 20000 | 21 | -25000 |
| 102 | 2024-01-08 | 15000 | NULL | NULL |
| 102 | 2024-02-01 | 60000 | 24 | +45000 |
| 102 | 2024-02-25 | 25000 | 24 | -35000 |
What About LEAD()?
LEAD() is the mirror image — it looks forward instead of backward.
SELECT customer_id, order_date, LEAD(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS next_order_date FROM orders;
Use LEAD() when you want to answer:
- "When did this customer come back after this purchase?"
- "Was this the customer's last order?"
LAG vs LEAD — Quick Reference
| LAG() | LEAD() | |
|---|---|---|
| Direction | Looks back | Looks forward |
| Typical use | Previous purchase, prior month | Next purchase, upcoming event |
| NULL appears on | First row per partition | Last row per partition |
Key Takeaways
LAG()pulls a value from a previous row within the same partition.PARTITION BYkeeps each customer's history separate — without it, rows bleed across customers.- The first row per partition always returns
NULL; handle it withCOALESCE()if needed. LEAD()does the same thing in the opposite direction.- Combine with
DATEDIFF()for interval analysis, or subtract amounts to track spend trends.
Practice Query
Find customers whose repurchase interval shortened compared to the gap before it.
(Hint: you'll need two levels of LAG() or a CTE with two passes.)
SQL Growth
