Reviewer journey transitions (>2 reviews)
Generated: 2026-06-26 13:55
Script: analyze_reviewer_transitions.py
Cohort: users with >2 reviews (≥3 total), chronological pairs with days_gap > 0
Report block convention
Each analytical section uses up to three executable layers:
| Layer | Tag | Role |
|---|---|---|
| Results | [result:…]…[/result] |
Markdown table — findings |
| Data SQL | [babysql:polars]…[/babysql] |
Polars SQL on parquet — reproduce inputs/aggregates |
| ML / stats | [ml:…]…[/ml] |
Python estimator spec (sklearn, statsmodels, scipy) |
Prose How we found this (math) gives the formulas. Cached artifacts live in cluster_output/.
Executive summary
[result:transition_summary] | Metric | Value | | --- | --- | | Repeat reviewers | 1,425 | | Reviews (deduped) | 6,257 | | Valid pairs | 4,184 | | Brand switch rate | 25.8% | | Modality switch rate | 5.3% | | Median gap (days) | 110 | | Brand switch median gap | 168d | | Same-brand median gap | 98d | [/result]
[babysql:polars] WITH per_user AS ( SELECT r.user_pk, COUNT() AS n_reviews FROM '/home/sidmishra/ontologer/data/raw/hk_market/hktvmall/2026-06-19/reviews.parquet' AS r JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl ON r.review_id = cl.review_id GROUP BY r.user_pk ) SELECT COUNT() AS n_users, SUM(n_reviews) AS n_reviews FROM per_user WHERE n_reviews >= 3 [/babysql]
[babysql:polars] WITH base AS ( SELECT r.user_pk, cl.review_id, CAST(LEFT(cl.date, 10) AS DATE) AS review_date, cl.product_id, v.v2_brand, v.v2_modality, v.v2_lens_type FROM '/home/sidmishra/ontologer/data/raw/hk_market/hktvmall/2026-06-19/reviews.parquet' AS r JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl ON r.review_id = cl.review_id JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/product_id_to_attributes_v2.parquet' AS v ON cl.product_id = v.product_id WHERE r.user_pk IS NOT NULL ), dedup AS ( SELECT * FROM base QUALIFY ROW_NUMBER() OVER ( PARTITION BY user_pk, review_date, product_id ORDER BY review_id ) = 1 ), lagged AS ( SELECT user_pk, review_date, v2_brand, v2_modality, LAG(review_date) OVER (PARTITION BY user_pk ORDER BY review_date, review_id) AS prev_date, LAG(v2_brand) OVER (PARTITION BY user_pk ORDER BY review_date, review_id) AS prev_brand, LAG(v2_modality) OVER (PARTITION BY user_pk ORDER BY review_date, review_id) AS prev_modality FROM dedup ) SELECT COUNT(*) AS n_pairs, SUM(CASE WHEN v2_brand != prev_brand THEN 1 ELSE 0 END) AS n_brand_switch, SUM(CASE WHEN v2_modality != prev_modality THEN 1 ELSE 0 END) AS n_modality_switch, MEDIAN(DATEDIFF('day', prev_date, review_date)) AS median_days_gap FROM lagged WHERE prev_date IS NOT NULL AND review_date > prev_date [/babysql]
[ml:polars-pairs] library: polars (window functions) script: analyze_reviewer_transitions.py :: build_transitions cohort: user_pk with n_reviews >= 3 dedupe: collapse (user, date, product_id) to one event pairs: LAG attributes over (user_pk ORDER BY date, review_id); keep days_gap > 0 artifact: cluster_output/reviewer_transition_pairs.parquet [/ml]
Top brand flows (from → to)
[result:brand_flows] | From | To | N | Median days | | --- | --- | --- | --- | | BAUSCHLOMB | BAUSCHLOMB | 1062 | 114 | | ACUVUE | ACUVUE | 1050 | 98 | | OLENS | OLENS | 264 | 98 | | ALCON | ALCON | 230 | 78 | | COOPERVISION | COOPERVISION | 211 | 118 | | BAUSCHLOMB | ACUVUE | 148 | 230 | | ACUVUE | BAUSCHLOMB | 115 | 170 | | FRESHKON | FRESHKON | 78 | 60 | | SHOBI | SHOBI | 58 | 20 | | CANDYMAGIC | CANDYMAGIC | 54 | 84 | | ACUVUE | OLENS | 45 | 170 | | DELIGHT | DELIGHT | 44 | 126 | [/result]
[babysql:polars] SELECT prev_v2_brand_top, v2_brand_top, COUNT(*) AS n, MEDIAN(days_gap) AS median_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_brand_top, v2_brand_top ORDER BY n DESC LIMIT 12 [/babysql]
Top modality flows
- daily → daily: n=3631, median 106d
- monthly → monthly: n=90, median 127d
- 2-week → 2-week: n=79, median 181d
- monthly → daily: n=71, median 174d
- daily → monthly: n=67, median 174d
- 2-week → daily: n=39, median 294d
- daily → 2-week: n=29, median 107d
[babysql:polars] SELECT prev_v2_modality, v2_modality, COUNT(*) AS n, MEDIAN(days_gap) AS median_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_modality, v2_modality ORDER BY n DESC LIMIT 10 [/babysql]
Top lens-type flows
- spherical → spherical: n=3110, median 106d
- color → color: n=322, median 92d
- spherical → color: n=204, median 202d
- color → spherical: n=142, median 134d
- multifocal → multifocal: n=140, median 77d
- spherical → multifocal: n=122, median 216d
- multifocal → spherical: n=75, median 91d
- multifocal → color: n=24, median 202d
- color → multifocal: n=20, median 174d
- spherical → toric: n=9, median 181d
[babysql:polars] SELECT prev_v2_lens_type, v2_lens_type, COUNT(*) AS n, MEDIAN(days_gap) AS median_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_lens_type, v2_lens_type ORDER BY n DESC LIMIT 10 [/babysql]
Top pack-size flows
- 30 → 30 (same pack): n=2882, median 107d
- 30 → other: n=198, median 205d
- other → other (same pack): n=188, median 70d
- other → 30: n=147, median 158d
- 10 → 10 (same pack): n=126, median 32d
- 6 → 6 (same pack): n=123, median 167d
- 2 → 2 (same pack): n=102, median 83d
- 30 → 2: n=57, median 123d
- 2 → 30: n=55, median 138d
- 30 → 6: n=53, median 156d
[babysql:polars] SELECT prev_v2_pack_bucket, v2_pack_bucket, COUNT(*) AS n, MEDIAN(days_gap) AS median_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_pack_bucket, v2_pack_bucket ORDER BY n DESC LIMIT 10 [/babysql]
Brand loyalty (stay rate on next review)
- BAUSCHLOMB: stay 79.7% (1062/1333 steps)
- ACUVUE: stay 79.3% (1050/1324 steps)
- OLENS: stay 65.3% (264/404 steps)
- ALCON: stay 71.0% (230/324 steps)
- COOPERVISION: stay 71.0% (211/297 steps)
- CANDYMAGIC: stay 35.8% (54/151 steps)
- FRESHKON: stay 62.9% (78/124 steps)
- DELIGHT: stay 59.5% (44/74 steps)
- _other: stay 48.5% (32/66 steps)
- SHOBI: stay 92.1% (58/63 steps)
[babysql:polars] SELECT prev_v2_brand_top AS brand, COUNT() AS n_out, SUM(CASE WHEN prev_v2_brand_top = v2_brand_top THEN 1 ELSE 0 END) AS n_stay, ROUND(100.0 * SUM(CASE WHEN prev_v2_brand_top = v2_brand_top THEN 1 ELSE 0 END) / COUNT(), 1) AS stay_rate_pct FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_brand_top ORDER BY n_out DESC LIMIT 10 [/babysql]
Modality loyalty
- daily: stay 97.4% (3631/3727)
- monthly: stay 55.6% (90/162)
- 2-week: stay 64.8% (79/122)
[babysql:polars] SELECT prev_v2_modality AS modality, COUNT() AS n_out, SUM(CASE WHEN prev_v2_modality = v2_modality THEN 1 ELSE 0 END) AS n_stay, ROUND(100.0 * SUM(CASE WHEN prev_v2_modality = v2_modality THEN 1 ELSE 0 END) / COUNT(), 1) AS stay_rate_pct FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_modality ORDER BY n_out DESC [/babysql]
Time between reviews
[result:review_timing] | Transition | N pairs | Median days | Mean days | P25 | P75 | | --- | --- | --- | --- | --- | --- | | All steps | 4,184 | 110 | 228 | 48 | 269 | | Brand switch | 1,080 | 168 | 309 | 64 | 412 | | Same brand | 3,104 | 98 | 199 | 44 | 231 | | Modality switch | 211 | 174 | 337 | 73 | 450 | | Same modality | 3,800 | 106 | 219 | 46 | 262 | | SKU switch | 1,943 | 168 | 314 | 63 | 414 | | Same SKU | 2,241 | 87 | 152 | 39 | 187 | [/result]
[babysql:polars] SELECT CASE WHEN switched_brand THEN 'brand_switch' ELSE 'same_brand' END AS transition, COUNT(*) AS n_pairs, MEDIAN(days_gap) AS median_days, AVG(days_gap) AS mean_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY switched_brand ORDER BY transition [/babysql]
How we found this (math): Median/mean Δt on consecutive pairs conditional on switch vs stay.
Notable cross-brand switches (excluding loyal diagonal)
- BAUSCHLOMB → ACUVUE: n=148, median 230d
- ACUVUE → BAUSCHLOMB: n=115, median 170d
- ACUVUE → OLENS: n=45, median 170d
- ALCON → ACUVUE: n=42, median 100d
- OLENS → ACUVUE: n=41, median 230d
- BAUSCHLOMB → OLENS: n=37, median 141d
- CANDYMAGIC → OLENS: n=36, median 194d
- COOPERVISION → ACUVUE: n=35, median 218d
- ACUVUE → CANDYMAGIC: n=33, median 197d
- ACUVUE → ALCON: n=32, median 80d
- OLENS → CANDYMAGIC: n=32, median 94d
- OLENS → BAUSCHLOMB: n=31, median 171d
- FRESHKON → BAUSCHLOMB: n=21, median 193d
- ACUVUE → COOPERVISION: n=20, median 109d
- CANDYMAGIC → ACUVUE: n=20, median 279d
[babysql:polars] SELECT prev_v2_brand_top, v2_brand_top, COUNT(*) AS n, MEDIAN(days_gap) AS median_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' WHERE prev_v2_brand_top != v2_brand_top GROUP BY prev_v2_brand_top, v2_brand_top ORDER BY n DESC LIMIT 15 [/babysql]
How we found this (math)
Cohort: Users with review count (n_u \geq 3). Reviews sorted by ((\text{user_pk}, \text{date}, \text{review_id})).
Dedupe: Collapse rows sharing ((\text{user}, \text{date}, \text{product_id})) to one event (same-day re-reviews of one SKU).
Consecutive pairs: For ordered reviews (r_{i-1}, r_i), keep pairs with gap (\Delta t_i = \text{date}i - \text{date}{i-1} > 0) days.
Switch indicators: e.g. (\mathbb{1}[\text{brand}i \neq \text{brand}{i-1}]). Stay rate from brand (b):
[\text{stay}(b) = \frac{#{(i): \text{brand}{i-1}=\text{brand}_i=b}}{#{(i): \text{brand}{i-1}=b}}]
Flow counts: (n_{b\rightarrow b'} = #{(i): \text{brand}_{i-1}=b,\, \text{brand}_i=b'}).
Timing: Report median (\tilde{\Delta t}), mean (\bar{\Delta t}), quartiles on (\Delta t_i) conditional on switch vs stay.
Output files
| File | Description |
|---|---|
cluster_output/reviewer_transition_pairs.parquet |
All valid consecutive pairs |
cluster_output/reviewer_brand_flows.csv |
Brand transition counts |
cluster_output/reviewer_modality_flows.csv |
Modality transitions |
cluster_output/reviewer_transitions_report.json |
Full payload |