Reviewer journey transitions (>2 reviews)

Generated: 2026-06-26 13:55
Script: analyze_reviewer_transitions.py
Cohort: users with >2 reviews (≥3 total), chronological pairs with days_gap > 0


Report block convention

Each analytical section uses up to three executable layers:

Layer Tag Role
Results [result:…]…[/result] Markdown table — findings
Data SQL [babysql:polars]…[/babysql] Polars SQL on parquet — reproduce inputs/aggregates
ML / stats [ml:…]…[/ml] Python estimator spec (sklearn, statsmodels, scipy)

Prose How we found this (math) gives the formulas. Cached artifacts live in cluster_output/.


Executive summary

[result:transition_summary] | Metric | Value | | --- | --- | | Repeat reviewers | 1,425 | | Reviews (deduped) | 6,257 | | Valid pairs | 4,184 | | Brand switch rate | 25.8% | | Modality switch rate | 5.3% | | Median gap (days) | 110 | | Brand switch median gap | 168d | | Same-brand median gap | 98d | [/result]

[babysql:polars] WITH per_user AS ( SELECT r.user_pk, COUNT() AS n_reviews FROM '/home/sidmishra/ontologer/data/raw/hk_market/hktvmall/2026-06-19/reviews.parquet' AS r JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl ON r.review_id = cl.review_id GROUP BY r.user_pk ) SELECT COUNT() AS n_users, SUM(n_reviews) AS n_reviews FROM per_user WHERE n_reviews >= 3 [/babysql]

[babysql:polars] WITH base AS ( SELECT r.user_pk, cl.review_id, CAST(LEFT(cl.date, 10) AS DATE) AS review_date, cl.product_id, v.v2_brand, v.v2_modality, v.v2_lens_type FROM '/home/sidmishra/ontologer/data/raw/hk_market/hktvmall/2026-06-19/reviews.parquet' AS r JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl ON r.review_id = cl.review_id JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/product_id_to_attributes_v2.parquet' AS v ON cl.product_id = v.product_id WHERE r.user_pk IS NOT NULL ), dedup AS ( SELECT * FROM base QUALIFY ROW_NUMBER() OVER ( PARTITION BY user_pk, review_date, product_id ORDER BY review_id ) = 1 ), lagged AS ( SELECT user_pk, review_date, v2_brand, v2_modality, LAG(review_date) OVER (PARTITION BY user_pk ORDER BY review_date, review_id) AS prev_date, LAG(v2_brand) OVER (PARTITION BY user_pk ORDER BY review_date, review_id) AS prev_brand, LAG(v2_modality) OVER (PARTITION BY user_pk ORDER BY review_date, review_id) AS prev_modality FROM dedup ) SELECT COUNT(*) AS n_pairs, SUM(CASE WHEN v2_brand != prev_brand THEN 1 ELSE 0 END) AS n_brand_switch, SUM(CASE WHEN v2_modality != prev_modality THEN 1 ELSE 0 END) AS n_modality_switch, MEDIAN(DATEDIFF('day', prev_date, review_date)) AS median_days_gap FROM lagged WHERE prev_date IS NOT NULL AND review_date > prev_date [/babysql]

[ml:polars-pairs] library: polars (window functions) script: analyze_reviewer_transitions.py :: build_transitions cohort: user_pk with n_reviews >= 3 dedupe: collapse (user, date, product_id) to one event pairs: LAG attributes over (user_pk ORDER BY date, review_id); keep days_gap > 0 artifact: cluster_output/reviewer_transition_pairs.parquet [/ml]


Top brand flows (from → to)

[result:brand_flows] | From | To | N | Median days | | --- | --- | --- | --- | | BAUSCHLOMB | BAUSCHLOMB | 1062 | 114 | | ACUVUE | ACUVUE | 1050 | 98 | | OLENS | OLENS | 264 | 98 | | ALCON | ALCON | 230 | 78 | | COOPERVISION | COOPERVISION | 211 | 118 | | BAUSCHLOMB | ACUVUE | 148 | 230 | | ACUVUE | BAUSCHLOMB | 115 | 170 | | FRESHKON | FRESHKON | 78 | 60 | | SHOBI | SHOBI | 58 | 20 | | CANDYMAGIC | CANDYMAGIC | 54 | 84 | | ACUVUE | OLENS | 45 | 170 | | DELIGHT | DELIGHT | 44 | 126 | [/result]

[babysql:polars] SELECT prev_v2_brand_top, v2_brand_top, COUNT(*) AS n, MEDIAN(days_gap) AS median_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_brand_top, v2_brand_top ORDER BY n DESC LIMIT 12 [/babysql]

Top modality flows

  • daily → daily: n=3631, median 106d
  • monthly → monthly: n=90, median 127d
  • 2-week → 2-week: n=79, median 181d
  • monthly → daily: n=71, median 174d
  • daily → monthly: n=67, median 174d
  • 2-week → daily: n=39, median 294d
  • daily → 2-week: n=29, median 107d

[babysql:polars] SELECT prev_v2_modality, v2_modality, COUNT(*) AS n, MEDIAN(days_gap) AS median_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_modality, v2_modality ORDER BY n DESC LIMIT 10 [/babysql]

Top lens-type flows

  • spherical → spherical: n=3110, median 106d
  • color → color: n=322, median 92d
  • spherical → color: n=204, median 202d
  • color → spherical: n=142, median 134d
  • multifocal → multifocal: n=140, median 77d
  • spherical → multifocal: n=122, median 216d
  • multifocal → spherical: n=75, median 91d
  • multifocal → color: n=24, median 202d
  • color → multifocal: n=20, median 174d
  • spherical → toric: n=9, median 181d

[babysql:polars] SELECT prev_v2_lens_type, v2_lens_type, COUNT(*) AS n, MEDIAN(days_gap) AS median_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_lens_type, v2_lens_type ORDER BY n DESC LIMIT 10 [/babysql]

Top pack-size flows

  • 30 → 30 (same pack): n=2882, median 107d
  • 30 → other: n=198, median 205d
  • other → other (same pack): n=188, median 70d
  • other → 30: n=147, median 158d
  • 10 → 10 (same pack): n=126, median 32d
  • 6 → 6 (same pack): n=123, median 167d
  • 2 → 2 (same pack): n=102, median 83d
  • 30 → 2: n=57, median 123d
  • 2 → 30: n=55, median 138d
  • 30 → 6: n=53, median 156d

[babysql:polars] SELECT prev_v2_pack_bucket, v2_pack_bucket, COUNT(*) AS n, MEDIAN(days_gap) AS median_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_pack_bucket, v2_pack_bucket ORDER BY n DESC LIMIT 10 [/babysql]

Brand loyalty (stay rate on next review)

  • BAUSCHLOMB: stay 79.7% (1062/1333 steps)
  • ACUVUE: stay 79.3% (1050/1324 steps)
  • OLENS: stay 65.3% (264/404 steps)
  • ALCON: stay 71.0% (230/324 steps)
  • COOPERVISION: stay 71.0% (211/297 steps)
  • CANDYMAGIC: stay 35.8% (54/151 steps)
  • FRESHKON: stay 62.9% (78/124 steps)
  • DELIGHT: stay 59.5% (44/74 steps)
  • _other: stay 48.5% (32/66 steps)
  • SHOBI: stay 92.1% (58/63 steps)

[babysql:polars] SELECT prev_v2_brand_top AS brand, COUNT() AS n_out, SUM(CASE WHEN prev_v2_brand_top = v2_brand_top THEN 1 ELSE 0 END) AS n_stay, ROUND(100.0 * SUM(CASE WHEN prev_v2_brand_top = v2_brand_top THEN 1 ELSE 0 END) / COUNT(), 1) AS stay_rate_pct FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_brand_top ORDER BY n_out DESC LIMIT 10 [/babysql]

Modality loyalty

  • daily: stay 97.4% (3631/3727)
  • monthly: stay 55.6% (90/162)
  • 2-week: stay 64.8% (79/122)

[babysql:polars] SELECT prev_v2_modality AS modality, COUNT() AS n_out, SUM(CASE WHEN prev_v2_modality = v2_modality THEN 1 ELSE 0 END) AS n_stay, ROUND(100.0 * SUM(CASE WHEN prev_v2_modality = v2_modality THEN 1 ELSE 0 END) / COUNT(), 1) AS stay_rate_pct FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY prev_v2_modality ORDER BY n_out DESC [/babysql]

Time between reviews

[result:review_timing] | Transition | N pairs | Median days | Mean days | P25 | P75 | | --- | --- | --- | --- | --- | --- | | All steps | 4,184 | 110 | 228 | 48 | 269 | | Brand switch | 1,080 | 168 | 309 | 64 | 412 | | Same brand | 3,104 | 98 | 199 | 44 | 231 | | Modality switch | 211 | 174 | 337 | 73 | 450 | | Same modality | 3,800 | 106 | 219 | 46 | 262 | | SKU switch | 1,943 | 168 | 314 | 63 | 414 | | Same SKU | 2,241 | 87 | 152 | 39 | 187 | [/result]

[babysql:polars] SELECT CASE WHEN switched_brand THEN 'brand_switch' ELSE 'same_brand' END AS transition, COUNT(*) AS n_pairs, MEDIAN(days_gap) AS median_days, AVG(days_gap) AS mean_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' GROUP BY switched_brand ORDER BY transition [/babysql]

How we found this (math): Median/mean Δt on consecutive pairs conditional on switch vs stay.

Notable cross-brand switches (excluding loyal diagonal)

  • BAUSCHLOMB → ACUVUE: n=148, median 230d
  • ACUVUE → BAUSCHLOMB: n=115, median 170d
  • ACUVUE → OLENS: n=45, median 170d
  • ALCON → ACUVUE: n=42, median 100d
  • OLENS → ACUVUE: n=41, median 230d
  • BAUSCHLOMB → OLENS: n=37, median 141d
  • CANDYMAGIC → OLENS: n=36, median 194d
  • COOPERVISION → ACUVUE: n=35, median 218d
  • ACUVUE → CANDYMAGIC: n=33, median 197d
  • ACUVUE → ALCON: n=32, median 80d
  • OLENS → CANDYMAGIC: n=32, median 94d
  • OLENS → BAUSCHLOMB: n=31, median 171d
  • FRESHKON → BAUSCHLOMB: n=21, median 193d
  • ACUVUE → COOPERVISION: n=20, median 109d
  • CANDYMAGIC → ACUVUE: n=20, median 279d

[babysql:polars] SELECT prev_v2_brand_top, v2_brand_top, COUNT(*) AS n, MEDIAN(days_gap) AS median_days FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/reviewer_transition_pairs.parquet' WHERE prev_v2_brand_top != v2_brand_top GROUP BY prev_v2_brand_top, v2_brand_top ORDER BY n DESC LIMIT 15 [/babysql]


How we found this (math)

Cohort: Users with review count (n_u \geq 3). Reviews sorted by ((\text{user_pk}, \text{date}, \text{review_id})).

Dedupe: Collapse rows sharing ((\text{user}, \text{date}, \text{product_id})) to one event (same-day re-reviews of one SKU).

Consecutive pairs: For ordered reviews (r_{i-1}, r_i), keep pairs with gap (\Delta t_i = \text{date}i - \text{date}{i-1} > 0) days.

Switch indicators: e.g. (\mathbb{1}[\text{brand}i \neq \text{brand}{i-1}]). Stay rate from brand (b):

[\text{stay}(b) = \frac{#{(i): \text{brand}{i-1}=\text{brand}_i=b}}{#{(i): \text{brand}{i-1}=b}}]

Flow counts: (n_{b\rightarrow b'} = #{(i): \text{brand}_{i-1}=b,\, \text{brand}_i=b'}).

Timing: Report median (\tilde{\Delta t}), mean (\bar{\Delta t}), quartiles on (\Delta t_i) conditional on switch vs stay.


Output files

File Description
cluster_output/reviewer_transition_pairs.parquet All valid consecutive pairs
cluster_output/reviewer_brand_flows.csv Brand transition counts
cluster_output/reviewer_modality_flows.csv Modality transitions
cluster_output/reviewer_transitions_report.json Full payload