Wearer landscape — new vs existing themes
Generated: 2026-06-26 13:55
Script: analyze_wearer_landscape.py
Report block convention
Each analytical section uses up to three executable layers:
| Layer | Tag | Role |
|---|---|---|
| Results | [result:…]…[/result] |
Markdown table — findings |
| Data SQL | [babysql:polars]…[/babysql] |
Polars SQL on parquet — reproduce inputs/aggregates |
| ML / stats | [ml:…]…[/ml] |
Python estimator spec (sklearn, statsmodels, scipy) |
Prose How we found this (math) gives the formulas. Cached artifacts live in cluster_output/.
What else we can do (total landscape roadmap)
This corpus supports a full VoC × product × people × time stack. Completed or in progress:
| Layer | Status | Examples |
|---|---|---|
| Category health & stars vs LLM | Done | initial_assessment_analytics.md |
| v2 segment trends & GLM drivers | Done | product_people_analytics.md |
| Reviewer journeys (brand/modality switches) | Done | reviewer_transitions.md |
| New vs existing wearer themes | This report | Text-declared + platform-first proxies |
Next extensions (not yet built):
- Causal mix decomposition — is category sentiment fall driven by OLENS/color mix shift or within-SKU deterioration? (Shapley or fixed-effects on segment×month.)
- Competitive switching attribution — link detractor Cluster 3 to prior brand/SKU (from transition pairs).
- Store × wearer cohort — are new wearers concentrated in outlier stores (
H7249002)? - Prescription / toric / multifocal onboarding — first-review topic paths for medical vs cosmetic entry.
- LLM re-prompt for wearer intent — add
wearer_type: new|existing|unknownto classification schema (cleaner than regex). - Survival / repeat hazard — P(second review within 90d | first-review themes).
- Price-regret vs product-failure split — separate mixed-sentiment price timing from fit/eye_health negatives.
Cohort definitions
[result:wearer_cohort_counts] | Cohort | Rule | N | Interpretation | | --- | --- | --- | --- | | declared_new_wearer | Scored novice / first-lens trial | 37 | Self-identified novice | | declared_existing_wearer | Scored repurchase / veteran | 3,885 | Self-identified repeat | | platform_first_review | User's 1st review (no declared signal) | 6,678 | New to reviewing on HKTV | | platform_repeat_reviewer | User's 2+ review (proxy) | 6,224 | Engaged repeat reviewer | [/result]
[ml:wearer_cohort-rules] library: wearer_cohort.py (regex scoring, not sklearn) script: classify_review(comment, m_topics, m_praise, review_ordinal) scores: S_n novice patterns, S_e veteran patterns, veto penalties gates: wearing context, ordinal caps for weak trial language audit: cluster_output/wearer_audit_sample.md [/ml]
How we found this (math): If S_e ≥ 3 and S_e ≥ S_n → existing; elif S_n clears thresholds → new; else proxy by ordinal o_u within user_pk.
(a) Top themes — new wearers (declared_new_wearer)
Sentiment: 54.1% positive, 27.0% negative, 16.2% mixed · avg ★3.78 · avg comment 37 chars
LLM topics (mention rate per review)
[result:new_wearer_topics] | Topic | Rate | vs existing (pp) | | --- | --- | --- | | repurchase | 0.0% | -59.5 | | customer_service | 0.0% | -48.8 | | delivery | 2.7% | -44.7 | | price | 5.4% | -42.0 | | expiry | 2.7% | -40.3 | | other | 0.0% | -39.6 | | packaging | 16.2% | -37.2 | | prescription | 10.8% | -33.8 | | brand | 16.2% | -33.6 | | eye_health | 21.6% | -29.5 | [/result]
[babysql:polars] SELECT cohort, topic, rate FROM read_csv('/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/wearer_topic_rates.csv') WHERE cohort = 'declared_new_wearer' ORDER BY rate DESC LIMIT 12 [/babysql]
[ml:scipy-chi2] library: scipy.stats.chi2_contingency script: analyze_wearer_landscape.py :: topic enrichment test: topic mention independent of cohort [/ml]
Top complaint phrases (LLM-extracted)
- 唔好expect 上眼會有驚喜效果 (2)
- 唔舒服 (2)
- 盒裝濕 (2)
- con packing破損 (2)
- 遇到開左就崩左 (1)
- 啲碎留係眼入邊係咩後果 (1)
- 對Con變細咗第一次買到咁嘅Con 好難想像長期帶會點 (1)
- 眼乾眼澀 (1)
- un著un著感覺 (1)
- 第一次試戴呢個牌子 (1)
Top praise phrases
- 舒服 (3)
- 舒適 (2)
- 顏色自然嘅全深啡色 (2)
- 唔會直徑太大嘅大眼仔 (2)
- 第一次買,度數算準,配戴也比較舒服 (1)
- 度數準確 (1)
- 配戴舒服 (1)
- 顏色唔错 (1)
New-wearer narrative: Themes cluster on fit_sizing, comfort, and eye_health — onboarding anxiety, not repurchase. Negative rate is 27% vs 8% for declared existing — failure modes are adaptation (异物感, dry eyes) not loyalty erosion.
(b) Top themes — existing wearers (declared_existing_wearer)
Sentiment: 86.2% positive, 7.8% negative · avg ★4.54
LLM topics (mention rate per review)
| Topic | Rate |
|---|---|
| fit_sizing | 81.4% |
| comfort | 68.9% |
| appearance | 60.7% |
| repurchase | 59.5% |
| quality | 57.1% |
| packaging | 53.5% |
| eye_health | 51.1% |
| brand | 49.8% |
| customer_service | 48.8% |
| delivery | 47.4% |
| price | 47.4% |
| prescription | 44.7% |
Top complaint phrases
- 冇有 (111)
- 唔乾 (10)
- 唔舒服 (7)
- not available in all sizes (6)
- 眼已覺攰及乾 (4)
- 不能久帶 (4)
- 冇有其他問題 (3)
- 唔好用 (3)
Top praise phrases
- 回購 (398)
- 多次回購 (263)
- 用開 (248)
- 已多次回購 (160)
- good quality (119)
- comfortable fit (94)
- good (82)
- 一直用開 (66)
- good fit (64)
- 已回購多次 (61)
Existing-wearer narrative: repurchase and brand dominate — advocacy and habit. Comfort/quality are table stakes; negatives are rare and often operational (delivery, expiry).
[babysql:polars] SELECT cohort, topic, rate FROM read_csv('/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/wearer_topic_rates.csv') WHERE cohort = 'declared_existing_wearer' ORDER BY rate DESC LIMIT 12 [/babysql]
Platform proxy comparison (no declared signal in text)
| Cohort | N | Pos | Neg | Top-3 topics |
|---|---|---|---|---|
| platform_first_review | 6,678 | 69.5% | 15.5% | comfort(60%), quality(60%), fit_sizing(59%) |
| platform_repeat_reviewer | 6,224 | 78.3% | 8.4% | comfort(71%), quality(69%), appearance(68%) |
[babysql:polars] SELECT wearer_cohort, COUNT() AS n, ROUND(100.0 * SUM(CASE WHEN m.m_sentiment = 'positive' THEN 1 ELSE 0 END) / COUNT(), 1) AS pos_pct, ROUND(100.0 * SUM(CASE WHEN m.m_sentiment = 'negative' THEN 1 ELSE 0 END) / COUNT(*), 1) AS neg_pct FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/wearer_tags.parquet' AS w JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl ON w.review_id = cl.review_id JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/6091e_cl_reviews_model_enriched.parquet' AS m ON cl.review_id = m.review_id WHERE wearer_cohort IN ('platform_first_review', 'platform_repeat_reviewer') GROUP BY wearer_cohort [/babysql]
First-time reviewers on the platform show higher negative rate than repeat reviewers — even without saying "first time wearing lenses." Likely mix of trialists, disappointed switchers, and motivated complainers.
Product mix — what new vs existing wearers buy
Declared new — top modalities / lens types
modality: daily(100%) lens: spherical(68%), color(24%), multifocal(8%)
Declared existing — top modalities / lens types
modality: daily(93%), monthly(4%), 2-week(3%) lens: spherical(77%), color(14%), multifocal(8%), toric(1%), color_toric(0%)
[babysql:polars] SELECT w.wearer_cohort, v.v2_modality, v.v2_lens_type, COUNT() AS n, ROUND(100.0 * COUNT() / SUM(COUNT(*)) OVER (PARTITION BY w.wearer_cohort), 1) AS pct FROM '/home/sidmishra/ontologer/subprojects/review_sequencing_sentiment/cluster_output/wearer_tags.parquet' AS w JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl ON w.review_id = cl.review_id JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/product_id_to_attributes_v2.parquet' AS v ON cl.product_id = v.product_id WHERE w.wearer_cohort IN ('declared_new_wearer', 'declared_existing_wearer') GROUP BY w.wearer_cohort, v.v2_modality, v.v2_lens_type ORDER BY w.wearer_cohort, n DESC [/babysql]
Topic enrichment — new vs existing (χ²)
| Topic | New rate | Existing rate | Δ pp | p |
|---|---|---|---|---|
| repurchase | 0.0% | 59.5% | -59.5 | 8.31e-13* |
| customer_service | 0.0% | 48.8% | -48.8 | 9.41e-09* |
| delivery | 2.7% | 47.4% | -44.7 | 1.46e-07* |
| price | 5.4% | 47.4% | -42.0 | 8.35e-07* |
| expiry | 2.7% | 43.0% | -40.3 | 1.84e-06* |
| other | 0.0% | 39.6% | -39.6 | 2.11e-06* |
| packaging | 16.2% | 53.5% | -37.2 | 1.34e-05* |
| prescription | 10.8% | 44.7% | -33.8 | 7.54e-05* |
| brand | 16.2% | 49.8% | -33.6 | 9.66e-05* |
| eye_health | 21.6% | 51.1% | -29.5 | 0.000657* |
| appearance | 35.1% | 60.7% | -25.6 | 0.00274* |
| fit_sizing | 62.2% | 81.4% | -19.3 | 0.00545* |
How we found this (math): Per topic (t): (\hat\pi_t = #{\text{reviews mentioning }t}/N). Test (H_0): topic mention independent of cohort using χ² on 2×2 table (mention × {new, existing}).
Output files
| File | Role |
|---|---|
cluster_output/wearer_tags.parquet |
Per-review cohort tags |
cluster_output/wearer_topic_rates.csv |
Topic rates by cohort |
cluster_output/wearer_landscape_report.json |
Full payload |