Initial Assessment Analytics — HKTVMall Contact Lens Reviews
Author: Composer (Cursor AI Agent)
Date: 2026-06-26
Status: Initial exploratory assessment — LLM-enriched VoC analysis
Sources
| # | Source | Path / reference | Role |
|---|---|---|---|
| 1 | Base review corpus | ~/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet |
16,835 contact lens reviews (rating, comment, brand, product, date) |
| 2 | LLM enrichment run | ~/ontologer/data/processed/hk_market/hktvmall/6091e_cl_reviews_model_enriched.parquet |
Sentiment, score, topics, complaints, praise from Ternary-Bonsai-4B (run 6091e) |
| 3 | Classification pipeline | classify_reviews.py |
Produces {run_id}_cl_reviews_model_enriched.parquet via local llama-server |
| 4 | Analytics script | analyze_reviews.py |
Polars-based exploratory analysis (this assessment) |
| 5 | Pipeline manual | README.md |
Server config, runbook, model requirements |
Methodology notes:
- Analysis performed with Polars (not pandas).
- Base data deduplicated on
review_id(keep last) before join → 16,835 rows (77 duplicate IDs in raw file). - LLM labels:
m_sentiment,m_score,m_topics,m_complaints,m_praise. - Topic sentiment heatmap cells = % of mentions of that topic falling into each sentiment bucket (rows sum to ~100% per topic).
Descriptive formulas used in this document:
- Positive rate (year t): (\hat\pi_{pos,t} = N_{pos,t} / N_t)
- Brand share (period P): (s_b = N_{b,P} / \sum_{b'} N_{b',P})
- Period share shift: (\Delta s_b = s_{b,\text{late}} - s_{b,\text{early}})
- Star–sentiment decoupling: count of rows with
rating ≥ 4ANDm_sentiment = negative
For multivariate segment tests, reviewer clustering, store anomalies, and controlled logistic models, see product_people_analytics.md (sections a–e, each with explicit math).
1. Executive summary
This corpus is a high-volume, superficially healthy category (mean ★4.42, 76.5% LLM-positive) that exhibits clear structural deterioration since 2022 and a systematic decoupling of star ratings from textual sentiment.
Three findings dominate:
- Category erosion — positive sentiment fell from ~80% (2020) to ~70% (2026); negative sentiment doubled from ~6% to ~21%.
- Star inflation — 342 reviews with ★≥4 carry LLM-negative sentiment; 240 five-star reviews tag price as negative/mixed (price-timing regret, not product failure).
- Brand bifurcation — incumbents (ACUVUE, 博士倫, Bausch + Lomb) dominate volume but OLENS and 博士倫 show acute 2025–2026 sentiment collapse; Bausch + Lomb's review share collapsed even as its per-brand sentiment stayed relatively stable.
The dimensions extracted by the LLM (comfort, fit, eye_health, price, repurchase, etc.) are actionable: negative reviews fingerprint to fit_sizing and eye_health, not comfort; mixed reviews cluster on price; repurchase is the strongest advocacy signal.
2. Dataset overview
| Metric | Value |
|---|---|
| Reviews (deduped) | 16,835 |
| Brands | 36 |
| Products (SKUs) | 479 |
| Date range | 2018 – 2026 |
| Mean star rating | 4.42 |
| Mean comment length | 14.7 characters |
[babysql:polars] SELECT COUNT(*) AS reviews, COUNT(DISTINCT brand) AS brands, COUNT(DISTINCT product_id) AS products, AVG(rating) AS avg_rating, AVG(LENGTH(comment)) AS avg_comment_len, MIN(LEFT(date, 4)) AS earliest_year, MAX(LEFT(date, 4)) AS latest_year FROM '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' [/babysql]
Sentiment landscape
| m_sentiment | Count | Share | Avg ★ | Avg comment length |
|---|---|---|---|---|
| positive | 12,895 | 76.6% | 4.85 | 9 chars |
| negative | 1,869 | 11.1% | 1.99 | 42 chars |
| mixed | 1,515 | 9.0% | 3.69 | 27 chars |
| neutral | 548 | 3.2% | 4.51 | 6 chars |
| unknown | 8 | 0.05% | 3.75 | 60 chars |
[babysql:polars] SELECT m.m_sentiment, COUNT() AS n, ROUND(100.0 * COUNT() / SUM(COUNT(*)) OVER (), 2) AS pct, AVG(cl.rating) AS avg_rating, AVG(LENGTH(cl.comment)) AS avg_len FROM '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/6091e_cl_reviews_model_enriched.parquet' AS m ON cl.review_id = m.review_id GROUP BY m.m_sentiment ORDER BY n DESC [/babysql]
Interpretation: Negative reviewers write ~4.7× longer comments. Volume-weighted dashboards understate pain. Complaints are near-perfectly diagnostic: 100% of negative rows carry an extracted complaint; 98% of mixed rows do; only 5.4% of positive rows mention a complaint.
3. Temporal dynamics — category health
| Year | Reviews | Positive rate | Negative rate | Avg ★ |
|---|---|---|---|---|
| 2018 | 599 | 75.1% | 10.4% | 4.53 |
| 2019 | 2,124 | 79.1% | 6.7% | 4.58 |
| 2020 | 2,515 | 80.4% | 6.1% | 4.60 |
| 2021 | 1,980 | 79.9% | 8.9% | 4.52 |
| 2022 | 2,103 | 76.6% | 10.5% | 4.43 |
| 2023 | 2,520 | 75.3% | 12.4% | 4.38 |
| 2024 | 2,389 | 74.8% | 12.8% | 4.36 |
| 2025 | 1,739 | 72.1% | 18.2% | 4.14 |
| 2026 | 866 | 70.0% | 21.1% | 4.02 |
[babysql:polars] SELECT CAST(LEFT(cl.date, 4) AS INT) AS year, COUNT() AS reviews, ROUND(100.0 * SUM(CASE WHEN m.m_sentiment = 'positive' THEN 1 ELSE 0 END) / COUNT(), 1) AS positive_rate, ROUND(100.0 * SUM(CASE WHEN m.m_sentiment = 'negative' THEN 1 ELSE 0 END) / COUNT(*), 1) AS negative_rate, AVG(cl.rating) AS avg_rating FROM '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/6091e_cl_reviews_model_enriched.parquet' AS m ON cl.review_id = m.review_id GROUP BY year ORDER BY year [/babysql]
Long-run shift (2018 → 2026): positive −5.1 pp, negative +10.7 pp, avg ★ −0.51.
Interpretation: 2020 marks a peak health year. Decline accelerates from 2022 onward — coinciding with OLENS share growth and ACUVUE share concentration. This is consistent with (a) category mix shift toward cosmetic/color lenses, (b) post-COVID price sensitivity, and/or (c) platform review cohort change — not yet decomposed causally here.
4. Brand time-series composition
4.1 Volume mix shift — early vs late period
Share of total review volume among the top 10 brands:
| Brand | Share 2018–2020 | Share 2024–2026 | Δ (pp) |
|---|---|---|---|
| ACUVUE | 26.3% | 38.7% | +12.4 |
| 博士倫 | 17.9% | 22.7% | +4.8 |
| Bausch + Lomb | 23.0% | 3.6% | −19.4 |
| OLENS | 6.4% | 10.9% | +4.5 |
| CooperVision | 9.2% | 3.4% | −5.8 |
| Alcon | 9.0% | 3.8% | −5.2 |
| Candymagic | 1.1% | 3.5% | +2.4 |
| FreshKon | 3.0% | 1.2% | −1.8 |
| Johnson & Johnson | 0.7% | 2.2% | +1.5 |
| Delight | 2.3% | 1.4% | −0.9 |
[babysql:polars] WITH tagged AS ( SELECT brand, CASE WHEN CAST(LEFT(date, 4) AS INT) BETWEEN 2018 AND 2020 THEN 'early' WHEN CAST(LEFT(date, 4) AS INT) BETWEEN 2024 AND 2026 THEN 'late' END AS period FROM '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' WHERE CAST(LEFT(date, 4) AS INT) BETWEEN 2018 AND 2020 OR CAST(LEFT(date, 4) AS INT) BETWEEN 2024 AND 2026 ), totals AS ( SELECT period, COUNT() AS n_period FROM tagged GROUP BY period ), brand_period AS ( SELECT period, brand, COUNT() AS n FROM tagged GROUP BY period, brand ) SELECT bp.brand, ROUND(100.0 * SUM(CASE WHEN bp.period = 'early' THEN bp.n ELSE 0 END) / MAX(CASE WHEN t.period = 'early' THEN t.n_period END), 1) AS share_early, ROUND(100.0 * SUM(CASE WHEN bp.period = 'late' THEN bp.n ELSE 0 END) / MAX(CASE WHEN t.period = 'late' THEN t.n_period END), 1) AS share_late FROM brand_period bp CROSS JOIN totals t GROUP BY bp.brand HAVING share_early IS NOT NULL AND share_late IS NOT NULL ORDER BY share_late DESC LIMIT 10 [/babysql]
Key narrative:
- ACUVUE is consolidating share of voice — from ~1-in-4 reviews to nearly 2-in-5. Category health metrics increasingly are ACUVUE metrics.
- Bausch + Lomb's review volume collapsed (−19.4 pp) without disappearing from the market — likely reflects SKU/listing shift on HKTVMall rather than brand exit. Per-brand sentiment for remaining B+L reviews stays relatively strong.
- OLENS gained share (+4.5 pp) into a period of worsening category sentiment — cosmetic lens brands disproportionately contribute to fit/eye_health negative topics (see §6).
4.2 ACUVUE — dominant but not immune
| Year | ACUVUE reviews | Vol share | Pos rate | Neg rate | Avg ★ |
|---|---|---|---|---|---|
| 2018 | 220 | 36.7% | 78.6% | 7.7% | 4.63 |
| 2020 | 597 | 23.7% | 83.1% | 4.0% | 4.63 |
| 2022 | 587 | 27.9% | 77.0% | 9.4% | 4.41 |
| 2024 | 841 | 35.2% | 76.2% | 13.2% | 4.31 |
| 2025 | 652 | 37.5% | 76.5% | 14.9% | 4.29 |
| 2026 | 440 | 50.8% | 75.9% | 17.3% | 4.18 |
ACUVUE's negative rate tripled from 4.0% (2020) to 17.3% (2026 YTD) while its share of all reviews hit 50.8% in 2026. The category decline is not happening despite ACUVUE dominance — it is increasingly driven by ACUVUE's review base.
4.3 OLENS — share growth, sentiment collapse
| Year | OLENS reviews | Vol share | Pos rate | Neg rate | Avg ★ |
|---|---|---|---|---|---|
| 2019 | 76 | 3.6% | 85.5% | 3.9% | 4.78 |
| 2022 | 450 | 21.4% | 78.0% | 9.1% | 4.53 |
| 2024 | 298 | 12.5% | 69.1% | 17.4% | 4.18 |
| 2025 | 179 | 10.3% | 67.0% | 21.8% | 3.91 |
| 2026 | 65 | 7.5% | 47.7% | 44.6% | 3.05 |
OLENS shows the steepest sentiment cliff in the dataset. 2026 YTD (small n=65, treat cautiously) suggests nearly coin-flip negative sentiment — consistent with cosmetic lens expectation gaps (appearance, fit, eye irritation).
4.4 博士倫 — volume leader under stress
| Year | 博士倫 reviews | Vol share | Pos rate | Neg rate | Avg ★ |
|---|---|---|---|---|---|
| 2020 | 374 | 14.9% | 80.7% | 7.8% | 4.61 |
| 2022 | 286 | 13.6% | 75.2% | 13.6% | 4.33 |
| 2024 | 581 | 24.3% | 76.2% | 12.7% | 4.40 |
| 2025 | 433 | 24.9% | 66.1% | 21.2% | 4.01 |
| 2026 | 122 | 14.1% | 57.4% | 31.1% | 3.75 |
[babysql:polars] SELECT CAST(LEFT(cl.date, 4) AS INT) AS year, cl.brand, COUNT() AS reviews, ROUND(100.0 * SUM(CASE WHEN m.m_sentiment = 'positive' THEN 1 ELSE 0 END) / COUNT(), 1) AS pos_rate, ROUND(100.0 * SUM(CASE WHEN m.m_sentiment = 'negative' THEN 1 ELSE 0 END) / COUNT(*), 1) AS neg_rate, AVG(cl.rating) AS avg_rating FROM '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/6091e_cl_reviews_model_enriched.parquet' AS m ON cl.review_id = m.review_id WHERE cl.brand IN ('ACUVUE', 'OLENS', '博士倫') GROUP BY year, cl.brand ORDER BY cl.brand, year [/babysql]
博士倫's negative rate quadrupled from 7.8% (2020) to 31.1% (2026 YTD). This aligns with high-volume polarized SKUs (e.g. Lacelle Enchanting Gold, 909 reviews, 34% dissatisfaction pressure in product-level analysis).
4.5 Bausch + Lomb — the silent exit from review share
Bausch + Lomb went from 24.5% of 2020 reviews to 3.6% of 2024–2026 reviews, while brands like ACUVUE absorbed share. Remaining B+L reviews retain relatively high sentiment (e.g. 2024: 83.8% pos, 5.0% neg on n=80). Interpretation: this is a distribution/channel story on HKTVMall, not necessarily a product quality crisis.
5. Brand equity snapshot (full period, n≥100)
| Brand | Reviews | Pos rate | Neg rate | Net sentiment* | Loyalty index** | Avg ★ |
|---|---|---|---|---|---|---|
| SHO-BI | 149 | 85.2% | 1.3% | +0.84 | 49.7% | 4.89 |
| Johnson & Johnson | 384 | 82.0% | 7.8% | +0.74 | 46.6% | 4.49 |
| CooperVision | 1,118 | 79.4% | 8.3% | +0.71 | 42.1% | 4.47 |
| Alcon | 1,053 | 79.4% | 10.4% | +0.69 | 43.6% | 4.48 |
| Bausch + Lomb | 1,875 | 76.7% | 9.4% | +0.67 | 41.0% | 4.51 |
| ACUVUE | 5,227 | 77.7% | 10.9% | +0.67 | 42.6% | 4.43 |
| 博士倫 | 3,179 | 76.1% | 12.1% | +0.64 | 44.7% | 4.39 |
| OLENS | 2,010 | 73.6% | 12.6% | +0.61 | 30.5% | 4.33 |
| FreshKon | 390 | 68.7% | 15.9% | +0.53 | 37.7% | 4.25 |
[babysql:polars] SELECT cl.brand, COUNT() AS reviews, ROUND(100.0 * SUM(CASE WHEN m.m_sentiment = 'positive' THEN 1 ELSE 0 END) / COUNT(), 1) AS pos_rate, ROUND(100.0 * SUM(CASE WHEN m.m_sentiment = 'negative' THEN 1 ELSE 0 END) / COUNT(*), 1) AS neg_rate, AVG(cl.rating) AS avg_rating FROM '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/6091e_cl_reviews_model_enriched.parquet' AS m ON cl.review_id = m.review_id GROUP BY cl.brand ORDER BY reviews DESC LIMIT 10 [/babysql]
* Net sentiment = positive rate − negative rate
* Loyalty index = share of all brand reviews that are repurchase topic + positive sentiment*
OLENS paradox: 73.6% positive but lowest loyalty index (30.5%) among major brands — high trial, lower repeat. Repurchase language is a stronger retention signal than raw positive %.
6. Topic–sentiment heatmap
Each cell = percentage of that topic's mentions tagged with the given sentiment.
Only topics with ≥200 mentions shown. Sorted by volume.
| Topic | Mentions | Positive | Negative | Mixed | Neutral |
|---|---|---|---|---|---|
| fit_sizing | 11,386 | 76.9% | 9.9% | 8.8% | 4.4% |
| comfort | 11,159 | 83.3% | 5.5% | 6.8% | 4.4% |
| quality | 10,563 | 81.0% | 7.3% | 7.1% | 4.7% |
| appearance | 10,342 | 83.3% | 4.8% | 7.0% | 4.9% |
| packaging | 9,342 | 79.4% | 7.9% | 7.2% | 5.5% |
| eye_health | 8,639 | 77.9% | 9.9% | 6.5% | 5.7% |
| price | 8,554 | 84.4% | 3.2% | 6.6% | 5.8% |
| delivery | 8,430 | 81.9% | 6.4% | 5.8% | 5.9% |
| customer_service | 8,100 | 84.6% | 4.7% | 4.7% | 6.0% |
| repurchase | 7,928 | 86.7% | 3.0% | 4.2% | 6.2% |
| brand | 7,909 | 86.3% | 3.1% | 4.4% | 6.2% |
| expiry | 7,573 | 81.4% | 6.4% | 5.6% | 6.5% |
| prescription | 7,449 | 84.4% | 4.4% | 4.6% | 6.6% |
| other | 7,056 | 84.7% | 2.7% | 5.7% | 6.9% |
How to read the heatmap
Hygiene factors (high volume, high positive skew):
comfort, appearance, quality, repurchase, brand — mentioned constantly, overwhelmingly positive. These are table stakes, not differentiators.
Failure modes (elevated negative share):
fit_sizing and eye_health tie at 9.9% negative — the highest negative rates among high-volume topics. When contact lens customers are unhappy, they talk about fit and eye damage, not generic "discomfort."
Ambivalence channel:
price has the lowest negative share (3.2%) but elevated mixed (6.6%) — customers express price-timing regret ("買貴左", "價格已減") rather than pure hatred. This is structurally invisible to star ratings.
Operational risk:
expiry (6.4% neg) and packaging (7.9% neg) point to fulfillment/QC issues — batch expiry and damaged packaging appear in negative fingerprint analysis.
Negative review topic fingerprint (absolute counts)
When m_sentiment = negative, which topics appear?
| Topic | Negative mentions |
|---|---|
| fit_sizing | 1,140 |
| eye_health | 864 |
| quality | 778 |
| packaging | 744 |
| comfort | 626 |
| delivery | 549 |
| expiry | 491 |
| customer_service | 385 |
7. Rating–sentiment decoupling
Calibration matrix (selected cells)
| Star bucket | LLM sentiment | Count |
|---|---|---|
| ★≥4 (positive stars) | positive | 12,936 |
| ★≥4 | negative | 342 |
| ★≥4 | mixed | 957 |
| ★≤2 (negative stars) | negative | 1,364 |
| ★≤2 | positive | 40 |
[babysql:polars] SELECT CASE WHEN cl.rating >= 4 THEN 'stars_ge_4' WHEN cl.rating <= 2 THEN 'stars_le_2' ELSE 'mid' END AS star_bucket, m.m_sentiment, COUNT(*) AS n FROM '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/6091e_cl_reviews_model_enriched.parquet' AS m ON cl.review_id = m.review_id WHERE cl.rating >= 4 OR cl.rating <= 2 GROUP BY star_bucket, m.m_sentiment ORDER BY star_bucket, n DESC [/babysql]
9.0% of all reviews sit in rating–sentiment tension.
Hidden detractors — 5★ + price topic + negative/mixed sentiment
| Brand | Count |
|---|---|
| ACUVUE | 72 |
| 博士倫 | 43 |
| OLENS | 30 |
| Bausch + Lomb | 23 |
| Candymagic | 17 |
| CooperVision | 16 |
| Total | 240 |
[babysql:polars] SELECT cl.brand, COUNT(*) AS n FROM '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/6091e_cl_reviews_model_enriched.parquet' AS m ON cl.review_id = m.review_id WHERE cl.rating >= 4 AND m.m_sentiment = 'negative' GROUP BY cl.brand ORDER BY n DESC LIMIT 10 [/babysql]
These are customers who award five stars but encode price regret in text. Star averages overstate true advocacy.
8. Product-level hotspots (n≥30 reviews)
| Product | Brand | n | Neg rate | Mixed rate | Dissatisfaction* |
|---|---|---|---|---|---|
| 1-DAY ACUVUE MOIST Daily | ACUVUE | 32 | 40.6% | 15.6% | 56.3% |
| O2 EDITION 1 Day | OLENS | 196 | 21.9% | 15.3% | 37.2% |
| Enchanting Gold Lacelle | 博士倫 | 909 | 21.1% | 12.9% | 34.0% |
| 1-DAY ACUVUE MOIST Multifocal | ACUVUE | 695 | 15.8% | 9.8% | 25.6% |
| (BC 8.5) Moist 1-Day | ACUVUE | 811 | 16.0% | 11.0% | 27.0% |
[babysql:polars] SELECT cl.product_title, cl.brand, COUNT() AS n, ROUND(100.0 * SUM(CASE WHEN m.m_sentiment = 'negative' THEN 1 ELSE 0 END) / COUNT(), 1) AS neg_rate, ROUND(100.0 * SUM(CASE WHEN m.m_sentiment = 'mixed' THEN 1 ELSE 0 END) / COUNT(), 1) AS mixed_rate, ROUND(100.0 * ( SUM(CASE WHEN m.m_sentiment = 'negative' THEN 1 ELSE 0 END) + SUM(CASE WHEN m.m_sentiment = 'mixed' THEN 1 ELSE 0 END) ) / COUNT(), 1) AS dissatisfaction_pressure FROM '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet' AS cl JOIN '/home/sidmishra/ontologer/data/processed/hk_market/hktvmall/6091e_cl_reviews_model_enriched.parquet' AS m ON cl.review_id = m.review_id GROUP BY cl.product_title, cl.brand HAVING COUNT(*) >= 30 ORDER BY dissatisfaction_pressure DESC LIMIT 10 [/babysql]
* Dissatisfaction pressure = neg rate + mixed rate
Lacelle Enchanting Gold (博士倫, n=909) is the highest-volume polarized SKU — a priority for qualitative review (color expectation vs reality).
9. Voice of customer — extracted language
Top complaint themes
| Complaint phrase (LLM-extracted) | Approx. count |
|---|---|
| 價格已減 (price already reduced) | ~71 |
| 買貴左 (bought too expensive) | frequent |
| 乾 / 眼乾 (dry eyes) | frequent |
| 異物感 (foreign body sensation) | frequent |
| 眼珠太深色,無效果圖咁明顯 | ~10 |
Dominant complaint class: price timing / promotion regret — a pricing architecture problem, not purely product quality.
Top praise themes
| Praise phrase | Approx. count |
|---|---|
| Good / good | ~2,786 combined |
| 回購 (repurchase) | ~895 |
| high quality | ~829 |
| comfortable | ~761 |
| 已多次回購 (repurchased many times) | ~162 |
10. Strategic implications
| Finding | So what |
|---|---|
| Category sentiment eroding since 2022 | Investigate mix shift (cosmetic vs corrective), promo calendar, and platform cohort effects |
| ACUVUE → 50.8% of 2026 reviews | Category KPIs should be ACUVUE-segmented; aggregate metrics mask brand concentration |
| OLENS 2025–2026 sentiment cliff | Fit + eye_health education, QC on color lenses, expectation management on appearance |
| 博士倫 Lacelle polarized SKU | Deep-dive Enchanting Gold — 909 reviews, 34% dissatisfaction pressure |
| fit_sizing + eye_health = failure modes | Comfort marketing misses the actual detractor language |
| price → mixed not negative | Build true sentiment score; consider price-match / post-purchase adjustment |
| 342 high-star / negative-LLM reviews | NPS and star KPIs are inflated; text-derived score needed for ops |
| Repurchase = strongest advocacy | Use repurchase-topic-positive rate as north-star retention KPI |
11. Recommended next analyses
- Causal decomposition of 2022+ decline — brand mix vs within-brand deterioration (Shapley or Oaxaca-style)
- Join to
products.parquet— lens type (daily/monthly/color), BC, diameter as covariates - Complaint topic modeling — embed
m_complaintsfor finer-grained price vs fit vs health clusters - True Sentiment Score — composite of LLM sentiment, topic weights, and star-rating penalty for hidden detractors
- Dedup fix upstream — 77 duplicate
review_ids incl_reviews.parquetshould be resolved at ETL
Appendix A — Reproducing this analysis
cd /home/sidmishra/ontologer/subprojects/review_sequencing_sentiment
source venv/bin/activate
pip install polars # if not installed
python analyze_reviews.py
Data inputs hardcoded in analyze_reviews.py:
- Base:
~/ontologer/data/processed/hk_market/hktvmall/cl_reviews.parquet - Enrichment:
~/ontologer/data/processed/hk_market/hktvmall/6091e_cl_reviews_model_enriched.parquet
Appendix B — LLM classification provenance
| Property | Value |
|---|---|
| Run ID | 6091e |
| Model | Ternary-Bonsai-4B-Q2_0 (PrismML llama.cpp) |
| Server config | -c 4098 -np 8 (effective 6144 KV, 768 tokens/slot) |
| Throughput | ~6.4 reviews/sec |
| Runtime | ~44 minutes for 16,835 reviews |
| Output | 6091e_cl_reviews_model_enriched.parquet (1.4 MB) |
See README.md for full pipeline reproduction.
Appendix C — Extended analytics (product × people)
Follow-on analysis with per-section mathematical methodology:
| Report | Script |
|---|---|
product_people_analytics.md |
analyze_product_people_clusters.py |
Covers: reviewer clusters in product-mix space, Mar–May 2026 trend deltas, store z-scores, k-NN coherence, χ² topic heterogeneity, logistic regression (ORs with modality/lens/pack/brand controls).
End of initial assessment.