Case Study: Scaling MetricsFlow to 10k Users
MetricsFlow is a real-time analytics dashboard for e-commerce brands. When their user base grew past five thousand, the dashboard started locking up under heavy data loads. They came to us to fix it without rewriting the whole product.
The original bottleneck
Every dashboard view fetched a single huge JSON payload, parsed it on the main thread, and then handed it to a charting library that re-rendered on every state change. On a busy account this meant 3–4 seconds of frozen UI on every navigation.
Profiling pinned the cost on three things: serialization, layout thrash from chart re-renders, and unnecessary re-fetching when the user toggled filters.
What we changed
- Migrated data fetching to TanStack Query with aggressive caching keyed by filter state, eliminating duplicate requests.
- Streamed large datasets in pages and rendered placeholder skeletons so the UI never blocked.
- Moved heavy aggregation off the main thread into a Web Worker, posting only render-ready summaries back.
- Replaced the off-the-shelf charting library with a custom Canvas-based renderer for the largest views.
The result
Time-to-interactive dropped from 3.4s to 0.6s on the heaviest dashboard. P95 navigation time fell by 78%. User retention in the analytics section improved 40% month over month.
Just as importantly, the team can now ship new chart types without worrying about hitting performance cliffs — the rendering and data layers are decoupled.
Lessons we carry forward
Most performance problems are not framework problems. They are architecture problems hiding behind a framework. Caching, streaming, and getting work off the main thread will solve nearly any frontend slowness — regardless of the stack you started with.
Frequently asked questions
- What was MetricsFlow's original performance bottleneck?
- Each dashboard view fetched a single huge JSON payload, parsed it on the main thread, then handed it to a charting library that re-rendered on every state change — locking the UI for 3–4 seconds on every navigation.
- What did Aqib Ops change to fix it?
- We migrated data fetching to TanStack Query with filter-keyed caching, streamed large datasets with skeleton placeholders, moved heavy aggregation into a Web Worker, and replaced the off-the-shelf charts with a custom Canvas renderer for the largest views.
- How big was the performance improvement?
- Time-to-interactive on the heaviest dashboard dropped from 3.4s to 0.6s, P95 navigation time fell by 78%, and user retention in the analytics section improved 40% month-over-month.
- Did Aqib Ops have to rewrite the whole product?
- No. The fix was scoped to the data-fetching, aggregation, and rendering layers. The rest of the product stayed in place, which is why the engagement was weeks instead of months.
- What's the broader lesson?
- Most frontend performance problems are architecture problems hiding behind a framework. Caching, streaming, and getting heavy work off the main thread will solve nearly any slowness — regardless of the stack you started with.