Dashboard Quality Vs. Product Quality
âWhy do you put the canary availability metric on top of your dashboard?â I asked a SDM while reviewing his web applicationâs operation dashboard.
âThe canary can tell us if our application is still running âŚâ he explained.
âBut the canary can only measure your web application as a machine. It is measuring your applicationâs uptime essentially. Is uptime your most important user experience to monitor? Web applications are designed for human users after all, not machines.â I probe further. I often ask SDMs about their approach to building dashboards during interviews, I can get insights into their perspective of quality, their customer obsession, and their technical depth of understanding of the product they're overseeing through their answers.
The quality of the dashboard often mirrors the product quality it's monitoring. It reflects the SDM's attention to detail, their customer-focus, and their insistence on upholding the highest standards. Hence, a key question to pose when interviewing an SDM is, "What are the most important metrics or Key Performance Indicators (KPIs) that you use to monitor user experience?"
A SDM who truly understands their users' needs will focus on metrics that measure customer experience. They will also be able to differentiate these from metrics used to monitor internal components for troubleshooting. This ability demonstrates an understanding of the crucial balance between symptom and root cause monitoring for optimal product performance.
Establishing baselines for each metric is also critical. A proficient SDM will know the importance of identifying "normal" behavior for each KPI, using historical data and industry benchmarks. This knowledge allows them to create a foundational reference point, against which future data can be compared.
Alerts for abnormal signals are another essential dashboard feature. A good SDM will have mechanisms in place to detect deviation from the established baselines, allowing them to swiftly troubleshoot and resolve issues.
The way an SDM discusses latency, a common performance metric, can also be revealing of their character. If they talk about latency as a single number, they may not fully grasp its nature. A nuanced understanding of latency recognizes it as a distribution of values, not a single figure. Therefore, different percentiles (p50, p90, p99, p99.9) should be included to truly comprehend the shape of the latency distribution.
The quality of a SDM can be observed from their approach to building and managing their operational and business dashboards. The metrics they prioritize, their understanding of baselines, their ways in monitoring abnormal signals, and their knowledge of latency distribution all provide valuable insights. As the old adage goes, "the devil is in the details," and it's these details that distinguish the good from the great when it comes to SDMs.
Last updated
Was this helpful?