🧘‍♂️
Jin's Muse on Engineering
  • About Me
  • Leadership
    • About Work
      • Work Like Watching a Movie
      • The day you leave …
      • The inner game of work
      • Routinization - how great institutions become mediocre
      • Have You Lost Your Stories
      • I wish I knew when I was facing layoffs
      • How to get out of this boring job
      • Why Setbacks Can Be Good for Your Career
      • What do we get from returning to office?
      • What if this is the last year you work here
      • Learning, Balancing, Falling - Don’t Stop
      • Who is the boss?
      • Under the Emperor’s New Clothes
      • Are you getting the right opportunities?
      • The elements of a healthy organization
      • Do What You Love vs. Love What You Do
    • About Life
      • Accept the things we cannot change ...
      • Live backwards
      • Life, one breath away, one breath a time
      • Will to Power
      • Learn from stand-up comedian
      • Connect the dots - submit to life’s entropy
    • Leadership Pinciples
      • The thing about Amazon’s leadership principles
      • Vision, Mission, Roadmap - But First, Eat What You Cook
      • Calibration - how we compare
      • But We are on the same side!
      • Leaders need two plans
      • Whose results are you delivering?
      • Coaching through open ended questions
      • This is the new year, and who will you delight?
      • Anti-fragility and Over-thinking
      • So ... what’s your decision
      • No Super Hero in Team
      • Congrats, you are the owner!
    • Communication
      • What, So What, Now What
        • Just tell me what’s next!
      • The Power of Repetition
      • Passing the Story Along
      • Lost in Translation
      • Chopstick War
      • How you speak about the most impactful project
      • Are we really listening
    • Problem Solving
      • How we resolve conflicts
      • Problem Solving: Leader Define Process and Better Leader Define StandardPage
      • If Kant and Nietzsche read Annie Duke's "How to Decide"
  • Engineering
    • Best Practices
      • Accuracy Ain't Always the Truth
      • No excuse for bad engineering
      • Secret of good engineering: constraint, not more time, or resource
      • The Art of Picking the Right Problems
      • Choosing the Right Battle: The Path to Productivity and Satisfaction
      • The Power Balance in Software Development
      • Three Ways of Invent and Simplify
      • Work backwards from demo-able sprint goals
      • That last 1%: from good to great
      • When it is too hard ... don't do it!
      • What Engineering can learn from Sports: Time it!
      • Struggling with Sprint Goals? Try Working Backward
      • Why working on a “legacy service” is a good thing
    • Operational Excellence
      • On dashboard
        • Dashboard Quality Vs. Product Quality
      • How we turn operation problems into big data innovation
      • Don’t automate 100%
        • 99% Automated - deleted an on-call rotation!
      • How we got rid of log diving
      • No perfect runbook - map is not territory
      • Someone have to be oncall, why they can’t be you?
        • Why being oncall is a good thing
        • If Sherlock is On-Call ...
        • Inner Peace During Oncall
      • Monitoring Your Service Health Like Human
    • Software Architecture
      • The Curse of the "Next Generation Project"
      • Absurdity in Decoupling
        • Tension of Decoupling and Cohesion
      • The ideal software form
        • Composition Over Inheritance
      • Security and Safety
      • Architecture Properties for Security Services
      • Learn system design from Git - Multilayered Architecture
        • Learn system design from Git - Immutable content-addressable datastore
        • Learn system design from Git - Event Sourcing
        • Learn system design from Git - Evolutionary Architecture
      • Quality starts with local stack
      • Intention-Revealing Interfaces with Examples
      • Design Principles
    • Distributed Computing
      • The beauty of multi-process applications
      • Cook Concurrent, Parallel, Async, and Non-blocking Together (1)
      • Cook Concurrent, Parallel, Async and Non-blocking Together (2)
      • Cook Concurrent, Parallel, Async and Non-Blocking Together (3) - The Multitasking Myth
      • The Local and Global Optimization Trade-Off in System Design
      • Throttling, Traffic Shaping, Traffic Shedding and Circuit Breakers in System Design
      • Differences Between Service Discovery and Load Balancing
        • Load balancer
      • Power of Two Choices
      • Using High Percentile Latency to Detect Resource Constraints
      • Queuing's Impact on High Percentile Latency
    • Machine Learning
      • Are you underfitting or overfitting?
      • How to lead an underfitting team
      • How to lead an overfitting team
      • Project Management and Neural Networks
      • Cryptography and Machine Learning
    • Programming
      • Can you create a language interpreter?
    • Cryptography
      • Explore KMS with CodeWhisperer (and a Dash of Cryptography) - AES-GCM
      • Exploring KMS with a Dash of Philosophy - HASH
      • Explore KMS' RSA encryption: a tale of two keys
      • Explore KMS - the monkey business of Elliptic Curve Cryptography (ECC)Page
  • Project Management
    • Setting goals
      • The Words of Goals
      • Why deadline is a good thing
      • Can you ship it?
      • One Thing Worth Remembering Per Quarter
    • The art of project planning
      • Are you still playing poker at scrum planning
      • Plan and Planning - How we start a project
      • A project’s journey - from inception to reflection
      • Component Team, Feature Team and Tiger Team
      • Sprint demo is a stage
      • Milestone and date
      • Onboarding project
      • Have You Sent Your Status Reports
      • When do you know a project smells
      • Hackathon is coming
    • How to prioritize
      • How to prioritize your 700 ticket backlog
      • Kitchen Sink and Tech Debt
  • Product Management
    • Find Problem
      • Dare to be 10x better
      • It is better to do the right problem the wrong way …
      • The Art of Goal setting: reactive goals, proactive goals, and inspirational goals
      • The things we don’t do …
      • No, I don’t want a platform
  • SDE Career Development
    • Promotion
      • The thing about promotion
        • Write your own promo doc
          • How to write your promo doc
        • The Reason Not To Promote
        • The myth of promotion project
      • The thing about leveling: are you SDE I, II or III?
      • What a L5 SDE is expected in Amazon
      • Customer Obsession - Of Your Own Promotion
      • Still waiting for that perfect manager?
      • You and Your Manager
    • Being SDE
      • What does a KMS SDE do?
      • From Knowing to Mastery: SDE Levels and Guild Ranks
      • How do SDE paint a Tiger from a Cat
      • The need to code
        • So ... where is your code
        • Quantity and Quality - How many line of code did you write?
      • Learn the Machine First
      • Is college major important?
      • Become an industry expert
      • How, What and Why in Problem Solving
      • Integrate both Breath and Depth in Life Long Learning
      • Do one thing well
      • CS students should know computer science history
      • Why Micromanagement is good for you
      • The power of imprinting
    • The Different Levels of Diving Deep
    • Becoming a tech lead
      • The traits of top engineers
      • Road to L6 SDE
      • There is no such thing called L6 task
      • Become a SME
      • Owning a Project vs. Owning a Problem Space
      • Leading, coaching and instructing
  • SDM Career Development
    • What does a SDM do
      • The thing about SDM - organization leader
      • The thing about SDM - manager of managers
      • The thing about SDM - line manager
      • The satisfaction of being a manager
      • Coach first, manager second, learner always - how I learn to coach
      • Gardener vs. SDM
      • SDM wearing three hats
      • What is a SDM doing in a design review?
    • How to be a SDM
      • How not to Micro-Manage - Stopping Being A Task Master
      • How SDM Build Momentum And Connect the Dots
      • How a SDM shoots at a moving target
      • The Danger of Over-Abstraction for Managers
        • How AWS SDM Dive Deep (1) - Dashboard
        • How AWS SDM Dive Deep (2) - Datastore
        • How AWS SDM Dive Deep (3) - Infrastructure
        • How AWS SDM Dive Deep (4) - Deployment
        • How AWS SDM Dive Deep (5) - Correction of Error (COE)
        • How AWS SDM Dive Deep (6) - OPS Review
        • How AWS SDM Dive Deep (7) - Bar Raiser
      • You are a manager, but are you a coach?
      • What an SDM can learn from swimming coach
      • Building a Team like a Waterpolo Coach
      • What a SDM can learn from a new programming language?
      • Should a SDM be able to code like a SDE?
      • Manager, Leader and Ruler
      • Performance Evaluation - Scope, Complexity and Impact
      • A Manager’s Vision
      • A Manager's Judgment
      • Build team with a soul
      • Build a learning organization - tech talk
      • Should You Become a Manager?
      • Becoming a SDM
    • Lead Business Application Team
      • Platform Team and Business Application Team
      • Page
    • What should you expect from your manager?
    • Build a Team like a Human Learning System
    • A New Manager's Tribal Survival Guide
    • SDM should Advocate for Intention Revealing Interface
    • A 30 minutes daily standup? You are doing it wrong!
    • Project Status Meeting that Takes an Hour? You are doing it wrong!
    • How to Ask Questions as a New Manager
    • Constructive Feedback: A Managerial Dialogue
    • Distinguishing "Improvement" from "Development" Feedback
  • Interview Tips
    • Level of Your LinkedIn Profile?
    • Make it personal
    • Tell me about yourself in 3 minutes
  • Import
Powered by GitBook
On this page

Was this helpful?

  1. Engineering
  2. Distributed Computing

Queuing's Impact on High Percentile Latency

Operating a large-scale cloud service entails vigilance over many performance metrics, with latency standing at the forefront. As these services commit to high availability and rapid responsiveness, engineers must consider not just the typical suspects like network and hardware bottlenecks, but also the often-overlooked queuing latency.

Dissecting Queuing Latency

Queuing latency represents the duration a request idles in a queue prior to processing. This can manifest at various junctures:

  1. Client-side Queuing: In scenarios where synchronized clients dispatch requests, they frequently block awaiting responses before initiating subsequent requests. Here, a delay in one response can backlog subsequent requests, escalating latency.

  2. Load Balancer Queuing: Load balancers distribute incoming traffic across several servers. If a server lags or is swamped, requests might queue, even if other servers remain underutilized.

  3. Service-side Queuing: Within a service, numerous processing stages can cause requests to queue, whether they're vying for database access, CPU cycles, or other resources.

High Percentile Latency: Where Queuing Really Hurts

Although average latency provides a broad performance perspective, high percentile latency - the upper extremes - dictates user experience quality. Delays, especially those stemming from sequential queuing at the client, load balancer, and service levels, can aggregate, pushing requests into the high latency bracket.

Mitigating the Queuing Effect

Recognizing the ramifications of queuing latency is the prelude to its mitigation. Here's how to tackle it:

  • Asynchronous Clients: By shifting to an asynchronous response-handling mode, clients can dispatch multiple requests without awaiting prior responses, cutting down client-side queuing.

  • Load Balancer Intelligence: Advanced load balancers can leverage algorithms to sidestep slower or overloaded servers, diminishing load balancer queuing.

  • Service Scalability, Monitoring, and Asynchronous Processing: Ensuring the service's scalability and real-time queuing monitoring at different stages can facilitate dynamic resource allocation. Furthermore, embracing asynchronous processing can drastically reduce server-side queuing latency. Instead of processing requests in a linear fashion, asynchronous processing allows the service to handle multiple tasks simultaneously without waiting for one to complete before starting another. This not only optimizes CPU usage but also ensures that resources aren't lying idle when they could be processing other tasks.

When evaluating the latency of a large-scale cloud service, it's not uncommon to observe certain peculiarities in latency percentiles. One such phenomenon is when the p99 latency (99th percentile) is significantly higher (say, 5-10 times) than the p90 latency (90th percentile), while the p90 and p50 (median) are close. This behavior often signals underlying issues related to queuing at various stages of the request lifecycle. Let's break this down:

1. Understanding Percentiles:

  • p50 (Median): Half of the requests have a latency less than this, and the other half have more.

  • p90: 90% of the requests are processed within this latency.

  • p99: 99% of the requests are processed within this latency, indicating the tail-end performance, which can be much worse than the median or even the p90.

2. Interpreting the Observations:

  • p90 and p50 Being Close: This indicates that the majority (90%) of the requests are being processed consistently and relatively quickly. Most requests don't face significant queuing delays.

  • p99 Being 5-10 Times Higher than p90: This suggests that the remaining 10% of the requests (between p90 and p99) face substantial delays. These delays can be attributed to queuing at different stages.

3. Queuing At Different Hops:

  • Client-side Queuing: If synchronized clients block on responses before sending subsequent requests, occasional delays in some responses can cause subsequent requests to queue up. This backlog might only affect a minority of requests but can lead to high p99 latencies.

  • Load Balancer Queuing: In scenarios where particular servers are slower or overloaded, even occasionally, the load balancer might queue requests awaiting those servers. If this behavior is intermittent but severe when it occurs, it can have a pronounced effect on the p99 latency without majorly impacting p90.

  • Server-side Queuing: When the server processes requests, certain non-uniform resource contention scenarios can emerge. For instance, occasional database locks, cache misses, or other sporadic resource contentions can cause significant delays for a small fraction of requests, pushing up the p99 latency.

The substantial difference between p90 and p99 latencies in this scenario underscores the importance of closely monitoring and understanding tail latencies. While the majority of requests (up to the 90th percentile) are served efficiently, there's a small fraction that experiences significant delays due to queuing at various stages. Addressing these queuing challenges can lead to a more uniformly high-performing system, improving user experience even for those edge cases that fall in the high-latency percentiles.

Conclusion

In the intricate world of large-scale cloud services, queuing latency can quietly but substantially influence performance. By understanding its roots and implementing strategies like asynchronous processing, services can ensure not just swifter response times but also a more uniform and enhanced user experience, especially in the high-latency percentiles.

PreviousUsing High Percentile Latency to Detect Resource ConstraintsNextMachine Learning

Last updated 1 year ago

Was this helpful?