Revolutionizing Perfomance Management

Dan Kuebrich

Subscribe to Dan Kuebrich: eMailAlertsEmail Alerts
Get Dan Kuebrich: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Top Stories by Dan Kuebrich

You don’t have to be a pre-cog to find and deal with infrastructure and application problems; you just need good monitoring.  We had quite a day Monday during the EC2 EBS availability incident.  Thanks to some early alerts - which started coming in about 2.5 hours before AWS started reporting problems - our ops team was able to intervene and make sure that our customers’ data was safe and sound. I’ll start with screenshots of what we saw and experienced, then get into what metrics to watch and alert on in your environment, as well as how to do so in TraceView. 10:30 AM EST: Increased disk latency, data pipeline backup Around 10 am, we started to notice that writes weren’t moving through our pipeline as smoothly as before.  Sure enough, pretty soon we started seeing alerts about elevated DB load and disk latency.  Here’s what it looked like: Figure 1: At 10 AM, we s... (more)

Performing Under Pressure | Part 1

Many types of performance problems can result from the load created by concurrent users of web applications, and all too often these scalability bottlenecks go undetected until the application has been deployed in production.  Load-testing, the generation of simulated user requests, is a great way to catch these types of issues before they get out of hand.  Last month I presented about load testing with Canonical's Corey Goldberg at the Boston Python Meetup last week and thought the topic deserved blog discussion as well. In this two-part series, I'll walk through generating lo... (more)

End-User Monitoring: RUM or Synthetic?

Performance for end-users is the metric by which most businesses judge their web applications' performance: is the responsiveness of the application an asset or a liability to the business?  Studies show that users are growing more and more demanding, while average pageloads are getting bigger and bigger-more than doubling in weight since 2010.  Combine that with frequent releases and updates from marketing, and pretty soon the optimization job is never quite done. Ongoing monitoring application performance from the end-user's perspective is therefore critical; fortunately, ther... (more)

The Taming of the Queue

A few weeks back webserver request queueing came under heightened scrutiny as rapgenius blasted Heroku for not using as much autotune as promised in their “intelligent load balancing”. If you somehow missed the write-up (or response), check it out for its great simulations of load balancing strategies on Heroku. What if you’re not running on Heroku? Well, the same wisdom still applies – know your application’s load balancing and concurrency and measure its performance. Let’s explore how request queueing affects applications in the non-PaaS world and what you can do about it. Fu... (more)

Performing Under Pressure | Part 2

In part 1 of this article, we covered writing web app load tests using multi-mechanize.  This post picks up where the other left off and will discuss how to gather interesting and actionable performance data from a load-test, using (of course) Traceview as an example. The big problem we had after writing load tests was that timing data gathered by multi-mechanize is inherently external to the application. This means it can tell us the response times of requests when the app is under load but doesn't identify bottlenecks or configuration problems. So we need to be gathering a bi... (more)