Important cloud metrics look remarkably like your usual network metrics: latency, packet loss, QoS, jitter, and capacity all play important roles in measuring your cloud.
As the network has moved from a physical entity to a more abstract one, IT has valiantly kept pace by researching and deploying new network devices and functionality. IT teams have had to shift workloads and mindsets to the cloud and to SaaS providers, while remaining responsible for end-user experience — which is ever more important.
It’s a tangled web in the networking world today, but all is not lost. We recommend these five metrics as the building blocks to measuring end-user experience and network performance in the cloud. They’ll likely look pretty familiar: They’re the same old metrics as were useful in the pre-cloud world, and can offer valuable insights now, too (though you’ll need to be able to see into the cloud to get them).
This metric measures the time it takes for packets to travel from source to destination, measured asymmetrically to match the asymmetric nature of the internet. Latency is about perception, so the perception of latency is different for different users and applications. In the old days, when 100 people were using a local application, 10 milliseconds of latency wasn’t a problem. But now, there are thousands of people using an app in the cloud, and a tenfold increase in latency (100 milliseconds) isn’t acceptable.
In addition, web applications are very chatty. A web app comprises a series of requests and responses from the client to the web server. So the increase in latency will affect each of these requests and objects downloaded. For a business-critical SaaS app, the increased latency can have a big effect on productivity.
Once you identify that latency is a problem, you’ll need to figure out where it’s occurring. It could be in your network, your Wi-Fi, your WAN connection, over the open internet or even in your service provider’s environment. Beyond basic troubleshooting, you’ll need better tools. Simple methods like traceroute can give a rough idea of the route from the user to the app, but the routes will likely be different each time.
This is the percentage of network packets that are lost between the source and destination. Depending on the protocol in use, packet loss can lead to network congestion, wasted time and frustrated users. In small bursts, networks can handle loss, but if loss compounds then it can have severe effects on end users.
Back when apps were all hosted internally over the LAN, packet loss wasn’t really a concern. If there was packet loss, it was pretty straightforward to find and fix. But on the open internet, it’s a different story. Internet protocol TCP guarantees delivery, but if it detects packet loss and re-transmits the data, it adds latency and leads to congested networks.
Today’s VoIP and video streaming applications are where packet loss can be particularly noticeable, with dropped calls and poor quality. It’s possible to track packet loss independently on both data and voice if you’re supporting those applications.
Capacity has taken over for bandwidth, which used to be a primary network metric. Now, though, path-based metrics like capacity are more indicative of user experience. Capacity is an end-to-end metric measuring the maximum possible transit rate between source and destination, limited by the most congested hop along the application delivery path. This becomes especially important when considering cloud services, since you don’t have control over a provider’s network, and don’t know how fast the connection really is.
Measuring capacity gauges the actual application path, including Wi-Fi. And continuous monitoring that includes capacity is what’s needed when using the dynamic internet.
There are two varieties of capacity: available and utilized. Available is the most accurate measure of network resources available to applications and can identify the root cause of degradation. When looking at utilized capacity, high utilization is a strong indicator for performance degradation. It’s possible to reduce troubleshooting time when tracking capacity by isolating where the slowest hops are.
This metric reflects the percentage of packets with delay variation between source and destination. When jitter is problematic, it’s highly visible. The quality of a call or online meeting can be affected with as low as 30 to 40ms of jitter.
Quality of Service
This metric ties to routing priority for traffic over specific ports or protocols. It matters when congestion hits a network, since QoS is what ensures a good experience on business-critical applications. For some apps like VoIP or video, if those routing priorities are demoted or re-marked, the network can experience jitter, data loss and latency.
In this world of cloud challenges and possibilities, it’s easy to lose sight of the old metrics. But these five metrics are a good way to regain the right amount of control over your apps and users’ experience.