After Achievers built a load testing framework to help our engineering teams understand how our platform performs under unexpected load, the team was able to improve overall platform performance, allow for 4x more traffic throughput, and scale clusters more effectively. This blog reviews goals set from the initial baseline results, and how we resolved bottlenecks we encountered during the testing phase. Now that we had a good report of the current state of the platform, it was time to beat that baseline.
Beating the baseline
Let’s do a quick review of the metrics we monitored and reported on.
- Throughput: Refers to the number of requests that a system can handle per second.
- Errors: Number of failed requests that occurred during the test period.
- Latency: The time taken for a request to be processed from the time it was sent out.
- Scalability: The platform's ability to handle increasing loads without significant performance degradation.
Having highlighted the metrics above, we then set a goal for each. This was done to effectively measure if we can successfully improve our system.
To measure testing results and track expectations, we created observability dashboards built in New Relic. Dashboards prepared the team to quickly identify bottlenecks and issues that arise during the testing period.
After reviewing the dashboards it was apparent bottlenecks were limiting our platform performance.
Throughput optimization
Istio Load balancing
gRPC keeps TCP sessions open as long as possible to maximize throughput and minimize overhead, but the long-lived sessions make load balancing complex. This is particularly an issue in auto-scaled Kubernetes environments. When load increases new pods are added, however, the client will stay connected to the old gRPC pods, resulting in unequal load distribution.
Below is a great example of gRPC traffic travelling to a deployment and not balancing.
You can see we are getting sporadic requests to each pod, including one pod getting no traffic at all. Istio helps us distribute requests by sharing connection information between Envoy proxies. The Envoy proxies share the number of requests being received, Istio can determine which pod is less busy and can better serve the request.
Configuring load balancing on your destinationRule to use LEAST_REQUEST instead of the default of round-robin, helped solve the issue of unbalanced traffic.
After the new loadBalancer configuration was deployed, the results show an even distribution of requests among pods and a large bump in throughput across all services.
Errors
Client-side
Not all bottlenecks are on the server. We noticed a few bottlenecks on the load testing client generating the load resulting in a high error rate. While we had enough CPU/memory to run our tests, our network had some limitations which revealed themselves.
Google Cloud virtual machines without an external IP address have traffic travelling outbounds through a Cloud NAT. When running a load test from inside another GCP project, egress traffic spiked OUT_OF_RESOURCES errors during the testing period.
The team added extra IP addresses and bumped the “ports allowed to a destination address” to the Google Cloud NAT. You can see from the image below we hit a flat line limit before solving the issue at the end of the graph.
A long-term solution would be to fully move to IPV6 to remove outbound connection limits. Achievers plans to be on IPV6 for all our networking in 2024.
Latency
Istio config
Since all cluster traffic travels through the Istio Envoy proxy, we thought there would be room to improve latency. During the investigation we did not see any long transactions hanging in Envoy, however, we thought this would be a good time to see if we could improve Istio by tweaking some configuration. Istio Concurrency is the number of worker threads which run on the Envoy proxy sidecar. If unset, the default is 2. If set to 0, this will be configured to use all cores on the machine using CPU requests and limits to choose a value, with limits taking precedence over requests.
Concurrency Results:
4 workers => 4434.11/s + p(95)=114.67ms
6 workers => 4230.42/s + p(95)=98.3ms
2 workers => 4519.10/s + p(95)=100.75ms
In our case, Istio concurrency didn’t seem to impact performance at all. It was worth the exercise to see if it would benefit us, although in the end there were no significant gains.
Code improvements
Often shared code can introduce issues across services. One example of this is when a bug in our database connector caused quite a bit of extra latency in outbound connections.
Using New Relic tracing we were able to easily spot a pattern. In the image above you can see latency results were high on endpoints which had these slow database calls. Database locks were used to switch to the schema needed before executing the query. This costs additional time to maintain those locks. The issue was resolved by removing a call to switch schemas and instead appending the schema name to the query.
After the library was optimized, the service saw an overall improvement in the average response time.
Another code issue was revealed on an old endpoint that lived in our monolith. The endpoint was seeing very poor performance, often causing requests to take longer than 10 seconds to complete.
After we upgraded the endpoint and optimized database queries, we managed to get that back down to milliseconds.
Caching
Large caching improvements came by moving some images to the CDN to improve response times. A few bottlenecks were located on services which rely heavily on Redis cache. While Redis should not be a dependency for services, it does greatly improve performance for applications.
Reviewing resource graphs on New Relic revealed Redis was being exhausted of resources in a couple of critical namespaces. The two purple lines exceed the Kubernetes limit, pointing to memory exhaustion and out-of-memory (OOM) errors.
A quick PR to bump Redis Kubernetes requests quickly solved memory exhaustion. Overall, this was a great exercise to test how much of a hit on performance heavily cached services had when Redis resources became saturated.
Scaling
Nodes are the managed virtual machines a Kubernetes cluster will run workloads on. Even though the virtual machines may be ephemeral and come and go, you need to guarantee your cluster is performing well and scaling accordingly. Once you go through basic improvements such as tweaking node CPU, memory, and disk, you should review some of the other modifications made below which ensure your infrastructure is scaling effectively.
IP Address limitations:
Clusters are created with IP ranges, and often values are never revisited. At the beginning of the testing, we quickly noticed our cluster could soon hit a limit with the number of worker nodes in the cluster. Kubernetes will take the pod network and divide it across the worker node network, restricting the number of workers in your nodepools. Depending on the max_pods_per_node value, node creation is bound by the number of addresses in the Pod address range.
For example, if you set the default maximum number of Pods to 110
and the secondary IP address range for Pods to /21
, Kubernetes assigns a /24
CIDR range to nodes on the cluster. This allows a maximum of 2(24-21) = 23 = 8
nodes on the cluster.
The limit of IP addresses may result in your cluster’s nodes being unable to scale and workloads being throttled, or worse, crashing due to resource exhaustion. This was an easy fix for us as we just needed to adjust some of the IP ranges assigned to pods and workers. Google has also introduced “secondary IP ranges” which can assist teams who need to rebuild their clusters to address this issue.
TopologySpreadConstraints:
High-utilized Kubernetes Deployment may need to be distributed amongst different worker nodes to ensure resources are not exhausted. Using topologySpreadConstraints allows Pods to spread across the clusters’ failure domains such as regions, zones, hosts, and other user-defined topologies.
We often use this feature on our critical services to ensure our Kubernetes clusters scale effectively. When Unsatisfiable is set to DoNotSchedule it can force the cluster to scale if more than two pods live on all workers.
If we allow the cluster to scale with critical services, we ensure we have enough CPU and memory during a spike in traffic. Tweaking this value along with average scaling values, allowed for the cluster to scale 10x more effectively.
Conclusion
After we resolved all the larger bottlenecks, it was time to review our initial goals. The results were in!
The team was able to meet our goals in every case, often surpassing expectations. It was relieving to see we did not impact any of our defined SLOs for errors or latency by the end of our initial testing. By improving throughput by 4x we were able to confidently handle failover traffic in a single region during an outage. Overall the cluster scaling had also improved by introducing properly sized resources, along with Istio handling gRPC request load balancing.
While we will always have room for improvement, the Achievers platform now has a flexible framework to continuously load test and measure performance.
This article was originally published on Medium on November 22, 2023.
The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.