A few months ago, LinkedIn surpassed the 50% IPv6 traffic milestone. In this post, we will look into the methodology we adopted to measure performance as we enabled IPv6 on our content delivery networks (CDNs), and share some key results of our performance analysis. We hope this information will help readers who are undertaking similar networking changes.
The Edge SRE team runs LinkedIn’s edge infrastructure, including four external CDNs, an in-house CDN, three Domain Name System (DNS) platforms, and all LinkedIn points of presence (PoPs). We build and manage tools to automate all aspects of our Edge stack.
We began to transfer our network traffic from IPv4 to IPv6 for several reasons, including the fact that the internet is running out of IPv4 addresses, and that IPv6 can be faster than IPv4, especially on mobile networks (the source of a majority of our member traffic). In 2013, we enabled IPv6 dual stack on our production mail servers. In 2014, we enabled it across all our data centers and CDNs, except for our CDNs in China. However, due to limited IPv6 coverage on some of our CDNs, the performance of their dual stack networks was not on par with the IPv4-only networks. In 2016, LinkedIn onboarded two new CDN partners, but we decided to hold off on enabling IPv6 until we had analyzed the performance of their dual stack networks and addressed any issues that we found.
Enabling IPv6 on a third-party CDN is trivial. A CDN will typically convert your provisioned CNAME to support a dual stack configuration or will provide a new CNAME that is dual stack enabled. For our pre-ramp performance analysis, our CDNs provided dual stacked test CNAMEs to work with.
Our objective was to enable IPv6 without impacting our members. Site reliability is important to LinkedIn and part of our “Members First” company values. We wanted to ensure that there was no negative impact to member experience on the site as a result of us starting to serve content over dual stack networks.
We leveraged a mix of third-party real-user measurement (RUM) using Cedexis and synthetic monitoring (Catchpoint) during the pre-ramp phase:
We used Cedexis to measure member performance and availability of a test object on our CDNs. We grouped our results by country and then by ASNs that carry a majority of LinkedIn’s traffic.
Catchpoint was used to dig deeper into performance and availability issues that surfaced during the experiment.
We uncovered a number of possible member-impacting issues over the course of this testing.
We noticed DNS resolution issues over IPv6 with one of our CDN partners in a major region in India. We worked closely with the provider on this issue, and they eventually set up an IPv6-enabled DNS PoP in the region, making resolution times significantly better.