Linode has had a tough time of it lately, from a data breach to crippling DDoS attacks over the recent Christmas period. That being said, and despite some self-criticism in their update, they now seem to be on top of everything. Furthermore, with significant upgrades planned for their Network, they should be in an excellent position to mitigate future attacks.
The DDoS attacks first commenced on Dec. 26, 2015, with the first major update by Linode on its severity on Dec. 31, 2015. The first announcement that the DDoS attacks had been successfully mitigated came in a short incident report on Jan. 6, 2016. That being said, the following weeks saw multiple other DDoS attacks, causing connectivity issues at various data centers, but each being mitigated within hours, presumably due to the further safeguards already put in place by Linode.
Now that the position is firmly under control, Linode has provided a detailed review of what happened, what went wrong, and what they are doing to ensure more resilience for future attacks.
Attacks targeted multiple infrastructure points
The DDoS attacks evolved throughout, presumably to adapt to any mitigation carried out by Linode. We shall cover each of the different attack vectors in turn, but first Linode have provided helpful diagram showing the various infrastructure points:
- Layer 7 (“400 Bad Request”) attacks toward their public-facing websites — These can be difficult to mitigate, as Layer 7 traffic typically seeks to mimic human behavior. In this case, Linode indicates they carried out tasks on the server to repeatedly make bad requests that exhaust server resources to cause the server to become unresponsive.
- Volumetric attacks toward their websites, authoritative nameservers, and other public services — This is where a significant volume of traffic is directed toward a particular IP or website. The volume of traffic overwhelms the limited resources of the server, causing it to become unresponsive.
- Volumetric attacks toward Linode network infrastructure — In a change of vector for the attack, the DDoS targeted "Secondary Addresses". The way it works is that Linode segments customers into individual /24 subnets, which then requires secondary addresses inside the subnet for each of the individual customers. This means there are hundreds of secondary IP addresses that could be used as a vector for the attack. While it is common for any one of these secondary addresses to be attacked, and is easily mitigated by null-routing or
black holing) the IP, causing just the one customer to have issues, in this case, many different secondary IP's were attacked simultaneously. This was further complicated that the upstream provider was only able to accept a limited number of "black hole" advertisements, causing the issue. It then took several days of "cat-and-mouse" games to black hole all the affected secondary addresses or drop the traffic at the edges of their transit networks.
- Volumetric attacks toward their colocation provider’s network infrastructure — While the attacks against the collocation provider were equally simple in theory to mitigate, in practice having to communicate with third parties, sometimes four degrees removed, caused significant delays in dealing with the issue. The attacks specifically targeted "cross-connects", which is essentially a physical link between multiple routers, and it was these IP addresses which were targeted. This caused the longest outage of 30 hours being felt in the Atlanta data center. Linode shows some frustration here, describing some transit providers as "stubborn".
Lessons Learned from the Linode DDoS attacks
Linode goes on to describe three major lessons that they learned:
- Don’t depend on middlemen — Linode relied on their collocation partners for their IP transit, but Linode believe that by doing so, the attacks were significantly harder to mitigate. They gave two reasons for this. Firstly, they believed that their collocation partners had more IP transit capacity than was the case, which caused the whole Linode Network to be de-peered by the collocation partner. Secondly, some of the most sophisticated attacks required involvement from senior network engineers from multiple Tier 1 providers. Having to contact them through their collocation partner on a Christmas holiday weekend, added extra barriers and delays to getting the issues resolved.
- Absorb larger attacks — Linode's strategy was never to use more than 50 percent of their resources. With smaller data centers only having a capacity of 40 Gbps, this would only ever leave 20 Gbps spare to deal with any additional traffic (such as traffic from a DDoS). With DDoS now becoming larger, it doesn't give any options when dealing with an 80 Gbps DDoS attack.
- Let customers know what’s happening — Linode comments "It’s important that we acknowledge when we fail, and our lack of detailed communication during the early days of the attack was a big failure." They now implemented a designated technical point-person that will be responsible for providing detail information to customers during any serious events such as this.
Linode to upgrade their infrastructure
In light of all the issues, and lessons learned, Linode are carrying out the following improvements to their infrastructure:
- Their front facing servers now have powerful CloudFlare based DDoS mitigation to keep them active no matter what.
- Linode's nameservers are also protected by CloudFlare.
- All emergency mitigation techniques used throughout this crisis are now permanent.
- At each location, the amount of transit and peering capacity will be increased to 200 Gbps, from multiple major regional points of presence, into each of Linode's locations.
Compared to our existing architecture, the benefits of this upgrade are obvious. We will be taking control of our entire infrastructure, right up to the very edge of the internet. This means that, rather than depending on middlemen for IP transit, we will be in direct partnership with the carriers who we depend on for service. Additionally, Linode will quintuple the amount of bandwidth available to us currently, allowing us to absorb extremely large DDoS attacks until properly mitigated. As attack sizes grow in the future, this architecture will quickly scale to meet their demands without any major new capital investment.
Additionally, Linode will quintuple the amount of bandwidth available to us currently, allowing us to absorb extremely large DDoS attacks until properly mitigated. As attack sizes grow in the future, this architecture will quickly scale to meet their demands without any major new capital investment.
Customer's rally round Linode during their troubles
Clearly, Linode has taken the recent disruption to its services, and reputation extremely seriously. This response, and how Linode have dealt with the matter seems to have gone down very well with customers, with many comments praising them for their service:
Can’t thank the Linode team enough for your dedication. The livelihood of thousands rest in your hands, I feel like this whole event further proves how well qualified you guys are to be doing what you’re doing.
... this is a huge thing to us. I was honestly feeling that it was going the usual corporate way with silence and deniability, just waiting for the furor to die down. It really makes a difference to hear not only the details of the response/mitigation activities, which we appreciate, but also acknowledgment of the position we were put into when communication was sparse.
Thanks for this post, Alex. This was a rough period for everyone involved and affected, but I am extremely impressed by Linode making the effort to hopefully prevent the same scenario from happening again.