CLICK HERE TO DOWNLOAD REPORT ON CODNS REPORT
CODNS Report Transcript
INTRODUCTION Translation of names to network addresses is an essential predecessor to communication in networked systems. The Domain Name System (DNS) performs this translation on the Internet and constitutes a critical component of the Internet infrastructure. While the DNS has sustained the growth of the Internet through static, hierarchical partitioning of the namespace and wide-spread caching, recent increases in malicious behavior, explosion in client population, and the need for fast reconfiguration pose difficult problems. The existing DNS architecture is fundamentally unsuitable for addressing these issues. The foremost problem with DNS is that it is susceptible to denial of service (DoS) attacks.
This vulnerability stems from limited redundancy in name servers, which pro-vide name-address mappings and whose overload, failure or compromise can lead to low performance, failed lookups and misdirected clients. Approximately 80% of the domain names are served by just two name servers and a surprising 0.8% by only one. At the network level, all servers for 32% of the domain names are connected to the Internet through a single gateway, and can thus be compromised by a single failure. The top levels of the hierarchy are served by a relatively small number of servers,
which serve as easy targets for denial of service attacks. A recent DoS attack on the DNS crippled nine of the thirteen root servers at that time, while another recent DoS attack on Microsoft’s DNS servers severely affected the availability of Microsoft’s web services for several hours .DNS name servers are easy targets for malicious agents, partly because approximately 20% of name server implementations contain security flaws that can be exploited to take over the name servers. Second, name-address translation in the DNS incurs long delays. Recent studies have shown that DNS lookup time contributes more than one second for up to 30% of web object retrievals.
The explosive growth of the namespace has decreased the effectiveness of DNS caching. The skewed distribution of names under popular domains, such as .com, has attended the name hierarchy and increased load imbalance. The use of short timeouts for popular map-pings, as is commonly employed by content distribution net-works, further reduces DNS cache hit rates. Further, manual configuration errors, such as lame delegations, can introduce latent performance problems. Finally, widespread caching of mappings in the DNS prohibits fast propagation of unanticipated changes. Since the DNS does not keep track of the locations of cached map-pings, but relies on timeout-based invalidations of stale co-pies, it cannot guarantee cache coherency. Lack of cache coherency in the DNS implies that changes may not be visible to clients for long durations,
effectively preventing quick service relocation in response to attacks or emergencies. Fresh design of the legacy DNS provides an opportunity to address these shortcomings. A replacement for the DNS should exhibit the following properties. High Performance: Decouple the performance of DNS from the number of name servers. Achieve lower latencies than legacy DNS and improve lookup performance in the presence of high loads and unexpected changes in popularity Resilience to Attacks: Remove vulnerabilities in the system and provide resistance against denial of service attacks through decentralization and dynamic load balancing. Self-organize automatically in response to host and network failures.
Fast Update Propagation: Enable changes in name-address mappings to quickly propagate to clients. Support secure delegation to preserve integrity of DNS records, and prohibit rogue nodes from corrupting the system. This paper describes Cooperative Domain Name System (CoDoNS), a backwards-compatible replacement for the legacy DNS that achieves these properties. CoDoNS com-bines two recent advances, namely, structured peer-to-peer overlays and analytically informed proactive caching. Structured peer-to-peer overlays, which create and maintain a mesh of cooperating nodes, have been used previously to implement wide-area distributed hash tables (DHTs). While their self organization, scalability, and failure resilience provide a strong foundation for robust large-scale distributed services, their high lookup costs render them inadequate for demanding,
latency-sensitive applications such as DNS]. CoDoNS achieve high lookup performance on a structured overlay through an analytically-driven proactive caching layer. This layer, called Beehive, automatically replicates the DNS mappings throughout the network to match anticipated demand and provides a strong performance guarantee. Specially, Beehive achieves targeted average lookup latency with a minimum number of replicas. Overall, the combination of Beehive and structured overlays provides the requisite properties for a large scale name service, suitable for deployment over the Internet. 2. DNS: OPERATION AND PROBLEMS 2.1 Overview of Legacy DNS The legacy DNS is organized as a static, distributed tree. The namespace is hierarchically partitioned into non-overlapping regions called domains. For example, cs.cornell.edu is a sub-domain of the domain cornell.edu, which in turn is a sub-domain of the top-level domain edu. Top-level domains are sub-domains of a global root domain .Domain names, such as www.cs.cornell.edu, belong to name-owners Name Resolution in Legacy DNS: Resolvers translate names to addresses by following a chain of Delegations iteratively (2-5) or recursively (6-9). Extensible data structures, called resource records, are used to associate values of different types with domain names. These values may include the corresponding IP address, mail host, owner name and the like. The DNS query interface al-lows these records to be retrieved by a query containing a domain name and a type. The legacy DNS delegates the responsibility for each do-main to a set of replicated name servers called authoritative name servers. The authoritative name servers of a domain manage all information for names in that domain,
keep track of authoritative name servers of the sub-domains rooted at their domain, and are administered by namespace operators. At the top of the legacy DNS hierarchy are root name servers, which keep track of the authoritative name servers for the top level domains (TLDs). The top-level domain namespace consists of generic TLDs (gTLD), such as .com, .edu, and .net, and country-code TLDs (ccTLD), such as .uk, .tr, and .in. Name servers are statically configured with thirteen IP addresses for the root servers. BGP-level anycast is used in parts of the Internet to reroute queries destined for these thirteen IP addresses to a local root server. Resolvers in the legacy DNS operate on behalf of clients to map queries to matching resource records. Clients typically issue DNS queries to local resolvers within their own administrative domain. Resolvers follow a chain of authoritative name servers in order to resolve the query. The local resolver contacts a root name server to _nd the top-level domain name server. It then issues the query to the TLD name server and obtains the authoritative name server of the next sub-domain. The authoritative name server of the sub-domain replies with the response for the query. This process continues recursively or iteratively until the authoritative name server of the queried domain is reached. Figure 1 illustrates the different stages in the resolution of an example domain name www.cs.cornell.edu. While this figure provides a simple overview of the communication involved in name resolution, in practice, each query may trigger additional lookups to resolve intermediate name servers. Pursuing a chain of delegations to resolve a query naturally incurs significant delay. The legacy DNS incorporates aggressive caching in order to reduce the latency of query resolution.
The resolver cache responses to queries they is-sue, and use the cached responses to answer future queries. Since records may change dynamically, legacy DNS provides a weak form of cache coherency through a time-to-live (TTL) field. Each record carries a TTL assigned by the authoritative name server, and may be cached until TTL expires. 2.2 Problems with Legacy DNS Failure Resilience - Bottlenecks The legacy DNS is highly vulnerable to network failures, compromise by malicious agents, and denial of service at-tacks, because domains are typically served by a very small number of name servers. We first examine the delegation bottlenecks in DNS; a delegation bottleneck is the minimum number of name servers in the delegation chain of each do-main that need to be compromised in order to control that domain. Table 1 shows the percentage of domains that are bottlenecked on different numbers of name servers. 78.63% of domains are restricted by two name servers; the minimum recommended by the standard surprisingly, 0.82% of domains are served by only one name server. Even the highly popular domains are not exempt from severe bottlenecks in their delegation chains. Some domains (0.43%) spoof the minimum requirement by having two name servers map to the same IP address.
Overall, over 90% of domain names are served by three or fewer name servers and can be disabled by relatively small-scale DoS attacks. Failure and attack resilience of the legacy DNS is even more limited at the network level. We examined physical bottlenecks, that is, the minimum number of network gateways or routers between clients and name servers that need to be compromised in order to control that domain. We measured the physical bottlenecks by performing trace routes to 10,000 different name servers, which serve about 5,000 randomly chosen domain names, from fifty globally distributed sites on Planet Lab. Figure 2 plots the percentage of domains that have different numbers of bottlenecks at the network level, and shows that about 33% of domains are bottlenecked at a single gateway or router.
While this number is not surprising - domains are typically served by a few name-servers, all located in the same sub-network. It highlights that a large number of domains are vulnerable to network outages. These problems are significant and affect many top level domains and popular web sites. Recently, Microsoft suffered a DoS attack on its name servers that rendered its services unavailable. The primary reason for the success of this attack was that all of Microsoft’s DNS servers were in the same part of the network. Overall, a large portion of the namespace can be compromised by infiltrating a small number of gateways or routers. FIGURE-2 Failure Resilience - Implementation Errors The previous section showed that legacy DNS suffers from limited redundancy and various bottlenecks. In this section, we examine the feasibility of attacks that target these bottle-necks through known vulnerabilities in commonly deployed name servers. Early studies identified several implementation errors in legacy DNS servers that can lead to compromise. While many of these have been fixed, a significant percentage of name servers continue to use buggy implementations.
We surveyed 150,000 name servers to determine if they contain any known vulnerabilities, based on the Berkeley Internet Name Daemon (BIND) exploit list maintained by the Internet Systems Consortium (ISC). Table 2 summarizes the results of this survey. Approximately 18% of servers do not respond to version queries, and about 14% do not report valid BIND versions. About 2% of name serves have the tsig bug, which permits a buffer overflow that can enable malicious agents to gain access to the system. 19% of name serves have the negcache problem that can be exploited to launch a DoS attack by providing negative responses with large TTL value from a malicious name server. Overall, exploiting the bottlenecks identified in the previous section is practical. Table2-Vunrabilities in BIND Performance - Latency Name resolution latency is a significant component of the time required to access web services. . The low performance is due mainly to low cache hit rates, stemming from the heavy tailed, Zipf-like query distribution in DNS. It is well known from studies on Web caching that heavy-tailed query distributions severely limit cache hit rates. Wide-spread deployment of content distribution networks, which perform dynamic server selection, have further strained the performance of the legacy DNS. A study on impact of short TTLs on caching shows that cache hit rates decrease significantly for TTLs lower than fifteen minutes. Another study on the adverse effect of server selection reports that name resolution latency can increase by two orders of magnitude. Performance - Misconfigurations DNS performance is further affected by the presence of a large number of broken (lame) or inconsistent delegations. In our survey,
address resolution failed for about 1.1% of name servers due to timeouts or non-existent records, mostly stemming from spelling errors. For 14% of domains, authoritative name servers returned inconsistent responses; a few authoritative name servers reported that the domain does 334 not exist, while others provided valid records. Failures stemming from lame delegations and timeouts can translate into significant delays for the end-user. Since these failures and inconsistencies largely stem from human errors,
it is clear that manual configuration and administration of such a large scale system is expensive and leads to a fragile structure. Performance - Load Imbalance DNS measurements at root and TLD name servers show that they handle a large load and are frequently subjected to denial of service attacks. A massive distributed DoS attack in November 2002 rendered nine of the thirteen root servers unresponsive. Partly as a result of this attack, the root is now served by more than sixty name servers and is served through special-case support for BGP-level any-cast. While this approach fixes the superficial problem at the topmost level,
the static DNS hierarchy fundamentally implies greater load at the higher levels than the leaves. The special-case handling does not provide automatic replication of the hot spots, and sustained growth in client population will require continued future expansions. In addition to creating exploitable vulnerabilities, load imbalance poses performance problems, especially for lookups higher in the name hierarchy. Update Propagation Large-scale caching in DNS poses problems for maintaining the consistency of cached records in the presence of dynamic changes. Selection of a suitable value for the TTL is an administrative dilemma; short TTLs adversely affect the lookup performance and increase network load, while long TTLs interfere with service relocation. For instance, a popular on line brokerage firm uses a TTL of thirty minutes. Its users do not incur DNS latencies when accessing the brokerage for thirty minutes at a time, but they may experience outages of up to half an hour if the brokerage firm needs to relocate its services in response to an emergency. Nearly 40% of domain names use TTLs of one day or higher, which prohibits fast dissemination of unanticipated changes to records. 3. COOPERATIVE DOMAIN NAMESYSTEM 3.1 Overview of Beehive CoDoNS derives its performance characteristics from a proactive caching layer called Beehive . Beehive is a proactive replication framework that enables prefix-matching DHTs to achieve O(1) lookup performance. In these DHTs both objects and nodes have randomly assigned identifiers from the same circular space, and each object is stored at the nearest node in the identifier space, called the home node. Each node routes a request for an object, say 2101, by successively matching prefixes; that is, by routing the re-quest to a node that matches one more digit with the object until the home node, say 2100, is reached. Overlay routing by matching prefixes in this manner incurs O(logN) hops in the worst case to reach the home node. Figure 3 illustrates the prefix matching routing algorithm in Pastry.
A routing table of O(logN) size provides the overlay node with pointers to nodes with matching prefixes. In a large system, logN translates to several hops across the Internet and is not sufficient to meet the performance requirements of latency critical applications such as DNS. Beehive proposes a novel technique based on controlled proactive caching to reduce the average lookup latency of structured DHTs. Figure 3 illustrates how Beehive applies proactive caching to decrease lookup latency in Pastry.
In the example mentioned above, where a query is issued for the object 2101, Pastry incurs three hops to find a copy of the object. By placing copies of the object at all nodes one hop prior to the home node in the request path, the lookup latency can be reduced by one hop. In this example, the lookup latency can be reduced from three hops to two hops by replicating the object at all nodes that start with 21. Similarly, the lookup latency can be reduced to one hop by replicating the object at all nodes that start with 2. Thus, we can vary the lookup latency of the object between 0 and logN hops by systematically replicating the object more extensively. In Beehive, an object replicated at all nodes with i matching prefixes incurs i hops for a lookup, and is said to be replicated at level i.
0 comments