July 17, 1997
By PETER WAYNER


|
|
“Tens of millions of people are using the Internet, and very few of them
know how truly complex it is.”
Adam Clayton Powell 3d, |
|
|
The error caused e-mail to be returned to senders with messages akin to "address unknown" and also prevented many people from finding Web sites.
Alex Gadea, the President of Vscape International, a Web site hosting company, was among the hundreds of systems people rousted out of bed Thursday morning when server alarms started going off around the world. Gadea's company pays a private service to check his 10 main Web servers every 10 minutes and beep him if they fail to respond. The beeps started at 5:30 in the morning.
The problem occurred when the database of the top level domain name server (DNS) became corrupted. This database acts as the telephone book for the Internet, matching names like e-mail addresses and Web URLs to four-byte numbers known as IP (for Internet Protocol) addresses, which steer data through the Internet to the appropriate computer.
For instance, the name "www.nytimes.com" corresponds to nine different computers that answer requests for The New York Times on the Web, one of which is 199.181.172.242. This four-byte address is used by the Internet in a manner roughly analogous to a nine-digit zip code to direct packets of information directly to the correct computer.
Each day at 2:30 a.m., Network Solutions updates its directory, known as a "lookup table," to reflect the changes made during the day. These include adding new addresses for anyone who has registered a new domain name and deleting addresses for sites than have surrendered their names. Network Solutions also shuts down the names of entities that haven't paid the $100 fee required to register the name.
In this case, the database became corrupted when it was regenerated, effectively wiping out all of the more than one million companies in the .com and .net domains. The .edu, .gov and .org domains, which are also administered by Network Solutions were not affected.
As a result, when a computer queried Network Solutions for the IP address of a service like www.nytimes.com, it was told that the site didn't exist, because no record was found.
Dave Grave, the Internet Business Manager of Network Solutions, said that the program responsible for converting the Ingres database information into the tables used by the DNS software failed. Internal error-checking software detected the problem, but the employee responsible for watching over the process ignored the warning and released the corrupted database to the network.
The problem was complicated by the fact that there are nine major top level servers responsible for answering calls to look up domain names and return the corresponding IP address. These are run by institutions both public and private throughout the network as a service. One of the top level servers is run by Network Solutions and is the authoritative record, but the other eight make copies of the Network Solutions database each night soon after the day's changes are made.
Grave says that Network Solutions detected the problem and corrected it by 6:30 a.m. Eastern Time. The system administrators of the other eight machines, however, had to be notified individually to update their versions.
|
|
In the four hours it took to repair one simple mistake, a problem had spread netwide. | |
|
Adding to the resulting chaos was the fact that the impact of the problem on individual users was literally random. Each computer tries to keep a cache of the latest IP addresses it requested. People who sent e-mail to -- or requested Web documents from -- common locations they frequently used may have never realized there was a problem with the Internet, because their own machines had stored the IP addresses they were using.
But if the local machine does not have the IP address cached, or if it decides that its current copy is too old to be trustworthy, it will send a request to a higher level DNS server, usually at the level of the Internet service provider. If this server does not have the IP address in question, it will kick it up another level.
Eventually, the request will reach the highest level, which is one of the nine top level machines named with the letters A through I. Older software will simply choose one of the nine machines at random for its request and then search through the nine machines in a round-robin fashion. Newer versions of the software will attempt to balance the load by looking for the machine that responds the fastest.
The responses each user received this morning depended upon which of the top level machines processed their requests for IP addresses. If the request happened to reach a machine loaded with the corrupted database, their request came back address unknown. If it reached a machine with the corrected version, the request was handled successfully. This explains why the problem was often sporadic and affected people randomly throughout the morning.
Asim Mughal is responsible for the E root server as the technical manager for network systems at NASA's Ames Research center at Moffet Field outside San Jose Calif. He said that NASA's server takes more than an hour to download the latest copies of the tables published by the A server at Network Solutions. His server checks the A server every three hours to determine if a new edition of the tables has been released. If one is available, it downloads a new copy. Each root server waits a different amount of time between checks, which is why some took longer than others to be updated.
The nine top level machines run versions of the Berkeley Internet Name Daemon (BIND), and the structure of this software was partially responsible for the magnitude of the problem. When each of the eight subordinate top level servers (servers B through I) ask the primary machine (server A) for a copy, BIND sends a complete copy of the database instead of just a list of the changed records. The .com database is about 142 megabytes and the .net database is about 10 megabytes.
The fact that a simple error could wreak so much havoc with the global computer network caused many people to reflect on how crucial the Internet has become in the daily life of businesses and individuals around the world and how vulnerable the system is.
"Tens of millions of people are using the Internet, and very few of them know how truly complex it is, almost like how people who fly on jet planes don't realize how complex that technology is," said Adam Clayton Powell 3d, head of the Freedom Forum, a New-York based foundation that studies new-media journalism. "The Web is deceptively simple. It's so easy to use it makes you think it's simpler than it really is. Even e-mail takes more work than it seems. Although it's easy and transparent to the user, there's a lot of technology."
Business Problems
he glitch at Network Solutions is sure to compound the problems of the company, which has been a lightning rod for complaints since
it began charging $100 for registering a domain name last year. Previously, registration had been free under the company's contract with the National Science Foundation.
The fees can be quite lucrative. In the three months ending March 31, Network Solutions reported gross revenues of $8.6, 76.5 percent of which came from fees for registering domain names. This number does not include the 30 percent of the fees that are set aside in a fund for research into improving the Internet's infrastructure. The rest came from fees for setting up and running intranets for companies.
The company is wholely owned by SAIC, the technical powerhouse that receives much of its revenues from government contracts. After SAIC assumed control of Network Solutions, SAIC absorbed some of Network Solutions' government contracts and began to spin off Network Solutions to handle the commercial side of the Internet.
This information comes from the company's S-1 form filed on July 3 with the Securities and Exchange Commission, announcing an initial public offering led by Hambrecht and Quist. If the offering is successful, SAIC will still maintain effective control of Network Solutions.
Many people on the Internet assert that competition would substantially reduce the cost of the registering domain names. The structure of the Internet makes this difficult, because most machines turn to the nine root servers for the ultimate
answer to requests for address queries, and these machines mirror the list of people who have paid Network Solutions $100.
|
|
“We're going to see this type of problem replicated ad nauseum.”
Peter Neumann, |
|
|
Some rebels are fighting back with stunts and by providing competitive services. Eugene Kashpureff, who runs the AlterNIC service for some of his network customers out of lower Manhattan, boasts that the 2 percent of the network that use his machines as their root server did not have problems on Thursday, because his database was not corrupted.
In the past, Kashpureff has been a vocal critic of Network Solutions and has pushed hard for the establishment of alternative domains with three-letter endings like ".nic", ".ltd" or ".xxx". He charges only $24 per year to maintain registration in these domains, about half of the $50 per year charged by Network Solutions.
The major problem with paying Kashpureff for registration in his databases is that most DNS servers don't know that these top-level domains exist. Earlier this week, Kashpureff pulled a stunt by fooling many servers around the nation into updating themselves to include his top-level domains. Network Solutions said that the problems this evening were not related in any way to Kashpureff's stunt.
Karl Denninger, the president of the Chicago-based ISP MCSNET, is pushing for a different approach, which he calls "enhanced DNS," or eDNS. His company maintains a list of top-level domains like '.nic' or '.xxx' that are backed by real companies and real efforts. He makes an effort to screen out fakers who are simply hoping to reserve a top-level name, and he does not charge anything for a listing in the table. He feels that this quiet, efficient approach will draw the affections of more Internet users with time.
In its S-1 filing, Network Solutions acknowledges that the competition from companies like the alter.nic is a potential problem. Each ISP chooses its own root domain server, and some may choose use the AlterNIC's machine because it provides more access to a greater number of top-level domains. Companies that point to Network Solution's server don't get access to URLs that end with the three letters .xxx, for instance.
This problem has proven to be a bit of a legal morass for Network Solutions. On March 20, the company announced that PG Media, a New York-based company, had sued, "alleging that the company had restricted access to the Internet by not adding [top level domains] in violation of the Sherman Act." But the National Science Foundation requested that Network Solutions not recognize these domains until a coherent policy emerged.
Worries About Terrorism
n many ways, this shows how the automated world of computers reflects the epigram about how the "want of a nail" resulted
in losing a war. A small human error at Network Solutions led to the failure of many Web sites and the loss of a great deal of e-mail.
The vulnerability of the nation to accidents like these has attracted the attention of the Pentagon, which is actively investigating the 21st century notion of information warfare. Knocking out a nation's DNS service would be one way to cripple it in time of attack.
Network Solutions and many others companies are actively investigating how to defend against such problems in the future. One solution would be to bind a public key to each name, which would allow companies to control how their domain names were used. In the past, some people have masqueraded as others and directed Network Solutions to redirect mail and requests to their machine.
Peter Neumann, a scientist at SRI, a nonprofit research foundation in Menlo Park, Calif., and the editor of the highly influential comp.risks newsgroup, calls Thursday's wipeout an example of "inadvertigo" -- when a small, inadvertant problem spirals out of control. "We're going to see this type of problem replicated ad nauseum," Neumann predicted.
Peter Wayner at [email protected] welcomes your comments and suggestions.
Related Sites
Following are links to the external Web sites mentioned in this article. These sites are not part of The New York Times on the Web, and The Times has no control over their content or availability. When you have finished visiting any of these sites, you will be able to return to this page by clicking on your Web browser's "Back" button or icon until this page reappears.