Zen and the Art of the Internet
Part 1
Part A
Zen and the Art of the Internet
Copyright (c) 1992 Brendan P. Kehoe
Permission is granted to make and distribute verbatim copies of this guide provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this booklet under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this booklet into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the author.
Zen and the Art of the Internet A Beginner's Guide to the Internet First Edition January 1992
by Brendan P. Kehoe
This is revision 1.0 of February 2, 1992. Copyright (c) 1992 Brendan P. Kehoe
The composition of this booklet was originally started because the Computer Science department at Widener University was in desperate need of documentation describing the capabilities of this "great new Internet link" we obtained.
It's since grown into an effort to acquaint the reader with much of what's currently available over the Internet. Aimed at the novice user, it attempts to remain operating system "neutral"---little information herein is specific to Unix, VMS, or any other environment. This booklet will, hopefully, be usable by nearly anyone.
A user's session is usually offset from the rest of the paragraph, as such:
prompt> command The results are usually displayed here.
The purpose of this booklet is two-fold: first, it's intended to serve as a reference piece, which someone can easily grab on the fly and look something up. Also, it forms a foundation from which people can explore the vast expanse of the Internet. Zen and the Art of the Internet doesn't spend a significant amount of time on any one point; rather, it provides enough for people to learn the specifics of what his or her local system offers.
One warning is perhaps in order---this territory we are entering can become a fantastic time-sink. Hours can slip by, people can come and go, and you'll be locked into Cyberspace. Remember to do your work!
With that, I welcome you, the new user, to The Net.
[email protected] Chester, PA
Acknowledgements
Certain sections in this booklet are not my original work---rather, they are derived from documents that were available on the Internet and already aptly stated their areas of concentration. The chapter on Usenet is, in large part, made up of what's posted monthly to news.announce.newusers, with some editing and rewriting. Also, the main section on archie was derived from whatis.archie by Peter Deutsch of the McGill University Computing Centre. It's available via anonymous FTP from archie.mcgill.ca. Much of what's in the telnet section came from an impressive introductory document put together by SuraNet. Some definitions in the one are from an excellent glossary put together by Colorado State University.
This guide would not be the same without the aid of many people on The Net, and the providers of resources that are already out there. I'd like to thank the folks who gave this a read-through and returned some excellent comments, suggestions, and criticisms, and those who provided much-needed information on the fly. Glee Willis deserves particular mention for all of his work; this guide would have been considerably less polished without his help.
Andy Blankenbiller <[email protected]> Andy Blankenbiller, Army at Aberdeen
[email protected] Alan Emtage, McGill University Computer Science Department
Brian Fitzgerald <[email protected]> Brian Fitzgerald, Rensselaer Polytechnic Institute
John Goetsch <[email protected]> John Goetsch, Rhodes University, South Africa
[email protected] Jeff Kellem, Boston University's Chemistry Department
[email protected] Bill Krauss, Moravian College
Steve Lodin <[email protected]> Steve Lodin, Delco Electronics
Mike Nesel <[email protected]> Mike Nesel, NASA
Bob <[email protected]> Bob Neveln, Widener University Computer Science Department
[email protected] (Wanda Pierce) Wanda Pierce, McGill University Computing Centre
[email protected] Joshua Poulson, Widener University Computing Services
[email protected] Dave Sill, Oak Ridge National Laboratory
[email protected] Bob Smart, CitiCorp/TTI
[email protected] Ed Vielmetti, Vice President of MSEN
Craig E. Ward <[email protected]> Craig Ward, USC/Information Sciences Institute (ISI)
Glee Willis <[email protected]> Glee Willis, University of Nevada, Reno
Charles Yamasaki <[email protected]> Chip Yamasaki, OSHA
Network Basics
We are truly in an information society. Now more than ever, moving vast amounts of information quickly across great distances is one of our most pressing needs. From small one-person entrepreneurial efforts, to the largest of corporations, more and more professional people are discovering that the only way to be successful in the '90s and beyond is to realize that technology is advancing at a break-neck pace---and they must somehow keep up. Likewise, researchers from all corners of the earth are finding that their work thrives in a networked environment. Immediate access to the work of colleagues and a "virtual" library of millions of volumes and thousands of papers affords them the ability to encorporate a body of knowledge heretofore unthinkable. Work groups can now conduct interactive conferences with each other, paying no heed to physical location---the possibilities are endless.
You have at your fingertips the ability to talk in "real-time" with someone in Japan, send a 2,000-word short story to a group of people who will critique it for the sheer pleasure of doing so, see if a Macintosh sitting in a lab in Canada is turned on, and find out if someone happens to be sitting in front of their computer (logged on) in Australia, all inside of thirty minutes. No airline (or tardis, for that matter) could ever match that travel itinerary.
The largest problem people face when first using a network is grasping all that's available. Even seasoned users find themselves surprised when they discover a new service or feature that they'd never known even existed. Once acquainted with the terminology and sufficiently comfortable with making occasional mistakes, the learning process will drastically speed up.
Domains
Getting where you want to go can often be one of the more difficult aspects of using networks. The variety of ways that places are named will probably leave a blank stare on your face at first. Don't fret; there is a method to this apparent madness.
If someone were to ask for a home address, they would probably expect a street, apartment, city, state, and zip code. That's all the information the post office needs to deliver mail in a reasonably speedy fashion. Likewise, computer addresses have a structure to them. The general form is:
a person's email address on a computer: [email protected] a computer's name: somewhere.domain
The user portion is usually the person's account name on the system, though it doesn't have to be. somewhere.domain tells you the name of a system or location, and what kind of organization it is. The trailing domain is often one of the following:
com Usually a company or other commercial institution or organization, like Convex Computers (convex.com).
edu An educational institution, e.g. New York University, named nyu.edu.
gov A government site; for example, NASA is nasa.gov.
mil A military site, like the Air Force (af.mil).
net Gateways and other administrative hosts for a network (it does not mean all of the hosts in a network). {The Matrix, 111. One such gateway is near.net.}
org This is a domain reserved for private organizations, who don't comfortably fit in the other classes of domains. One example is the Electronic Frontier Foundation named eff.org.
Each country also has its own top-level domain. For example, the us domain includes each of the fifty states. Other countries represented with domains include:
au Australia ca Canada fr France uk The United Kingdom. These also have sub-domains of things like ac.uk for academic sites and co.uk for commercial ones.
FQDN (Fully Qualified Domain Name)
The proper terminology for a site's domain name (somewhere.domain above) is its Fully Qualified Domain Name (FQDN). It is usually selected to give a clear indication of the site's organization or sponsoring agent. For example, the Massachusetts Institute of Technology's FQDN is mit.edu; similarly, Apple Computer's domain name is apple.com. While such obvious names are usually the norm, there are the occasional exceptions that are ambiguous enough to mislead---like vt.edu, which on first impulse one might surmise is an educational institution of some sort in Vermont; not so. It's actually the domain name for Virginia Tech. In most cases it's relatively easy to glean the meaning of a domain name---such confusion is far from the norm.
Internet Numbers
Every single machine on the Internet has a unique address, {At least one address, possibly two or even three---but we won't go into that.} called its Internet number or IP Address. It's actually a 32-bit number, but is most commonly represented as four numbers joined by periods (.), like 147.31.254.130. This is sometimes also called a dotted quad; there are literally thousands of different possible dotted quads. The ARPAnet (the mother to today's Internet) originally only had the capacity to have up to 256 systems on it because of the way each system was addressed. In the early eighties, it became clear that things would fast outgrow such a small limit; the 32-bit addressing method was born, freeing thousands of host numbers.
Each piece of an Internet address (like 192) is called an "octet," representing one of four sets of eight bits. The first two or three pieces (e.g. 192.55.239) represent the network that a system is on, called its subnet. For example, all of the computers for Wesleyan University are in the subnet 129.133. They can have numbers like 129.133.10.10, 129.133.230.19, up to 65 thousand possible combinations (possible computers).
IP addresses and domain names aren't assigned arbitrarily---that would lead to unbelievable confusion. An application must be filed with the Network Information Center (NIC), either electronically (to [email protected]) or via regular mail.
Resolving Names and Numbers
Ok, computers can be referred to by either their FQDN or their Internet address. How can one user be expected to remember them all?
They aren't. The Internet is designed so that one can use either method. Since humans find it much more natural to deal with words than numbers in most cases, the FQDN for each host is mapped to its Internet number. Each domain is served by a computer within that domain, which provides all of the necessary information to go from a domain name to an IP address, and vice-versa. For example, when someone refers to foosun.bar.com, the resolver knows that it should ask the system foovax.bar.com about systems in bar.com. It asks what Internet address foosun.bar.com has; if the name foosun.bar.com really exists, foovax will send back its number. All of this "magic" happens behind the scenes.
Rarely will a user have to remember the Internet number of a site (although often you'll catch yourself remembering an apparently obscure number, simply because you've accessed the system frequently). However, you will remember a substantial number of FQDNs. It will eventually reach a point when you are able to make a reasonably accurate guess at what domain name a certain college, university, or company might have, given just their name.
The Networks
Internet The Internet is a large "network of networks." There is no one network known as The Internet; rather, regional nets like SuraNet, PrepNet, NearNet, et al., are all inter-connected (nay, "inter-networked") together into one great living thing, communicating at amazing speeds with the TCP/IP protocol. All activity takes place in "real-time."
UUCP The UUCP network is a loose association of systems all communicating with the UUCP protocol. (UUCP stands for `Unix-to-Unix Copy Program'.) It's based on two systems connecting to each other at specified intervals, called polling, and executing any work scheduled for either of them. Historically most UUCP was done with Unix equipment, although the software's since been implemented on other platforms (e.g. VMS). For example, the system oregano polls the system basil once every two hours. If there's any mail waiting for oregano, basil will send it at that time; likewise, oregano will at that time send any jobs waiting for basil.
BITNET BITNET (the "Because It's Time Network") is comprised of systems connected by point-to-point links, all running the NJE protocol. It's continued to grow, but has found itself suffering at the hands of the falling costs of Internet connections. Also, a number of mail gateways are in place to reach users on other networks.
The Physical Connection
The actual connections between the various networks take a variety of forms. The most prevalent for Internet links are 56k leased lines (dedicated telephone lines carrying 56kilobit-per-second connections) and T1 links (special phone lines with 1Mbps connections). Also installed are T3 links, acting as backbones between major locations to carry a massive 45Mbps load of traffic.
These links are paid for by each institution to a local carrier (for example, Bell Atlantic owns PrepNet, the main provider in Pennsylvania). Also available are SLIP connections, which carry Internet traffic (packets) over high-speed modems.
UUCP links are made with modems (for the most part), that run from 1200 baud all the way up to as high as 38.4Kbps. As was mentioned in The Networks, the connections are of the store-and-forward variety. Also in use are Internet-based UUCP links (as if things weren't already confusing enough!). The systems do their UUCP traffic over TCP/IP connections, which give the UUCP-based network some blindingly fast "hops," resulting in better connectivity for the network as a whole. UUCP connections first became popular in the 1970's, and have remained in wide-spread use ever since. Only with UUCP can Joe Smith correspond with someone across the country or around the world, for the price of a local telephone call.
BITNET links mostly take the form of 9600bps modems connected from site to site. Often places have three or more links going; the majority, however, look to "upstream" sites for their sole link to the network.
"The Glory and the Nothing of a Name" Byron, {Churchill's Grave}
Electronic Mail
The desire to communicate is the essence of networking. People have always wanted to correspond with each other in the fastest way possible, short of normal conversation. Electronic mail (or email) is the most prevalent application of this in computer networking. It allows people to write back and forth without having to spend much time worrying about how the message actually gets delivered. As technology grows closer and closer to being a common part of daily life, the need to understand the many ways it can be utilized and how it works, at least to some level, is vital. part of daily life (as has been evidenced by the ISDN effort, the need to understand the many ways it can be utilized and how it works, at least to some level, is vital.
Email Addresses
Electronic mail is hinged around the concept of an address; the section on Networking Basics made some reference to it while introducing domains. Your email address provides all of the information required to get a message to you from anywhere in the world. An address doesn't necessarily have to go to a human being. It could be an archive server, {See Archive Servers, for a description.} a list of people, or even someone's pocket pager. These cases are the exception to the norm---mail to most addresses is read by human beings.
%@!.: Symbolic Cacophony
Email addresses usually appear in one of two forms---using the Internet format which contains @, an "at"-sign, or using the UUCP format which contains !, an exclamation point, also called a "bang." The latter of the two, UUCP "bang" paths, is more restrictive, yet more clearly dictates how the mail will travel.
To reach Jim Morrison on the system south.america.org, one would address the mail as [email protected]. But if Jim's account was on a UUCP site named brazil, then his address would be brazil!jm. If it's possible (and one exists), try to use the Internet form of an address; bang paths can fail if an intermediate site in the path happens to be down. There is a growing trend for UUCP sites to register Internet domain names, to help alleviate the problem of path failures.
Another symbol that enters the fray is %---it acts as an extra "routing" method. For example, if the UUCP site dream is connected to south.america.org, but doesn't have an Internet domain name of its own, a user debbie on dream can be reached by writing to the address not smallexample!
debbie%[email protected]
The form is significant. This address says that the local system should first send the mail to south.america.org. There the address debbie%dream will turn into debbie@dream, which will hopefully be a valid address. Then south.america.org will handle getting the mail to the host dream, where it will be delivered locally to debbie.
All of the intricacies of email addressing methods are fully covered in the book "!%@@:: A Directory of Electronic Mail Addressing and Networks" published by O'Reilly and Associates, as part of their Nutshell Handbook series. It is a must for any active email user. Write to [email protected] for ordering information.
Sending and Receiving Mail
We'll make one quick diversion from being OS-neuter here, to show you what it will look like to send and receive a mail message on a Unix system. Check with your system administrator for specific instructions related to mail at your site.
A person sending the author mail would probably do something like this:
% mail [email protected] Subject: print job's stuck
I typed `print babe.gif' and it didn't work! Why??
The next time the author checked his mail, he would see it listed in his mailbox as:
% mail "/usr/spool/mail/brendan": 1 messages 1 new 1 unread U 1 [email protected] Tue May 5 20:36 29/956 print job's stuck ?
which gives information on the sender of the email, when it was sent, and the subject of the message. He would probably use the reply command of Unix mail to send this response:
? r To: joeuser@@foo.widener.edu Subject: Re: print job's stuck
You shouldn't print binary files like GIFs to a printer!
Brendan
Try sending yourself mail a few times, to get used to your system's mailer. It'll save a lot of wasted aspirin for both you and your system administrator.
Anatomy of a Mail Header
An electronic mail message has a specific structure to it that's common across every type of computer system. {The standard is written down in RFC-822. See also RFCs for more info on how to get copies of the various RFCs.} A sample would be:
>From [email protected] Sat May 25 17:06:01 1991 Received: from hq.mil by house.gov with SMTP id AA21901 (4.1/SMI for [email protected]); Sat, 25 May 91 17:05:56 -0400 Date: Sat, 25 May 91 17:05:56 -0400 From: The President <[email protected]> Message-Id: <[email protected]> To: [email protected] Subject: Meeting
Hi Dan .. we have a meeting at 9:30 a.m. with the Joint Chiefs. Please don't oversleep this time.
The first line, with From and the two lines for Received: are usually not very interesting. They give the "real" address that the mail is coming from (as opposed to the address you should reply to, which may look much different), and what places the mail went through to get to you. Over the Internet, there is always at least one Received: header and usually no more than four or five. When a message is sent using UUCP, one Received: header is added for each system that the mail passes through. This can often result in more than a dozen Received: headers. While they help with dissecting problems in mail delivery, odds are the average user will never want to see them. Most mail programs will filter out this kind of "cruft" in a header.
The Date: header contains the date and time the message was sent. Likewise, the "good" address (as opposed to "real" address) is laid out in the From: header. Sometimes it won't include the full name of the person (in this case The President), and may look different, but it should always contain an email address of some form.
The Message-ID: of a message is intended mainly for tracing mail routing, and is rarely of interest to normal users. Every Message-ID: is guaranteed to be unique.
To: lists the email address (or addresses) of the recipients of the message. There may be a Cc: header, listing additional addresses. Finally, a brief subject for the message goes in the Subject: header.
The exact order of a message's headers may vary from system to system, but it will always include these fundamental headers that are vital to proper delivery.
Bounced Mail
When an email address is incorrect in some way (the system's name is wrong, the domain doesn't exist, whatever), the mail system will bounce the message back to the sender, much the same way that the Postal Service does when you send a letter to a bad street address. The message will include the reason for the bounce; a common error is addressing mail to an account name that doesn't exist. For example, writing to Lisa Simpson at Widener University's Computer Science department will fail, because she doesn't have an account. {Though if she asked, we'd certainly give her one.}
From: Mail Delivery Subsystem Date: Sat, 25 May 91 16:45:14 -0400 To: [email protected] Cc: [email protected] Subject: Returned mail: User unknown
Transcript of session follows ----- While talking to cs.widener.edu: >>> RCPT To:<[email protected]> <<< 550 <[email protected]>... User unknown 550 lsimpson... User unknown
As you can see, a carbon copy of the message (the Cc: header entry) was sent to the postmaster of Widener's CS department. The Postmaster is responsible for maintaining a reliable mail system on his system. Usually postmasters at sites will attempt to aid you in getting your mail where it's supposed to go. If a typing error was made, then try re-sending the message. If you're sure that the address is correct, contact the postmaster of the site directly and ask him how to properly address it.
The message also includes the text of the mail, so you don't have to retype everything you wrote.
Unsent message follows ----- Received: by cs.widener.edu id AA06528; Sat, 25 May 91 16:45:14 -0400 Date: Sat, 25 May 91 16:45:14 -0400 From: Matt Groening <[email protected]> Message-Id: <[email protected]> To: [email protected] Subject: Scripting your future episodes Reply-To: [email protected]
.... verbiage ...
The full text of the message is returned intact, including any headers that were added. This can be cut out with an editor and fed right back into the mail system with a proper address, making redelivery a relatively painless process.
Mailing Lists