The web, a multilingual encyclopedia
Part 1
THE WEB, A MULTILINGUAL ENCYCLOPEDIA
MARIE LEBERT, 2012
TABLE OF CONTENTS
1974 > The internet “took off” 1990 > The invention of the web 1990 > The LINGUIST List 1991 > From ASCII to Unicode 1994 > Travlang, travel and languages 1995 > The Internet Dictionary Project 1995 > NetGlos, a glossary of the internet 1995 > Various languages on our screen 1995 > Global Reach, promoting localization 1996 > OneLook Dictionaries, a “fast finder” 1997 > 82.3% of the web in English 1997 > The internet, a tool for minority languages 1997 > A European terminology database 1997 > Babel Fish, a free translation software 1997 > The tools of the translation company Logos 1997 > Specialized terminology databases 1998 > The need for a “linguistic democracy” 1999 > Bilingual dictionaries in WordReference.com 1999 > The internet, a mandatory tool for translators 1999 > The need for bilingual information online 2000 > Online encyclopedias and dictionaries 2000 > The web portal yourDictionary.com 2000 > Project Gutengerg and languages 2001 > Wikipedia, a collaborative encyclopedia 2001 > UNL, a digital metalanguage project 2001 > A market for language translation software 2004 > The web 2.0, community and sharing 2007 > The ISO 639-3 standard to identify languages 2007 > Google Translate 2009 > 6,909 languages in the Ethnologue 2010 > A UNESCO atlas for endangered languages
INTRODUCTION
"The web will be an encyclopedia of the world by the world for the world. There will be no information or knowledge that anyone needs that will not be available. The major hindrance to international and interpersonal understanding, personal and institutional enhancement, will be removed. It would take a wilder imagination than mine to predict the effect of this development on the nature of humankind." (Robert Beard, founder of A Web of Online Dictionaries, september 1998)
This book is a chronology in 31 chapters from 1974 to 2010. Many thanks to all those who are quoted here, for their time and their friendship. Unless specified otherwise, the quotes are excerpts from the interviews conducted by the author during several years and published in the same collection.
1974 > THE INTERNET "TOOK OFF"
[Summary] The internet “took off” in 1974 with the creation of TCP/IP (Transmission Control Protocol / Internet Protocol) by Vinton Cerf and Bob Kahn, fifteen years before the invention of the web. The internet expanded as a network linking U.S. governmental agencies, universities and research centers, before spreading worldwide in 1983. The internet got its first boost in 1990 with the invention of the web by Tim Berners-Lee, and its second boost in 1993 with the release of Mosaic, the first browser for the general public. The Internet Society (ISOC) was founded in 1992 by Vinton Cerf to promote the development of the internet as a medium that was becoming part of our lives. There were 100 million internet users in December 1997, with one million new users per month, and 300 million users in December 2000.
***
The internet “took off” in 1974 with the creation of TCP/IP (Transmission Control Protocol / Internet Protocol) by Vinton Cerf and Bob Kahn, fifteen years before the invention of the web.
# A new medium
The internet expanded as a network linking U.S. governmental agencies, universities and research centers, before spreading worldwide in 1983.
The internet got its first boost in 1990 with the invention of the web by Tim Berners-Lee, and its second boost in 1993 with the release of Mosaic, the first browser for the general public.
Vinton Cerf founded the Internet Society (ISOC) in 1992 to promote the development of the internet as a medium that was becoming part of our lives. When interviewed by the French daily Libération on 16 January 1998, he explained that the network was doing two things. Like books, it could accumulate knowledge. But, more importantly, it presented knowledge in a way that connected it with other information whereas, in a book, information stayed isolated.
Because the web was easy to use with hyperlinks going from one document to the next, the internet could now be used by anyone, and not only by computer literate users. There were 100 million internet users in December 1997, with one million new users per month, and 300 million users in December 2000.
# A worldwide expansion
North America was leading the way in computer science and communication technology, with significant funding and cheap computers compared to Europe. A connection to the internet was much cheaper too.
In some European countries, internet users needed to surf the web at night (including the author of these lines), when phone rates by the minute were cheaper, to cut their expenses. In late 1998 and early 1999, some users in France, Germany and Italy launched a movement to boycott the internet one day per week, as a way to force internet providers and phone companies to set up a special monthly rate. This action paid off, and providers began to offer "internet rates".
In summer 1999, the number of internet users living outside the U.S. reached 50%.
In summer 2000, the number of internet users having a mother tongue other than English also reached 50%, and went on steadily increasing then. According to statistics regularly published on the website of Global Reach, a marketing consultancy promoting internationalization and localization, they were 52.5% in summer 2001, 57% in December 2001, 59.8% in April 2002, 64.4% in September 2003 (including 34.9% non-English-speaking Europeans and 29.4% Asians), and 64.2% in March 2004 (including 37.9% non- English-speaking Europeans and 33% Asians).
Broadband became the norm over the years. Jean-Paul, webmaster of the hypermedia website cotres.net, summarized things in January 2007: “I feel that we are experiencing a ‘floating’ period between the heroic ages, when we were moving forward while waiting for the technology to catch up, and the future, when high-speed broadband will unleash forces that just begin to move, for now only in games.”
# The internet of the future
The internet of the future could be a “pervasive” network allowing us to connect in any place and at any time on any device through a single omnipresent network.
The concept of a “pervasive” network was developed by Rafi Haladjian, founder of the European company Ozone, who explained on its website in 2007 that “the new wave would affect the physical world, our real environment, our daily life in every moment. We will not access the network any more, we will live in it. The future components of this network (wired parts, non wired parts, operators) will be transparent to the final user. The network will always be open, providing a permanent connection anywhere. It will also be agnostic in terms of applications, as a network based on the internet protocols themselves.” We do look forward to this.
As for the content of the internet, Timothy Leary, a visionary writer, described it in 1994 in his book “Chaos & Cyber Culture” as gigantic glass towers containing the whole world information, with free access, through the cyberspace, not only to all books, but also to all pictures, all movies, all TV shows, and all other data. In 2011, we are not there yet, but we are getting there.
1990 > THE INVENTION OF THE WEB
[Summary] The World Wide Web was invented in 1990 by Tim Berners-Lee at CERN (European Center for Nuclear Research), Geneva, Switzerland. In 1989, Tim Berners-Lee networked documents using hypertext. In 1990, he developed the first HTTP (HyperText Transfer Protocol) server and the first web browser. In 1991, the web was operational and radically changed the way people were using the internet. Hypertext links allowed us to move from one textual or visual document to another with a simple click of the mouse. Information became interactive, thus more attractive to many users. Later on, this interactivity was further enhanced with hypermedia links that could link texts and images with video and sound. The World Wide Web Consortium (W3C) was founded in October 1994 to develop protocols for the web.
***
The World Wide Web was invented in 1990 by Tim Berners-Lee, a researcher at CERN (European Center for Nuclear Research), Geneva, Switzerland, who made the internet accessible to all.
# How the web started
In 1989, Tim Berners-Lee networked documents using hypertext. In 1990, he developed the first HTTP (HyperText Transfer Protocol) server and the first web browser. In 1991, the web was operational and made the internet accessible to all. Hypertext links allowed us to move from one textual or visual document to another with a simple click of the mouse. Information became interactive, thus more attractive to many users. Later on, this interactivity was further enhanced with hypermedia links that could link texts and images with video and sound.
Developed by NCSA (National Center for Supercomputing Applications) at the University of Illinois (USA) and distributed free of charge in November 1993, Mosaic was the first browser for the general public, and contributed greatly to the development of the web. In early 1994, part of the Mosaic team migrated to the Netscape Communications Corporation to develop a new browser called Netscape Navigator. In 1995, Microsoft launched its own browser, the Internet Explorer. Other browsers were launched then, like Opera and Safari, Apple's browser.
The World Wide Web Consortium (W3C) was founded in October 1994 to develop interoperable technologies (specifications, guidelines, software, other tools) for the web, for example specifications for markup languages (HTML, XML and others). It also acted as a forum for information, commerce, communication and collective understanding. In 1998, the section Internationalization/Localization gave access to some protocols for creating a multilingual website: HTML, base character set, new tags and attributes, HTTP, language negotiation, URLs and other identifiers including non-ASCII characters, etc.
# Tim Berners-Lee’s dream
Pierre Ruetschi, a journalist for the Swiss daily “Tribune de Genève”, asked Tim Berners-Lee on 20 December 1997: "Seven years later, are you satisfied with the way the web has evolved?". He answered that, if he was pleased with the richness and diversity of information, the web still lacked the power planned in its original design. He would like "the web to be more interactive, and people to be able to create information together", and not only to be information consumers. The web was supposed to become a "medium for collaboration, a world of knowledge that we share."
In an essay posted on his webpage, Tim Berners-Lee wrote in May 1998: "The dream behind the web is of a common information space in which we communicate by sharing information. Its universality is essential: the fact that a hypertext link can point to anything, be it personal, local or global, be it draft or highly polished. There was a second part of the dream, too, dependent on the web being so generally used that it became a realistic mirror (or in fact the primary embodiment) of the ways in which we work and play and socialize. That was that once the state of our interactions was online, we could then use computers to help us analyze it, make sense of what we are doing, where we individually fit in, and how we can better work together." (excerpt from "The World Wide Web: A very short personal history")
# The web 2.0
According to Netcraft, a company tracking data on the internet, the number of websites went from one million (April 1997) to 10 million (February 2000), 20 million (September 2000), 30 million (July 2001), 40 million (April 2003), 50 million (May 2004), 60 million (March 2005), 70 million (August 2005), 80 million (April 2006), 90 million (August 2006) and 100 million (November 2006), with a growing number of personal websites and blogs.
The term “web 2.0” was invented in 2004 by Tim O’Reilly, a publisher of computer books, as a title for a series of conferences he was organizing. The web 2.0 may begin to answer Tim Berners-Lee’s dream as a web based on community and sharing, with many collaborative projects across borders and languages.
Fifteen years after the invention the web, Wired stated in its August 2005 issue that less than half of the web was commercial, with the other half being run by passion. As for the internet, according to the French daily Le Monde dated 19 August 2005, its three powers -- ubiquity, variety and interactivity -- made its potential use quasi infinite.
Robert Beard, a language teacher at Bucknell University, Pennsylvania, and the founder of A Web of Online Dictionaries in 1995, wrote as early as September 1998: "The web will be an encyclopedia of the world by the world for the world. There will be no information or knowledge that anyone needs that will not be available. The major hindrance to international and interpersonal understanding, personal and institutional enhancement, will be removed. It would take a wilder imagination than mine to predict the effect of this development on the nature of humankind."
1990 > THE LINGUIST LIST
[Summary] The LINGUIST List was founded by Anthony Rodrigues Aristar in 1990 at the University of Western Australia, with 60 subscribers, before moving to Texas A&M University in 1991, with Eastern Michigan University established as the main editing site for the list. In 1997, emails sent to the distribution list were also available on the list's own website, in several sections: the profession (conferences, linguistic associations, programs), research and research support (papers, dissertation abstracts, projects, bibliographies, topics, texts), publications, pedagogy, language resources (languages, language families, dictionaries, regional information), and computer support (fonts and software). The LINGUIST List is a component of the WWW Virtual Library for linguistics.
***
The LINGUIST List was founded by Anthony Rodrigues Aristar in 1990 at the University of Western Australia, as a mailing list for academic linguists.
With 60 subscribers, it moved to Texas A&M University in 1991, with Eastern Michigan University being established as the main editing site for the list.
In 1997, emails sent to the distribution list were also available on the list's own website, in several sections: the profession (conferences, linguistic associations, programs), research and research support (papers, dissertation abstracts, projects, bibliographies, topics, texts), publications, pedagogy, language resources (languages, language families, dictionaries, regional information), and computer support (fonts and software). The LINGUIST List is a component of the WWW Virtual Library for linguistics.
Helen Dry, co-moderator of the LINGUIST List since 1991, wrote in August 1998: "The LINGUIST List, which I moderate, has a policy of posting in any language, since it is a list for linguists. However, we discourage posting the same message in several languages, simply because of the burden extra messages put on our editorial staff. (We are not a bounce-back list, but a moderated one. So each message is organized into an issue with like messages by our student editors before it is posted.) Our experience has been that almost everyone chooses to post in English. But we do link to a translation facility that will present our pages in any of five languages; so a subscriber need not read LINGUIST in English unless s/he wishes to. We also try to have at least one student editor who is genuinely multilingual, so that readers can correspond with us in languages other than English."
She added in July 1999: "We are beginning to collect some primary data. For example, we have searchable databases of dissertation abstracts relevant to linguistics, of information on graduate and undergraduate linguistics programs, and of professional information about individual linguists. The dissertation abstracts collection is, to my knowledge, the only freely available electronic compilation in existence.
1991 > FROM ASCII TO UNICODE
[Summary] Used since the beginning of computing, ASCII (American Standard Code for Information Interchange) is a 7-bit coded character set for information interchange in English. It was published in 1963 by ANSI (American National Standards Institute). With the internet spreading worldwide, to communicate in English (and Latin) was not enough anymore. The accented characters of several European languages and characters of some other languages were taken into account from 1986 onwards with 8-bit variants of ASCII, also called extended ASCII, that provided sets of 256 characters. But problems were not over until the publication of Unicode in January 1991 as a new universal encoding system. Unicode provided "a unique number for every character, no matter what the platform, no matter what the program, no matter what the language", and could handle 65,000 characters or ideograms.
***
With the internet spreading worldwide, the use of ASCII and extended ASCII was not enough anymore, thus the need to take into account all languages with Unicode, whose first version was published in January 1991.
Used since the beginning of computing, ASCII (American Standard Code for Information Interchange) is a 7-bit coded character set for information interchange in English (and Latin). It was published in 1963 by ANSI (American National Standards Institute). The 7-bit plain ASCII, also called Plain Vanilla ASCII, is a set of 128 characters with 95 printable unaccented characters (A-Z, a- z, numbers, punctuation and basic symbols), the ones that are available on the American / English keyboard.
With computer technology spreading outside North America, the accented characters of several European languages and characters of some other languages were taken into account from 1986 onwards with 8-bit variants of ASCII, also called extended ASCII, that provided sets of 256 characters.
Brian King, director of the WorldWide Language Institute (WWLI), explained in September 1998: “Computer technology has traditionally been the sole domain of a 'techie' elite, fluent in both complex programming languages and in English -- the universal language of science and technology. Computers were never designed to handle writing systems that couldn't be translated into ASCII. There wasn't much room for anything other than the 26 letters of the English alphabet in a coding system that originally couldn't even recognize acute accents and umlauts -- not to mention non- alphabetic systems like Chinese. But tradition has been turned upside down. Technology has been popularized. (…)
An extension of (local) popularization is the export of information technology around the world. Popularization has now occurred on a global scale and English is no longer necessarily the lingua franca of the user. Perhaps there is no true lingua franca, but only the individual languages of the users. One thing is certain -- it is no longer necessary to understand English to use a computer, nor it is necessary to have a degree in computer science. A pull from non-English-speaking computer users and a push from technology companies competing for global markets has made localization a fast growing area in software and hardware development. This development has not been as fast as it could have been. The first step was for ASCII to become extended ASCII. This meant that computers could begin to start recognizing the accents and symbols used in variants of the English alphabet -- mostly used by European languages. But only one language could be displayed on a page at a time. (...)
The most recent development [in 1998] is Unicode. Although still evolving and only just being incorporated into the latest software, this new coding system translates each character into 16 bits. Whereas 8-bit extended ASCII could only handle a maximum of 256 characters, Unicode can handle over 65,000 unique characters and therefore potentially accommodate all of the world's writing systems on the computer. So now the tools are more or less in place. They are still not perfect, but at last we can surf the web in Chinese, Japanese, Korean, and numerous other languages that don't use the Western alphabet. As the internet spreads to parts of the world where English is rarely used -- such as China, for example, it is natural that Chinese, and not English, will be the preferred choice for interacting with it. For the majority of the users in China, their mother tongue will be the only choice."
First published in January 1991, Unicode "provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language" (excerpt from the website). This double-byte platform-independent encoding provides a basis for the processing, storage and interchange of text data in any language. Unicode is maintained by the Unicode Consortium, with its variants UTF-8, UTF-16 and UTF-32 (UTF: Unicode Transformation Format), and is a component of the specifications of the World Wide Web Consortium (W3C). Unicode has replaced ASCII for text files on Windows platforms since 1998. Unicode surpassed ASCII on the internet in December 2007.
1994 > TRAVLANG, TRAVEL AND LANGUAGES
[Summary] Travlang was the first website to offer links to free basic translation dictionaries, intended for travelers and the general public. As a first step, Michael C. Martin created in 1994 a “Foreign Languages for Travelers” section on his university website when he was a physics student in New York. One year later, he launched Travlang, a site that quickly became a major portal for travel and languages, and won a best travel site award in 1997. Travlang was still maintained in 1998 by Michel C. Martin, now an researcher in experimental physics at the Lawrence Berkeley National Laboratory in California. The section “Translating Dictionaries” gave access to free basic online dictionaries in a number of languages (Afrikaans, Czech, Danish, Dutch, Esperanto, Finnish, French, Frisian, German, Hungarian, Italian, Latin, Norwegian, Portuguese, Spanish, Swedish). Other sections offered links to language dictionaries, translation services and language schools.
***
Travlang was the first website to offer links to free basic translation dictionaries, intended for travelers and the general public.
As a first step, Michael C. Martin created in 1994 a “Foreign Languages for Travelers” section on his university website when he was a physics student in New York. One year later, he launched Travlang, a site that quickly became a major portal for travel and languages, and won a best travel site award in 1997.
Travlang was still maintained in 1998 by Michel C. Martin, now an researcher in experimental physics at the Lawrence Berkeley National Laboratory in California.
The section “Foreign Languages for Travelers” gave links to online tools to learn 60 languages. The section “Translating Dictionaries” gave access to free basic online dictionaries in a number of languages (Afrikaans, Czech, Danish, Dutch, Esperanto, Finnish, French, Frisian, German, Hungarian, Italian, Latin, Norwegian, Portuguese, Spanish, Swedish). Other sections offered links to translation services, language schools and multilingual bookstores. People could also book their hotel, car or plane ticket, look up exchange rates and browse an index of 7,000 links to other language and travel sites.