The web, a multilingual encyclopedia

Part 3

Chapter 33,693 wordsPublic domain

In summer 2000, the number of internet users having a mother tongue other than English also reached 50%, and went on steadily increasing then. According to statistics regularly published online by Global Reach, they were 52.5% in summer 2001, 57% in December 2001, 59.8% in April 2002, 64.4% in September 2003 (including 34.9% non-English-speaking Europeans and 29.4% Asians), and 64.2% in March 2004 (including 37.9% non-English-speaking Europeans and 33% Asians).

1997 > THE INTERNET, A TOOL FOR MINORITY LANGUAGES

[Summary] Despite the so-called hegemony of the English language, the internet was also a good tool for minority languages, as stated by Caoimhín Ó Donnaíle, who has taught computing at the Institute Sabhal Mòr Ostaig, on the Isle of Skye, in Scotland. Caoimhín has maintained the trilingual (Scotish Gaelic, Irish Gaelic, English) college website, as the main site worldwide with information on Scottish Gaelic, with a trilingual list of European minority languages. The internet could be a tool to develop a "cultural identity" for any language, while using the English language for this, as stated by Guy Antoine, who founded Windows on Haiti in April 1998 to promote the Haitian culture and language.

***

Despite the so-called hegemony of the English language, the internet was also a good tool for minority languages, as stated by Caoimhín Ó Donnaíle, who has taught computing at the Institute Sabhal Mòr Ostaig, on the Isle of Skye, in Scotland.

Caoimhín has maintained the trilingual (Scotish Gaelic, Irish Gaelic, English) college website, as the main site worldwide with information on Scottish Gaelic, with a trilingual list of European minority languages.

Interviewed in August 1998, Caoimhín saw four main points for the growth of a multilingual web: “(a) The internet has contributed and will contribute to the wildfire spread of English as a world language. (b) The internet can greatly help minority languages, but this will not happen by itself. It will only happen if people want to maintain the language as an aim in itself. (c) The web is very useful for delivering language lessons, and there is a big demand for this. (d) The Unicode (ISO 10646) character set standard is very important and will greatly assist in making the Internet more multilingual.”

How about the Gaelic language? Caoimhín wrote in May 2001: "Students do everything by computer, use Gaelic spell-checking, a Gaelic online terminology database. There are more hits on our website. There is more use of sound. Gaelic radio (both Scottish and Irish) is now available continuously worldwide via the internet. A major project has been the translation of the Opera web-browser into Gaelic -- the first software of this size available in Gaelic."

What about endangered languages? "I would emphasize the point that as regards the future of endangered languages, the internet speeds everything up. If people don't care about preserving languages, the internet and accompanying globalization will greatly speed their demise. If people do care about preserving them, the internet will be a tremendous help."

Robert Beard, co-founder of the web portal yourDictionary.com, wrote in January 2000: "While English still dominates the web, the growth of monolingual non-English websites is gaining strength with the various solutions to the font problems. Languages that are endangered are primarily languages without writing systems at all (only 1/3 of the world's 6,000+ languages have writing systems). I still do not see the web contributing to the loss of language identity and still suspect it may, in the long run, contribute to strengthening it. More and more Native Americans, for example, are contacting linguists, asking them to write grammars of their language and help them put up dictionaries. For these people, the web is an affordable boon for cultural expression."

The internet could be a tool to develop a "cultural identity" for any language, while using the English language for this, as stated by Guy Antoine, who founded Windows on Haiti in April 1998 to promote the Haitian culture and language.

Guy wrote in November 1999: "In Windows on Haiti, the primary language of the site is English, but one will equally find a center of lively discussion conducted in 'Kreyòl'. In addition, one will find documents related to Haiti in French, in the old colonial Creole, and I am open to publishing others in Spanish and other languages. I do not offer any sort of translation, but multilingualism is alive and well at the site, and I predict that this will increasingly become the norm throughout the web. (…)

The internet can serve, first of all, as a repository of useful information on minority languages that might otherwise vanish without leaving a trace. Beyond that, I believe that it provides an incentive for people to learn languages associated with the cultures about which they are attempting to gather information. One soon realizes that the language of a people is an essential and inextricable part of its culture. (...) ‘Kreyòl’ (Creole for the non-initiated) is primarily a spoken language, not a widely written one. I see the web changing this situation more so than any traditional means of language dissemination."

Guy added in June 2001: "Kreyòl is the only national language of Haiti, and one of its two official languages, the other being French. It is hardly a minority language in the Caribbean context, since it is spoken by eight to ten million people. (...) I have taken the promotion of Kreyòl as a personal cause, since that language is the strongest of bonds uniting all Haitians. (…) I have created two discussion forums on my website Windows on Haiti, held exclusively in Kreyòl. One is for general discussions on just about everything but obviously more focused on Haiti's current socio-political problems. The other is reserved only to debates of writing standards for Kreyòl. Those debates have been quite spirited and have met with the participation of a number of linguistic experts. The uniqueness of these forums is their non- academic nature.”

1997 > A EUROPEAN TERMINOLOGY DATABASE

[Summary] Launched in 1997 by the Translation Service of the European Commission, Eurodicautom was a multilingual terminology database of economic, scientific, technical and legal terms and expressions, with language pairs for the eleven official languages of the European Union (Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish, Swedish), and Latin. There were 120,000 daily visits on average in 2003. In late 2003, Eurodicautom announced its integration into a larger terminology database in partnership with other institutions of the European Union. The new database, called IATE (InterActive Terminology for Europe), would be available in more than 20 languages, because of the enlargement of the European Union planned in 2004. IATE was launched on the intranet of some European institutions in spring 2004 and on the internet for free in March 2007.

***

Eurodicautom was a multilingual terminology database of economic, scientific, technical and legal terms and expressions, with language pairs for the eleven official languages of the European Union, and Latin.

Eurodicautom was initially developed to assist in-house translators. A free online version was available on the web in 1997 for European Union officials and for language professionals throughout the world.

Eurodicautom covered "a broad spectrum of human knowledge", mainly relating to economy, science, technology and legislation in the European Union (EU), to answer the needs of the 15 member countries in 11 official languages (Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish, Swedish), plus Latin.

The project of a larger terminology database was studied as early as 1999 to merge the existing databases for a better inter- institutional cooperation between the European organizations. The project partners were the European Commission, the European Parliament, the Council of the European Union, the Court of Justice, the European Court of Auditors, the European Economic and Social Committee, the Committee of the Regions, the European Investment Bank, the European Central Bank, and the Translation Centre for the Bodies of the European Union.

Eurodicautom had 12,000 visits a day in late 2003, when it closed to prepare for a larger terminology database that would include the databases of other official European institutions. The new database would be available in many more languages, more than 20 languages instead of 12, because of the Enlargement of the European Union planned in 2004 to include new countries from Central and Eastern Europe. The European Union went from 15 country members to 25 country members in May 2004, and 27 country members in January 2007.

IATE (InterActive Terminology for Europe) was launched in March 2007 as an eagerly free service on the web, after been launched in summer 2004 on the intranet of the participating European institutions, with 1.4 million entries in the 23 official languages of the European Union (Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish, Swedish), plus Latin.

The website has been maintained by the Translation Center of the European Union institutions in Luxembourg. According to the IATE brochure, also available in the 23 official languages, IATE offered 8,4 million words in 2010, including 540,000 abbreviations and 130.000 expressions.

1997 > BABEL FISH, A FREE TRANSLATION SOFTWARE

[Summary] In December 1997, the search engine AltaVista launched the first free machine translation software called Babel Fish or AltaVista Translation, which could translate webpages or short texts from English into French, German, Italian, Portuguese or Spanish, and vice versa. The software was developed by Systran (an acronym for "System Translation"), a company specializing in automated language solutions. Babel Fish was a “hit” among the 12 million internet users of the time, who included more and more non- English-speaking users, and greatly contributed to a plurilingual web. Other tools were developed then by Alis Technologies, Globalink, Lernout & Hauspie and Softissimo, with free and/or paid versions available on the web.

***

In December 1997, the search engine AltaVista launched Babel Fish as the first free machine translation software from English to five other languages.

At the time, the interface of Yahoo! was available in seven languages (English, French, German, Japanese, Korean, Norwegian, Swedish), to take into account a growing number of non-English- speaking users. When a search didn't give any result in Yahoo!, it was automatically shunted to AltaVista, and vice versa.

Babel Fish, also called AltaVista Translation, could translate webpages from English into French, German, Italian, Portuguese or Spanish, and vice versa, the original page and the translation being face-to-face on the screen. Translating any short text was also possible with a “copy and paste”. The result was far from perfect but helpful, as well as instantaneous and free unlike a high-quality professional translation. Non-English-speaking users were thrilled. Babel Fish greatly contributed to a plurilingual web.

Backed up by plurilingual dictionaries with 12.5 million entries, Babel Fish was developed by Systran (an acronym for "System Translation"), a company specializing in automated language solutions. As explained on Systran’s website: "Machine translation software translates one natural language into another natural language. MT takes into account the grammatical structure of each language and uses rules to transfer the grammatical structure of the source language (text to be translated) into the target language (translated text). MT cannot replace a human translator, nor is it intended to."

Machine translation was defined as such on the website of the European Association for Machine Translation (EAMT): "Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful for certain specific applications, usually in the domain of technical documentation. In addition, translation software packages which are designed primarily to assist the human translator in the production of translations are enjoying increasing popularity within professional translation organizations."

Other translation software was developed then by Alis Technologies, Globalink, Lernout & Hauspie and Softissimo, with paid and/or free versions available on the web. As for Babel Fish, it moved to Yahoo!’s website in May 2008.

1997 > THE TOOLS OF THE TRANSLATION COMPANY LOGOS

[Summary] In December 1997, Logos, a global translation company based in Modena, Italy, decided to put on the web for free the professional tools used by its translators, for the internet community to be able to use them as well. These tools were the Logos Dictionary, a multilingual dictionary with 7.5 billion words (in fall 1998); the Logos Wordtheque, a multilingual library with 328 billion words extracted from translated novels, technical manuals, and other texts; the Logos Linguistic Resources, a database of 553 glossaries; and the Logos Universal Conjugator, a database for verbs in 17 languages. In 2007, the Logos Library (formerly Wordtheque) included 710 billion words, Linguistic Resources (no change of name) included 1,215 glossaries, and the Universal Conjugator (formerly Conjugation of Verbs) included verbs in 36 languages.

***

In December 1997, Logos, a global translation company, decided to put on the web all the professional tools used by its translators, for the internet community to freely use them as well.

Logos was founded by Rodrigo Vergara in 1979, with headquarters in Modena, Italy. In 1997, Logos had 300 in-house translators and 2,500 free-lance translators worldwide, who processed around 200 texts per day.

The linguistic tools available online were the Logos Dictionary, a multilingual dictionary with 7.5 billion words (in fall 1998); the Logos Wordtheque, a multilingual library with 328 billion words extracted from translated novels, technical manuals, and other texts, that could be searched by language, word, author or title; the Logos Linguistic Resources, a database of 500 glossaries; and the Logos Universal Conjugator, a database for verbs in 17 languages.

When interviewed by Annie Kahn in an article of the French daily Le Monde dated 7 December 1997, Rodrigo Vergara, head of Logos, explained: "We wanted all our translators to have access to the same translation tools. So we made them available on the internet, and while we were at it we decided to make the site open to the public. This made us extremely popular, and also gave us a lot of exposure. This move has in fact attracted many customers, and also allowed us to widen our network of translators, thanks to contacts made in the wake of the initiative."

In the same article, called “Les mots pour le dire” (The words to tell it), Annie Kahn wrote: "The Logos site is much more than a mere dictionary or a collection of links to other online dictionaries. The cornerstone is the document search program, which processes a corpus of literary texts available free of charge on the web. If you search for the definition or the translation of a word ('didactique', for example), you get not only the answer sought, but also a quote from one of the literary works containing the word (in our case, an essay by Voltaire). All it takes is a click on the mouse to access the whole text or even to order the book, including in foreign translations, thanks to a partnership agreement with the famous online bookstore Amazon.com. However, if no text containing the required word is found, the program acts as a search engine, sending the user to other web sources containing this word. In the case of certain words, you can even hear the pronunciation. If there is no translation currently available, the system calls on the public to contribute. Everyone can make suggestions, after which Logos translators check the suggested translations they receive."

Ten years later, in 2007, the Logos Library (formerly Wordtheque) included 710 billion words, Linguistic Resources (no change of name) included 1,215 glossaries, and the Universal Conjugator (formerly Conjugation of Verbs) included verbs in 36 languages.

1997 > SPECIALIZED TERMINOLOGY DATABASES

[Summary] Some international organizations have run terminology databases in their own field of expertise for their translation services. In 1997, some databases were freely available on the web, to be used by language professionals throughout the world and by the internet community at large, for example ILOTERM, maintained by the International Labor Organization (ILO), TERMITE (ITU Telecommunication Terminology Database), maintained by the International Telecommunication Union (ITU), and WHOTERM (WHO Terminology Information System), maintained by the World Health Organization (WHO).

***

In 1997, some specialized terminology databases maintained by international organizations in their own field of expertise were freely available on the web, to be used by language professionals throughout the world and by the internet community at large.

Here are three examples with ILOTERM, maintained by the International Labor Organization (ILO), TERMITE (ITU Telecommunication Terminology Database), maintained by the International Telecommunication Union (ITU), and WHOTERM (WHO Terminology Information System), maintained by the World Health Organization (WHO).

ILOTERM is a quadrilingual (English, French, German, Spanish) terminology database maintained by the Terminology and Reference Unit of the Official Documentation Branch (OFFDOC) at the International Labor Office (ILO) in Geneva, Switzerland. As explained on its website, the primary purpose of ILOTERM is to provide solutions, reflecting current usage, to terminology issues in the social and labor fields. Terms are available in English with their French, Spanish and German equivalents. The database also includes the ILO structure and programs, official names for international institutions, national bodies and employers' and workers' organizations, and names of international meetings and symposiums.

TERMITE, which stands for “Telecommunication Terminology Database”, is a quadrilingual (English, French, Spanish, Russian) terminology database maintained by the Terminology, References and Computer Aids to Translation Section of the Conference Department at the International Telecommunication Union (ITU) in Geneva, Switzerland. This database has been built on the content of all ITU printed glossaries since 1980, and regularly updated with recent entries.

WHOTERM, which stands for “WHO Terminology Information System”, is a trilingual (English, French, Spanish) database maintained by the World Health Organization (WHO) in Geneva, Switzerland. It has included: (a) the WHO General Dictionary Index in English, with the French and Spanish equivalents; (b) three glossaries in English: Health for All, Programme Development and Management, and Health Promotion; (c) the WHO TermWatch, an awareness service from the Technical Terminology, reflecting the current WHO usage, but not necessarily terms officially approved by WHO, with links to health-related terminology.

1998 > THE NEED FOR A “LINGUISTIC DEMOCRACY”

[Summary] Brian King, director of the WorldWide Language Institute (WWLI), brought up the concept of "linguistic democracy" in September 1998: "Whereas 'mother-tongue education' was deemed a human right for every child in the world by a UNESCO report in the early '50s, 'mother-tongue surfing' may very well be the Information Age equivalent. If the internet is to truly become the Global Network that it is promoted as being, then all users, regardless of language background, should have access to it. To keep the internet as the preserve of those who, by historical accident, practical necessity, or political privilege, happen to know English, is unfair to those who don't."

***

Brian King, director of the WorldWide Language Institute (WWLI), brought up the concept of "linguistic democracy" in September 1998: "Whereas 'mother-tongue education' was deemed a human right for every child in the world by a UNESCO report in the early '50s, 'mother-tongue surfing' may very well be the Information Age equivalent.

If the internet is to truly become the Global Network that it is promoted as being, then all users, regardless of language background, should have access to it. To keep the internet as the preserve of those who, by historical accident, practical necessity, or political privilege, happen to know English, is unfair to those who don't."

For Brian King, one factor contributing to the development of a multilingual internet is the “competition for a chunk of the 'global market' by major industry players”, with “the export of information technology around the world. Popularization has now occurred on a global scale and English is no longer necessarily the lingua franca of the user. Perhaps there is no true lingua franca, but only the individual languages of the users. One thing is certain -- it is no longer necessary to understand English to use a computer, nor it is necessary to have a degree in computer science. A pull from non-English-speaking computer users and a push from technology companies competing for global markets has made localization a fast growing area in software and hardware development.”

Another factor is the development of electronic commerce. “Although a multilingual web may be desirable on moral and ethical grounds, such high ideals are not enough to make it other than a reality on a small-scale. As well as the appropriate technology being available so that the non-English speaker can go, there is the impact of 'electronic commerce' as a major force that may make multilingualism the most natural path for cyberspace. Sellers of products and services in the virtual global marketplace into which the internet is developing must be prepared to deal with a virtual world that is just as multilingual as the physical world. If they want to be successful, they had better make sure they are speaking the languages of their customers!"

Founder of Euro-Marketing Associates and its virtual branch Global Reach, Bill Dunlap championed the assets of e-commerce in Europe among his fellow compatriots in the U.S., promoting the internationalization and localization of their websites. He wrote in December 1998: "There are so few people in the U.S. interested in communicating in many languages -- most Americans are still under the delusion that the rest of the world speaks English. However, in Europe, the countries are small enough so that an international perspective has been necessary for centuries."

Peter Raggett, deputy-head (and then head) of the Central Library of OECD (Organization for Economic Cooperation and Development), wrote in August 1999: "I think it is incumbent on European organizations and businesses to try and offer websites in three or four languages if resources permit. In this age of globalization and electronic commerce, businesses are finding that they are doing business across many countries. Allowing French, German, Japanese speakers to easily read one's website as well as English speakers will give a business a competitive edge in the domain of electronic trading."