The new Rosetta stone

By  | July 21, 2011 | 0 Comments | Filed under: Misc

image

The Long Now Foundation, which is the driving force behind the 10,000 year clock project, has a few other projects worth looking over. The Rosetta Project is an initiative which is trying to document every human language (in current use). The intent is to create a repository (in the very long run) which would allow those who follow us to have a clean and simple access to as much information as is possible.

This will be an obvious treasure trove for anthropologist and linguists, and the implication is that in some post apocalyptic world. Future generations will need as much help as we can give them…

If you are interested in Isaac Asimov’s Foundation series of science fiction books, this will make a lot of sense…

The Rosetta Project Is Preserving Every Language Ever Spoken, On One Nano-Etched Piece of Metal
http://www.fastcompany.com/1763851/the-long-nows-laura-welcher-on-time-language-and-a-rosetta-stone-for-the-future

A project of the Long Now Foundation, the aim is to make sure we preserve the knowledge contained in dying languages: "If languages are our how-to guides for living on planet Earth, we are handing our descendants an encyclopedia with almost all of the pages ripped out."

You’re trying to make a record of every language in the world. How do you go about that?

There are about 7,000 languages spoken in the world today, and it is likely that we will lose at least half of them–and some say up to 90%–in the next 100 years. With all our resources combined (money, experts in the field, community initiatives) we have a hope of maybe documenting 500 languages in the foreseeable future, but we need to scale this to about 5,000. The only way I can see to do this is by engaging speakers of languages themselves to produce their own language documentation. So, the question then becomes: What is the minimal amount of useful language documentation the average person might produce? I would argue it would be a verbal text–ideally a short video–and then I’d need to know what language the user thinks the recording is in (detailed identification can be done later).

The realization I’ve come to in the past year or so is most of us are carrying around language documentation devices in our own bag or back pocket–video enabled cell phones, cameras, laptops. If you project out 10 years, these devices become globally ubiquitous, and then anyone can create and contribute language documentation to a central repository. Then, as we assemble a collection of videos for any given language, we can start enriching them with transcriptions, translations, annotations–that is, building a corpus.

How will future researchers use that data, and what insights will they be able to glean?

A corpus can be used in many different ways–a small corpus can provide language learning and teaching materials, as well as materials for the building of linguistic resources such as grammars and dictionaries (this is the kind of language documentation linguists are producing today). Then, with a larger corpus–say tens of hours of transcribed speech, we can start building acoustic models for speech recognition. With a few million words we can start to do machine translation. And these are the tools that enable a language to be used online–which I would argue is a crucial new domain for language use in the modern world.

How will the corpus collected by the Rosetta Project differ from other archives of natural language?

Most language archives focus on languages of a particular region, or data collected under the umbrella of a particular project. The Rosetta Project is quite different in that we aim to assemble information on and in all human languages–all 7,000 of them. Not only is this a big effort, it is also a big challenge for how you organize all that information and make it usable to many different groups of people, from language specialists, to endangered language speech communities, to the interested general public, to an elementary school teacher or student.

Related Posts Plugin for WordPress, Blogger...
Be Sociable, Share!
 
Tags:
Post comment as twitter logo facebook logo
Sort: Newest | Oldest

Translator

Subscribe