LORELEI Imagines Rapid Automated Language Toolkit

Low-cost approach to extracting critical concepts from public information sources in unfamiliar languages would support disaster relief and other quick-response missions

Understanding local languages is essential for effective situational awareness in military operations, and particularly in humanitarian assistance and disaster relief efforts that require immediate and close coordination with local communities. With more than 7,000 languages spoken worldwide, however, the U.S. military frequently encounters languages for which translators are rare and no automated translation capabilities exist. DARPAs Low Resource Languages for Emergent Incidents (LORELEI) program aims to change this state of affairs by providing real-time essential information in any language to support emergent missions such as humanitarian assistance/disaster relief, peacekeeping and infectious disease response. The program recently awarded Phase 1 contracts to 13 organizations.


"The global diversity of languages makes it virtually impossible to ensure that U.S. personnel will be able to understand the situation on the ground when they go into new environments," said Boyan Onyshkevych, DARPA program manager. "Through LORELEI, we envision a system that could quickly pick out key information—things such as names, events, sentiment and relationships—from public news and social media sources in any language, based on the systems understanding of other languages. The goal is to provide immediate, evolving situational awareness that helps decision makers assess and respond as intelligently as possible to dynamic, difficult situations."

The conventional system of developing automated language technology—which requires years of effort and tens of millions of dollars to manually translate, transcribe and annotate individual words and phrases for each language—is adequate for languages in widespread use or in high demand. It is neither flexible enough to meet constantly changing language needs, however, nor specialized enough to account for the specific communication challenges involved in military-level emergency response.

LORELEI seeks to dramatically advance computational linguistics and human language technology to identify the elements that different languages have in common, and use that knowledge to enable rapid, low-cost development of automated language capabilities. The program would apply these automated capabilities via an easy-to-use interface that would assimilate, integrate and analyze real-time incident data in the local language(s). The envisioned system would provide useful response-related material as quickly as 24 hours after an incident occurs and fully automated language capabilities within days or weeks after that.

While LORELEI technologies could include partially or fully automated speech recognition and/or machine translation, the program does not primarily seek to comprehensively translate low-resource languages into English. Instead, LORELEI would provide situational awareness by identifying and correlating elements of information in foreign-language and English sources. LORELEI technology would be applicable to any incident where a sudden need emerges for assimilation of information by U.S. government entities about a region of the world where low-resource languages are frequently used.

"Our goal with LORELEI isnt rote translation based on libraries, but instead to provide idiomatic understanding of language as a whole, and specifically disaster-response vocabulary, to improve cooperation and speed response to dangerous situations worldwide," Onyshkevych said.

DARPA has awarded Phase 1 contracts for LORELEI to the following organizations:

Appen
Carnegie Mellon University
Columbia University
Johns Hopkins University
Next Century Corporation
Raytheon BBN
University of Illinois Urbana-Champaign
University of Massachusetts
University of Pennsylvania
University of Pennsylvania Linguistic Data Consortium
University of Texas El Paso
University of Washington
University Southern California Information Sciences Institute

LORELEI plans to explore three principal technical areas:

* Algorithm Research and Development Environment: LORELEI plans to target research and development of human language technology that would reduce the current reliance on huge, manually translated, transcribed or annotated bodies of knowledge. Instead, LORELEI would leverage what related and unrelated languages have in common and take advantage of a broad range of language-specific resources. The program also seeks to develop the LORELEI Technology Development Environment (LTDE), which would synthesize language data and integrate it with Web services that would provide named-entity recognition, topic spotting and other language technology capabilities.

* Run-time Framework Development: The program aims to develop a prototype tool, the LORELEI Run-Time Framework (LTRF), which would pull together various open-source data feeds in English and incident languages and send this data compilation through the LTDEs Web services. The processed results would return to the Framework, where numerous analytics tools would aggregate, summarize and organize them.

The LRTF would not produce reports or situational awareness documents automatically, but would present users with easy-to-understand summaries, visualizations and other useful products that would greatly help in the creation of such documents. The Framework would be able to generate initial results 24 hours after an incident and provide progressively more detailed results at one-week and one-month intervals.

* Linguistic Resource Creation: LORELEI plans to collect, create and annotate linguistic resources in multiple languages to support the work in the first two technical areas listed above. These resources would include standard language resources (dictionaries, etc.), subject-specific resources (disaster relief terminology, etc.) and other data-enabling research, development and evaluation.

Featured Product

3D Vision: Ensenso B now also available as a mono version!

3D Vision: Ensenso B now also available as a mono version!

This compact 3D camera series combines a very short working distance, a large field of view and a high depth of field - perfect for bin picking applications. With its ability to capture multiple objects over a large area, it can help robots empty containers more efficiently. Now available from IDS Imaging Development Systems. In the color version of the Ensenso B, the stereo system is equipped with two RGB image sensors. This saves additional sensors and reduces installation space and hardware costs. Now, you can also choose your model to be equipped with two 5 MP mono sensors, achieving impressively high spatial precision. With enhanced sharpness and accuracy, you can tackle applications where absolute precision is essential. The great strength of the Ensenso B lies in the very precise detection of objects at close range. It offers a wide field of view and an impressively high depth of field. This means that the area in which an object is in focus is unusually large. At a distance of 30 centimetres between the camera and the object, the Z-accuracy is approx. 0.1 millimetres. The maximum working distance is 2 meters. This 3D camera series complies with protection class IP65/67 and is ideal for use in industrial environments.