The United States Department of Homeland Security plans to develop software that analyses and summarises opinions expressed in articles, providing a possible tool for better monitoring what is written about the US in the global press.
The department says it will spend $2,4-million over the next three years supporting research at three US universities using computer science to analyse human language in texts.
”The work is really designed to get information extraction that would help the DHS review statements for sentiments or beliefs contained in statements, and to provide intelligence analysts within DHS,” said Homeland Security spokesperson Christophe Kelly.
Kelly said the software would offer the department staff ”another resource to conduct their work” — even though the project has raised eyebrows among press freedom advocates.
Janyce Wiebe of the University of Pittsburgh in Pennsylvania, who will direct the research project, said that the funding will go towards basic research and not any monitoring of the global press.
The research will seek to ”develop accurate and robust techniques for extracting and summarising information about events and opinions described in a text,” Wiebe said.
Researchers from Cornell University and the University of Utah will also participate in the work, in a field computer scientists call ”natural language processing”.
”Their focus is to develop simpler, more efficient software, algorithms and mathematical architectures for use in a broad range of computing applications,” Kelly at the DHS said.
The research team has gathered more than 270 000 articles from 180 news sources from around the world — including Agence France-Presse — between June 2001 and May 2002 covering a range of subjects including elections in Zimbabwe, relations between China and Taiwan, treatment of detainees at Guantanamo and the Kyoto protocol.
Each article has been manually annotated ”with the meanings we want the software to learn to understand,” Wiebe said.
The software envisaged by the DHS-funded research would be capable of tracking the ambiguities of human language, distinguishing the meaning of a sentence depending on context and summarising descriptions and opinions that appear in several different texts.
The researchers and DHS officials decline to discuss the possible uses of the software.
”It’s just too early to speculate about what it would evolve into,” Kelly said.
Several press freedom organisations have expressed concern that the US government wants to create a data base of certain media, particularly outlets that are the most critical of Washington.
”We’re taking a very hard look to make sure that the outcomes of this are really in line with the missions of the DHS” to protect the United States from attack, said Kelly.
Asked if the software under development could allow authorities one day to determine which media or journalist appeared hostile to the United States, Kelly said it was too early to say. – Sapa-AFP