TACC's Ranger supercomputer assists Guatemalan human rights effort
Powerful system converts 10 million image files in days to meet election deadline
A chance investigation in Guatemala City in 2005 led to the discovery of the Guatemalan Historical National Police Archive and its nearly 80 million pages of police records dating from 1882 to 1996. For those searching for disappeared friends and family, studying state repression, or exploring the legacy of U.S. involvement in Guatemala, the documents are invaluable.
In December 2011, The University of Texas at Austin launched a digital archive of 10 million pages of records from the Guatemalan Historical National Police Archive (known as Archivo Histórico de la Policía Nacional or AHPN). The documents are now publicly accessible to researchers, human rights activists, prosecutors and ordinary citizens online through a UT library website.
The Guatemalan government denied the existence of records relating to state repression until a chance investigation in Guatemala City in 2005 led to the discovery of the AHPN and its nearly 80 million pages of police records dating from 1882 to 1996. For those searching for disappeared friends and family, studying state repression, or exploring the legacy of U.S. involvement in Guatemala, the documents are invaluable.
The AHPN's staff have labored since 2005 to preserve, digitize, and catalogue the Archive's contents. As of May 2011, they have processed 12.5 million documents, predominantly those from the most severe years of the civil conflict, 1975-1985. In addition to this archiving activity, the AHPN is quickly becoming a central actor and catalyst in prosecutions of wartime cases of human rights violations and in facilitating Guatemala's historical memory.
Until now, researchers had to travel to the physical archive in Guatemala to investigate these materials. Today, they are accessible to anyone via the Internet. However, making the archive available was no small feat. It required the assistance of digital archivists at The University of Texas at Austin and the processing power of the Ranger supercomputer at the Texas Advanced Computing Center (TACC) to speed the process along.
"The Guatemala project has been unique for us because of its scale," said Ladd Hanson, head of library systems at UT Austin. "It's over 10 million TIFF images that were brought to us from Guatemala early this year. They're things like police records, hand-written orders, all kinds of governmental-type documents, scanned in by the Guatemalans over the last five or six years."
TIFFS are what archivists call "master files". They are the archival images that are preserved as is and from which derivatives are made; however, they are typically too large to be posted online. Therefore while TIFFs get "archived", JPEGS are used for reference purposes.
The routine process of converting from one format to the other is common and indispensable for access purposes, but it is very time-consuming. The UT libraries had a tight deadline to get the reference materials online in time for the upcoming Guatemalan election, as requested by the Guatemalan archivists. Hanson did a rough calculation and determined it would take months to create reference JPEG files out of the TIFFs.
"It was obvious it would take longer than we had," Hanson said, "so I contacted TACC. We used Ranger to convert the files and we did it in a day or two."
By utilizing hundreds of computer processors simultaneously, they were able to cut months from the processing time. This is important since the 10 million images archived so far are only a fraction of the 80 million total files that comprise the archive and will have to be processed in the near future.
Ladd Hanson, head of library systems at UT Austin.
Using parallel processing in routine digital library activities is a novel approach. Making derivatives from the first large set of files has helped Hanson and his team develop a workflow for high-throughput image processing. In the future, he plans to use TACC's supercomputers to do more sophisticated activities, like extracting meaningful information from the documents and analyzing them in multiple ways.
"If we were doing this ourselves, it would take us so long to try one thing that we wouldn't try many things," Hanson said. "Having TACC at our disposal means we can try many more approaches."
Guatemala's internal armed conflict from 1960 to 1996 killed or forcibly disappeared 200,000 people and displaced one million others. For families and friends, the archive offers the hope of finding answers to the circumstances surrounding those deaths and disappearances.
The archive also contains documentation relating to decades of United States involvement in Guatemala, including human experimentation on Guatemalan citizens in connection with syphilis research in the 1940s.
As AHPN Coordinator Gustavo Meoño noted at a recent conference at UT Austin: "This alliance secures the perpetual public availability of the archive, which is so important for Guatemala. The University of Texas at Austin's prestige and commitment to academic inquiry gives us an opportunity to guarantee the right to information in the most democratic and permanent manner possible."
December 19, 2011
The Texas Advanced Computing Center (TACC) at The University of Texas at Austin is one of the leading centers of computational excellence in the United States. The center's mission is to enable discoveries that advance science and society through the application of advanced computing technologies. To fulfill this mission, TACC identifies, evaluates, deploys, and supports powerful computing, visualization, and storage systems and software. TACC's staff experts help researchers and educators use these technologies effectively, and conduct research and development to make these technologies more powerful, more reliable, and easier to use. TACC staff also help encourage, educate, and train the next generation of researchers, empowering them to make discoveries that change the world.
- A chance investigation in Guatemala City in 2005 led to the discovery of the Guatemalan Historical National Police Archive (AHPN) and nearly 80 million pages of police records dating from 1882 to 1996.
- AHPN Staff have labored since 2005 to preserve, digitize, and catalogue the Archive's contents. To make the Archive available online, AHPN partnered with The University of Texas at Austin and the Texas Advanced Computing Center.
- The Ranger supercomputer was able to create reference files for the first 10 million documents in just a few days. By speeding up the process, UT and the AHPN researchers were able to get the archive online in time for the Guatemalan elections.
Science and Technology Writer