November 30, 2021 Old Phone Books Teach New Lessons in Digital Scholarship
LC Labs Releases Report on Humans-in-the-Loop Machine Learning Research Framework
Press Contact: Kelley McNabb, email@example.com
Public Contact: Meghan Ferriter, firstname.lastname@example.org
Website: Humans in the Loop Report
Library of Congress innovation specialists examining the role of human expertise and experience in developing machine-powered research tools today released a report detailing their findings. The “Humans in the Loop” recommendation report from LC Labs details the potential and responsibility of the Library of Congress in its ongoing work to deepen access to its vast collections and share knowledge with other institutions.
The Library’s digital experiments have resulted in popular public initiatives such as By the People, the crowdsourcing platform powered by volunteer transcription, Citizen DJ, a music discovery and mixing app, and Newspaper Navigator, a machine learning algorithm that uncovered more than a million images in the Chronicling America newspaper collection. To discover the combined power of machine learning and crowdsourcing, the “Humans in the Loop” experiment investigated each step of creating a machine learning algorithm, building an engaging crowdsourcing program, and launching a prototype web experience for potential users. Together these approaches could transform access and discovery of the Library’s vast resources by combining human expertise with machine learning outputs.
“As the cultural heritage community has used more digital approaches to help our users access and discover large collections, people have wondered about the role of real humans in the future study of humanities,” said Kate Zwaard, director of Digital Strategy at the Library of Congress. “We wanted to answer that question in a way that promises to engage people, remain mindful of ethical and privacy impacts, and make our collections useful. We want to offer this report as a resource for other scholars and institutions who share these goals.”
The Library’s popular U.S. Telephone Directory collection, with its consistent layouts and fonts and unique snapshots of American communities over time, provided the ideal test sample for “Humans in the Loop.” LC Labs staff, Library subject matter experts, and partners from AVP, a data solutions provider, designed an experiment based on machine learning and crowdsourcing processes that could be created with the telephone directory’s contents. Using bounding boxes drawn around business listings and addresses in the phone books and transcriptions of these segments, the experiment team created training data to teach an algorithm to keep drawing. Wireframe mockups of sample web presentations were created for testing with potential users and for showcasing how volunteers might engage with and learn more about the collection.
Though the telephone directories are organized alphabetically with businesses categorized by industry, the research team quickly found that machine learning catalyzed a workflow for identifying the specific business name and address data that can enable flexible searching, incorporating geographic data and other Library collections. The experiment revealed the value of human expertise from volunteers and staff alike in every step of the experiment. Validating contributions and feedback on workflow design set the stage to not only improve the discovery of related information and context, but also to return exponential dividends. The humans following manual workflows processed 119 Directory listings; through this careful analysis this work seeded the machine learning workflow that generated 15,000 listings in just four days.
While the complete findings of the “Humans in the Loop” report can be found here, two major themes emerged: designing flexible and informed approaches and major investment in staffing and resources will enable sustained success. No two collections are exactly the same, so the processes outlined in “Humans in the Loop” are not a one-size-fits-all solution, and there is no substitute for human enthusiasm for problem solving.
To learn more about the challenges, ethical considerations, and the potential to expand access to Library of Congress collections at a major scale, read the report. To learn more about Library of Congress digital experiments, visit LC Labs.
About LC Labs
Through experimentation, research, collaboration, and reflection, LC Labs works to realize the Library’s vision that “all Americans are connected to the Library of Congress” by enabling the Library’s Digital Strategy. LC Labs is home to the Library of Congress Innovator in Residence Program; has nurtured experiments in machine learning and the use of collections as data; and incubated the Library’s popular crowdsourced transcription program By the People. Learn more and subscribe to our monthly newsletter at labs.loc.gov.
About the Library of Congress
The Library of Congress is the world’s largest library, offering access to the creative record of the United States — and extensive materials from around the world — both on-site and online. It is the main research arm of the U.S. Congress and the home of the U.S. Copyright Office. Explore collections, reference services and other programs and plan a visit at loc.gov; access the official site for U.S. federal legislative information at congress.gov; and register creative works of authorship at copyright.gov.