CHICAGO—May 18, 2023—Researchers at Illinois Institute of Technology have secured a $1.6 million contract to develop a groundbreaking system for authentic authorship attribution and anonymization. Using natural language processing and machine learning, the program, known as AUTHOR, promises to create “stylistic fingerprints” for reliable identification, while also providing robust solutions for anonymization. With broad applications including counterintelligence, combating misinformation, and even investigating the origins of ancient religious texts, the project marks a significant leap in computational analysis.
Credit: Illinois Institute of Technology
CHICAGO—May 18, 2023—Researchers at Illinois Institute of Technology have secured a $1.6 million contract to develop a groundbreaking system for authentic authorship attribution and anonymization. Using natural language processing and machine learning, the program, known as AUTHOR, promises to create “stylistic fingerprints” for reliable identification, while also providing robust solutions for anonymization. With broad applications including counterintelligence, combating misinformation, and even investigating the origins of ancient religious texts, the project marks a significant leap in computational analysis.
The project—a collaboration with Charles River Analytics, Rensselaer Polytechnic Institute, Aston University, and the Howard Brain Sciences Foundation—has received the funding from an $11.3 million pool allocated by the Human Interpretable Attribution of Text Using Underlying Structure (HIATUS) program of the Intelligence Advanced Research Projects Activity (IARPA), a research organization within the Office of the Director of National Intelligence.
AUTHOR (Attribution, and Undermining the Attribution, of Text While Providing Human-Oriented Rationales) aims to accurately capture the unique writing styles of authors through a sophisticated blend of natural language processing and machine learning. The project is being led by Illinois Tech’s Shlomo Argamon, professor of computer science and chair of the Department of Computer Science, and Kai Shu, Gladwin Development Chair Assistant Professor of Computer Science.
“There are a number of different types of authorship attribution tasks,” says Argamon, who has more than 25 years of research experience in the field. “One is where there is a particular author who we want to identify in different texts. Another is where we have a specific text which we want to attribute to one of a number of candidate authors. A third is simply to determine when two texts have been written by the same person or not.”
Argamon and Shu also aim to address the rising urgency caused by malicious online activities and machine-generated misinformation.
“With large language models, such as GPT-3, it is possible that human-like texts can be generated from these ’bots,” says Shu. “Our work will explore deep generative models and style transfer techniques to explore the boundary of human-written and machine-generated texts.”
One of the central challenges the team seeks to overcome is the limitations of current methods of authorship analysis and obfuscation. The issue lies partially in identifying authorship when the questioned document differs in type from the known documents, given the inherent stylistic variations between different forms of writing, such as a personal letter, an academic article, or a short story.
“The best current methods do very poorly when test documents are of a different type than the training documents,” says Argamon. “We will develop author models that incorporate such stylistic domain dependence to enable more generally effective attribution.”
The project will also tackle the challenge of authorship obfuscation, maintaining the meaning of the text while altering the style. The team will integrate deep learning with semantic knowledge representation to generate text that maintains the original content meaning while changing the style. This dual functionality—attribution and obfuscation—sets AUTHOR apart from existing algorithms.
Unlike existing systems, AUTHOR will provide a clear rationale for its author identification systems, adding another layer of transparency and reliability to the project.