“Sticks and stones may break my bones,” the old adage goes. “But words will never hurt me.”
Credit: Photo by Peter Means for Virginia Tech.
“Sticks and stones may break my bones,” the old adage goes. “But words will never hurt me.”
Tell that to Eugenia Rho, assistant professor in the Department of Computer Science, and she will show you extensive data that prove otherwise.
Her Society + AI & Language Lab has shown that
- Police language is an accurate predictor of violent interactions with Black motorists.
- Broadcast media bias and social media echo chambers have put American democracy at risk.
Now, Rho’s research team in the College of Engineering has turned to another question: what effects did social media rhetoric have on COVID-19 infection and death rates across the United States, and what can policymakers and public health officials learn from that?
“A lot of studies just describe what’s happening online. Often they do not show a direct link with offline behaviors,” Rho said. “But there is a tangible way to connect online behavior with offline decision making.”
Cause and effect
During the COVID-19 pandemic, social media became a mass gathering place for opposition to public health guidance, such as mask wearing, social distancing, and vaccines. Escalating misinformation encouraged widespread disregard for preventive measures and led to soaring infection rates, overwhelmed hospitals, health care worker shortages, preventable deaths, and economic losses.
During a one-month period between November and December 2021, more than 692,000 preventable hospitalizations were reported among unvaccinated patients, according to a 2022 study published in the Yale Journal of Biology and Medicine. Those hospitalizations alone cost a staggering $13.8 billion.
In the study, Rho’s team, including Ph.D. student Xiaohan Ding, developed a technique that trained the chatbot GPT-4 to analyze posts in several banned subreddit discussion groups that opposed COVID-19 prevention measures. The team focused on Reddit because its data was available, Rho said. Many other social media platforms have barred outside researchers from using their data.
Rho’s work is grounded in a social science framework called Fuzzy Trace Theory that was pioneered by Valerie Reyna, a Cornell University professor of psychology and a collaborator on this Virginia Tech project. Reyna has shown that individuals learn and recall information better when it is expressed in a cause and effect relationship, and not just as rote information. This holds true even if the information is inaccurate or the implied connection is weak. Reyna calls this cause-and-effect construction a “gist.”
The researchers worked to answer four fundamental questions related to gists on social media:
- How can we efficiently predict gists across social media discourse at a national scale?
- What kind of gists characterize how and why people oppose COVID-19 public health practices, and how do these gists evolve over time across key events?
- Do gist patterns significantly predict patterns in online engagement across users in banned subreddits that oppose COVID-19 health practices?
- Do gist patterns significantly predict trends in national health outcomes?
The missing link
Rho’s team used prompting techniques in large language models (LLMs) — a type of artificial intelligence (AI) program — along with advanced statistics to search for and then track these gists across banned subreddit groups. The model then compared them to COVID-19 milestones, such as infection rates, hospitalizations, deaths, and related public policy announcements.
The results show that, indeed, social media posts that linked a cause, such as “I got the COVID vaccine,” with an effect, such as “I’ve felt like death ever since,” quickly showed up in people’s beliefs and affected their offline health decisions. In fact, the total and new daily COVID-19 cases in the U.S. could be significantly predicted by the volume of gists on banned subreddit groups.
This is the first AI research to empirically link social media linguistic patterns to real-world public health trends, highlighting the potential of these large language models to identify critical online discussion patterns and point to more effective public health communication strategies.
“This study solves a daunting problem: how to connect the cognitive building blocks of meaning that people actually use to the flow of information across social media and into the world of health outcomes,” Reyna said. “This prompt-based LLM framework that identifies gists at scale has many potential applications that can promote better health and wellbeing.”
Big data, big impact
Rho said she hopes this study will encourage other researchers to bring these methods to bear on important questions. To that end, the code used in this project will be made freely available when the paper publishes in the Proceedings of the Association of Computing Machinery Conference on Human Factors in Computing Systems. The paper also compares the cost of various ways researchers can analyze big datasets and extract meaningful conclusions at lower cost. The team will present its findings May 11-16 in Honolulu, Hawaii.
Outside of academia, Rho said she hopes this work will encourage social media platforms and other stakeholders to find alternatives to deleting or banning groups that discuss controversial topics.
“Simply banning people in online communities altogether, especially in spaces where they are already exchanging and learning health information, can risk driving them deeper into conspiracy theories and force them onto platforms that don’t moderate content at all,” Rho said. “I hope this study can inform how social media companies work hand-in-hand with public health officials and organizations to better engage and understand what’s going on in the public’s mind during public health crises.”
Article Publication Date
11-May-2024