Online data can tell us a lot about the places where we live — if we can develop ways to analyze it
BUFFALO, N.Y. — Every day, people share a dizzying amount of information about local communities online. They talk about whether their neighbors are friendly, how well the buses run, what kinds of restaurants are in an area, and much, much more.
A new study by University at Buffalo researcher Yingjie Hu shows how we can sort through this vast trove of digital data to improve cities and people’s quality of life.
The research, published on March 8 in the Annals of the American Association of Geographers, analyzed neighborhood reviews from more than 7,600 unique users who posted about New York City on the website Niche.com.
The goal was to efficiently sort through thousands of opinions to learn about people’s perceptions of their communities. The study combined spatial analysis, machine learning and natural language processing techniques to identify features of neighborhoods that people talked about online, and what reviewers’ general feelings were about those attributes.
This kind of research could help urban planners spot problems they might otherwise miss, Hu says.
“It takes a lot of time to learn about neighborhoods by surveys or face-to-face interviews. So although the data from surveys or interviews are highly valuable, if you are a planner, you may not be able to collect this kind of data frequently or for large geographic areas,” says Hu, PhD, an assistant professor of geography in the University at Buffalo College of Arts and Sciences and an expert in the field of geographic information science (GIScience).
“In contrast, the online data we are researching is updated constantly,” Hu adds. “People are always posting new neighborhood reviews. We want to use this resource to discover new information that can support a variety of applications.”
Co-authors on the study included Chengbin Deng, PhD, and Zhou Zhou, both at Binghamton University.
Online reviews give planners access to thousands of opinions
The researchers’ analysis found that in some cases, people’s perceptions of a community did not align with the picture presented by other data. For example, while reviewers’ feelings about safety generally matched up with crime statistics, other categories did not correlate so neatly, Hu says.
Discrepancies occur for many reasons, and some can be revealing. When it came to employment, neighborhood reviews discussed the quality of jobs in an area, a problem that isn’t captured by broad statistics like the unemployment rate. Likewise, when it came to public transportation, people were concerned about reliability: Were buses on time or late? Planners might miss this issue if they are focused on metrics like the number of bus lines or transit hubs in a neighborhood.
Hu cautions that online reviews should not be used in isolation: People who post are self-selected, so they may not represent the views of all the people in a neighborhood.
Still, Hu says data harvesting can complement surveys, face-to-face interviews and other existing methods for learning about neighborhoods. Analyzing online data can give planners access to the insights of many individuals quickly.
“One advantage of online neighborhood review data is that it becomes relatively easy to get access to the opinions of many people — in the case of our study, more than 7,600,” Hu says. “If you think about what it takes to interview even 100 people, it takes a lot of time and resources. The neighborhood review data is far from perfect, but it can help to complement other types of data we already have.”
Related Journal Article