In an era marked by the rapid evolution of artificial intelligence (AI), the visual arts face unprecedented challenges. One of the critical issues arising in this technological landscape is the non-consensual use of artists’ work by generative AI tools. These tools, including widely discussed models like ChatGPT, rely on vast datasets harvested from the internet, which often include copyrighted creations and original artwork. This situation has left many artists feeling vulnerable and unprotected, as their works are utilized without their permission or consideration, significantly disrupting their livelihoods.
To tackle this growing concern, a group of researchers at the University of California, San Diego, and the University of Chicago have undertaken a profound investigation into the capabilities of content creators to protect their work from these AI crawlers. These programs are designed to collect data from various online sources to train machine learning models. However, the study reveals that while artists wish to control how their content is used, they often lack both the technical knowledge and the tools to effectively prevent AI crawlers from accessing their work.
The researchers presented their findings at the 2025 Internet Measurement Conference, outlining the vital need for artists to assert greater control over their creative output. They surveyed over 200 visual artists to gauge their awareness of tools designed to block AI crawlers, exploring the artists’ varying levels of technical expertise and their perceptions of the effectiveness of existing methods. Overall, it became clear that a significant gap exists between the desire for protection and the ability to execute it.
.adsslot_ngZ0sUKErY{ width:728px !important; height:90px !important; }
@media (max-width:1199px) { .adsslot_ngZ0sUKErY{ width:468px !important; height:60px !important; } }
@media (max-width:767px) { .adsslot_ngZ0sUKErY{ width:320px !important; height:50px !important; } }
ADVERTISEMENT
Finding ways to restrict access to their work is a high priority for about 80% of the artists surveyed, but the question remains: how can they prevent their creations from being included in the datasets that fuel AI generative models? The survey results indicated that roughly two-thirds of artists reported utilizing tools like “Glaze,” which is designed to mask original artworks from these crawlers by manipulating the images in a specific way. While this approach represents a step forward, it serves as only a partial solution since it still allows some level of access to the work, albeit in a less recognizable form.
Furthermore, a staggering 96% of participants expressed a desire for a straightforward tool to deter AI crawlers from accessing their content. A solution frequently discussed within technical circles is the use of the “robots.txt” file, a standard text file placed in a website’s root directory, which specifies which pages should be accessible to crawlers. It’s a tool that can potentially play a crucial role in governing how automated systems interact with web content.
Despite its simplicity, robots.txt remains underutilized, particularly among artists. The researchers discovered that more than 60% of artists were not familiar with this essential tool, highlighting a significant gap in understanding regarding how to effectively safeguard their work online. While some major websites have begun to explicitly disallow AI crawlers in their robots.txt files, this trend is not universal. Sites with licensing agreements with AI companies have opted to remove these prohibitions, thereby increasing the risk of their content being included in a variety of AI training datasets.
The alarming reality is that many artists lack control over their robots.txt files, with over 75% of artist websites hosted on third-party platforms that do not allow for modifications of these critical access files. This absence of control is further complicated by a lack of transparency provided by content management systems (CMS) regarding what types of crawlers are blocked or allowed. Notably, Squarespace appears as a rare exception, as it provides an easy-to-use interface for blocking AI tools, yet only a small percentage of its users—approximately 17%—take advantage of this feature.
While some AI crawlers respect directives issued in robots.txt files, the compliance is inconsistent. Major companies typically adhere to these guidelines; however, notable exceptions exist, such as “Bytespider” from TikTok. This inconsistency creates a landscape of uncertainty, where artists cannot rely on these tools alone for their protection. Current measures are insufficient, as they do not provide the specificity or enforcement desired by content creators.
Beyond technical barriers, the evolving legal landscape surrounding the use of artistic content for AI training models adds another layer of complexity. Artists are caught in an ambiguous web of legal protections, as courts continue to grapple with issues surrounding copyright and fair use pertaining to AI-generated content. In the United States, ongoing litigation raises questions about the obligations of AI companies to artists whose content has been utilized without consent. Conversely, the recent passing of the AI Act in the European Union suggests a shift towards requiring explicit authorization from copyright holders before data scraping can be performed.
In conclusion, while the study’s findings illuminate the pressing needs of visual artists to protect their work from AI crawlers, the path forward is fraught with challenges. Effective control over digital content must be paired with tools that are accessible and user-friendly. Additionally, legislative changes must align with these technological solutions to ensure that artists are not merely passive participants in the digital landscape but active guardians of their creative rights. The ongoing discourse around artists, AI, and copyright must consider the evolving technological innovations and their implications on creative expression and ownership.
This research emphasizes the urgent call for better awareness and tools to protect artists’ rights, as well as the importance of advocating for changes that empower creators in the face of rapidly advancing AI technologies. The intersection of art and technology will only become more intricate, and the voices of artists must be central in shaping this evolving narrative.
Subject of Research:
Article Title: SomeSite I Used To Crawl: Awareness, Agency, and Efficacy in Protecting Content Creators From AI Crawlers
News Publication Date: 28-Oct-2025
Web References:
References:
Image Credits: University of California San Diego
Keywords
Generative AI, Artificial intelligence, Computer science, Visual arts, Fine arts.
Tags: AI and copyright issuesAI crawlers and content theftartists’ rights in the digital agechallenges for digital artistscontent creators and AI toolsgenerative AI and artInternet Measurement Conference presentationsnon-consensual use of artworkprotecting visual artists’ worksafeguarding artistic creationstechnological impact on visual artsUniversity of California San Diego research