While West Virginia University researchers see potential in educational settings for the newest official ChatGPT plugin, called Code Interpreter, they’ve found limitations for its use by scientists who work with biological data utilizing computational methods to prioritize targeted treatment for cancer and genetic disorders.
Credit: WVU Illustration/Aira Burkhart
While West Virginia University researchers see potential in educational settings for the newest official ChatGPT plugin, called Code Interpreter, they’ve found limitations for its use by scientists who work with biological data utilizing computational methods to prioritize targeted treatment for cancer and genetic disorders.
“Code Interpreter is a good thing and it’s helpful in an educational setting as it makes coding in the STEM fields more accessible to students,” said Gangqing “Michael” Hu, assistant professor in the Department of Microbiology, Immunology and Cell Biology at the WVU School of Medicine and director of the Bioinformatics Core. “However, it doesn’t have the features you need for bioinformatics. These are technical issues that can be overcome. Future developments of Code Interpreter are likely to extend its use to many fields such as bioinformatics, finance and economics.”
Since its release in December 2022, the popular artificial intelligence chatbot ChatGPT has gained the attention of businesses, educators and the general public. However, it didn’t quite live up to the needs of people working in biomedical research including bioinformatics — the field where computer science meets biology — who eagerly awaited OpenAI’s Code Interpreter plugin hoping it would fill the gaps.
Hu and his team put Code Interpreter to the test on a variety of tasks to evaluate its features. Their findings, published in Annals of Biomedical Engineering, show the plugin breaks down some of the barriers, but not all of them.
For example, people without a science background will have an ease of access to coding, or computer programming, with Code Interpreter. Hu said it’s also cost-effective and sparks a curiosity for students to explore data analysis and boosts their interest in learning. He points out, though, users will need to understand how to interpret data and recognize whether the results are accurate and know how to interact with the chatbot.
Bioinformaticians rely on precise coding, computer software programs and internet access to store, analyze and interpret biological data such as DNA and human genome used for advancements in modern medicine.
Despite the need for improvements specific to bioinformatics, Hu said, Code Interpreter helps users determine whether a response is accurate or if it is a fictitious answer presented with confidence, known as a hallucination.
“People know that ChatGPT can do many impressive things, but it is not good at providing a citation or reference to support its answer. If it is asked about the source to support the claim of a response, it may start to make up references,” Hu explained. “Code Interpreter provides a solution to minimize hallucinations. For questions that can be addressed through coding, the code itself serves as the source or citation. That is a significant step forward.”
Working with Hu were Lei Wang, a postdoctoral fellow in the WVU Department of Microbiology, Immunology and Cell Biology; Xijin Ge, of South Dakota State University; and Li Liu, of Arizona State University.
The team found positive results in Code Interpreter’s ability to convert data to charts and graphs.
Suggestions for upgrades to Code Interpreter include internet access for downloading genome data, installation of software specific to bioinformatics, expansion of storage capacity and support for additional programming languages. In addition, researchers found a need for privacy and security applications to comply with regulations such as HIPAA.
In testing data analysis, they discovered several limitations. The plugin supports only one computer program, Python, and few of its software packages are dedicated to bioinformatics. In addition, it doesn’t allow access to internet data and lacks the capacity to work with large files.
“It allows for 100 megabytes or so, but the files we’re handling are at a gigabyte level,” Hu said. “Also, it doesn’t support parallel processing needed for large datasets which results in slow performance.”
Hu said that while he anticipates more upgrades for Code Interpreter, he plans to help students learn more about the advantages of the current plugin.
“In my class next spring, I plan to introduce this plugin to help students learn about data visualization,” Hu said. “AI is a fast-moving field. I hope by that time OpenAI may overcome some of the limitations so it can be used for a broad range of bioinformatics coding.”
Earlier this year, Hu led another study to prepare high school and college students to harness the power of ChatGPT by learning more about coding. The process employed OPTIMAL — Optimization of Prompts Through Iterative Mentoring and Assessment — to improve communication with a chatbot.
In the long run, Hu said he will continue to monitor and test new AI programming and features.
“As new products develop, I’ll just keep going,” Hu said. “There are certainly many other innovative uses awaiting to be discovered.”
Journal
Annals of Biomedical Engineering
DOI
10.1007/s10439-023-03324-9
Article Title
Code Interpreter for Bioinformatics: Are We There Yet?
Article Publication Date
23-Jul-2023