By Alistair Jones
Credit: Singapore Management University
By Alistair Jones
SMU Office of Research & Tech Transfer – Despite hero moments in movies where fingers clatter at dizzying speed across computer keyboards, not everyone in the real world finds code fascinating, nor algorithms intriguing.
In fact, there is a worldwide shortage of skilled data scientists and software engineers.
David Lo, a Professor of Computer Science at Singapore Management University (SMU), suggests two reasons for the shortfall.
“First, software today is everywhere; organisations, companies, governments and society rely on software,” he says.
“There is increasing demand for new software and upgrades to existing software [which] translates to the demand for skilled software engineers.
“Second, technologies continue to advance at a rapid pace. We see new technologies such as blockchain, self-driving cars, IoT, drones, etc. This advancement increases the complexity of software systems built on top of these technologies.
“This complexity, in turn, increases the demand for software engineers with the necessary skills to handle the different technologies and their interactions. With the rapidly increasing demand for skilled software engineers and the not-as-rapid increase in the number of newly trained software engineers, there is a shortage,” Professor Lo says.
New technical support is on the way in the form of TrustedSEERS, a research project led by Professor Lo, which has recently been awarded a National Research Foundation (NRF) Investigatorship grant. The NRF Investigatorship is designed to support a small number of excellent Principal Investigators with a track record of research achievements that identify them as leaders in their respective fields of research. Since the launch of the Investigatorship in 2015, Professor Lo is the second SMU faculty to have secured the grant.
An acronym for Trusted Software Engineering Expert AdvisoRs, TrustedSEERs will address the shortage of software engineers needed to create and maintain the software that society needs. It will do so by improving the workflow of software engineers and quality of software systems already in the field by creating trusted automation bots to act as concierges and interactive advisors – digital assistants informed by Artificial Intelligence (AI).
Evolving knowledge
The key to TrustedSEERS is software analytics (SA), a research area that has developed during the past two decades.
“SA seeks to automate software engineering tasks [and] has introduced novel and specialised AI-based solutions that analyse and learn from software engineers’ activity data (software artefacts). Much of such data is available on open-source software repositories such as GitHub,” Professor Lo says.
“SA’s beginning was fuelled by the high availability of data in closed and open-source software repositories, the development of AI algorithms that make sense of data, and challenges that have plagued software development.
“Simply put, SA processes data to help address challenges faced by software developers and companies.”
One challenge is that software engineering knowledge is ever evolving, and engineers can struggle to keep up to date on the many different technologies.
“There is too much material and information to read. Outdated knowledge, wrong knowledge, or lack of knowledge can affect the quality of upgrades software engineers make to software systems,” Professor Lo says.
“SA can help software engineers by ‘digesting’ much of the data available in many repositories and making recommendations that can help software engineers in making constant, urgent and trusted upgrades to software systems and applications.”
Engineering data
Even though SA is a comparatively new field, it too needs to update. The TrustedSEERs project aims to bring about the next generation of SA to address mismatches and limitations that Professor Lo has identified in present solutions.
“Software engineers have high expectations of the effectiveness of SA tools’ recommendations before they are willing to adopt them,” he says.
“To boost effectiveness, many studies have predominantly focused on model-centric innovations by designing ever more sophisticated AI models that can crunch ever larger amounts of data – typically from a specific large data source.
“However, there is a limit on how much we can push forward by designing ever more sophisticated models and using ever larger (and noisier) data. Much more improvement can potentially be gained by focusing on data-centric innovations.”
Data-centric innovations involve engineering better data, but what does this entail?
“Better data corresponds to data that is more comprehensive (contains all helpful information and covers all essential cases), is relevant to a task at hand, is accessible (transformed to a representation that is more amenable to AI learning), and is labelled more consistently and accurately,” Professor Lo says.
“For engineering better data, there is a need for novel and effective solutions that can systematically select, label, synthesise, link and transform data from diverse software artefacts and harness them to learn effective SA solutions.”
Finding such solutions is a key focus of the TrustedSEERs project.
Intelligent and trustworthy
Also integral to the project is engendering trust in the automated bots it creates.
“Although trust has been highlighted as a critical component for effective human-machine collaborations, studies on investigating and improving the trustworthiness of SA solutions are limited,” Professor Lo says.
“Two sets of factors may affect a software engineer’s trust in an SA solution: intrinsic (the solution engenders trust by being able to provide explanations for its outputs); and extrinsic (the solution abides by regulations set by external authorities and is robust to external attacks). Both have not been addressed much in SA research.
“Current generation SA tools typically produce recommendations (such as, patches to fix a bug, source code to be written, or a third-party library to use) without explanations. The lack of explanations limits the trust that software engineers have in these recommendations and can hamper the adoption of SA solutions,” Professor Lo says.
The use of open-source software and crowdsourced data is widespread. But do we need to be cautious in trusting open-source software data?
“Yes. Many software artefacts are low-quality (they contain bugs and even security vulnerabilities) or outdated (they are using older technologies that are not optimal),” Professor Lo says.
“The SA solution that we want to build needs to be able to identify such low-quality and outdated data so that the output it produces does not introduce trustworthiness issues.”
The project, which will also consider privacy and copyright issues, aims to champion six new directions in data-centric and trustworthy SA.
“It will be a step towards my long-term dream of realising a symbiotic workforce of autonomous bots and engineers working together productively to build high-quality software for the betterment of industry and society,” Professor Lo says.