Abstract
Introduction Large Language Models, such as ChatGPT, have become the fastest growing web platforms to date, exceeding one hundred million users in record time less than two months and as the capabilities and use of ChatGPT to inform decisions increases, it is crucial to understand the inherent biases of the platform regarding sustainability in the context of infrastructure and utilities. Sustainability, defined for the purpose of this paper as the intersection of social, environmental, and economic factors which will ensure a better future for tomorrow, is a critical factor to consider when making decisions for infrastructure and utilities. The intent of this paper is to determine and assess what biases, if any, ChatGPT expresses when responding to prompts regarding infrastructure and utilities. If advice is sought from a platform like ChatGPT in the context of utilities and infrastructure, does it prioritize Sustainability? What about regenerative development? Perhaps more interestingly, since ChatGPT is essentially a representation of an amalgam of human thought, what does it show about our own societal biases in terms of sustainability in utility data? These were the guiding questions for our analysis, which is intended to provide perspective on how the incredible new tools coming out today require us to be aware of their eccentricities to utilize them appropriately. Methodology A list of over one hundred questions were developed for use as ChatGPT prompts. These questions were intended to fall along three axes: public-private, discipline (water resources, transportation, site development), and function (operations and maintenance, planning, design, financing, and construction). Each question was developed selectively as to not directly lead ChatGPT to mention sustainability in the response, but to be open ended and to inherently require considerations regarding economics, the environment, and society. Context was provided in the form of a preamble for each question, and ChatGPT was asked to respond using a list format and limit its token response to 150 for each question this helped weigh each question evenly and control cost. The engine selected for this analysis was ChatGPT 3.5-Turbo accessible via OpenAI's Application Programming Interface (API). This engine was selected for its similarity to OpenAI's non-premium and publicly available interface, as well as its advanced ability to understand and respond to text prompts. Each of ChatGPT's responses to provided prompts were scored as follows: 0 not directly addressed, 1 indirectly addressed, and 2 directly addressed. The scores were used to assess whether the response adequately addressed the following criteria: 1) social aspect of sustainability, 2) economic aspect of sustainability 3) environmental aspect of sustainability 4) sustainability in general 5) regenerative development. Two different methods were used to score the ChatGPT provided responses. The first method utilized ChatGPT to score itself, where ChatGPT was fed the questions and corresponding answers and was then instructed to score the answers based on the criteria listed above. The ChatGPT scores were then reviewed by a human being for quality assurance and control. The second method required a human being to evaluate the questions and responses and provide a score manually. Results Sustainability was addressed to some degree across all criteria. However, the breakdown in how sustainability was addressed varies between the social, environmental, and economic components. The social component was addressed directly more than addressed indirectly or not addressed. Both the economic and environmental components had far more indirect responses than direct responses, and far more responses that did not address the respective topic. This indicates that social and communal aspects of infrastructure are well represented by ChatGPT, but environmental and economic components require further prompting or weighting from decision makers to make balanced and informed decisions. All results were then broken down by functional group, public/private, and discipline to explore more specific biases, all results are given within Figures 4-8. A deeper analysis will be performed between human scores and ChatGPT generated scores to determine potential bias in scoring and to assess whether there is a different bias between generating text and understanding text when it comes to ChatGPT and sustainability. Conclusion It is expected that ChatGPT will contain bias. It is trained on human generated data which will inheritently contain some level of skew, but understanding this bias qualitatively and quantitatively will provide decision-makers with more context needed to evaluate the reliability of results from ChatGPT, especially in the context of sustainability in infrastructure a complex, evolving field that lends itself to the powers of AI and ChatGPT. The bias observed seems to skew overall towards a social component and often does not directly or at all address economic and environmental components when it is expected these aspects would be incorporated.
This paper was presented at the WEF/AWWA Utility Management Conference, February 13-16, 2024.
Author(s)W. Kuehne1, L. Basler1
Author affiliation(s)Ardurra Group 1;
SourceProceedings of the Water Environment Federation
Document typeConference Paper
Print publication date Feb 2024
DOI10.2175/193864718825159252
Volume / Issue
Content sourceUtility Management Conference
Word count12