Misinformation and Biases
The output of a large language learning model like ChatGPT is only as good as its training data. What is the training data? According to OpenAI, publicly available information on the internet, information licensed from third parties, and information provided by users and human trainers is utilized to teach ChatGPT how to produce natural language outputs. Therefore, biases and misinformation that is found in these texts are reflected in outputs. Human trainers do manipulate the program to make some outputs more appropriate (in their estimation), but given the incredible amount of training data, continual maintenance is required. Furthermore, it is not very clear to the user where the information used for training data comes from.
AI Hallucinations
ChatGPT and other generative AI chat bots may state false information in a manner that sounds confident to the user. They may generate false citations for authors due to text prediction tendencies and reference materials that do not exist. These false statements are sometimes referred to as hallucinations. McGowen et al. (2023) found that ChatGPT-3.5 and Bard 2.0 generated largely inaccurate citations in a summer 2023 study. Users may be inclined to believe outputs because generative AI chat bots appear to have access to so much information and chat bots are expected to be unbiased because of their computer origins. However, ChatGPT and similar large language models can produce inaccurate and biased information.
Dangers of Providing AI Tools with Private Information
Users may intentionally or inadvertently provide ChatGPT with personal information while expediting everyday tasks. ChatGPT takes measures to remove personally identifying information from training data, but there is still a concern that this information could accidentally be released or be available to unauthorized parties through 'membership inference attacks' (Derner & Batistič, 2023).
Derner, E., & Batistič, K. (2023). Beyond the Safeguards: Exploring the Security Risks of ChatGPT. https://doi.org/10.48550/arxiv.2305.08005
McGowan, A., Gui, Y., Dobbs, M., Shuster, S., Cotter, M., Selloni, A., Goodman, M., Srivastava, A., Cecchi, G. A., & Corcoran, C. M. (2023). ChatGPT and Bard exhibit spontaneous citation fabrication during psychiatry literature search. Psychiatry Research, 326, N.PAG. https://doi.org/10.1016/j.psychres.2023.115334