Tuesday, May 16, 2023

This Citation Does Not Exist

In 2019, engineer Phillip Wang launched a website called "This Person Does Not Exist," which harnessed the StyleGAN AI system to generate realistic-looking photographs of nonexistent people. Although Wang's original website is now defunct, scores of similar sites do exist, including a variation of This Person Does Not Exist and a variety of generators designed for such diverse uses as helping programmers look busy at work, creating a geography guessing game out of Google Street View, and introducing made-up dictionary definitions into the lexicon.

Just a few years later, large language model (LLM) chatbots are the hottest trend in generative AI technology, with OpenAI's ChatGPT, Microsoft's Bing Chat (powered by ChatGPT), and Google's Bard the best-known of their kind. It's easy to see their appeal – type a quick prompt and almost instantly generate a wall of convincing-sounding text, complete with citations.

Sound too good to be true? Alas, it certainly can be. Since the public unveiling of ChatGPT and similar systems, many users have noticed that some of the supporting citations…well, just don't exist. In most cases, these will appear to be a perfectly plausible book, article, and even court opinion or statute citation, until a researcher attempts to actually retrieve the cited material. These dead-ends are called "hallucinations" by programmers (or "hallu-citations," as coined by USC professor Kate Crawford) and have become a growing problem for researchers as these systems explode in popularity.

The New York Times reported earlier this month on the perils of "When A.I. Chatbots Hallucinate." Commentators have highlighted damaging falsehoods from hallucinated biographies, including for a law professor falsely implicated for sexual harassment and an Australian mayor implicated in the bribery scandal for which he had actually been the whistleblower.

Google's Bard, which just removed its waitlist last week, contains a clear warning of the potential for inaccurate information and encourages its users to double-check results: Bard will not always get it right: Bard may give inaccurate or inappropriate responses. When in doubt, use the "Google it" button to check Bard's responses. (Bard results will also include links to sources from which the response is drawn, at least in the case of a direct quotation.) OpenAI similarly acknowledges the concerns about inaccuracies: ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging...

These tools will continue to improve over time, and the potential applications for quick search, summary, and drafting assistance are truly exciting. For legal research, Lexis has recently announced the planned launch of Lexis+ AI, which will roll out to law school faculty later this summer. Other tools to watch in this space include Casetext’s CoCounsel, powered by GPT-4 and unveiled in March. As these tools continue to develop, though, researchers need to remain aware of the potential inaccuracies that may appear in results, especially in these early days.

Of course, researchers have always had to be on the lookout for erroneous citations (whether due to sloppy notetaking by authors, poor cite-checking by editors, or transcription errors by typists and publishers). But now, it's more important than ever to double-check cited sources for accuracy, in order to prevent spreading any nonexistent citations.

So where can a researcher confirm the existence of a cited source? There is no one-size-fits-all answer, but a few bookmarks to keep handy for verification purposes include:

  • The existence of a particular book title can usually be verified via a search of WorldCat.org and/or Google Books. To obtain a copy for substantiation purposes, consult the Duke Libraries Catalog.
  • Journal article citations can likely be verified quickly via the library's Articles search, E-Journals title search, and/or Google Scholar. (Tip: When using Google Scholar, visit Settings > Library Links to ensure "Duke University Libraries – Get it at Duke" is enabled to allow you to log in to many paywalled full-text resources with your NetID.)
  • Court opinions from US jurisdictions can usually be retrieved in your preferred legal research service (e.g., Westlaw, Lexis, Fastcase). Two free resources to try accessing the full text of case law include Google Scholar and the Harvard Caselaw Access Project. Note that some time periods and jurisdictions may not be available, depending on which source you choose.
  • For statutes, try to retrieve by citation in your preferred legal research service. Many jurisdictions provide free access to current codes and historical session laws on their websites, linked at Cornell's Legal Information Institute. If searching by citation fails, search selected keywords from the quoted text.
  • To locate content from cited websites that now appear to be defunct, try entering the URL into the Internet Archive Wayback Machine.

For help with locating other specific types of materials that you discover in your research, be sure to Ask a Librarian.