1 of 47

Skeptical Approaches to AI Research Tools

Anna Mills, College of Marin

MyFest via Equity Unbound

July 2, 2024

Licensed CC BY NC 4.0

2 of 47

Welcome! Please share

  • Your name and where you are
  • Something about your professional life
  • Your favorite celebratory drink! Mine is Thai ice tea
  • Slides, open for commenting: https://bit.ly/SkepticalAIresearch

Photo by Kim Cruz

3 of 47

Welcome! What to expect

  • When search and chat combine: AI + real sources
  • Teaching students to approach AI research assistance skeptically
  • AI with academic research: pros and cons of Elicit.org
  • Other research assistance apps with different specialities

  • Slides: https://bit.ly/SkepticalAIresearch
  • Feel free to put questions in the chat as we go

4 of 47

Could AI be the best research assistant ever?

“I have basically found that it is the best research assistant I’ve ever had. So now if I’m looking up something for a column or preparing for a podcast interview, I do consult with generative AI almost every day for ideas and brainstorming.

And just things like research — make me a timeline of all the major cyber attacks in the last 10 years, or something like that. And of course I will fact-check that research before I use it in a piece, just like I would with any research assistant.”

-Kevin Roose on the Hard Fork podcast for The New York Times Opinion, April 5, 2024

5 of 47

Share in the chat: On a scale of 1 to 4, how much have you used AI for research, specifically for help finding sources and working with them?

1: I haven’t used it

2: I’ve used it a little

3: I’ve used it quite a few times

4: I use it consistently in my search/research process

6 of 47

When AI combines with search and works with real sources

7 of 47

Remember the warning that “AI makes up sources”? AI systems still make them up, but not as often as a year ago.

8 of 47

AI + real sources = good and bad news

  • It sounds like a dream! At long last, could we just explain what we are looking for and be in dialogue with the papers themselves?�
  • But it doesn’t work reliably. These AI research assistants mimic academic source citation practices but aren’t always summarizing correctly and may cite a source that is not really the origin of its output.

9 of 47

  • Perplexity is a popular combination of search and chat.�
  • Google has introduced AI overviews (they show up in response to some queries and not others).

*

10 of 47

Google AI overviews have included some striking misinformation, as in this test by Casey Newton shared on Threads.

*

11 of 47

Will we examine these AI results for bias and consider where they are coming from? I asked Google, “What makes life worth living?” Its answer is secular and individualistic in keeping with its dataset. The top source it cited was Psychology Today

*

12 of 47

I asked Perplexity “Which vegetables are not likely to grow well in a San Francisco garden where there is moderate fog and wind?”

  • Its answer seemed decent: melons, corn, eggplants, peppers, winter squash. (Though I’ve grown peppers and seen pumpkins.)
  • However, the top source it gave me was a Reddit thread.
  • Google gave me links, and the first one was about vegetables that grow well in the winter. Not relevant.

13 of 47

But the AI overview sometimes misrepresents the sources. After listening to Ezra Klein’s podcast, I asked Perplexity, “What does Ezra Klein think AI will do to the Internet?”

Perplexity.AI:

But no! His guest Nilay Patel said that, as the footnoted source indicates!

14 of 47

Let’s test: Ask Perplexity.ai a question on a topic you know a lot about. Share in the chat:

  • How good is the AI-generated answer?
  • How good is the selection of sources? Are they trustworthy?
  • Now try the same question in Google. How do the results compare? Does it give an AI overview or just search results?

15 of 47

Teaching students to approach AI research assistance skeptically

16 of 47

Let’s make sure students practice checking how AI handles information

Invite students to try out one of these systems that purports to cite its sources and/or aid with research. Ask them to find something the AI missed.

  • Is each source really talking about what the AI summary says it’s talking about?
  • Did the AI summary miss anything central?
  • Is the summary or synthesis accurate?

17 of 47

One lesson: “Fact-Checking Auto-Generated AI Hype”

I asked students to fact-check a list of claims and sources generated by ChatGPT. They commented in the margins of a chat session transcript, speaking back to and correcting ChatGPT’s handling of sources.

See this description of the assignment with materials and samples, published in TextGenEd: Teaching with Text Generation Technologies from the Writing Across the Curriculum Clearinghouse.

18 of 47

ChatGPT misinformation from a chat session on surprising AI facts

“AI Can Decode Ancient Scripts:

  • Fact: Researchers used AI to decipher the ancient script known as Linear B.
  • Source: Jucha, M. A., Arjovsky, M., & Bengio, Y. (2017). Decipherment of the Linear B script with deep learning. arXiv preprint arXiv:1708.08731.
  • Credibility: arXiv is a repository for electronic preprints, and while not peer-reviewed, many significant findings are first reported here. Yann Bengio, one of the authors, is a Turing Award winner.”

There’s no such paper and no such author!

19 of 47

What happened? Yann LeCun + Yoshua Bengio = Yann Bengio?

Yann LeCun and Yoshua Bengio are computer scientists considered “godfathers” of AI who have collaborated. ChatGPT combined their names.

20 of 47

ChatGPT generated the claim, “AI creates original art and music.” I annotated its supposed source and shared this with students.

21 of 47

Students also practiced assessing ChatGPT’s explanations for why sources were credible

ChatGPT output cited the Facebook AI blog: “While a company blog might not be a traditional academic source, it's a primary source in this case because it's directly from the team that conducted the research.”

The students pushed back on the idea that a company blog is credible just because it contains internal company information.

22 of 47

AI for Academic Research Apps: A look at Elicit.org

23 of 47

How does this work for academic research? Let’s look at Elicit: “The AI Research Assistant”

  • With Elicit, you write “your search query in natural language… There's no need to try to think of every possible keyword or synonym to find relevant papers. Just ask Elicit a question like you would ask an expert in the field.”
  • “Elicit's Find papers step can handle filter criteria directly in your query. If you try a query like ‘papers about the benefits of taking l-theanine published after 2020’ Elicit will automatically filter to papers published after 2020.”�--Elicit Help

24 of 47

Elicit answers your research question in its own words

Instead of searching on “teacher shortages students effects”

And also “educator shortages” and “instructor shortages,” with “impacts”

the student can just ask their question in one way.

25 of 47

Elicit lists papers and summarizes their elements

26 of 47

My students enjoyed Elicit’s intuitive interface and immediate response to their questions.

  • They found it to easier to navigate than the academic databases they had just learned to search.
  • I want to keep sharing it as a way to open research to students who find it intimidating and who shut down due to cognitive overload when they have to tweak search terms, filters, and databases.
  • But I can see the potential for harm as well as good.

27 of 47

In one test, Elicit’s synthesis addressed a different question, not what I asked.

Question: Do language models trained partly on AI-generated text perform worse than ones trained only on human text?

Its answer was about detection of AI text and comparison of the quality of human writing and AI text, not about how training data affects performance.

Can we help students practice catching this kind of misinterpretation?

28 of 47

Elicit’s one-sentence summaries of papers sometimes miss key points.

Elicit’s summary of Student Perceptions of AI-Powered Writing Tools: Towards Individualized Teaching Strategies by Michael Burkhard: “AI-powered writing tools can be used by students for text translation, to improve spelling or for rewriting and summarizing texts.”��But the real abstract includes this other central point: “[S]tudents may need guidance from the teacher in interacting with those tools, to prevent the risk of misapplication. Depending on the different student types, individualized teaching strategies might be helpful to promote or urge caution in the use of these tools.”

Elicit’s “main findings” column describes this better, but the user has to specifically choose that option.

29 of 47

Elicit promises efficiency, but even where it delivers we should ask whether such efficiency means we are skipping important thinking and reading

Elicit.org’s tag lines are

  • “Analyze research papers at superhuman speed.”
  • “Automate time-consuming research tasks like summarizing papers, extracting data, and synthesizing your findings.”

30 of 47

Emily Bender and Chirag Shah have argued that essential aspects of research may be lost with such efficiencies.

In “Situating Search,” Bender and Shah argue that “removing or reducing interactions in an effort to retrieve presumably more relevant information can be detrimental to many fundamental aspects of search, including information verification, information literacy, and serendipity.”�

Proceedings of the 2022 Conference on Human Information Interaction and Retrieval, March 2022

31 of 47

Problems with student use of AI research assistance

  • AI research assistance may create an illusion of easy and efficient access and understanding of the field.
  • Maybe the student isn’t really getting the sources they need or understanding them accurately.
  • Even if the student reads the original text, are their reading skills adequate to help them discern whether the summary is accurate?

32 of 47

AI for Academic Research: An array of apps with different specialties

33 of 47

A variety of AI research apps offer similar functionality to Elicit.

34 of 47

Consensus.AI attempts to assess the level of agreement among scholars

  • You ask it a research question
  • It offers a brief AI-generated synthesis answer based on ten papers.
  • For yes or no questions, it estimates what percent of the research leans “yes,” “possibly,” and “no” on the question.

35 of 47

The “Consensus Meter”

36 of 47

The “Consensus Meter” looks authoritative and quantitative with its percent ratings, but it comes with warnings and would be hard to double check.

  • “This feature is powered by generative AI and may generate incorrect information. Please read the papers and consult other sources.”
  • “Important Reminder: A result appearing at the top does NOT mean it is true. Interpreting research is complicated.”

37 of 47

  • “Undermind highlights the precise papers you should focus on and gives a clear explanation for each decision.”
  • Analyzes the full text of research articles.

38 of 47

If you have one research paper, Keenious helps you find related ones

  1. Upload a paper
  2. Keenious suggests topics based on the paper
  3. Keenious suggests related research and allows you to filter by topic.

39 of 47

  • Similar to Elicit, but with special focus on citation
  • “Read what research articles say about each other”
  • “Smart Citations allow users to see how a publication has been cited by providing the context of the citation and a classification describing whether it provides supporting or contrasting evidence for the cited claim”
  • How accurate are these assessments?

40 of 47

  • Similar to Elicit but adds in “AI Chat for Scientific PDFs
  • “Copilot” mode takes a chat-with-the-research approach:it shows a scholarly paper on one side and a chat pane on the other with suggested prompts.
  • It will do an automated “literature review” on the basis of a user-entered question. What’s the quality?

41 of 47

ResearchRabbit.AI, “Spotify for papers”

  • Research Rabbit helps scholars organize papers and discover connections between them.
  • Its maps of networks of connections between authors and between papers could give students a visual representation of research as conversation. Connected Papers also does this.

42 of 47

ResearchRabbit.AI, “Spotify for papers”

When I put in my own paper, ResearchRabbit showed me a network of papers on related topics. I could see that the Zawacki-Richter one was cited by the others.

43 of 47

MoxieLearn.ai, an all-inclusive app for supporting academic research and writing. Not cheap.

  • ~$60/month, intended to replace multiple premium subscriptions to Claude, Perplexity Pro, Gemini Pro, and ChatGPT. Offers access to those models.
  • Writing assistance, feedback, and research assistance organized in a proprietary system developed by a group of academic writing consultants.

44 of 47

AI features common in these apps are now appearing within academic databases themselves. AI functionality is going mainstream in the research process.

45 of 47

Share in the chat: What's seems most useful in AI research assistance? (Let's imagine the functionality works decently.)

  • Papers related to a particular paper
  • Analysis of how a publication has been cited
  • Map of connections between papers and authors
  • Chat pane next to papers
  • Summaries
  • Synthesis of multiple sources
  • Search with natural language rather than by keywords or filters
  • Analysis of how the research leans on yes or no questions

46 of 47

The outlook is for ongoing ambivalence, gray areas, and exploration

Can we use AI research assistance wisely?

Can we guide students to use it in ways that don’t detract from their learning?

Can we make sure they practice noticing where AI is wrong and where its use gets in the way of important reading and reflection?

47 of 47

  • What will you try next?
  • What will you avoid?
  • What will you tell students about AI research assistance?
  • Questions? Comments?

AnnaRMills.com

Slides open for commenting: https://bit.ly/SkepticalAIresearch

This presentation is shared under a CC BY NC 4.0 license.