1 of 7

Demo 4: Summarizing Qualitative Data in ChatGPT without Python

Problem: My boss asked me to extract all the learning from my organization’s evaluation briefs, which are PDFs. How do I do that?

Demo: I don’t know any Python, but I’ve played with ChatGPT. Can I get it to help me with this?

  • The demo will focus on how to prepare the data and the possibilities and limits of what ChatGPT can do with copied and pasted qualitative datasets.
  • The demo will give you a chance to see what is possible and assess how useful it might be.

2 of 7

Summarizing Documents

  1. Prepare the documents (Machine readable – garbage in – garbage out)
  2. Break documents into the appropriate size - For GPT-3, this was 2048 tokens (a token can be as short as one character or as long as one word in English)

a. The more detailed you want your summary to be – the smaller the files

b. ChatGPT will also look at all the metadata, tables, headers, footers, might want to clean

  • To be able to do this with multiple documents (like 100 evaluations reports- will need to use python to create batch processing and/or use an API
  • Using an LLM to summarize publicly available information is fine – but understand the privacy issues of sharing PII or information collected for an evaluation (not to train an AI) with models.

3 of 7

Scenario and preparation

Scenario – supervisor asks you to summarize the MCC learning from education evaluations

1. either scrape the evaluation briefs or download PDFs and export to plain text

2. make sure your docs don’t exceed the limit (hint you can ask ChatGPT to count the tokens for you)

3. to get the best results – you actually need to do a lot of data cleaning- so the LLM (or machine) is only looking at what is important

  • for example: I removed everything EXCEPT the MCC Learning section (and I only used 10 evaluation briefs)
  • When converting from PDF to text – it brought in table and image descriptions – I took these out. Also – not all PDFs took all content to text. Some created gibberish and had to be discarded – look at your data!

4 of 7

Process and Questions to ask

  • Open the .rtf file using from the google file here.
  • Select all the text in the file and copy it.
  • In ChatGPT type these (or other questions) and then paste all the text.

What countries did MCC work in according to the following text? (paste the text)

What gender or sex differences were mentioned in the following text? (paste the text)

What did MCC learn according to the following text? Please summarize. (paste the text)

5 of 7

6 of 7

Lessons

  • Only relatively small files can be provided to ChatGPT to summarize directly in the window
  • Beyond that you need to use Python and an API to provide the data
  • Data cleaning and preparation is essential, otherwise the machine will easily misinterpret
  • Can be a great timesaving device, but probably only faster with large datasets.
  • Please be careful what data you are putting into the system

7 of 7

Commercial Options

  • Evidencebase.ai
  • Aialyze.com

Other do it yourself guidance

https://beebom.com/how-train-ai-chatbot-custom-knowledge-base-chatgpt-api/

And a post that tried to use that guidance https://www.linkedin.com/pulse/training-ai-chatbot-chatgpt-api-evaluative-purposes-part-dennis-bours%3FtrackingId=ZZLZMaf4DKqLmT59FjiTYA%253D%253D/?trackingId=ZZLZMaf4DKqLmT59FjiTYA%3D%3D