1 of 99

AI for Software and Analysis Development

Candace Savonen, Carrie Wright and Elizabeth Humphries

https://bit.ly/AI_ITN

2 of 99

Schedule for today

Introduction

Get to know each other activity
What’s the ITN?

Ways we can use AI for writing
Ways we can use AI for analysis code
Bias in AI
AI activity
Free work and question time

3 of 99

Join at slido.com�#208 2139

ⓘ

Click Present with Slido or install our Chrome extension to display joining instructions for participants while presenting.

4 of 99

Have your phone

(or a separate tab) handy for interactive polls!

Join at slido.com�#208 2139

5 of 99

What is your favorite tv show?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

6 of 99

What's your email?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

7 of 99

What do you hope to learn from this workshop?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

8 of 99

What do you use AI for in your daily work currently?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

9 of 99

Informatics Technology for Cancer Research (ITCR)

itcr.cancer.gov

10 of 99

Informatics Technology for Cancer Research (ITCR)

… and more!

ITCR tools: itcr.cancer.gov/informatics-tools

11 of 99

What is the ITN?

ITCR Training Network

Catalyzing informatics research through training opportunities

itcrtraining.org

12 of 99

User preparedness

Gap

Tool usability

itcrtraining.org/courses

Informatics research is hindered by a gap between different types of experts

CC-BY jhudatascience.org - Image made by Candace Savonone using https://getavataaars.com/ and https://thenounproject.com/ a

13 of 99

User preparedness

Gap

Tool usability

itcrtraining.org/courses

Catalyzing Informatics for Research

CC-BY jhudatascience.org - Image made by Candace Savonone using https://getavataaars.com/ and https://thenounproject.com/ a

14 of 99

Elements of ITN:

Make courses about informatics

Make tools for researchers to do outreach

Provide live education opportunities

Enhance community engagement in cancer research

15 of 99

ITN courses

itcrtraining.org/courses

16 of 99

Current ITN courses: itcrtraining.org/courses

Management	Software Development	Tools and Resources	Best Practices
Leadership for Cancer Informatics Research	Documentation & Usability	Computing for Cancer Informatics	Introduction to Reproducibility
	AI for Software Development	Introduction to Overleaf and LaTeX for Writing Scientific Articles	Advanced Reproducibility
	Software Development beyond Coding (coming soon!)	Choosing Genomics Tools	Ethical Data Handling
			NIH Data management and Sharing Policy

17 of 99

Who is the AI course for?

For individuals who:

Develop software or want to start

Are interested in using large language model AI tools to help you with your work

Want to use AI tools responsibly

Icons from iconpacks, iconmonstr, and openmoji

18 of 99

Concepts discussed in the AI for Software Development course:

Using AI to annotate code

Introduction to Large Language AI tools

Using AI to refactor code

Using AI to write code from scratch

Using AI to understand unfamiliar code

Icons from Openmoji.org, Nounproject.org and Iconpacks.net

Ethics of AI tool use for software development

19 of 99

20 of 99

The media talking about AI:

21 of 99

AI can enhance your work!

22 of 99

Skills you need for working with AI

Query (re)construction (new skill)
Awareness of AI’s limitations
Creativity*
Ability to break problems down into small parts*
Ability to troubleshoot*

*Skills we currently use

23 of 99

Which LLM should you use?

Wow, there are already so many options!

24 of 99

LLM	What is it really good at?	What does it struggle with?
Bard	Most human-like interaction Answers oddball questions Willing to answer “I don’t know”	Gives least amount of detail in answers Has been known to give incorrect answers
ChatGPT	Most popular, which means most tested Good all-around LLM	Unlikely to change answer even when told previous answer was wrong Invents citations Known hallucination issues
Claude	Good all-around LLM Offers specific advice when editing a writing sample for tone Best understanding of clever word play	Can sometimes require prodding to give additional detail Doesn’t easily save threads at this time (but this is changing!)
Phind	Great for technical programming questions Provides links to sources unprompted Offers many programming options at once	Tends to plagiarize sources directly when used for writing

25 of 99

Using AI as an editor

26 of 99

AI can help edit and clean up your writing

DON’T use it to do all your writing
LLMs can act as copy editors

Check for grammar
Change writing tone
Make writing “smoother”

Not all LLMs are very good at this

27 of 99

AI as an editor - Demonstration using Claude

https://poe.com/Claude-instant

28 of 99

Example sentence:

Human activities, like urbanization, road building, or other activities that alter the landscape, are likely to have an effect on the bacteria in the soil.

Let’s make the sentence more concise.

29 of 99

https://poe.com/

Claude-instant

30 of 99

Example paragraph:

Activities altering the landscape through urbanization, road building, and other means likely impact soil bacteria. One way is an increased load of heavy metal contaminants such as lead and arsenic in the soil adjacent to areas of increased human activity. When cars drive on roads, compounds from the exhaust, oil, and other fluids might settle onto the roads and be washed into the soil. When we put salt on roads, parking lots, and sidewalks, the salts themself will eventually be washed away and enter the ecosystem through both water and soil. Chemicals from factories and other businesses also leech into our environment. Previous research has demonstrated that in areas with more human activity, like cities, soils have greater concentrations of heavy metals than found in rural areas with fewer people.

Let’s rewrite this rough draft

31 of 99

Can you improve the paragraph?

Can you make the paragraph more concise?

Can you write this for an elementary school student?

Query:

OKAY – but not specific on what improvements are needed

BETTER - we’ve specified a goal

BEST - we’ve specified an audience or tone

32 of 99

Can you improve the paragraph?

33 of 99

Can you make the paragraph smoother?

34 of 99

Can you write this for an elementary school student?

35 of 99

AI cannot read your mind!

Be specific about your goals.
Start with a simple task.
Give the LLM enough data.
Be patient.
Queries often require iteration to get the answer you need

36 of 99

AI for writer’s block:

How can I expand on the ideas in this paragraph?

37 of 99

What helpful prompts have you used an LLM to help you with writing?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

38 of 99

Reproducibly using LLMs

39 of 99

Be mindful of reproducibility!

There is some randomness in how an LLM answers your query

Keep records of your prompts
Keep records of the LLM answers
Ask the question in multiple ways

Some LLMs keep logs, but also a simple markdown or google doc can work too!

40 of 99

We previously asked Bard about R:

Is here::here(“raw”, “data.csv”) equivalent to here::here(“raw/data.csv”) ?

41 of 99

AI chatbots are constantly being updated and “learning” when they are mistaken.

Answer two weeks earlier:

42 of 99

AI changes - Demonstration using Bard

https://bard.google.com/

43 of 99

Query:

If you were a person, what would you look like?

44 of 99

Bard’s answer

45 of 99

Another answer from Bard

46 of 99

And a third:

47 of 99

The answers are not always the same even if given the same prompt

48 of 99

Using AI for code!

49 of 99

50 of 99

Can people learn how to program using LLMs?

LLMs nearly always have bugs in the code that would need fixing
Sometimes LLMs make up packages!
Sometimes LLMs have code that is just incorrect

For now, LLMs can be a supplement to traditional coding education, not as a replacement

51 of 99

Test and validate EVERYTHING an LLM writes

Test your results (build in checkpoints).
Make sure the code is secure.
Have humans review your code

You are responsible for code you put out into the world, even if an LLM wrote it.

52 of 99

Good ideas for using AI for code

Getting a basic strategy for how to write code for something

How might I go about doing ______ ?
How could I structure code that would do ______ ?
Is it possible to create a package that does ______?
What packages could I use to make code that does _______ ?

53 of 99

54 of 99

This code did NOT work at all, but the basic structure of it got me started

55 of 99

AI as a pair programmer -

Demonstration using Phind

https://www.phind.com/

56 of 99

Good ideas for using AI for code

Getting a basic strategy for how to write code for something

How might I go about doing ______ ?
How could I structure code that would do ______ ?
Is it possible to create a package that does ______?
What packages could I use to make code that does _______ ?

Reviewing existing code for improvements

Can you tell me how I could make this code more readable?
Can you help fix the formatting, styling, and indent errors on this code?
Can you recommend how I could make this code more reproducible?

57 of 99

58 of 99

LLMs can be good at reviewing already existing code:

59 of 99

Image created by Candace Savonen using Avataars.

def my_function(x):

result = x

for i in range(10):

for j in range(5):

result = result + 2 * (i + 1) * (j + 1) * (i % 2 == 0 and j % 2 == 0) - 1

return result

Ruby the Researcher

Wait, what is this code even?

plot-data-2020-9-11.tsv

plot-data-20-10-2020.tsv

plot-data-20-10-2020-clean.tsv

plot_final.R

plot_final_FINAL.R

plot_final_old.R

plot.py

functions.R

functions-old.R

plot-final.png

plot-new.png

60 of 99

AI is really good for understanding unfamiliar code!

It’s like having a pair programmer explain things to you.

61 of 99

Good ideas for using AI for code

Getting a basic strategy for how to write code for something

How might I go about doing ______ ?
How could I structure code that would do ______ ?
Is it possible to create a package that does ______?
What packages could I use to make code that does _______ ?

Reviewing existing code for improvements

Can you tell me how I could make this code more readable?
Can you help fix the formatting, styling, and indent errors on this code?
Can you recommend how I could make this code more reproducible?

Annotating or upping documentation

Can you annotate this code?
Can you explain to me what this code is doing?
Can you create a README for this code?

62 of 99

Image created by Candace Savonen using Avataars.

Ruby’s code

ERROR

Ruby’s code

Now Ruby

Future Ruby

63 of 99

Code annotation Improves readability

64 of 99

Code annotation Improves maintainability

65 of 99

Code annotation Improves quality

66 of 99

https://hutchdatascience.org/AI_for_software/annotating-your-code.html

There are many benefits to annotating code:

Improves readability
Improves maintainability
Improves quality

67 of 99

Ethics alert

Remember, you don’t know what an LLM does with the code you provide in your query.

Keep in mind:

Is the code proprietary?
Are you planning to commercialize the code summary? Who owns the code?
Does the code contain sensitive data?

68 of 99

AI as a code explainer -

Demonstration using ChatGPT 3.5

https://poe.com/ChatGPT

https://chat.openai.com/

69 of 99

Query:

What does this function do?

def my_function(x):

result = x

for i in range(10):

for j in range(5):

result = result + 2 * (i + 1) * (j + 1) * (i % 2 == 0 and j % 2 == 0) - 1

return result

70 of 99

What helpful prompts have you used to use LLMs to help you with code or programming?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

71 of 99

A Good README

72 of 99

A Good README

Link to template README: https://raw.githubusercontent.com/jhudsl/Reproducibility_in_Cancer_Informatics/main/resources/README-template.md

Writing READMEs: https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/documenting-analyses.html#readmes

Info that should be included in a README:

General purpose of the project
Instructions on how to re-run the project
Lists of any software required by the project
Input and output file descriptions.
Descriptions of any additional tools included in the project?

73 of 99

Choosing an LLM

74 of 99

Which LLM should you use?

Wow, there are already so many options!

75 of 99

AI is only as good as its training data

LLMs aren’t really “talking” to you

They are simply putting words together based on patterns learned from the training data.

76 of 99

AI is only as good as its training data

Biased training data

Biased LLM responses

77 of 99

The bias often lies with us

78 of 99

79 of 99

Bias exists, so it is our responsibility that our work doesn’t make things worse!

80 of 99

Being aware of Bias in AI

Be aware of the potential biases in the data that is used to train AI systems

What data is this based on?
How inclusive would this data be?

Consider the possible outcomes
Be VERY critical of the output from LLMs
Call out the LLM and ask it to combat its own biases

81 of 99

Let’s strive to become more aware of the biases everywhere!

(Especially the ones that we aren’t affected by but affect others) !

Fred Hutch Bias training
Inclusivity resources

82 of 99

Calling out an LLM:

Part 1

83 of 99

Calling out an LLM:

Part 2

84 of 99

Calling out an LLM Part 3

85 of 99

Algorithmic Justice League

Dr. Joy Buolamwini

AI needs to be:

Equitable

Agency and control on how individuals interact with AI
Affirmative consent – an individual must opt into a service being given all the information

Accountable

Meaningful transparency into the creation of an AI system and its known limitations
Continuous oversight by third parties as the AI evolves
Provide a path to contest and correct

86 of 99

Knowing that LLMs have these biases, what strategies can we use to make sure we don't perpetuate their biases when we use them?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

87 of 99

Let’s talk about data privacy

Assume that nothing you give an LLM is kept private or secure*

*unless you created the LLM

Unclear what companies behind the LLMs do with query data
Limited information on how the companies secure query information

88 of 99

NEVER submit code that contains PHI or PII to AI tools

89 of 99

Activity: Make a README with an LLM

Have some code that is not as well annotated as it could be or does not have a README. (here’s some bad code from me you can use)
Go to ChatGPT
Ask ChatGPT to create a README
Take a look at the output and reiterate the prompt with how you’d like the README changed.
Put that README with your code base!

90 of 99

Do this!	Do NOT do this!
Be transparent about how you use LLMs wherever you use them! Say which one you used. Keep records of your prompts!	Do not hide the fact that you’ve used an LLM for your work
Look for LLM guidelines and requirements specific to your field or a particular submission or application!	Do not use LLMs without considering the application! (People who review what you’ve submitted can often tell!)
Use LLMs for drafting emails or some kinds of correspondences, but make sure you read and edit what it has written	Do not use LLMs to be a journal reviewer in your stead!
Verify the accuracy (and sources!) of LLM output	Do not use LLMs without verifying their output!
Feel free to give a LLM code if it does not contain PHI or PII	NEVER submit code that contains PHI or PII! And don’t share anything to an LLM you wouldn’t share with someone not on the project (or IRB)

91 of 99

What helpful things have you used LLMs for successfully?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

92 of 99

Activity options:

Make a README for code
Altering/summarize a piece of writing

Steps:

Try at least two different LLMs
Log your queries and results in a word or markdown document
Try multiple iterations of queries
Be ready to tell the group what you did or didn’t find useful!

93 of 99

What was your goal in your work with the LLM?
What queries/strategies worked with the LLM?
What didn’t work with the LLM?
Did one LLM work better than another?

94 of 99

LLM	What is it really good at?	What does it struggle with?
Bard	Most human-like interaction Answers oddball questions Willing to answer “I don’t know”	Gives least amount of detail in answers Has been known to give incorrect answers
ChatGPT	Most popular, which means most tested Good all-around LLM	Unlikely to change answer even when told previous answer was wrong Invents citations Known hallucination issues
Claude	Good all-around LLM Offers specific advice when editing a writing sample for tone Best understanding of clever word play	Can sometimes require prodding to give additional detail Doesn’t easily save threads at this time (but this is changing!)
Phind	Great for technical programming questions Provides links to sources unprompted Offers many programming options at once	Tends to plagiarize sources directly when used for writing

95 of 99

How likely are you to use what you learned in your daily work?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

96 of 99

How likely would you be to recommend this workshop?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

97 of 99

What did you like most about the workshop?

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

98 of 99

Please share any recommendations you have for improvements.

ⓘ

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

99 of 99

https://bit.ly/itn_demo

Demographics Survey