1 of 99

AI for Software and Analysis Development

Candace Savonen, Carrie Wright and Elizabeth Humphries

https://bit.ly/AI_ITN

2 of 99

Schedule for today

  • Introduction
    • Get to know each other activity
    • What’s the ITN?
  • Ways we can use AI for writing
  • Ways we can use AI for analysis code
  • Bias in AI
  • AI activity
  • Free work and question time

3 of 99

Join at slido.com�#208 2139

Click Present with Slido or install our Chrome extension to display joining instructions for participants while presenting.

4 of 99

Have your phone

(or a separate tab) handy for interactive polls!

Join at slido.com�#208 2139

5 of 99

What is your favorite tv show?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

6 of 99

What's your email?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

7 of 99

What do you hope to learn from this workshop?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

8 of 99

What do you use AI for in your daily work currently?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

9 of 99

Informatics Technology for Cancer Research (ITCR)

10 of 99

Informatics Technology for Cancer Research (ITCR)

… and more!

11 of 99

What is the ITN?

ITCR Training Network

Catalyzing informatics research through training opportunities

12 of 99

User preparedness

Gap

Tool usability

Informatics research is hindered by a gap between different types of experts

CC-BY jhudatascience.org - Image made by Candace Savonone using https://getavataaars.com/ and https://thenounproject.com/ a

13 of 99

User preparedness

Gap

Tool usability

Catalyzing Informatics for Research

CC-BY jhudatascience.org - Image made by Candace Savonone using https://getavataaars.com/ and https://thenounproject.com/ a

14 of 99

Elements of ITN:

  1. Make courses about informatics

  • Make tools for researchers to do outreach

  • Provide live education opportunities

  • Enhance community engagement in cancer research

15 of 99

ITN courses

16 of 99

Current ITN courses: itcrtraining.org/courses

Management

Software Development

Tools and Resources

Best Practices

Leadership for Cancer Informatics Research

Documentation & Usability

Computing for Cancer Informatics

Introduction to Reproducibility

AI for Software Development

Introduction to Overleaf and LaTeX for Writing Scientific Articles

Advanced Reproducibility

Software Development beyond Coding (coming soon!)

Choosing Genomics Tools

Ethical Data Handling

NIH Data management and Sharing Policy

17 of 99

Who is the AI course for?

For individuals who:

  • Develop software or want to start

  • Are interested in using large language model AI tools to help you with your work

  • Want to use AI tools responsibly

Icons from iconpacks, iconmonstr, and openmoji

18 of 99

Concepts discussed in the AI for Software Development course:

Using AI to annotate code

Introduction to Large Language AI tools

Using AI to refactor code

Using AI to write code from scratch

Using AI to understand unfamiliar code

Icons from Openmoji.org, Nounproject.org and Iconpacks.net

Ethics of AI tool use for software development

19 of 99

20 of 99

The media talking about AI:

21 of 99

AI can enhance your work!

22 of 99

Skills you need for working with AI

  • Query (re)construction (new skill)
  • Awareness of AI’s limitations
  • Creativity*
  • Ability to break problems down into small parts*
  • Ability to troubleshoot*

*Skills we currently use

23 of 99

Which LLM should you use?

Wow, there are already so many options!

24 of 99

LLM

What is it really good at?

What does it struggle with?

Bard

  • Most human-like interaction
  • Answers oddball questions
  • Willing to answer “I don’t know”
  • Gives least amount of detail in answers
  • Has been known to give incorrect answers

ChatGPT

  • Most popular, which means most tested
  • Good all-around LLM
  • Unlikely to change answer even when told previous answer was wrong
  • Invents citations
  • Known hallucination issues

Claude

  • Good all-around LLM
  • Offers specific advice when editing a writing sample for tone
  • Best understanding of clever word play
  • Can sometimes require prodding to give additional detail
  • Doesn’t easily save threads at this time (but this is changing!)

Phind

  • Great for technical programming questions
  • Provides links to sources unprompted
  • Offers many programming options at once
  • Tends to plagiarize sources directly when used for writing

25 of 99

Using AI as an editor

26 of 99

AI can help edit and clean up your writing

  • DON’T use it to do all your writing
  • LLMs can act as copy editors
    • Check for grammar
    • Change writing tone
    • Make writing “smoother”
  • Not all LLMs are very good at this

27 of 99

AI as an editor - Demonstration using Claude

https://poe.com/Claude-instant

28 of 99

Example sentence:

Human activities, like urbanization, road building, or other activities that alter the landscape, are likely to have an effect on the bacteria in the soil.

Let’s make the sentence more concise.

29 of 99

30 of 99

Example paragraph:

Activities altering the landscape through urbanization, road building, and other means likely impact soil bacteria. One way is an increased load of heavy metal contaminants such as lead and arsenic in the soil adjacent to areas of increased human activity. When cars drive on roads, compounds from the exhaust, oil, and other fluids might settle onto the roads and be washed into the soil. When we put salt on roads, parking lots, and sidewalks, the salts themself will eventually be washed away and enter the ecosystem through both water and soil. Chemicals from factories and other businesses also leech into our environment. Previous research has demonstrated that in areas with more human activity, like cities, soils have greater concentrations of heavy metals than found in rural areas with fewer people.

Let’s rewrite this rough draft

31 of 99

Can you improve the paragraph?

Can you make the paragraph more concise?

Can you write this for an elementary school student?

Query:

OKAY – but not specific on what improvements are needed

BETTER - we’ve specified a goal

BEST - we’ve specified an audience or tone

32 of 99

Can you improve the paragraph?

33 of 99

Can you make the paragraph smoother?

34 of 99

Can you write this for an elementary school student?

35 of 99

AI cannot read your mind!

  • Be specific about your goals.
  • Start with a simple task.
  • Give the LLM enough data.
  • Be patient.
  • Queries often require iteration to get the answer you need

36 of 99

AI for writer’s block:

How can I expand on the ideas in this paragraph?

37 of 99

What helpful prompts have you used an LLM to help you with writing?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

38 of 99

Reproducibly using LLMs

39 of 99

Be mindful of reproducibility!

There is some randomness in how an LLM answers your query

  • Keep records of your prompts
  • Keep records of the LLM answers
  • Ask the question in multiple ways

Some LLMs keep logs, but also a simple markdown or google doc can work too!

40 of 99

We previously asked Bard about R:

Is here::here(“raw”, “data.csv”) equivalent to here::here(“raw/data.csv”) ?

41 of 99

AI chatbots are constantly being updated and “learning” when they are mistaken.

Answer two weeks earlier:

42 of 99

AI changes - Demonstration using Bard

https://bard.google.com/

43 of 99

Query:

If you were a person, what would you look like?

44 of 99

Bard’s answer

45 of 99

Another answer from Bard

46 of 99

And a third:

47 of 99

The answers are not always the same even if given the same prompt

48 of 99

Using AI for code!

49 of 99

50 of 99

Can people learn how to program using LLMs?

  • LLMs nearly always have bugs in the code that would need fixing
  • Sometimes LLMs make up packages!
  • Sometimes LLMs have code that is just incorrect

For now, LLMs can be a supplement to traditional coding education, not as a replacement

51 of 99

Test and validate EVERYTHING an LLM writes

  • Test your results (build in checkpoints).
  • Make sure the code is secure.
  • Have humans review your code

You are responsible for code you put out into the world, even if an LLM wrote it.

52 of 99

Good ideas for using AI for code

  • Getting a basic strategy for how to write code for something
    1. How might I go about doing ______ ?
    2. How could I structure code that would do ______ ?
    3. Is it possible to create a package that does ______?
    4. What packages could I use to make code that does _______ ?

53 of 99

54 of 99

This code did NOT work at all, but the basic structure of it got me started

55 of 99

AI as a pair programmer -

Demonstration using Phind

https://www.phind.com/

56 of 99

Good ideas for using AI for code

Getting a basic strategy for how to write code for something

    • How might I go about doing ______ ?
    • How could I structure code that would do ______ ?
    • Is it possible to create a package that does ______?
    • What packages could I use to make code that does _______ ?

Reviewing existing code for improvements

    • Can you tell me how I could make this code more readable?
    • Can you help fix the formatting, styling, and indent errors on this code?
    • Can you recommend how I could make this code more reproducible?

57 of 99

58 of 99

LLMs can be good at reviewing already existing code:

59 of 99

Image created by Candace Savonen using Avataars.

def my_function(x):

result = x

for i in range(10):

for j in range(5):

result = result + 2 * (i + 1) * (j + 1) * (i % 2 == 0 and j % 2 == 0) - 1

return result

Ruby the Researcher

Wait, what is this code even?

plot-data-2020-9-11.tsv

plot-data-20-10-2020.tsv

plot-data-20-10-2020-clean.tsv

plot_final.R

plot_final_FINAL.R

plot_final_old.R

plot.py

functions.R

functions-old.R

plot-final.png

plot-new.png

60 of 99

AI is really good for understanding unfamiliar code!

It’s like having a pair programmer explain things to you.

61 of 99

Good ideas for using AI for code

Getting a basic strategy for how to write code for something

    • How might I go about doing ______ ?
    • How could I structure code that would do ______ ?
    • Is it possible to create a package that does ______?
    • What packages could I use to make code that does _______ ?
  • Reviewing existing code for improvements
    • Can you tell me how I could make this code more readable?
    • Can you help fix the formatting, styling, and indent errors on this code?
    • Can you recommend how I could make this code more reproducible?
  • Annotating or upping documentation
    • Can you annotate this code?
    • Can you explain to me what this code is doing?
    • Can you create a README for this code?

62 of 99

Image created by Candace Savonen using Avataars.

Ruby’s code

ERROR

Ruby’s code

Now Ruby

Future Ruby

63 of 99

Code annotation Improves readability

64 of 99

Code annotation Improves maintainability

65 of 99

Code annotation Improves quality

66 of 99

There are many benefits to annotating code:

  • Improves readability
  • Improves maintainability
  • Improves quality

67 of 99

Ethics alert

Remember, you don’t know what an LLM does with the code you provide in your query.

Keep in mind:

  • Is the code proprietary?
  • Are you planning to commercialize the code summary? Who owns the code?
  • Does the code contain sensitive data?

68 of 99

AI as a code explainer -

Demonstration using ChatGPT 3.5

https://poe.com/ChatGPT

https://chat.openai.com/

69 of 99

Query:

What does this function do?

def my_function(x):

result = x

for i in range(10):

for j in range(5):

result = result + 2 * (i + 1) * (j + 1) * (i % 2 == 0 and j % 2 == 0) - 1

return result

70 of 99

What helpful prompts have you used to use LLMs to help you with code or programming?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

71 of 99

A Good README

72 of 99

A Good README

Info that should be included in a README:

  1. General purpose of the project
  2. Instructions on how to re-run the project
  3. Lists of any software required by the project
  4. Input and output file descriptions.
  5. Descriptions of any additional tools included in the project?

73 of 99

Choosing an LLM

74 of 99

Which LLM should you use?

Wow, there are already so many options!

75 of 99

AI is only as good as its training data

LLMs aren’t really “talking” to you

They are simply putting words together based on patterns learned from the training data.

76 of 99

AI is only as good as its training data

Biased training data

Biased LLM responses

77 of 99

The bias often lies with us

78 of 99

79 of 99

Bias exists, so it is our responsibility that our work doesn’t make things worse!

80 of 99

Being aware of Bias in AI

  • Be aware of the potential biases in the data that is used to train AI systems
    • What data is this based on?
    • How inclusive would this data be?
  • Consider the possible outcomes
  • Be VERY critical of the output from LLMs
  • Call out the LLM and ask it to combat its own biases

81 of 99

Let’s strive to become more aware of the biases everywhere!

(Especially the ones that we aren’t affected by but affect others) !

  • Fred Hutch Bias training
  • Inclusivity resources

82 of 99

Calling out an LLM:

Part 1

83 of 99

Calling out an LLM:

Part 2

84 of 99

Calling out an LLM Part 3

85 of 99

Algorithmic Justice League

Dr. Joy Buolamwini

AI needs to be:

Equitable

  • Agency and control on how individuals interact with AI
  • Affirmative consent – an individual must opt into a service being given all the information

Accountable

  • Meaningful transparency into the creation of an AI system and its known limitations
  • Continuous oversight by third parties as the AI evolves
  • Provide a path to contest and correct

86 of 99

Knowing that LLMs have these biases, what strategies can we use to make sure we don't perpetuate their biases when we use them?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

87 of 99

Let’s talk about data privacy

Assume that nothing you give an LLM is kept private or secure*

*unless you created the LLM

  • Unclear what companies behind the LLMs do with query data
  • Limited information on how the companies secure query information

88 of 99

NEVER submit code that contains PHI or PII to AI tools

89 of 99

Activity: Make a README with an LLM

  1. Have some code that is not as well annotated as it could be or does not have a README. (here’s some bad code from me you can use)
  2. Go to ChatGPT
  3. Ask ChatGPT to create a README
  4. Take a look at the output and reiterate the prompt with how you’d like the README changed.
  5. Put that README with your code base!

90 of 99

Do this!

Do NOT do this!

Be transparent about how you use LLMs wherever you use them! Say which one you used. Keep records of your prompts!

Do not hide the fact that you’ve used an LLM for your work

Look for LLM guidelines and requirements specific to your field or a particular submission or application!

Do not use LLMs without considering the application! (People who review what you’ve submitted can often tell!)

Use LLMs for drafting emails or some kinds of correspondences, but make sure you read and edit what it has written

Do not use LLMs to be a journal reviewer in your stead!

Verify the accuracy (and sources!) of LLM output

Do not use LLMs without verifying their output!

Feel free to give a LLM code if it does not contain PHI or PII

NEVER submit code that contains PHI or PII! And don’t share anything to an LLM you wouldn’t share with someone not on the project (or IRB)

91 of 99

What helpful things have you used LLMs for successfully?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

92 of 99

Activity options:

  1. Make a README for code
  2. Altering/summarize a piece of writing

Steps:

  1. Try at least two different LLMs
  2. Log your queries and results in a word or markdown document
  3. Try multiple iterations of queries
  4. Be ready to tell the group what you did or didn’t find useful!

93 of 99

  1. What was your goal in your work with the LLM?
  2. What queries/strategies worked with the LLM?
  3. What didn’t work with the LLM?
  4. Did one LLM work better than another?

94 of 99

LLM

What is it really good at?

What does it struggle with?

Bard

  • Most human-like interaction
  • Answers oddball questions
  • Willing to answer “I don’t know”
  • Gives least amount of detail in answers
  • Has been known to give incorrect answers

ChatGPT

  • Most popular, which means most tested
  • Good all-around LLM
  • Unlikely to change answer even when told previous answer was wrong
  • Invents citations
  • Known hallucination issues

Claude

  • Good all-around LLM
  • Offers specific advice when editing a writing sample for tone
  • Best understanding of clever word play
  • Can sometimes require prodding to give additional detail
  • Doesn’t easily save threads at this time (but this is changing!)

Phind

  • Great for technical programming questions
  • Provides links to sources unprompted
  • Offers many programming options at once
  • Tends to plagiarize sources directly when used for writing

95 of 99

How likely are you to use what you learned in your daily work?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

96 of 99

How likely would you be to recommend this workshop?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

97 of 99

What did you like most about the workshop?

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

98 of 99

Please share any recommendations you have for improvements.

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

99 of 99

Demographics Survey