1 of 24

Dataframe

2 of 24

pandas

  • pandas is a very useful package for managing tables. It is widely used in biomedical data analysis.
  • pandas can generate a special data type – dataframe to store information.
  • https://hackmd.io/@wiimax/10-minutes-to-pandas

3 of 24

Call indices and columns

4 of 24

Transpose

5 of 24

Sort

6 of 24

Select rows/columns

Select columns

Select rows

Select rows and columns

7 of 24

Select rows/columns

Selected by position

Boolean indexing

8 of 24

Re-assign values

By index and column

By matching value

9 of 24

Exercise

  • We had an exam of Python language. The scores are as below:

Student ID

Q1

Q2

Q3

Q4

M0001

25

23

20

22

M0002

15

22

23

17

M0003

15

19

25

25

M0004

13

20

22

25

M0005

23

23

20

15

  • Please tell me the students who get higher than 20 points at Q1.
  • A mistyping for M0005, he actually got 25 in Q4. Please help me to re-assign his score.

10 of 24

11 of 24

Add rows/columns

Add a row

Add a column

12 of 24

Remove the rows/columns

Remove rows

Remove columns

13 of 24

Rename columns/indices and reset index

14 of 24

Manage missing values

Missing values

15 of 24

Merge dataframe

16 of 24

Simple statistics

columns

17 of 24

Simple statistics

rows

18 of 24

Simple statistics

A specific column

19 of 24

Exercise

  • The same scores of a exam are as below:

Student ID

Q1

Q2

Q3

Q4

M0001

25

23

20

22

M0002

15

22

23

17

M0003

15

19

25

25

M0004

13

20

22

25

M0005

23

23

20

15

  • Please add a new column to store the sum of scores for each student and a new row for the average of each question.

20 of 24

21 of 24

Read file

22 of 24

Write file

23 of 24

Iteration

Iterate columns

Iterate rows

24 of 24

Exercise

  • Logarithm the expression values of genes in GSE166046.

https://drive.google.com/file/d/1q_OBMRDod_tfwEthFpnTHSXZacyaytCS/view?usp=drive_link