Download long.csv (see “CSV format” under “Files used in the examples”) long <- read.csv('long.csv') long First Last Age Group Type Plant 1 John Sims 30 A Veggie Artichoke 2 Mary Sims 25 A Veggie Asparagus 3 John Kent 40 A Veggie 4 John Sims 30 B Veggie Broccoli 5 Mary Sims 25 B Veggie 6 John Kent 40 B Veggie 7 John Sims 30 C Veggie Carrot 8 Mary Sims 25 C Veggie Celery 9 John Kent 40 C Veggie 10 John Sims 30 A Fruit Apple 11 Mary Sims 25 A Fruit Apricot 12 John Kent 40 A Fruit Avocado 13 John Sims 30 B Fruit Banana 14 Mary Sims 25 B Fruit 15 John Kent 40 B Fruit Blueberry 16 John Sims 30 C Fruit Cherry 17 Mary Sims 25 C Fruit Cranberry 18 John Kent 40 C Fruit Clementine Option 1: Transform using pivot_table |
import pandas as pd # Set index as variables we won’t transform # aggfunc = ‘first’ means that we take the first observation per group. long.pivot_table(index=['First','Last','Group','Age'], columns='Type', values='Plant', aggfunc='first').reset_index().rename_axis(None, axis=1) # Rows with NaN in both types are removed automatically First Last Group Age Fruit Veggie 0 John Kent A 40 Avocado NaN 1 John Kent B 40 Blueberry NaN 2 John Kent C 40 Clementine NaN 3 John Sims A 30 Apple Artichoke 4 John Sims B 30 Banana Broccoli 5 John Sims C 30 Cherry Carrot 6 Mary Sims A 25 Apricot Asparagus 7 Mary Sims C 25 Cranberry Celery |
# Specify the level of index to unstack. For mixed left, the index is ‘Type’, which is level=4 mixedL = long.set_index(['First','Last','Age','Group','Type']).unstack(level=4) mixedL.reset_index() First Last Age Group Plant Fruit Veggie 0 John Kent 40 A Avocado NaN 1 John Kent 40 B Blueberry NaN 2 John Kent 40 C Clementine NaN 3 John Sims 30 A Apple Artichoke 4 John Sims 30 B Banana Broccoli 5 John Sims 30 C Cherry Carrot 6 Mary Sims 25 A Apricot Asparagus 7 Mary Sims 25 B NaN NaN 8 Mary Sims 25 C Cranberry Celery |
Functions referenced:
More information:
Data reshaping with pandas explained
--- The End ---