2 of 36

This session will cover

The software
How to get data in
Cleaning data
Calculating descriptive statistics
Making graph

3 of 36

The Software

Installing STATA

The interface

.do files

4 of 36

Installing STATA

Stata is on all university IT managed computers on campus.
University has a license to install on your personal laptop if you are in one of the departments in the Social Science Faculty
To install, go to the IT software Stata pages

5 of 36

STATA Interface

Open up STATA
Results window (centre)
Review window (left)
Menus (top)
Command window (bottom)
The command is repeated as well as the answer in the results window
Separate windows will appear for viewing data, plotting graphs, and writing commands

6 of 36

.do Files

Type your commands and comments in a .do file.
They are a record of what you have done. (There would be no record if you use the menus instead of commands.)
To open a .do file either:

Click pencil and pad icon for a new file or
File > Open and navigate to a previously saved .do file

Open the google doc and copy and paste the commands into the .do file
Remember to save your .do file - this is not done automatically

7 of 36

Getting started

Using commands

Importing and viewing data

Basic commands

8 of 36

Running commands

To run a command hold down ctrl and d:

display “hello”
display 4 + 4

To add a comment use * or //
Note the use of /* and */ for a multiple line comment.
Use help command if you want more guidance on a function

9 of 36

Command notation

Variable names are case sensitive
# can be used to create a comment
= is assigned as
== is equal to
!= is not equal to
< less than
<= less than or equal to
> greater than
>= greater than or equal to
& and
| or
! not
If can be used to specify certain conditions

10 of 36

Importing data

You need to tell Stata what drive and folder the data file is in
Use the command cd to change the directory (i.e. the drive and folder)

cd “___DIRECTORY PATH___”

Read in the data

import excel using Name_Of_File.xlsx, firstrow

Firstrow specifies that the first line is the variable names
You can also use the menu options - File > Import > Excel
You can read in other types of data (.csv, .sav, etc)

11 of 36

Viewing the data

Variables window (top right)
Properties window (bottom right)
Click through variables to have a check
To see data click grid with magnifier (top)
To edit data click grid with pencil (top) (not recommended)

12 of 36

Some basic commands

View your data:

browse

Check the dataset using describe

describe

Or use describe to check one variable in particular

describe gdp

To display the values of variables

list

13 of 36

Cleaning data

Naming and labelling variables

Replacing values

Creating new variables

14 of 36

Naming and labelling

To change the name of the variable

rename debtgdpratio ratio

STATA defaults to using the variable name as the label
To change a label

label variable nameofvariabletochange “Text to put in as label”

Run the commands in the do file to change the labels
Note: variable can be shortened to var

15 of 36

Replace a data point

In STATA a missing data point is denoted by a .

Browse data to check this
The gdp for the UK is missing

If you wanted to change a piece of data

replace gdp = 2631.23 if country ==“United Kingdom”

16 of 36

Create a new variable

You might want to use the data in your current variables to create a new variable

For example you could create a variable of non European and European countries. 0 for non european and 1 for Europe

Create a new column, with a name, filled in with 0s.

generate european = 0

Fill in with 1s if there is the word Europe in the variable region

replace european = 1 if region == “Europe”

17 of 36

Number code a variable: Method 1

Convert string variables into numbers - this makes it easier for the software to identify groupings and reduces the risk of human error
For example 1 for Europe, 2 for Asia etc,

gen regioncode= 0
replace regioncode=1 if region==“Europe”
replace regioncode=2 if region== “Asia”
replace regioncode=3 if region=="North America"
replace regioncode=4 if region=="Oceania"
replace regioncode=5 if region=="South America"

Add to the value label to note what these codes mean

label define myvaluelabel 1 “Europe” 2 “Asia” 3 “North America” 4 “Oceania” 5 “South America”
label values regioncode myvaluelabel

18 of 36

Number code a variable: Method 2

Similar number coding can be achieved with

encode region, gen (regioncode2)

This variable has the data type long which is a number. STATA will treat this variable as a number.
Note that the groupings are ordered automatically alphabetically

19 of 36

Calculate a new variable

To calculate a new variable based on the values in current variables use generate or gen
For example if you wanted to calculate the percent of debt for each country

First calculate the total debt

total debt

Then you can use that total debt to calculate the percent of debt for each country in a new variable.

gen debtpercent = (debt/45498.86)*100

20 of 36

Calculate a variable from multiple variables

You can use information from several variables to create a new variable.
You can also use criteria to select information
For example use region and gdp to code countries as 1 for european gdp above $1000million, 2 for gdp below $1000million or 3 other countries.

gen gdpeurope = 3
replace gdpeurope=1 if region == “Europe” & gdp> 1000
replace gdpeurope=2 if region == “Europe” & gdp< 1000

21 of 36

Selecting variables

If you want to exclude a variable

drop regioncode2

If you want to exclude some rows/observations

drop in 1 / 4

Note drop is permanent!
If you want to only work with certain variables

keep country gdp ratio debt region income

This will delete the other variables, and is permanent!

22 of 36

Save your data as a .dta file

.dta files are STATA’s own data file format
Save the data as a dta file e.g.

save M:/Stata Workshop/gdpdebt.dta

Next time you can read it in as a dta file using something like

use M:/Stata Workshop/gdpdebt.dta

23 of 36

Calculating descriptive statistics

Summarising data

Temporarily selecting data

Creating frequency tables

24 of 36

Run summary descriptive statistics

To calculate the number of observations, mean, standard deviation, minimum and maximum values use summarize (with a z) or sum
For all variables

You can just summarise one variable

sum gdp

You can get descriptive statistics according to criteria or groups

sum if region == “Asia”
sum gdp if gdp < 1000
bysort region : sum gdp

25 of 36

Temporarily selecting data

The preserve command remembers the dataset while you select a portion of it to analyse. You can then recall the preserved data.

preserve
keep if region == “Europe”
summarize gdp
restore, preserve

26 of 36

Creating a frequency table

Use table or tab to make a frequency table - also known as a contingency table or crosstabs
You can run these commands with one or two string variables

tab region
tab region income

You can’t run tab on string variables - it gives you a table with each value listed which is not useful

tab region gdp

27 of 36

Making graphs

Types of graphs

Different bar graphs

Changing how the graph looks

28 of 36

Types of graphs

If you wanted to check for normality, create a histogram

hist ratio

A boxplot of gdp

graph box gdp

A scatterplot

scatter gdp debt

29 of 36

Different bar graphs

A bar graph

graph bar gdp, over (region)

A horizontal bargraph

graph hbar gdp debt

A horizontal stacked bargraph by region or country

graph hbar gdp debt, stack
graph hbar gdp debt, over (region) stack
graph hbar gdp debt, over (country) stack

30 of 36

Label axes

Start with a bar graph

graph bar gdp, over (region)

Add some x and y axes labels

graph bar gdp, over (region) b1title(Regions of Countries in the Dataset) ytitle(Mean GDP)

31 of 36

Change the colours

To make all the bars the same colour

graph bar gdp, over (region) bar(1, color (green))

To make each bar a different colour

graph bar gdp, asyvars over(region) bar(1, color (green)) bar(2, color(red)) bar(3, color(blue)) bar(4, color(orange)) bar(5, color(yellow))

Or use a preset scheme such as:

s1mono
economists
s2color

graph bar gdp, asyvars over (region) scheme(economists)

32 of 36

Save your graph

With the graph still open

graph export “My Graph.png”

33 of 36

Summary

Today, we have looked at:

The software
How to get data in
Cleaning data
Calculating descriptive statistics
Making graphs

34 of 36

Resources

Academic Skills Community Workshops
One-to-one appointments with a maths or statistics tutor
Digital Skills Training (including Library skills)
Maths Skills Centre Practical Guides
Statology Stata Guides
Stata Cheat Sheets

35 of 36

Maths Skills Centre

One-to-one appointments available to all students during term-time and vacation
Personalised advice and guidance on maths and statistics topics
Online and on campus workshops covering a range of topics including statistical tests and software

Maths Skills Centre

36 of 36

Any questions?

Please book a one-to-one statistics appointment if you have questions you would like to discuss with a statistics tutor.

If you have any feedback you would like to share with us, please fill in this feedback form.