1 of 36

Getting Started with STATA

2 of 36

This session will cover

  • The software
  • How to get data in
  • Cleaning data
  • Calculating descriptive statistics
  • Making graph

3 of 36

The Software

Installing STATA

The interface

.do files

4 of 36

Installing STATA

  • Stata is on all university IT managed computers on campus.
  • University has a license to install on your personal laptop if you are in one of the departments in the Social Science Faculty
  • To install, go to the IT software Stata pages

5 of 36

STATA Interface

  • Open up STATA
  • Results window (centre)
  • Review window (left)
  • Menus (top)
  • Command window (bottom)
  • The command is repeated as well as the answer in the results window
  • Separate windows will appear for viewing data, plotting graphs, and writing commands

6 of 36

.do Files

  • Type your commands and comments in a .do file.
  • They are a record of what you have done. (There would be no record if you use the menus instead of commands.)
  • To open a .do file either:
    • Click pencil and pad icon for a new file or
    • File > Open and navigate to a previously saved .do file
  • Open the google doc and copy and paste the commands into the .do file
  • Remember to save your .do file - this is not done automatically

7 of 36

Getting started

Using commands

Importing and viewing data

Basic commands

8 of 36

Running commands

  • To run a command hold down ctrl and d:
    • display “hello”
    • display 4 + 4
  • To add a comment use * or //
  • Note the use of /* and */ for a multiple line comment.
  • Use help command if you want more guidance on a function

9 of 36

Command notation

  • Variable names are case sensitive
  • # can be used to create a comment
  • = is assigned as
  • == is equal to
  • != is not equal to
  • < less than
  • <= less than or equal to
  • > greater than
  • >= greater than or equal to
  • & and
  • | or
  • ! not
  • If can be used to specify certain conditions

10 of 36

Importing data

  • You need to tell Stata what drive and folder the data file is in
  • Use the command cd to change the directory (i.e. the drive and folder)
    • cd “___DIRECTORY PATH___”
  • Read in the data
    • import excel using Name_Of_File.xlsx, firstrow
  • Firstrow specifies that the first line is the variable names
  • You can also use the menu options - File > Import > Excel
  • You can read in other types of data (.csv, .sav, etc)

11 of 36

Viewing the data

  • Variables window (top right)
  • Properties window (bottom right)
  • Click through variables to have a check
  • To see data click grid with magnifier (top)
  • To edit data click grid with pencil (top) (not recommended)

12 of 36

Some basic commands

  • View your data:
    • browse
  • Check the dataset using describe
    • describe
  • Or use describe to check one variable in particular
    • describe gdp
  • To display the values of variables
    • list

13 of 36

Cleaning data

Naming and labelling variables

Replacing values

Creating new variables

14 of 36

Naming and labelling

  • To change the name of the variable
    • rename debtgdpratio ratio
  • STATA defaults to using the variable name as the label
  • To change a label
    • label variable nameofvariabletochange “Text to put in as label”
  • Run the commands in the do file to change the labels
  • Note: variable can be shortened to var

15 of 36

Replace a data point

  • In STATA a missing data point is denoted by a .
    • Browse data to check this
    • The gdp for the UK is missing
  • If you wanted to change a piece of data
    • replace gdp = 2631.23 if country ==“United Kingdom”

16 of 36

Create a new variable

  • You might want to use the data in your current variables to create a new variable
    • For example you could create a variable of non European and European countries. 0 for non european and 1 for Europe
  • Create a new column, with a name, filled in with 0s.
    • generate european = 0
  • Fill in with 1s if there is the word Europe in the variable region
    • replace european = 1 if region == “Europe”

17 of 36

Number code a variable: Method 1

  • Convert string variables into numbers - this makes it easier for the software to identify groupings and reduces the risk of human error
  • For example 1 for Europe, 2 for Asia etc,
    • gen regioncode= 0
    • replace regioncode=1 if region==“Europe”
    • replace regioncode=2 if region== “Asia”
    • replace regioncode=3 if region=="North America"
    • replace regioncode=4 if region=="Oceania"
    • replace regioncode=5 if region=="South America"
  • Add to the value label to note what these codes mean
    • label define myvaluelabel 1 “Europe” 2 “Asia” 3 “North America” 4 “Oceania” 5 “South America”
    • label values regioncode myvaluelabel

18 of 36

Number code a variable: Method 2

  • Similar number coding can be achieved with
    • encode region, gen (regioncode2)
  • This variable has the data type long which is a number. STATA will treat this variable as a number.
  • Note that the groupings are ordered automatically alphabetically

19 of 36

Calculate a new variable

  • To calculate a new variable based on the values in current variables use generate or gen
  • For example if you wanted to calculate the percent of debt for each country
    • First calculate the total debt
      • total debt
    • Then you can use that total debt to calculate the percent of debt for each country in a new variable.
      • gen debtpercent = (debt/45498.86)*100

20 of 36

Calculate a variable from multiple variables

  • You can use information from several variables to create a new variable.
  • You can also use criteria to select information
  • For example use region and gdp to code countries as 1 for european gdp above $1000million, 2 for gdp below $1000million or 3 other countries.
    • gen gdpeurope = 3
    • replace gdpeurope=1 if region == “Europe” & gdp> 1000
    • replace gdpeurope=2 if region == “Europe” & gdp< 1000

21 of 36

Selecting variables

  • If you want to exclude a variable
    • drop regioncode2
  • If you want to exclude some rows/observations
    • drop in 1 / 4
  • Note drop is permanent!
  • If you want to only work with certain variables
    • keep country gdp ratio debt region income
  • This will delete the other variables, and is permanent!

22 of 36

Save your data as a .dta file

  • .dta files are STATA’s own data file format
  • Save the data as a dta file e.g.
    • save M:/Stata Workshop/gdpdebt.dta
  • Next time you can read it in as a dta file using something like
    • use M:/Stata Workshop/gdpdebt.dta

23 of 36

Calculating descriptive statistics

Summarising data

Temporarily selecting data

Creating frequency tables

24 of 36

Run summary descriptive statistics

  • To calculate the number of observations, mean, standard deviation, minimum and maximum values use summarize (with a z) or sum
  • For all variables
    • sum
  • You can just summarise one variable
    • sum gdp
  • You can get descriptive statistics according to criteria or groups
    • sum if region == “Asia”
    • sum gdp if gdp < 1000
    • bysort region : sum gdp

25 of 36

Temporarily selecting data

  • The preserve command remembers the dataset while you select a portion of it to analyse. You can then recall the preserved data.
    • preserve
    • keep if region == “Europe”
    • summarize gdp
    • restore, preserve

26 of 36

Creating a frequency table

  • Use table or tab to make a frequency table - also known as a contingency table or crosstabs
  • You can run these commands with one or two string variables
    • tab region
    • tab region income
  • You can’t run tab on string variables - it gives you a table with each value listed which is not useful
    • tab region gdp

27 of 36

Making graphs

Types of graphs

Different bar graphs

Changing how the graph looks

28 of 36

Types of graphs

  • If you wanted to check for normality, create a histogram
    • hist ratio
  • A boxplot of gdp
    • graph box gdp
  • A scatterplot
    • scatter gdp debt

29 of 36

Different bar graphs

  • A bar graph
    • graph bar gdp, over (region)
  • A horizontal bargraph
    • graph hbar gdp debt
  • A horizontal stacked bargraph by region or country
    • graph hbar gdp debt, stack
    • graph hbar gdp debt, over (region) stack
    • graph hbar gdp debt, over (country) stack

30 of 36

Label axes

  • Start with a bar graph
    • graph bar gdp, over (region)
  • Add some x and y axes labels
    • graph bar gdp, over (region) b1title(Regions of Countries in the Dataset) ytitle(Mean GDP)

31 of 36

Change the colours

  • To make all the bars the same colour
    • graph bar gdp, over (region) bar(1, color (green))
  • To make each bar a different colour
    • graph bar gdp, asyvars over(region) bar(1, color (green)) bar(2, color(red)) bar(3, color(blue)) bar(4, color(orange)) bar(5, color(yellow))
  • Or use a preset scheme such as:
    • s1mono
    • economists
    • s2color
      • graph bar gdp, asyvars over (region) scheme(economists)

32 of 36

Save your graph

  • With the graph still open
    • graph export “My Graph.png”

33 of 36

Summary

Today, we have looked at:

  • The software
  • How to get data in
  • Cleaning data
  • Calculating descriptive statistics
  • Making graphs

34 of 36

Resources

35 of 36

Maths Skills Centre

  • One-to-one appointments available to all students during term-time and vacation
  • Personalised advice and guidance on maths and statistics topics
  • Online and on campus workshops covering a range of topics including statistical tests and software

Maths Skills Centre

36 of 36

Any questions?

Please book a one-to-one statistics appointment if you have questions you would like to discuss with a statistics tutor.

If you have any feedback you would like to share with us, please fill in this feedback form.