1 of 43

Developing a daily time step individual level demographic simulation model

Presentation slides prepared for a CSAP research cluster meeting, School of Geography, University of Leeds, UK, 2012-12-11

2 of 43

Outline

  • Why?
  • What?
  • How?
  • Results
  • Plans and Next Steps
  • Feedback

3 of 43

Why?

  • Generally...
    • Demographic data is used in a wide range of applications
      • Epidemiology for estimating prevalence and incidence rates
      • Service planning
      • Risk management
      • Commercial
    • Census data tend to be years old by the time outputs are made available
    • Contemporary populations are assumed to be highly and increasingly mobile and fertility in terms of live births is perhaps becoming more variable

4 of 43

    • Demographic forecasting is important
      • Planning is a key to sustainability
        • Dependency ratios are increasing in many countries with increasing aged populations
        • Pensions
        • Welfare
        • Services and infrastructure
    • Many countries (including the UK) do not have official residential registration data that tracks the location of people (between censuses)
    • Our ability to track where everyone has lived improves continually, but we need the models and the data to forecast and provide the best estimates

5 of 43

  • Why daily time steps?
    • People tend to be born, die and move residences on specific days
      • It intrinsically makes sense to model at this resolution
      • Modelling for multiple days either misses important events or becomes much more complicated
        • Consider
          • migrations of individuals within a time period
          • births (same mother) at different times within a year
    • Allows for linkage with models of daily activity that work on sub-daily time steps

6 of 43

    • Allows for variation in mortality, fertility and migration rates over the year to be modelled
      • Mortality, Fertility, Miscarriage and Migration are seasonal
        • Student migration
        • Holidays and fertility
        • Winter mortality
      • Power cuts/flood events and birth spikes
    • Allows for new and exciting aggregate data/statistics to be produced
      • Distribution of the total number of
        • births per month in a region
        • moves per person in a year
          • maximium, minimum, average, variance

7 of 43

  • Why individual level?
    • Data can be linked with other individual level data
      • e.g. Disease data
    • It has the possibility that other individual data can be augmented or linked
      • Linkage and substitution with "real data"
    • Everybody is different
      • Individuals have their own mortality, fertility and migration probabilities and history
        • There is scope for specifying these in the model
          • In such a way as to keep overall counts of births and deaths at observed levels
        • Return migration

8 of 43

What?

  • Stages of development
  • The nature of the model
  • Initialisation
  • Daily Simulation
    • Death
    • Birth
    • Migration
  • Results

9 of 43

Stages of Development

  • Natural change Simulation Model
    • ESRC funded GENESIS Project
    • Leeds Output Area level results produced
  • e-Infrastructure
    • JISC funded NeISS Project
    • Web Portal based User Interface
    • Simulation Models configured, run and results stored on e-Science resources
  • Migration model component
    • Not externally funded
    • Developed since July 2012

10 of 43

The nature of the model

  • Open Source
    • Development repositories
  • Java

11 of 43

  • Dependencies
    • Generic Library
    • MoSeS Code
      • For loading 2001 UK Population Census Data
  • Grid enabled
    • Thanks to NeISS collaboration with Tom Doherty based at University of Glasgow
  • Run for multiples of a year
  • Individual representation
    • Males
    • Females

12 of 43

  • Stochastic yet deterministic
    • Based on pseudo-random sequences
    • Results replicable
  • Study Region
    • Comprised of regions and subregions
  • 2 stages to modelling
    • Initialisation
    • Simulation
  • Simulation proceeds for each subregion in turn, and for each individual in turn
  • Synchronisation needed for each daily step

13 of 43

  • There are many simplifying assumptions
    • Many things are assumed to be evenly distributed
    • Some things are not explicitly modelled
  • There are interesting model details
    • Pregnancy and miscarriage
    • Multiple births
  • Input data
    • Population count data by age and gender
    • Either birth and death counts or fertility and mortality probabilities
    • Migration data

14 of 43

  • Output data
    • Produced annually for study region, regions, subregions and aggregates of subregions
    • Includes raw ASCII data (XML,CSV), binary serialised Java object data, and images (PNG)
      • Population count estimates
      • Mortality and fertility estimates
      • Migration estimates
      • Comparisons with an annual time step model
        • Which uses mid year population estimation
      • An individual level population data set

15 of 43

16 of 43

17 of 43

18 of 43

19 of 43

20 of 43

21 of 43

Initialisation

  • For each region
    • Daily survival probabilities are calculated for each age and gender
      • Death rate assumed to be even throughout the year
    • Daily pregnancy probabilities are calculated for each age of potential mother
      • Annual Live Birth Fertility Rates are factored for multiple births, miscarriage and death of mother
      • Pregnancy rate and miscarriage rate assumed to be even throughout the year

22 of 43

    • Daily migration
      • Assumes migration evenly distributed throughout the year
      • General migration probability calculated
      • Internal migration rates are calculated for migration within the region
      • In migration rates are calculated for people moving from all regions not in the study region
    • Cumulative sums of migration are calculated to help determine
      • The region destination for each out migration
      • The subregion destination for each in migration

23 of 43

  • Each person is initialised
    • Assigned a date of birth
    • Assigned to a subregion as usually resident
    • Females are assigned pregnancies and due dates

24 of 43

Daily Simulation

  • For each person
    • Do they die?
    • Is it their birthday?
      • If so update population statistics
    • Do they migrate?
      • If yes, find out where they move to
  • For each female
    • If pregnant do they have a miscarriage
    • If due give birth
    • If not pregnant, determine if they become pregnant

25 of 43

  • Having gone through the population for all regions in the study region
    • Migrate those migrating out of the study region
    • Migrate those migrating within the study region
    • For migration into the study region from outside of the study region
      • Create individuals
        • Assigning date of birth
        • Record migration origin location
        • Assign subregion usual resident location

26 of 43

How?

  • Designed for (scalability) simulating large populations with large numbers of regions and subregions
    • Individual level data stored in collections which are swapped to and from slower access storage as required
    • Numerical indexes are stored in mapped collections
  • Computational demands are considerable
    • Consider simulating a single region, population ~1 million, with ~10 thousand subregions
      • Can all the data be stored in the available fast access memory?

27 of 43

    • For a simple model, a 10 year simulation might take many days with only one CPU
      • Each individual in the population is updated ~3650 times
    • The amount of persistent data produced and that we want to store is in the order of tens of GigaBytes
    • For a UK Simulation there are in the order of 60 million individuals and 200 thousand subregions
  • Grid enabled
  • Parallelisation
  • Numerical precision
    • Java BigDecimal

28 of 43

Results

  • Results for simulations without migration
    • Provide confidence in daily probability calculations for natural processes
      • The expected amounts of deaths, pregnancies, miscarriages and births result at a regional level
    • Variation
      • At sub-regional level can be large
      • At regional level are generally small
      • At aggregated sub-regional level are intermediate
      • For less frequently occurring events is greater

29 of 43

Variation in results

  • 4 runs
    • Everything but the pseudo-random seed start point is the same
    • Usually there are 5 lines on each side of a plot
      • Min and Max are lightest grey
      • Q1 and Q3 are darker grey
      • Median is black
      • Some measures are highly variable
      • There is more variation for smaller regions

30 of 43

31 of 43

32 of 43

33 of 43

34 of 43

35 of 43

36 of 43

37 of 43

Migration

  • Types of migration modelled
    • Immigration
    • In migration to Study Region
    • Out migration from Study Region
    • Internal migration within Study Region
  • Input data
    • 2001 UK Population Special Migration Statistics
      • LAD to LAD flows by age and gender
      • OA to OA flows by age and gender
  • Region (LAD to LAD) flows are primarily used

38 of 43

  • Subregion (OA to OA) flows are used to assign individuals to subregions with each region
  • A migration factor and a minimum flow

39 of 43

Plans and Next Steps

  • Add emigration to the model
  • Detailed results statistics for migration
  • Simulate population change in West Yorkshire from 2001 to 2011
    • Vary migration factor and minimum flow
    • Present results at an appropriate event
    • Publish a paper on the demographic simulation model and the results for West Yorkshire
  • Simulate population change for all of England from 2001 to 2011
    • Compare results with 2011 census data
    • More publication

40 of 43

  • Further modelling
    • Use Nik Lomax's estimated migration flows for 2001 to 2011
    • Constrain migration using subregion area classifications
    • Allow for variations in mortality, pregnancy, miscarriage and migration rates over the year
      • Student migration
    • Migrating groups (families/households)
    • Fathers

41 of 43

  • Seek data for more detailed simulations
    • Annual and regional miscarriage data
  • Seek collaboration with statistical offices
  • Seek further funding
    • Secondment to UK ONS funded by ESRC?

42 of 43

Feedback

  • Much can be done to improve this work
  • What has emerged is something like the simplest demographic model
    • There is much detail to add...
  • Anyone interested in writing this up or collaborating in anyway?
  • Any questions?

43 of 43

Thank You