Contents - Updated 12/04/19

1. Information Technology Strategy        3

1.0 Foreword        3

1.0.1  Guidance on IT strategy for national statistical institutions        3

1.0.2  Target audience        4

1.0.3 Contributors        4

1.1 Introduction        5

1.2 Creating the IT Strategy - the importance of awareness        6

1.2.1 Mapping the Landscape        7

1.2.2 The Strategy Cycle        10

1.3 Principles (Doctrine)        11

1.3.1 Who are the users (e.g. customers, shareholders, regulators, staff)        12

1.3.2 Focus on user needs        14

1.3.3 Use a common language (necessary for collaboration)        15

1.3.4 Use a systematic mechanism of learning (a bias towards data)        16

1.3.5 Focus on high situational awareness (understand what is being considered)        17

1.3.6 Be transparent (a bias towards open)        18

1.3.7 Challenge assumptions (speak up and question)        18

1.3.8 Remove bias and duplication        19

1.3.9 Think small (as in know the details)        20

1.3.10 Use appropriate methods (e.g. agile vs lean vs six sigma)        21

1.3.11 Move fast (an imperfect plan executed today is better than a perfect plan executed tomorrow)        24

1.3.12 Be pragmatic (it doesn’t matter if the cat is black or white as long as it catches mice)        25

1.3.13 Manage inertia (e.g. existing practice, political capital, previous investment)        25

1.3.14 Effectiveness over efficiency        29

1.3.15 Leave no-one behind        31

1.4 Strategy - Integrating this into strategy        31

1.4.1 UN SDG Example        31

1.5 Technologies / Techniques        39

1.5.1 Big Data        40

1.5.2 Cloud (Todo)        42

1.5.3 IoT (Todo)        42

1.5.4 Artificial Intelligence        43

1.5.4.1 Recommendations        43

1.5.5 Data Science (Todo)        45

1.5.6 Serverless Technology        46

1.5.6.1 Serverless Security        48

1.5.6.2 Recommendations        48

1.5.7 Privacy Preserving Techniques        49

1.6 Data Management        52

1.6.1 Linked data (Todo)        52

1.6.2 Data integration / data linkage (Todo)        52

1.7 IT Management Models (Todo)        54

1.7.1 Intelligent Sourcing        54

1.7.2 In-house development        56

1.7.3 Outsourced development        56

1.7.4 Hybrid In-house/Outsource model        57

1.7.5 Collaborative model        57

1.8 Use of Standards and Frameworks (Todo)        59

1.8.1 The Importance of standards        59

1.8.2 Conceptual statistical industry standards        60

1.8.2.1 Generic Activity Model for Statistical Organisations (GAMSO)        60

1.8.2.2 Generic Statistical Business Process Model (GSBPM)        60

1.8.2.3 Generic Statistical Information Model (GSIM)        61

1.8.3 Open Standards for Metadata and Data        63

1.8.3.1 Statistical Data and Metadata Exchange        63

1.8.3.2 Data Documentation Initiative (DDI)        64

1.8.4 Architecture and Interoperability frameworks        65

1.8.4.1 Enterprise architecture        65

1.8.4.2 Common Statistical Processing Architecture (CSPA)        66

1.8.4.3 Common Statistical Data Architecture (CSDA)        67

1.8.4.4 European Interoperability Framework (EIF) (Todo)        68

1.9 Specialist statistical processing / analytical software        70

1.10 Dissemination tools        73

1.10.1 Channels        73

1.11 The Anti-Pattern Organisation        78

1.12 Terms Definitions        79


1. Information Technology Strategy

1.0 Foreword

This document has been prepared by the Technical Delivery Board of the UN Global Platform Committee, which is one of the sub-groups under the UN Global Working Group (GWG) on Big Data for Official Statistics.

The Statistical Commission created the GWG at its forty-fifth session, in 201. In accordance with its terms of reference (see E/CN.3/2015/4) and decision 46/101 of the Statistical Commission (see E/2015/24), the GWG provides strategic vision, direction and coordination of a global programme on big data for official statistics, including for the compilation of the Sustainable Development Goal indicators in the 2030 Agenda for Sustainable Development.

In its decision 49/107 (see E/2018/24), the Statistical Commission reaffirmed that the use of Big Data and other new data sources is essential for the modernization of national statistical institutions so that they remain relevant in a fast-moving data landscape and highlighted the opportunity for Big Data to fill gaps, make statistical operations more cost effective, enable the replacement of surveys and provide more granularities in outputs. The Commission further endorsed the proposal of the GWG to develop a global platform as a collaborative research and development environment for trusted data, trusted methods and trusted learning.

1.0.1  Guidance on IT strategy for national statistical institutions

The GWG provides guidance on using Big Data for official statistics. It has produced so far handbooks on Earth Observations for official statistics (see https://marketplace.officialstatistics.org/earth-observations-for-official-statistics), on the use of mobile phone data for official statistics (see https://marketplace.officialstatistics.org/handbook-on-the-use-of-mobile-phone-data-for-official-statistics), and on privacy preserving techniques https://marketplace.officialstatistics.org/privacy-preserving-techniques-handbook

This Handbook provides guidance and recommendations on IT strategy for national statistical institutions, which are embarking on the use of Big Data in the production of their official statistics and indicators. It uses Wardley Maps to understand where the institute has to invest in development and where it can take off-the-shelf solutions. It runs through a number of principles from knowing your users (e.g. customers, shareholders, regulators, staff) and their needs, through challenging assumptions, removing bias, using appropriate methods (e.g. agile vs lean vs six sigma), being pragmatic and managing inertia (e.g. existing practice, political capital, previous investment) to choosing effectiveness over efficiency while leaving no one behind.

The Handbook will cover new topics like Cloud, IoT, Artificial Intelligence, Data Science and Serverless technology and security, as well as the use of standards and generic models for IT management.

1.0.2  Target audience

The Handbook tries to give a high-level overview of the IT strategy for a modern statistical institute, which should be of benefit for senior managers, and provide sufficient detail to be of interest to the IT professionals.

1.0.3 Contributors

Name

Role

Organisation

Mark Craddock

Chair Technical Delivery Board

UN Global Platform

Rob McLellan

Member Technical Delivery Board

Stats Canada

Matjaz Jug

Member Technical Delivery Board

Stats Netherlands

Ronald Jansen

Member Technical Delivery Board

Statistics Division | Department of Economic and Social Affairs. United Nations

Bogdan Dragovic

Member Technical Delivery Board

Statistics Division | Department of Economic and Social Affairs. United Nations

Jan Murdoch

Strategy

Office for National Statistics

Simon Wardley

Contributor

Leading Edge Forum


1.1 Introduction

Information Technology (IT) continues to play an essential role in all aspects of statistical processing throughout the entire production life cycle from data collection through to dissemination. This is a fast moving and rapidly changing environment with new innovations being developed at a breath-taking rate.

‘Change has never happened this fast before, and it will never be this slow again’

Since the publication of the last handbook in 2003 the IT landscape has changed almost beyond recognition, but it is predictable - at that time many NSOs were just emerging from the mainframe era and have since evolved through the phases of personal computers, distributed databases the explosion of the internet, smartphones, tablets, cloud technology and new data sources.

NSOs can expect a continued and accelerating the rate of change in the years to come with further advances in Artificial Intelligence, machine learning, increased computing power, smart data and the “Internet of Things”.

Figure 1

These developments combined with changing work practices, increased user expectations, competition from other data providers and a constant drive for modernisation and increased efficiency provide an ongoing challenge for NSOs.

Harnessing the power of IT can help to meet these challenges but only if supported by a robust strategy. A robust strategy consists of understanding the business and technology landscape, a strong set of principles and a clear delivery plan. However, to understand the landscape and communicate it to others requires some form of map. The purpose of producing a map is to help us to communicate our intentions, to allow others to challenge our assumptions and for an entire organisation to learn and then apply basic patterns of change (known as climatic patterns), principles of organisation (known as doctrine) and context specific forms of gameplay. Maps are our learning and communication tool for discovering these things and enabling us to make better decisions before acting.

1.2 Creating the IT Strategy - the importance of awareness

Throughout time, understanding and exploiting the landscape has been vital in battle as it acts as a force multiplier. Probably the most famously cited example is the battle of the pass of Thermopylae where the Athenian general Themistocles used the terrain to enable 7,000 Greeks to hold back a Persian Army of 300,000 for seven days with King Leonidas and the “three hundred” reportedly holding them back for two of those days.

Maps and situational awareness are always vital to the outcome of any conflict. Maps enable us to determine the why of action – cut off an enemy supply route, gain a geographical advantage over an enemy position or restrict an opponent’s movement. The what (capture this hill), the how (bombard with artillery followed by ground assault) and the when (tomorrow morning) all flow from this, though the specifics change as no plan generally survives first contact intact.

Military maps are traditionally thought of in terms of describing a geographical environment, the physical landscape in which the theatre of battle operates. However, business is equally a competitive engagement between “opponents” but in this case fought over a business landscape of consumers, suppliers, geographies, resources and changing technology.

In Business and IT we almost never have a map of the landscape and we cannot know where to act. Our reasons for action (the why) can only ever be vague unlike the actions of Thermopylae and the exploitation of the environment to restrict a foe.

The lack of any map forces us to focus on the what rather than the why.


1.2.1 Mapping the Landscape

A map has three basic characteristics - an anchor (e.g. magnetic North), position of pieces relative to the anchor (this is North or South of that) and consistency of movement (e.g. heading North means to head North and not South).  To understand the business landscape we used a Wardley Map that has an anchor (the needs of the user), position described through visibility in a value chain and movement described through evolution. An example, simplified map of a tea shop is given below.

Figure 2 -

The map itself allows others to instantly challenge the position of components, add missing components and therefore collaborate with others in describing the landscape. Each of the components of a map have a stage of evolution. These are:-

  • Genesis. This represents the unique, the very rare, the uncertain, the constantly changing and the newly discovered. Our focus is on exploration.
  • Custom built. This represents the very uncommon and that which we are still learning about. It is individually made and tailored for a specific environment. It is bespoke. It frequently changes. It is an artisan skill. You wouldn’t expect to see two of these that are the same. Our focus is on learning and our craft.
  • Product (including rental). This represents the increasingly common, manufactured through a repeatable process, the more defined, the better understood. Change becomes slower here. Whilst there exists differentiation particularly in the early stages there is increasing stability and sameness. You will often see many of the same product. Our focus is on refining and improving.
  • Commodity (including utility). This represents scale and volume operations of production, the highly standardised, the defined, the fixed, the undifferentiated, the fit for a specific known purpose and repetition, repetition and more repetition. Our focus is on ruthless removal of deviation, on industrialisation, and operational efficiency. With time we become habituated to the act, it is increasingly less visible and we often forget it’s even there.

This evolution is shown as the x-axis and all the components on the map are moving from the left (an uncharted space of the novel and new) to the right (an industrialised space). This process of evolution is driven by supply and demand competition and hence a map is fluid if competition exists. However, this doesn’t mean organisations will build things in the right way. In our Tea Shop example, questions need to be asked over why we’re custom building kettles.

Figure 3 -

However, the map also has some advanced features which are not so immediately obvious. There is a flow of risk, information and money between components. In fact, the components themselves can represent different forms of capital such as activities, practices, data and knowledge and the lines represent bidirectional flow of capital e.g. a public consumer exchanges financial flow (revenue) for a physical flow (a cup of tea).

Any single map can have many different types of components (e.g. activities and data) and the terms we use to describe the separate stages of evolution are different for each type. In order to keep the map simple, the x-axis of evolution shows the terms for activities alone. The terms used today for other types of things are provided below.

Figure 4 -

Lastly on a map, we can not only show different forms of capital flow (e.g. financial flow, useful in creating income statements) but also our intended actions (e.g. shifting the kettle to more commodity forms) along with climatic impacts such as anticipated changes to components caused by competition (e.g. staff being replaced by robots).

Figure 5 -

With the map above, we can start to discuss the landscape. For example, have we represented the user need reasonably and are we taking steps to meet that user need? Maybe we’re missing something such as an unmet need that we haven’t included? Are we treating components in the right way? Have we included all the relevant components on the map or are we missing key critical items? We can also start to discuss our anticipation of change and our plans for the future.

1.2.2 The Strategy Cycle

Maps are part of the strategy cycle, shown below. At the highest level, it consists of an iterative series of four phases. Our first phase establishes purpose - what “game” are we in, what do we wish to attain, leading to a clear “why” of purpose. The next phase is where we observe our landscape and climate (system of forces) to establish context and awareness that we are operating in. Failure to do this adequately means that we are blind to potential opportunities or challenges, with the risk of engaging in lower value or risky actions. Our landscape is captured in the form of maps.

We then proceed to the “orient” phase,  where we incorporate important elements of doctrine, in the form of principles and norms directing our action. We identify a number of principles in the section to follow. These provide important “north star” direction to our work. The final phase is that of deciding what we are going to do, resulting in a transition to “Act”. Collectively the Observe. Orient and Decide phases provide our “why of movement” wherein we make decisions on an on-going basis as we proceed i.e. why should we be taking this action over that action.

As we continue to act we further iterate to assess the on-going validity and change our maps and actions accordingly. The maps themselves can be an important source of learning assuming that at some future point we review the map which describes the context with the actions decided and examine what actually happened.

Figure 6 -

The next section addresses doctrine in our Orient phase through a discussion of principles.

1.3 Principles (Doctrine)

The following section outlines the core doctrine of this handbook. This is not an exhaustive list but instead it covers the basic principles that should be applied. These principles are considered to be universal and applicable to all industries regardless of the landscape they operate in. This doesn’t mean that the doctrine is right but instead that it appears to be consistently useful for the time being. There will always exist better doctrine in the future and it is anticipated that this list will expand over time.

The principles are:

  1. Know your users (e.g. customers, shareholders, regulators, staff)
  2. Focus on user needs
  3. Focus on high situational awareness (understand what is being considered)
  4. Use a common language (necessary for collaboration)
  5. Be transparent (a bias towards open)
  6. Challenge assumptions (speak up and question)
  7. Use a systematic mechanism of learning (a bias towards data)
  8. Remove bias and duplication
  9. Think small (as in know the details)
  10. Use appropriate methods (e.g. agile vs lean vs six sigma)
  11. Move fast (an imperfect plan executed today is better than a perfect plan executed tomorrow)
  12. Be pragmatic (it doesn’t matter if the cat is black or white as long as it catches mice)
  13. Manage inertia (e.g. existing practice, political capital, previous investment)
  14. Effectiveness over efficiency
  15. Leave no-one behind

Using the SDG 9.1.1 (Proportion of the rural population who live within 2 km of an all-season road) as an example.  SDG 9.1.1 is an indicator to support developing quality, reliable, sustainable and resilient infrastructure, including regional and trans-border infrastructure, to support economic development and human well-being, with a focus on affordable and equitable access for all.

The rest of the IT Strategy section walks through creating an IT strategy using the SDG 9.1.1 as an illustration, as seen in figure 1.

Figure 7 - SDG 9.1.1 Wardley map

1.3.1 Who are the users (e.g. customers, shareholders, regulators, staff)

Understand who the users are  —  statistical consumers, data regulators, governing organisations like the UN and national governments, staff, or statistics organisations.

Users exist within the wider community such as businesses, other UN agencies, and other NSO’s. In this sense users are identified in the broadest possible context as those who directly or indirectly receive benefit from official statistics.

Other forms of users are data partnerships and collectives where an NSO, third sector organisations, and private sector companies operate together to achieve joint goals.

Data based user research and utilising analytics aids understanding of the groups of people who have a stake in official statistics. The use of user interviews, focus groups, and surveys are common techniques used in user research to gather data about users and user needs.

Knowing who users are is the first stage in identifying whether the value generated by the organisation will satisfy the needs of each particular user group.

Figure 8 - Knowing your users

In the 9.1.1 SDG example, within the wider landscape the key users are the UN, governments, NSO’s, construction companies and the public. These groups all have needs for the SDG 9.1.1 indicator. Some users such as the public will have needs beyond the indicator itself. The public will be interested in outcomes from using the 9.1.1 SDG indicator.

Some of these users will be more visible in the value chain than others which explains their position on the map but in the early stages of creating a map it is enough simply mark who the user is. The following “cheat sheet” for Stages of Evolution helps to categorise where on the horizontal, axis users exist. This is always up for debate, which is the purpose of using a map. For example, Governments have been placed in stage III (noted on the x-axis as product)  as it is has characteristics of “increasingly common / disappointed if not available / feeling left behind if not used / advantage through implementation / competing models of”.

Figure 9 - SDG 9.1.1 Wardley map

1.3.2 Focus on user needs

Figure 10 - Matching users to needs

User needs reflect what a user requires and/or values. For example, a government may have a need for funding which in turn requires the UN and its agencies. The UN has needs to improve the global economy and to end poverty. The UN also has a need to create goodwill with Governments and much of that comes from trust and in turn it is the NSOs that must provide trust in the data services that regulators base decisions upon. The map therefore provides a chain of needs.

User needs can be determined by investigation, by looking at the transactions that organisations makes with users and by examining the customer journey when interacting with those transactions. Questioning this journey and talking with customers will often find pointless steps or unmet needs or unnecessary needs being catered for. Mapping the landscape often clarifies what is really needed.

It is important to discuss user needs both with the users and to talk with experts in the field. Care must be taken to avoid bias, especially where legacy already exists. Be wary of the legacy mindset, the equivalent to a user saying to Henry Ford — “we don’t want a car; we want a faster horse!”

With rare and uncertain situations where users and experts don’t actually know what is needed beyond vague hand waving, take a chance on what the user needs are. These can be revisited with all stakeholders.

Looking at figure above, some examples of the interaction includes; Governments need to interact with construction companies for them to build roads. Public need access to jobs and services, to end poverty (a UN need) in order to do this they need to move from A to B, all season roads are needed for this, and construction companies will build those roads.

1.3.3 Use a common language (necessary for collaboration)

Using a common language is essential for collaboration. It is very difficult when different skilled groups or NSO members use their own arcane language and techniques.

For example, let us take a box representation of the 9.1.1 SDG map, translated into Elvish.

Figure 11 -  

If we were to ask an observer which components should be outsourced, where should we use standards then it would difficult to provide any meaningful answer beyond simple guesswork. Let us now convert this to a map in the figure below. The map may still be in Elvish, but we can see which components we should be custom building and which components should be bought as a commodity or outsourced to others.

Figure 12 -

The map itself provides a common language for describing an environment despite the labels that might be used. If you break any complex system into components, then some of those components will be uncharted space and are going to be experimental. This is not a bad thing, this is just what they are. For those components then you’re likely to do this in-house with agile techniques or use a specialist company focused on more agile processes. But you won’t give that company all the components because the majority of components tend to be highly industrialised and hence you’ll use established utility providers such as Amazon, Microsoft, Google or Alibaba for computing infrastructure.

1.3.4 Use a systematic mechanism of learning (a bias towards data)

It is important for the governance system to provide a mechanism of consistent measurement against outcomes and for continuous improvement of measurements. Using maps means that strategy can be revisited at a later date to see what was done, understand what the outcomes of the strategy were, and scenario planning to understand alternative outcomes.

However the strategic tools used are less important than the principle of revisiting previous strategy in order to understand where we have come from, what was intended, what was actually achieved and what could have been done differently.

With the 9.1.1 SDG if the intention is to drive towards the commoditisation of components in the value chain then revisiting the map at a future date will show how much progress there has been towards this goal, and whether commoditising is still desirable and what inertia has (if any) been faced.

Figure 13 -

1.3.5 Focus on high situational awareness (understand what is being considered)

In attempting to decide on a course of action and decision making with our components and strategy there are a number of approaches that can be used, and yielding a broad range of results. What is notably absent in most approaches is the lack of context, of an understanding of the landscape you are in, the series of forces (or climate) you find yourselves in, and how that this is never static and instead changes over time.

Our strategy activity can answer a “why of purpose”, determining what outcome we wish to achieve, however there is a second “why of movement” that underlies effective decision making on an ongoing basis. To do this you require maps that let you “see” the landscape, “understand” the forces at play, that can guide you in your “movement” decision making, and capture the results and lessons being learned as you move forward. Plenty of examples from military history highlight the strategic importance of having this complete situational awareness.

We need maps to provide this situational awareness to us - capturing six basic elements - visual representation, context specific, position of components relative to some form of anchor and movement of these components. Many strategies rely on a “story”, a linear progression of directions without providing the contextual view and orienting landscape. A proper map and climatic forces allows us to identify patterns that we can exploit to our advantage.

NSOs can strongly benefit from a change in approach, overcoming traditional approaches that tend to be linear, localized, and inwardly focussed, thus posing strategy risk through a misreading and lack of knowledge of critical patterns in society, technology, data, and marketplaces.

The 9.1.1 SDG map provides situational awareness of the rural all season road indicator from user needs in a broad sense, public, governments and construction as well as the UN and NSO’s.

Understanding the data, sources of data, roads and population that deliver the 9.1.1 SDG, and how rare and certain they are allows decisions to be made about where to play.

Building trust is seen as important both in the way that data is collected, and in the relationship between NSO’s and the public. Understanding the landscape allows debate as to what types of trust there are, and how each type of trust is treated.

1.3.6 Be transparent (a bias towards open)

Transparency is difficult within organisations. Many people find challenging uncomfortable. The downside of sharing and openness is it allows others to challenge and question firmly held assumptions.  

Sharing maps enables challenges and questioning of assumptions. Challenging helps to learn and refine maps.

In order to be open and transparent, maps must be published in one place in one format through a shared and public space.

1.3.7 Challenge assumptions (speak up and question)

There is little point in focusing on user needs, creating a common language through the use of a map and sharing it transparently if no-one is willing to challenge it. Challenging others assumptions is a key approach to communicating, innovating, creating and problem solving. Assumptions can too easily be made if not challenged or explained. Often these silent assumptions are based on an idea one believes to be true, or based on prior experience or one’s belief systems.

Maps provide a way to visually display our assumptions of users, needs and relationships and a method for people to challenge them.


1.3.8 Remove bias and duplication

The NSOs should avoid rebuilding what already exists and re-use where possible. To support this effort, maps should be circulated and collated to enable the removal of duplication and bias.

Such efforts need to be iterative i.e. it is not necessary to map the entire NSO statistical landscape prior to taking any action. Collating maps to create profile diagrams can also be a mechanism for discovering what future services the NSOs should provide along with what core capabilities are needed.

An example profile diagram, built from many maps, highlighting both duplication and bias is provided below.

Figure 14 -

In the diagram above, website - shows thirteen different references to website across multiple maps. This may be completely valid but it could also mean nine separate installations when only one is needed.

Collection - we see ten different references of which two are bespoke, eight are products and no use of commodity services.. There might be perfectly valid reasons for customised collection but the profile allows us to question this and also whether the nine product forms are stand alone installations or would be better suited for provision by a common service.

Surveys Tools - shows there are 12 instances, but there is a bias towards custom building these tools. If some are custom and some are nearly commodity, can they all be right?

Collating maps often helps in creating a common lexicon. The same thing is often described with different terms across organisations or within a single organisation.


1.3.9 Think small (as in know the details)

When viewed from an airplane objects on the ground can appear to be very similar, yet they may in fact have important distinctions between themselves that could be vital to the success of actions and decisions.

A mismatch between understanding the important characteristics of components (within the context of analysis, method, and action) and the level of knowledge and understanding resident in decision makers and actors can be very problematic. Couple this with the tendency in many organizations to push decision making higher up (to senior management or executive layers) and the results can be messy.

Decomposing components into smaller components allows you to better understand the parts and their relationships, and it allows for the use of powerful techniques where small teams are organized around the smaller component pieces with clear contracts with producer and consumer components around them. Two examples of successful small team strategies are Amazon’s Two Pizza model and Haier’s Cell based structure.

 The following figure shows how this has been done in a map.

Figure 15 -

The complexity of managing the contracts and interfaces between the components is more than offset by the  benefits arising from having small, autonomous teams with clearly defined outputs who can be creative in their approach to realizing their components and services. This clearly supports the “use the right method” principle as it is evident that the components in the figure are in different phases of their maturity, thus supporting the ability to make effective choices.


1.3.10 Use appropriate methods (e.g. agile vs lean vs six sigma)

It can be tempting to apply a new or existing method to decisions and actions without regard for the specific nuances surrounding a component. No “one size fits all” approach can be successful - it is important to consider a number of important factors - landscape, climatic forces, and position in the map are key.

Techniques that might be appropriate for commodity phase components (with a bias to standardization and certainty) will not work well for genesis phase ones, and vice versa. In the genesis case there is no standardization because of the “new” and “discovery” nature of the component. Similarly not looking at the context for components will likely result in surprises and failure.

It is important to understand that components themselves are assembled from subordinate ones, and the constituent pieces may themselves be in different phases. Outsourcing a CRM platform that is perceived to be a commodity may overlook the fact that there are additional features added to the solution that are in the custom or product phase, with a potential result being cost growth and solution degradation through contract-based conflicts.

An example helps clear this principle. In the following diagram for our SDG 9.1.1 example, a sourcing decision is being discussed.

The component labels are shown in elvish - in practice these could be english names, but for the decision making participants the words are completely opaque, as if in another language. A meaningful discussion and decision will be very difficult to arrive at.

In contrast, we can look at the context and maps that we have as in the following picture.

In this case we see that a component of the SDG 9.1.1 is actually in the commodity space, which indicates that embarking on a custom build decision for that component will be a mistake. The map has provided the important lifecycle context to allow the right method to be chosen.

Preparation and analysis is required to provide the context for selecting the most appropriate methods to our components. Methods have intrinsic strengths, weaknesses, and assumptions that influence their effectiveness in situations - situational awareness is required as well. Method selection covers a broad range of topics, including project management and delivery (e.g. agile), process effectiveness and efficiency (e.g. lean, kanban, six sigma), sourcing strategies (e.g. outsource, insource, open-source, commercial product), and more.

An effective practice is to indicate on your maps what methods are to be applied for each component, to decompose components further where they are composed of subcomponents in different phases, to understand the rate of change and climatic factors that will directly influence the creation, acquisition, and operation of components.

More importantly it is critical that the components are in the correct phase of product/commodity, etc. Using the right methodology when the component is in the wrong phase does not provide an effective outcome. For example, the map below is from the HS2 project within the UK and this shows which components should use the six-sigma methodology and are outsourced.

Using our SDG 9.1.1 example, we can see that NSOs are using the Agile methodology for developing their custom components, but the map shows us that these components should be commodity services and outsourced.

1.3.11 Move fast (an imperfect plan executed today is better than a perfect plan executed tomorrow)

By placing the emphasis on speed, inexpensive components, keeping it simple and restraint (FIST) ensures that the focus is on simplifying the problem wherever possible and building in smaller components. The use of prototyping will provide faster feedback loops which will ultimately deliver greater benefit. Taking time to perfect a plan results in it being more susceptible to changing technological, market and economic changes. Delaying action increases the risk of scope creep which will increase the complexity of our design.

1.3.12 Be pragmatic (it doesn’t matter if the cat is black or white as long as it catches mice)

It is important to accept that not everything will fit perfectly into the model that has been described. There may be very good reasons for an approach, in theory. Most cases are applicable and would greatly benefit.

However a one size fits all approach often fails to work. Trying to use a one size fits all approach can lead to inertia and resistance. It is more important to maintain a pragmatic approach than an ideological one.

It may well be that it is more pragmatic to maintain an existing IT estate. Auditing and sweating the existing estate until they are replaced.

The future estate may require a fundamentally different approach such as agile, open source, or local delivery.

In the example, the drive to commoditise survey tool to support 9.1.1 SDG may, in theory, be desirable. However moving survey tools to a commodity may encounter inertia and resistance from NSO’s who see survey tools as part of their reason for being. Therefore, in order to be pragmatic survey tools may need to remain custom built.

1.3.13 Manage inertia (e.g. existing practice, political capital, previous investment)

In any established value chain, there exist interfaces between components along with accompanying practices. There is a significant cost associated with changing these interfaces and practices due to the upheaval caused to all the higher order systems that are built upon it e.g. changing standards in electrical supply impacts all the devices which use it. This cost creates resistance to the change.

You also find similar effects with data or more specifically our models for understanding data. As Bernard Barber once noted even scientists exhibit varying degrees of resistance to scientific discovery. For example, the cost associated with changing the latest hypothesis on some high level scientific concept is relatively small and often within the community we see vibrant debate on such hypotheses.

However changing a fundamental scientific law that is commonplace, well understood and used as a basis for higher level concepts will impact all those things built upon it and hence the level of resistance is accordingly greater. Such monumental changes in science often require new forms of data creating a crisis point in the community through unresolved paradoxes including things that just don’t fit our current models of understanding. In some cases, the change is so profound and the higher order impact is so significant that we even coin the phrase “a scientific revolution” to describe it.

The costs of change are always resisted and past paradigms are rarely surrendered easily — regardless of whether it is a model of understanding, a profitable activity provided as a product or a best practice of business. As Wilfred Totter said “the mind delights in a static environment”. Alas, this is not the world we live in. Life’s motto is “situation normal, everything must change” and the only time things stop changing is when they’re dead.

The degree of resistance to change will increase depending upon how well established and connected the past model is. A map can be used to anticipate not only a change but also the likely sources of resistance whether past activities (sunk capital) or practices. There are many forms of inertia from financial to political capital, from past best practice, to the cost of training in new practices. This inertia should be actively managed.


Such a change is problematic for several reasons:

  • all the data the company has demonstrated the past success of current business models and concerns would be raised over cannibalisation of the existing business.
  • the rewards and culture of the company are likely to be built on the current business model hence reinforcing internal resistance to change.
  • external expectations of the financial markets are likely to reinforce continual improvement of the existing model i.e. it’s difficult to persuade shareholders and financial investors to replace
  • a high margin and successful business with a more utility approach when that market has not yet been established.

For the reasons above, the existing business model resists change and the more successful and established it is then the greater the resistance. This is why the change is usually initiated by those not encumbered by past success.

This resistance of existing suppliers will continue until it is abundantly clear that the past model is going to decline. However, by the time it has become abundantly clear and a decision is made, it is often too late for those past incumbents.

In the 9.1.1 SDG example we would want to move towards commoditising trust, methods, processing and survey tools but may encounter inertia from a belief that these components rely on a customised approach based on existing practices within NSO’s. NSO’s may believe that custom surveys need to be created and processed to support independent statistics. Challenging the need to customise surveys will overcome the drive towards standardisation.


1.3.14 Effectiveness over efficiency

The company has its own data centres and it is rapidly growing which is creating a problem because of the time it takes to order and install new servers. The basic process is fairly straightforward, they order new servers, the servers arrive at goods in, they are then modified and mounted. The company analysed the flow (a rough diagram is provided in figure below) and found that the bottleneck was caused by modification of the servers due to the number of changes which need to be made before mounting.

The solution? To automate the process further including investigating the possible use of robotics. It turns out that the process of modification is not only time consuming but can cause errors and the cost of automating this process (given the rise of general purpose robotic systems) can be made given the future expectation of growth. The figures seems encouraging. Getting rid of bottlenecks in a flow sounds like a good thing.  It all sounds reasonable.

The first thing to do is map out their environment including the flow. The map is provided in figure xx below.

Now, as we can see from the map then goods-in and ordering are fairly commodity (i.e. well defined, commonplace) and that makes sense. Computing on the other hand is treated as a “product” which seems suspect but far worse is they are making some fairly custom modifications to this “product” before mounting. Also, what is racks doing in the custom built section? Racks are pretty standard things aren’t they?

Well, not in this case. Here the company is using custom built racks which are slightly smaller than the 19 inch standard. Which is why they need to modify the servers. The company needs to take the cases off the servers, drill new holes and mount rails to make them fit. The proposed automation is about using robots to improve that process of drilling and modifying servers to fit into custom built racks.  Wait, can’t we just use standard racks?

If we take our map, and mark on how things should be treated then surely it makes sense to use standard racks and get rid of the modification.

In fact if we look at the map, we should also mark compute as a utility (which is what it is now).

1.3.15 Leave no-one behind

With the adoption of the 2030 Agenda, UN Member States pledged to ensure “no one will be left behind” and to “endeavour to reach the furthest behind first”. This principle equally applies to the modernisation. The data revolution with new data sources, new innovative methods and technologies and new partnerships can transform the operation of statistical systems in developing as well as developed countries. Without the old and often complex methods and legacy infrastructure and application landscape in some cases developing countries are even better positioned for fast leap forward. It is therefore very important that modernisation progress doesn't leave any country behind and that all benefits are achieved across global statistical community.

1.4 Strategy - Integrating this into strategy

1.4.1 UN SDG Example

In order to better illustrate this practice of creating an effective IT and business strategy we have created a partial series of maps addressing UN SDG 9.1.1 - access to roads (Proportion of the rural population who live within 2 km of an all-season road. Develop quality, reliable, sustainable and resilient infrastructure, including regional and trans-border infrastructure, to support economic development and human well-being, with a focus on affordable and equitable access for all).

Our first phase is to establish purpose - this is expressed in the UN SDG 9.1.1 as noted above. For the SDG, our “why of purpose” is to ensure that all people have access to all-weather roads everywhere by 2030. There are a number of actors in this context. Our first map shows a subset of them - Government, National Statistical Organizations, UN, Public (at large), and construction industry. Each of these actors has “needs” that are to be addressed by this work.

Figure - 

We now observe our landscape to identify further components. The following map shows additional components in our landscape that are important as we strive to identify what our NSO strategy should be.

Figure -

In this map we see the actors above identified as red circles. There are relationships between these actors and components - governments provide funding along with NGO’s and donors, Government relies on votes from the Public, the UN is a trusted independent agency that requires goodwill from governments, and so on. Importantly the UN (with countries) wish to improve the global economy and end poverty, and it has been determined that access to jobs and service is an important part of making that happen. The NSO in this case is an important trusted contributor through providing high quality objective indicators to the other actors to support their needs and goals. The NSO’s require access to data sources in order to make this happen.

We now look further at the data source component, as shown in the next map.

Figure -

In this map we see that we have a continuum of data sources, ranging from the “well-known” standardized sources (those provided by underlying traditional survey mechanisms) to “newly emerging “other” data sources”. Satellite data is currently available as a product, but a significant amount of it is becoming a commodity. The red line shows that part of our strategy is to see this move from “genesis / custom” to “commodity” happen for all of our sources. By creating a global data platform and participating with the global marketplace we can identify and access standardized data to support all countries’ NSO’s.

If we look beyond the data pipeline we can now look at the next map that adds in the tools and methods used to create these data sources. We introduce survey tools (potentially across the GSBPM spectrum), methods, and underlying infrastructure components (processing, storage).  We see that for may NSO’s their survey tools are all in the “Custom” phase - internally developed and used. Furthermore, for those with traditional data centres their processing is also a custom configuration, either at the Agency level or at a “whole of government” service provider level. The fact that these tools are all custom is a challenge that has been at the centre of many years of collaborative activity to change (through HLG and other work).

What shall we do with this? In the next map we identify that strategy (through NSO collaboration, the UN Global Platform, and open-source community) is to take advantage of the fact that the elements in the red circle are actually commodities in the broader landscape, and to identify that we wish to effect a change to how we source and use them. The red arrow signifies the move to the Commodity phase.

At this point we should step back and look at the landscape from our earlier broader perspective. The following map shows how we view the overall context.

The value of this map is apparent in how it now situates a technology strategy conversation within the broader business strategy context. We can picture how, in the context of the UN and SDG activities with member countries, there is a desire to attain our goal for 2030, and at the same time it is very likely that each NSO has some version of the elements in  the red circle at the bubble. Our UN Global Platform provides a possible means for individual NSO “wins” and importantly to achieve a scalable, accessible, effective strategy at a global platform level at the same time.

 Looking at the bigger picture in our map, an important element in our landscape is that of “trust”. This is shown in the following map.

The value of our NSO’s hinges on our trust - of objectivity, quality, applicability, and other elements of typical quality frameworks. This is an important component in the broader context. On the other hand,we see that our desire to move our tools, methods, processing, and storage to commodity (cloud, standardized methods, and tools) hinges on a related but different trust - are the public algorithms and methods sound, is there something unique to our practice, will there be sovereignty or residency issues with our data in the cloud, and so on. On the one hand our “trust” is a key differentiator and value that we deliver to governments and the public. On the other hand “trust” is appearing as an inhibitor in our move to establish a more effective approach to the elements in the red circle.

In our next map we focus again on key pipelines for data tools, methods, sources. Our red lines show our desire to move those elements to the commodity side to take advantage of all that the broader market has to offer - open-source, global data, collaborative communities. We also show a pipeline for the data itself, reflecting that our data space also has a span ranging from “well-known, standardized” elements such as address, to more custom and early elements to the left. Much of our standardization activity is focused on moving data to the right.

A further examination of the “trust” element noted earlier allows us to probe further into impact and nature of the element. The black bars represent “inertia” - elements that act as impediments to our desire to standardize and commoditize access to data, methods, and tools. It is important to identity these elements as we will need to address them individually (per NSO or department within an NSO) as well more broadly. Clearly we want to take advantage of traditional mechanisms such as peer review, publishing, etc.as part of this process; contracts can also address inertia elements for the tools, computation, and storage.

As a final step, we look again at our broader map and do a summary.

We see clearly the elements pertaining to helping realize the SDG by 2030, and the role of the NSO’s and UN in this broader context. We highlight a part of the landscape for NSO’s that addresses the means by which we deliver trusted insights to the ecosystem. We have a continuum of data sources we draw on, all of whom are moving to the right as time progresses (many governments are pursuing data marketplaces and sharing of government registers, for example, like Denmark and Estonia). We see that we can “leave no one behind” (across all country NSO’s) in our journey by driving the growth and accessibility to trusted algorithms, methods, tools, data, and infrastructure and we can derive what actions will be required. As result of this “observation” phase discussion we can now incorporate our doctrine (principle) elements, and then through leadership arrive at our “act” plan. We have identified that we will need to “unpack” trust as an inertial force, keeping the important value aspects while testing cultural and other resistance, to identify technical and non-technical actions that may need to be taken.

We have also identified the “win-win” opportunity for NSO’s to realize the vision of the UN Global Platform. The Platform provides a mechanism by which we can create this open, trusted marketplace of methods, data, tools, infrastructure, either by direct use or through collaborative development (with subsequent importation to individual NSO’s) to further their individual strategies and goals.

The use of our strategy cycle and maps allow us to create a coherent foundation to further elaborate and realize our goals.


1.5 Technologies / Techniques

All technologies and techniques go through the four phases, and so do practices, data and knowledge. If we place data, data science, artificial intelligence and serverless technology onto a Wardley Map, we can easily understand what methodologies and processes we need to handle these technologies and the impact on strategy.


1.5.1 Big Data

"Big data" is a field that treats ways to analyse, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate[1]. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. Big data was originally associated with three key concepts: volume, variety, and velocity.[2] Other concepts later attributed with big data are veracity (i.e., how much noise is in the data)[3] and value[4].

All data is structured and goes through four phases of evolution. Unmodelled, divergent, convergent and modelled. All data is structured and will go through the evolution from unmodeled (e.g. we don't know what the structure is and hence we call it unstructured) to modelled.

Such data sources may not have predefined data models and often do not fit well into conventional relational databases. Many big data sources are still in the genesis, custom or product phases, therefore the data is not modelled. This will require the NSOs to focus efforts on modelling the data, and will require the use of new skills and roles, such as data engineers or data wranglers.

Despite the high expectations for using Big Data, the reality is currently proving to be that while the technology needed to process these huge data sets is available and maturing, the biggest obstacle for an NSO is to actually gain access to the data. This lack of access can be due to the reluctance of a business to release their data, legal obstacles or concerns about privacy. See privacy preserving techniques section.

Implications of Big Data for an NSO IT infrastructure and skill requirements

At the time of writing there are very few cases of using Big Data in NSO production processes with the exception of geolocation data. This type of data has the advantage of being both relatively simple to access and also structured in a standardised way that allows integration with traditional datasets to provide additional and more granular information on location.

There has been much discussion on what the position on an NSO should be vis-à-vis Big Data – scenarios include NSOs developing a role as ‘brokers’ of Big Data which has been integrated by a third party, or simply providing a stamp of official quality to such datasets once they have validated its content and methodologies. Once the NSO has mapped their landscape, they will be able to create a strategy on how to handle big data.

In order to make use of Big Data, an NSO will require access to large compute resources and staff with new skills. The processing of increasingly high-volume data sets for official statistics requires staff with statistical skills and an analytical mindset, strong IT skills and a nose for extracting valuable new insights from data – often referred to as “Data miners” or “Data scientists”.

NSOs need to develop these new analytical capabilities through specialised training. Skills will include how to adapt existing systems to integrate Big Data into existing datasets and processes using specific technology skills.

Recommendations

  • Allocate IT engineering resources to understand BigData and gain experience in this space. The best way of understanding big data is through its use. The UN Global Platform Data Service (https://marketplace.officialstatistics.org/services) is a good place to start, as this allows easy access to bigdata and the analytic tools to process and is provided free to NSIs.
  • Engage with the big data world, including sending engineers and ideally some members of methods to attend events / conferences in order to acquire those emerging skills.
  • Seek advice from the existing NSO Data Centre Campuses[5].

1.5.2 Cloud (Todo)

Pros and cons of cloud technology for an NSO

Cloud computing offers solutions to NSOs of all capacity levels but is especially useful for allowing countries with weak infrastructure to access more advanced computing technology. As the cloud requires only a reliable internet access it can eliminate high upfront costs of installing hardware and software infrastructure and reduce the resources needed for ongoing maintenance.

Cloud technology is scalable and can enable IT teams to more rapidly tailor resources to meet fluctuating and unpredictable demand and can remove the layers of complexity in setting up infrastructure. It can enable the sharing of software as applications can be saved on the cloud by one organisation and used by other NSOs as needed.

The inherent security capabilities of large scale Cloud computing services can guarantee a higher level of security for some lower capacity NSOs than their own infrastructure.

For some NSOs problems could be related to confidentiality as data on citizens is not stored on site which may go against legislation.

1.5.3 IoT (Todo)


1.5.4 Artificial Intelligence

Artificial Intelligence (AI) is the development and application of computer systems that are able to perform tasks normally requiring human intelligence such as adapting and learning. These learning tasks include problems commonly associated with human intelligence, such as learning, problem solving, and pattern recognition.

In section 1.5.1 the availability of big data is discussed. This increasing volume, velocity and variety of data presents new challenges to NSO’s in terms of how to interpret this larger, more frequent and multiple types of data.

A main component of AI is Machine Learning (ML) which is the way the computer can learn from provided “training data” without being explicitly programmed. ML is separated into supervised and unsupervised learning techniques. Unsupervised learning uses training data without the desired output, whereas supervised learning uses training data that includes the desired output. An example would be to predict happiness based on social media posts or understand disease prediction using healthcare claims data.

The field of AI is evolving and growing rapidly, with further subfields being added regularly. Deep Learning is one such field It layers multiple algorithms to identify more relationships than humans could attempt to do with complex data with programming alone.

AI has major potential for NSOs to process and interpret big data faster and with greater accuracy. The benefit to the end users of the statistics is richer, more accurate and more up to date information in order to inform decisions.

NSOs wanting to exploit the benefits of AI will need to engage staff with new skills including the various methods of Machine Learning such as Supervised Learning, Unsupervised Learning, Reinforcement Learning and Deep Learning.

AI-based applications can replace or augment certain tasks. This presents several challenges to brownfield organisations, such as upskilling existing staff from administerial functions to data analysts, data engineers and data scientists. A careful communication strategy will be required for such a game-changing technological transformation to ensure staff are engaged and buy into the transformation and reduce the friction of change.

1.5.4.1 Recommendations

  • No cloud, no AI. Start planning to sunset existing private IaaS efforts over the next five to ten years, with a view to moving up the stack into the serverless space.
  • Allocate IT engineering resources to understanding and building methods with AI and gaining experience in this space. The best way of understanding serverless is through its use. The UN Global Platform Methods Service (https://marketplace.officialstatistics.org/methods-service) is a good place to start, as this allows the easy deployment of trained machine learning models. At the time of writing, the Methods Service supports multiple frameworks (Caffe, Chainer, Dlib, Gensim, H20.ai, Keras, MXNet, NLTK, OpenCV, PMML, PyTorch, Rusty Machine, Skikit-Learn, TensorFlow, Theano, Weka, XGBoost, spaCY) and is provided free to NSIs.
  • Engage with the machine learning world, including sending engineers and ideally some members of finance to attend events / conferences in order to acquire those emerging skills.
  • Seek advice from the existing NSO Data Centre Campuses.
  • Introduce a policy of serverless first, but do not embark on building your own serverless environments without robustly challenging the reasons for this.


1.5.5 Data Science (Todo)

The job title of ‘Data scientist’ has emerged in recent years in parallel with the growth of Big Data. The two fields go hand in hand in the pursuit of extracting knowledge and insights from new data collections in various forms, both huge structured datasets or unstructured alternative data sources.

Data science requires a mix of expertise. Technology skills are required to manipulate Big Data using techniques such as massive parallel processing to analyse volatile unstructured data, perform data cleansing and then distilling it into a format suitable for analysis. Mathematical skills are needed to write the complex algorithms used in analysing these data. Statistical skills are needed to investigate the data and to respond to questions and derive insights from them. Other skills like Machine Learning and Deep Learning may also be required. .(Refer to para..)

Data Scientist candidates are consequently a highly sought-after species and many universities now offer data science courses. Given the range of skills involved in data science, the reality is that such tasks are carried out by a team rather than a single individual.  Such teams consist of a data engineer who would access the primary data source and render it into a structured format, a software developer to write the routines to clean and aggregate the data and the data scientist who would create algorithms and use statistical techniques to gain insights from the data.

A typical list of the required competencies for a Data Science position would include:

  1. Programming Skills
  2. Statistics
  3. Machine Learning
  4. Multivariable Calculus & Linear Algebra
  5. Data mining
  6. Data Visualisation & Communication
  7. Software Engineering

Amalgam of data engineer, software engineer, data scientist (algorithms etc)

Links to guidelines, best practices and examples:

  • UK Government Data Science Ethical Framework:

 https://www.gov.uk/government/publications/data-science-ethical-framework


1.5.6 Serverless Technology

Serverless is the next phase after cloud computing. Serverless is enabled by the use of cloud components, removing the need to curate servers within the environment, e.g. patching, etc. Serverless is a fundamental change in architecture both in IT and the business. Allowing the organisation to focus on creating more value and not maintaining hardware.

Serverless is an event-driven, utility-based, stateless, code execution environment.[6]

1.        Serverless is a code execution environment. This means that a developer using a serverless environment is solely concerned with writing code and consuming standard building blocks provided through the services. Any code written could represent a discrete function or an application, which is a logical namespace for a collection of functions. Within a serverless environment there is no concept of machines, operating systems or the mechanics of distribution or scaling as they are all dealt with in the background.

2.        Serverless is event-driven. This means that the initiation and execution of the code is caused by some event or trigger – for example, the calling of an API service or the storage of a file. The code is not running, listening and responding to some input, but instead is initiated and executed by the event.

3.        Serverless is utility-based. You pay for the code only when it is running, and the cost paid depends upon the resources the code consumes. If you build a function to respond to an event which is subsequently never called, your cost of running the function will be zero because it will never actually run. There is no payment for idle compute or hosting. This utility-based charging is known as billing per function.

4.        Serverless is stateless. The environment in which the code runs is constructed in response to the event, the code is then executed, and the environment deconstructed. This means that any information that needs to be passed between function calls must be stored or retrieved from services outside of the environment, such as a file storage, database or message queue service. If, for example, you call a function twice (through two different events), you cannot assume the same environment exists for both function calls, so no information can be passed between the two function occurrences by virtue of the environment they are running in.

Creating a serverless architecture is more a mindset of stitching together Lego
bricks to create a structure than designing your own Lego bricks. The result is significant
benefits in efficiency.


Wardley Map[7]

Figure xxx

1.        Serverless is a shift of a code execution environment from a product stack (such as LAMP and .NET) to a utility stack (such as AWS Lambda). A necessity of being a true utility is the environment has no permanence (i.e. it is stateless) and is only invoked when needed (i.e. event driven). This change potentially impacts the entire application pipeline and a vast array of value chains above.

2.        Serverless lowers (Mean time to Repair) MttR through both action (automation of the code execution environment including provisioning and scaling) and observability (billing per function). A similar impact happened with IaaS.

3.        The change in MttR will lead to a new set of co-evolved practices.

4.        The provision of a serverless environment will increase efficiency, speed and access to new sources of worth by expanding the adjacent unexplored.

5.        Serverless will not only impact novel and new activities but also existing application pipelines.

6.        The focus of development will shift away from the lower-order systems including containers, VMs, DevOps and IaaS itself. Lower-order systems will increasingly become invisible and considered legacy.

7.        Inertia to change will be reinforced by preexisting acts and practices, so those who have invested heavily in DevOps and IaaS may be disproportionately affected by the change compared to those who have not. We expect a high proportion of ‘traditional’ NSI’s to be among the early adopters of serverless as they seek to jump ahead.

1.5.6.1 Serverless Security

The advice from the UK’s National Cyber Security Centre (NCSC) recommends using Serverless components as they are more secure than ones built on IaaS. NCSC believes that architectures that use Serverless components (on a good cloud platform) will be more secure than ones build on IaaS or on-premise.

NCSC defines serverless components as;

  • Performs a single task, such as storage, computation, or access control
  • Has an underlying platform which the customer cannot see or modify
  • Is driven by code (FaaS only) or configuration
  • Can be used or orchestrated programmatically using an API

1.5.6.2 Recommendations

We recommend that NSIs:

  • Allocate IT engineering resources to understanding and building methods with serverless and gaining experience in this space. The best way of understanding serverless is through its use. The UN Global Platform Methods Service (https://marketplace.officialstatistics.org/methods-service) is a good place to start. At the time of writing, this code execution environment supports multiple languages (Node.js, Java, R and Python) and is provided free to NSIs.
  • Apply the twelve-factor app methodology to serverless applications[8]. The Twelve-Factor App methodology[9] is twelve best practices for building modern, cloud-native applications. With guidance on things like configuration, deployment, runtime, and multiple service communication, the Twelve-Factor model prescribes best practices that apply to a diverse number of use cases, from web applications and APIs to data processing applications.
  • Engage with the serverless world, including sending engineers and ideally some members of finance to attend events / conferences in order to acquire those emerging skills.
  • Be careful of advice, especially from those who lack experience of building extensively with serverless.
  • Introduce a policy of serverless first, but do not embark on building your own serverless environments without robustly challenging the reasons for this.
  • Start planning to sunset existing private IaaS efforts over the next five to ten years, with a view to moving up the stack into the serverless space.

1.5.7 Privacy Preserving Techniques

As mentioned in the foreword, the GWG has also delivered a Handbook on Privacy Preserving Techniques Handbook[10], which describes the issues involved in privacy protection via a Wardley map, illustrates how various parties can work together in a controlled setting, such as the UN Global Platform, and which techniques can be used to guarantee confidentiality.

The figure below shows a top-level Wardley map of the ecosystem of a national statistics office (NSO) computation. Wardley maps are widely used to visualise priorities, or to aid organizations in developing business strategy. A Wardley map is often shown as a two-dimensional chart, where the horizontal dimension represents readiness and the vertical dimension represents the degree to which the end user sees or recognizes value. Readiness typically increases from left to right, while recognized value by the end user increases from bottom to top in the chart. As shown, NSOs are charged to deliver diverse official statistical reports, which sometimes rely on sensitive data from various sources.

The next figure below illustrates a setting where multiple NSOs collaborate under the coordination of the United Nations. NSOs from individual nations act as Input Parties in this setting to share their results and methods with each other on the UN Global Platform. In this setting, the Global Platform takes on the role of the Computing Party. Also in this setting the Result Parties may be more diverse than in the first setting above: people, organizations, and governments across the world may receive and benefit from reports produced by the Global Platform.

Figure 2: Privacy-preserving statistics workflow for the UN Global Platform

The Privacy Preserving Techniques Handbook  lists five techniques for statistics that will help reduce the risk of data leakage.

  1. Secure multi-party computation is also known as secure computation, multi-party computation (MPC), or privacy-preserving computation. A subfield of cryptography, MPC deals with the problem of jointly computing an agreed-upon function among a set of possibly mutually distrusting parties, while preventing any participant from learning anything about the inputs provided by other parties[11]; and while guaranteeing that the correct output is achieved. Each data input is shared into two or more shares and distributed among the parties involved. These when combined produce the correct output of the computed function.
  2. Homomorphic encryption refers to a family of encryption schemes with a special algebraic structure that allows computations to be performed directly on encrypted data without requiring a decryption key. The advantage of this encryption scheme is that it enables computation on encrypted data without revealing the input data or results to the computing party. The result can only be decrypted by a specific party that has access to the secret key, typically it is the owner of the input data.
  3. Differential Privacy (DP) is a statistical technique that makes it possible to collect and share aggregate information about users, while also ensuring that the privacy of individual users is maintained. This technique was designed to address the pitfalls that previous attempts to define privacy suffered, especially in the context of multiple releases and when adversaries have access to side knowledge.
  4. Zero-knowledge proofs involve two parties: prover and verifier. The prover has to prove statements to the verifier based on secret information known only to the prover. ZKP allows you to prove that you know a secret or secrets to the other party without actually revealing it. This is why this technology is called “zero knowledge”, as in, “zero” information about the secret is revealed. But, the verifier is convinced that the prover knows the secret in question.
  5. Trusted Execution Environments (TEEs) provide secure computation capability through a combination of specialT-purpose hardware in modern processors and software built to use those hardware features. In general, the special-purpose hardware provides a mechanism by which a process can run on a processor without its memory or execution state being visible to any other process on the processor, even the operating system or other privileged code. Thus the TEE approach provides Input Privacy.

The importance of data security        

Data security is of paramount importance for an NSO.  Confidentiality and privacy are therefore one of the most important of the Fundamental principles (Ref. Chapter 3.2.6) and a major concern for citizens. Maintaining data security is vital for the good reputation of an NSO.

National statistics are aggregated from individual records and often contain personal or commercial information - thus security measures must be designed to preserve data confidentiality and ensure data is accessible only by authorised people and only on an as needed basis. Alongside public concerns with data confidentiality and privacy, there is a growing demand for researchers to access microdata – and this access is often limited by the fear that confidentiality protection cannot be guaranteed.

There are a number of ways an NSO can address data security.

Security measures can be implemented at the level of the data by using anonymisation techniques so that individual records in a microdata set have personal details removed so that identification of individuals is highly unlikely.

Security measures can put in place at the physical level by restricting access to where the data is stored and implementing strict data controls. Many NSOs have set up of Data Laboratories where on-site access to microdata is under NSO supervision with strict audit trails and supervision to ensure no confidential data leaves the premises.

An alternative to Data Laboratories are Remote Access Facilities (RAFs). RAFs are becoming increasingly important as a way of facilitating secure access to microdata in order that researchers do not have to suffer the inconvenience of having to go to the NSO premises but can rather launch algorithms for microdata remotely via the internet. The job is then run by the NSO and results returned to the researcher while the microdata do not actually leave the NSO.

Procedural measures include vetting processes to approve requests for individual researchers access to microdata and the signing of contractual agreements with these researchers that include penalties if security rules are breached.


1.6 Data Management

1.6.1 Linked data (Todo)

The growth of linked data

The term Linked Data refers to the method of publishing structured data so that it can be interlinked through semantic queries, connecting related data that weren’t formerly related. It is defined as "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web”.

In practice Linked Data builds upon standard Web technologies such as HTTP, the Resource Description Framework (RDF) and Uniform Resource Identifiers (URI), but instead of using them to generate standard web content as pages to be read by users, it extends them to connect information in a way that can be read automatically by computers.

In this way data is linked to other data, fully exploiting these connections so its value increases exponentially. Thus, data becomes discoverable from other sources and is given a context through links to textual information via glossaries, dictionaries and other vocabularies.

Uses of linked data for an NSO

There are a number of examples of NSOs using Linked data that can be interlinked and become more useful. The process is lengthy and requires significant investment.

1.6.2 Data integration / data linkage (Todo)

The importance of data integration and data linkage for an NSO

As new data sources are becoming available NSOs are facing the challenge of finding ways to integrate changeable and often unstructured data with traditional data maintained by the NSO in order to produce new and reliable outputs.  

Data integration provides the potential to augment existing datasets with new data sources, and produce timelier, more disaggregated statistics at higher frequencies than traditional approaches alone.

New and emerging technologies are available to support data integration and NSOs need to ensure that staff have the necessary new skills, in particular the need for data scientists - new skills, new methods and new information technology approaches, designing new concepts or aligning existing statistical concepts to the concepts in new data sources.

Examples of data integration and data linkage

There are many possible types of data integration which include using administrative sources with survey and other traditional data; new data sources (such as Big Data) with traditional data sources; geospatial data with statistical information; micro level data with data at the macro level; and validating data from official sources with data from other sources.

Links to guidelines, best practices and examples:

  • Statistics New Zealand Data Integration guidelines:

http://archive.stats.govt.nz/about_us/legisln-policies-protocols/data-integration-gdlns.aspx

http://archive.stats.govt.nz/methods/data-integration/data-integration-manual.aspx

  • HLG-MOS Review paper on Data Integration:

https://www.unece.org/stats/ces/in-depth-reviews/geospatial2.html


1.7 IT Management Models (Todo)

There are a number of models for managing IT staff and resources in an NSO. These range from in-house development, outsourcing and offshoring IT work to external companies, a hybrid in-house/outsourced approach, and more recently, a collaborative approach to development. There has been a continuous cycle of changing approaches to management over the years – these depend on a number of factors: the size of the organisation; government policy; budgetary issues; general management attitude toward managed IT or IT support models; overall staff resources; and global IT trends.

Insufficient resources for IT functions will create problems for entire organisation so it is vital to understand which support model is best to run an organisation effectively.

Each different approach has its advantages and challenges.

1.7.1 Intelligent Sourcing

Outsourcing is a global practice that is often disparaged in the popular press due to associations with excessive costs and failure. The problems are generally not with outsourcing per se but instead what is outsourced.

The concept of outsourcing is based upon a premise that no organisation is entirely self-sufficient nor does any have unlimited resources and some work can be conducted by others at a lower cost. The organizational focus should not therefore be on the pursuit of capabilities that third parties have the skills and technology to better deliver and can provide economies of scale.

This practice is common in all industries; the machine manufacturer doesn’t have to make its own nuts and bolts and can instead buy those from a supplier.

In IT, it is not uncommon to treat entire projects as single things. For example, we will take the Fotango value chain and imagine that we had decided to outsource the development and maintenance of Fotango to a third party on the assumption that the Fotango system was a single thing and someone else could better provide it with economies of scale.

Being a consumer of these outsourced services, we’d want to ensure that we’re getting value for money and the features we require are delivered when they are expected. Hence the process of outsourcing often requires a well-defined contract for delivery based upon our desire for certainty i.e. we’re getting what we expect and paid for.

As a result both parties will tend to treat the entire activity as more linear and hence structured techniques are often applied with formal specifications and change control processes.

However, looking at the Fotango system through the lens of value chain vs evolution, we can see that whilst some components are linear (e.g. compute resource, installation of a CRM system), other components are clearly not (e.g. image manipulation system).

The more chaotic components will inevitably change due to their uncertain nature and this will incur an associated change control cost. In a review of various studies over the last decade, the most common causes of “outsourcing” failure have been cited as buyer’s unclear requirements, changing specifications and excessive costs.

However, it’s the very act of treating large-scale systems as one thing that tends to set up an
unfavourable situation whereby the more linear activities are treated effectively but the more chaotic activities cause excessive costs due to change. In any resultant disagreement, the third party can also demonstrate this by showing that the costs were incurred due to client’s changing of the specification but in reality those more chaotic activities were always going to change (see figure below).

The “excessive cost” associated with these changes should be unsurprising as a more structured technique is applied to a more chaotic activity. A better approach would to subdivide the large-scale project into its components and outsource those more linear components.

In today’s world, this is in effect happening where well defined and common components such as compute resources are “outsourced” to utility providers of compute (known as IaaS – infrastructure as a service). Or equivalently, well-defined and common systems (such as CRM) are “outsourced” to more utility providers through software as a service.

Those more chaotic activities offer no opportunity for efficiencies through volume operations because of their uncertain and changing nature and hence they are best treated on a more agile basis with either an in-house development team or a contract development shop used to working on a cost plus basis. Outsourcing itself is not an inherently ineffective way of treating IT, on the contrary it can be highly effective. However, it’s important to outsource those more linear components that are suitable to outsourcing.

1.7.2 In-house development        

The In-house model is the case where all software development and maintenance are carried out within the NSO by staff of the IT department. This model was quite common in the past but is much less so today as IT and other support services are wholly or partially outsourced to external companies or individual freelance IT experts.

Statistical processing is a niche market for software vendors and very few ‘off the shelf’ products exist for managing the statistics processing life cycle and consequently NSOs often have a legacy of internally developed statistical software (unlike, say Human Resources or budget planning which have a large range of commercial software solutions). This can make maintenance and evolution more complex as upgrades have to be coded rather than being provided by vendors.

An advantage of the in-house model can be the autonomy of development and stability of teams can provide capability to ensure the stability of systems and the technical know-how of often complex processes is retained by the NSO. A common challenge with this approach is the difficulty of attracting and retaining IT staff, as salaries in the NSO are often not competitive with those in the private sector, particularly in developing countries. This can result to a high turnover as staff leave once they have been trained in the IT skills which are in high demand in the marketplace.

In-sourced IT can be extremely costly, as it requires investment in time, training, employee salaries and benefits, and management.

1.7.3 Outsourced development

In this Outsource IT management model, the main part of development and support is carried out by external resources. External resources can be onsite and come from local suppliers, be offshore and coordinated remotely, or a mixture of the two.

Using external resources has the advantage of flexibility in that resources are used only when needed for specific tasks which can save costs. Points to take into consideration when using external resources include the loss of institutional knowledge when a consultant leaves and the lack of continuity in a project when an outsource provider changes personnel due to their own priorities. This is a particular risk for low capacity countries when external staff are brought in, often by a donor agency, to implement a system – once the work is completed and the consultants leave there is inadequate internal capacity to maintain and use the system.

One should also consider vendor incentives as it can be in the interest of external staff to extend a task as long as possible, so it is important to ensure a transparent and ethical relationship with vendors and close monitoring of projects.

This also applies to low capacity countries where donor aid can be linked to adopting a particular software package implemented by external consultants that the NSO is then unable to maintain themselves once the consultants leave. Cases exist where countries have been left with multiple tools meeting similar needs, particularly in the domain of dissemination software.

Outsourcing is typically a less expensive option, as there are minimal training and time involved with IT management and outsourcing typically converts fixed IT costs into variable costs which allow for effective budgeting as you only pay for what you use when you need it. By outsourcing, IT administration is trusted upon the experts to provide leadership and professional expertise for IT solutions.

1.7.4 Hybrid In-house/Outsource model

In the hybrid In-house/Outsource model, the NSO uses both its own staff and also external staff. A hybrid IT model requires internal and external IT professionals to support the business capabilities of the enterprise. With this model, in some cases only the IT managers are NSO staff members while all development and support are carried out by external staff. Other cases can include more of a mix of both managers and IT experts.

This model is widely adopted in NSOs and is the most common approach because it reflects the realities of the IT market -  high mobility of staff with the latest skills are highly mobile and difficult to retain.

This approach typically allows an organisation to maintain a centralised approach to IT governance, while using experts to deal with the functionality that is beyond the capabilities  of the organisation’s IT staff.

1.7.5 Collaborative model

The collaborative model of NSOs working together on IT projects is a trend that is increasing considerably in recent years. Collaborations can take the form of several organisations working together to develop a software that they will then all use; it can be a single organisation developing a software tool that is then adopted and, possibly, further developed by other NSOs.

In the past the vast majority of statistical software used by an NSO would have been developed within an NSO for use only by that NSO - today the trend is for there to be a mix of the older legacy software and common shareable tools.

The collaborative approach has many obvious advantages for an NSO. These advantages include sharing the software development burden and also sharing experiences, knowledge and best practices through multilateral collaboration and help build collective capacity. It also reduces risk for new developments through additional scrutiny and testing according to open-source principles with all members benefiting from each other in terms of ideas and methods.

Collaborating on projects does of course have its own challenges for an NSO – particularly in determining how to balance development priorities between the different partner organisations and the increased complexity of project management in the context of multiple partner collaboration. To achieve this model, partnership management capabilities will need to be developed in an NSO (Ref HR chapter).

Links to guidelines, best practices and examples:

  • UN Public Administrations Network paper “ICT Outsourcing: Inherent Risks, Issues and Challenges”:

http://unpan1.un.org/intradoc/groups/public/documents/un-dpadm/unpan043748.pdf

  • Singapore Government paper “Strategic Considerations for Outsourcing Service Delivery in the Public Sector”:

https://www.cscollege.gov.sg/Knowledge/Documents/Website/Strategic%20Considerations%20For%20Outsourcing%20Service%20Delivery%20in%20the%20Public%20Sector.pdf


1.8 Use of Standards and Frameworks (Todo)

1.8.1 The Importance of standards

Standards are enablers of modernisation - by using common standards, statistical systems can be modernised and “industrialised” allowing internationally comparable statistics to be produced more efficiently. Standards facilitate the sharing of data and technology in the development of internationally shared solutions which generate economies of scale.  A number of major statistical standards are in use today while others are emerging and maturing. But in the fast changing and interconnected world it is not enough to rely only on statistical standards. Other useful categories are official standards (for example ISO/IEC 11197), industry/domain standards and taxonomies (for example XBRL), open standards (standards covered by W3C) and even widely used de-facto standards (such as JSON and PDF formats). While the focus in this document is on statistical standards it is strongly recommended to consider other types of standards and define their role in statistical organizations.

In the next sections we will describe the most relevant statistical industry standards, their purpose, evolution stage and their potential for modernisation activities.

Figure xx -

1.8.2 Conceptual statistical industry standards

1.8.2.1 Generic Activity Model for Statistical Organisations (GAMSO)

Refer to chapter 5 “The National Statistical Office”

The Generic Activity Model for Statistical Organisations (GAMSO) is the standard covering activities at the highest level of the statistics organisation. It describes and defines the activities that take place within a typical organisation that produces official statistics. GAMSO was launched in 2015 and extends and complements the Generic Statistical Business Process Model (GSBPM) by adding additional activities beyond business processes that are needed to support statistical production. It is part of the common vocabulary of collaboration.

The GAMSO standard covers four broad areas of activity within an NSO: production; strategy and leadership; capability management and corporate support. It provides a common vocabulary for these activities and a framework to support international collaboration activities, particularly in the field of modernisation and can be used as a basis for resource planning within an NSO. GAMSO can contribute to the development and implementation of Enterprise Architectures, including components such as capability architectures, and also support risk management systems.

GAMSO can be used as a basis for measuring the costs of producing official statistics in a standardised way allowing comparison between NSOs, and also as a tool for resource planning. It can help assess the readiness of organisations to implement different aspects of modernisation, in the context of a proposed “Modernisation Maturity Model” allowing NSOs to evaluate their levels of maturity against a standard framework, and to help them determine the priorities for the next steps based on a roadmap.

The GAMSO activities specifically concerned with IT management cover coordination and management of information and technology resources and solutions. They include the management of the physical security of data and shared infrastructures:

  • Manage IT assets and services
  • Manage IT security
  • Manage technological change

Links to guidelines, best practices and examples:

        

1.8.2.2 Generic Statistical Business Process Model (GSBPM)

The Generic Statistical Business Process Model (GSBPM) is a statistical model that provides a standard terminology for describing the different steps involved in the production of official statistics. GSBPM can be considered the "Production" part of GAMSO. Since its launch in 2009 it has become widely adopted in NSOs and other statistical organisation. GSBPM allows an NSO to define, describe and map statistical processes in a coherent way, thereby making it easier to share expertise. GSBPM is part of a wider trend towards a process-oriented approach rather than one focused on a particular subject-matter topic. GSBPM is applicable to all activities undertaken by statistical organisations which lead to a statistical output.  It accommodates data sources such as administrative data, register-based statistics and also Census and mixed sources.

GSBPM standardises process terminology. This allows an NSO to compare and benchmark processes within and between organisations. It can help identify synergies between processes in order to make informed decisions on systems architectures and organisation of resources. GSBPM is not a linear model – instead it should be seen as a matrix through which there are many possible paths, including iterative loops within and between processes and sub-processes.

GSBPM main processes and sub-processes

GSBPM covers the processes that cover specifying needs, survey design, building products, data collection, data processing, analysis, dissemination and evaluation. Within each process there are a number of sub-processes.

Using GSBPM in a statistical organisation

GSBPM contributes to a common vocabulary among statistical organisations - having a standard terminology makes it much easier to communicate on collaboration projects and its methodology allow for the re-use of concepts and definitions throughout the life cycle of statistical projects. It can be used as a reference in planning, mapping, documentation and self-assessment of capacity needs.

GSBPM plays an important role in the modernising of the statistical system, especially concerning the statistical project cycle and can accommodate emerging issues in data collection like the introduction of mobile data collection and Big Data.

Links to guidelines, best practices and examples:

  • Modernstats – the Generic Statistical Business Process Model:

https://statswiki.unece.org/display/GSBPM/Generic+Statistical+Business+Process+Model

  • Modernstats – the GSBPM resources repository:

https://statswiki.unece.org/display/GSBPM/GSBPM+resources+repository

1.8.2.3 Generic Statistical Information Model (GSIM)

The Generic Statistical Information Model (GSIM) standard was launched in 2012 and describes the information objects and flows within statistical business process.  GSIM is complementary to GSBPM and the framework enables descriptions of the definition, management and use of data and metadata throughout the statistical information process.

GSIM information objects are grouped into four broad categories: Business; Production; Structures; and Concepts. It provides a set of standardized information objects, inputs and outputs in the design and production of statistics, regardless of subject matter. By using GSIM, NSOs are able to analyse how their business could be more efficiently organised.

As with the other standards, GSIM helps improve communication by providing a common vocabulary for conversations between different business and IT roles, between different business subject matter domains and between NSOs at national and international levels. This common vocabulary contributes towards the creation of an environment for reuse and sharing of methods, components and processes and the development of common tools. GSIM also allows NSOs to understand and map common statistical information and processes and the roles and relationships between other standards such as SDMX and DDI.

Links to guidelines, best practices and examples:

  • Modernstats – the Generic Statistical Information Model:

https://statswiki.unece.org/display/gsim/Generic+Statistical+Information+Model

  • Statistics Finland Project to adopt the GSIM Classification model:

https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.58/2018/mtg1/Finland_KAUKONEN_Paper.pdf

  • Statistics Sweden project to incorporate GSIM in the information architecture and the information models:

 https://statswiki.unece.org/display/GSBPM/Statistics+Sweden%3A+use+of+GSIM


1.8.3 Open Standards for Metadata and Data

https://standards.theodi.org/community/

1.8.3.1 Statistical Data and Metadata Exchange

The Statistical Data and Metadata Exchange (SDMX) standard for statistical data and metadata access and exchange was established in 2000 under the sponsorship of seven international organisations (IMF, World Bank, UNSD, Eurostat, BIS, ECB & OECD).

The importance of a standard for statistical data exchange is well known and cannot be underestimated. The labour-intensive nature of data collection and dissemination mapping to different formats is a problem well known to NSOs and in the context of the timely transmission of SDG indicators it has become even more vital.

SDMX is a standard for both content and technology that standardises statistical data and metadata content and structure. SDMX facilitates data exchange between NSOs and international organisations – and also within a national statistical system. SDMX aims to reduce reporting burden for data providers and provide faster and more reliable data and metadata sharing. Using SDMX facilitates the standardisation of IT applications and infrastructure and can improve harmonisation of statistical business processes. There is much reusable software available to implement SDMX in an NSO which can reduce development & maintenance costs with shared technology and know-how.

SDMX ensures data quality as it incorporates data validation into its data structures and validation rules as well as with the many tools made freely available with the standard as part of its open source approach. SDMX is an ISO standard and has been adopted by the UNSC as the preferred standard for data exchange.

The SDMX sponsors continue to improve the SDMX standard and have identified four main priority areas for their "roadmap 2020": a) strengthening the implementation of SDMX, b) making data usage easier, especially for policy use, c) modernising statistical processes, improving the SDMX standard and the IT infrastructure and d) improving communication

Consequently the new version of SDMX (version 3) is going to be finalized in the near future. The new versions is going to backward compatible with the current 2.1 version in order to provide for easy upgrade of various SDMX artifacts. Further it is going to address some semantic and technical issues including better metadata integration.

The new addition to SDMX is Validation and Transformation Language (VTL) which is a standard language for defining validation and transformation rules (set of operators, their syntax and semantics) for any kind of statistical data. The VTL builds on the generic framework for defining mathematical expressions existing in the SDMX information model but the intention is to provide a language that is usable with other standards as well as for expressing logical validation rules and transformations on data.

The present SDMX software products, packages and components are mostly designed for traditional data centers and server based environments. The transition to the native serverless cloud environments is going to require some additional work to adjust present SDMX software to this new environment.

SDMX and SDGs

A specific SDG indicator data structure (Data Set Definition) has been defined which will be used to report and disseminate the indicators at national and international levels. SDMX compliance has been built into a number of internationally used dissemination platforms such as the African Information Highway, the IMF web service and the OECD.Stat platform to ensure efficient transmission of SDG Indicator data and metadata.

Links to guidelines, best practices and examples:

  • The SDMX Content-Oriented Guidelines (COG) recommend practices for creating interoperable data and metadata sets using the SDMX technical standards. The guidelines are applicable to all statistical domains and focus on harmonising concepts and terminology that are common to a large number of statistical domains:  https://sdmx.org/?page_id=4345
  • Inventory of Software Tools for SDMX Implementers and Developers which have been developed by organisations involved in the SDMX initiative: https://sdmx.org/?page_id=4500

  • Guidelines for managing an SDMX design project:

https://statswiki.unece.org/display/SDMXPM/Checklist+for+SDMX+Design+Projects+Home

  • The SDMX Starter Kit - a resource for NSOs wishing to implement the SDMX technical standards and content-oriented guidelines for the exchange and dissemination of aggregate data and metadata:

https://sdmx.org/wp-content/uploads/SDMX_Starter_Kit_Version_23-9-2015.pdf

1.8.3.2 Data Documentation Initiative (DDI)

The Data Documentation Initiative (DDI) is an international standard for describing metadata from surveys, questionnaires, statistical data files, and social sciences study-level information. DDI focuses on microdata and tabulation of aggregates/indicators.

The DDI specification provides a format for content, exchange, and preservation of questionnaire and data file information. It fills a need related to the challenge of storing and distributing social science metadata, creating an international standard for the design of metadata about a dataset.

DDI is a membership-based alliance of NSOs, International organisations, academia and research bodies.

In many NSOs the exact processing in the production of aggregate data products is not well documented. DDI can be used to describe processing of data in a detailed way to document each step of a process. In this way DDI can be used not just as documentation but can help use metadata to automate throughout the entire process, thus creating “metadata-driven” systems. In this way DDI can also act as the institutional memory of an NSO.

DDI is another standard that promotes greater process efficiency in the “industrialised” production of statistics. DDI can be used to facilitate data collection/microdata to support the GSBPM (SDMX for dissemination of aggregates). DDI can be also be used for facilitating microdata access as well as for register data.

Links to guidelines, best practices and examples:

http://www.odaf.org/papers/DDI_and_SDMX.pdf

  • Guidelines on Mapping Metadata between SDMX and DDI:

http://www.oecd.org/sdd/44736290.pdf

1.8.4 Architecture and Interoperability frameworks

1.8.4.1 Enterprise architecture

Enterprise architecture (EA) is a conceptual blueprint that defines the structure and operation of an organisation whose role is to determine how the organisation can most effectively achieve its current and future objectives.

EA maps the goals and priorities of an organisation to Information Technology that is fit to support those goals as far as it can by managing information and delivering it accurately and in time where and when it is needed, and in a way that is cost effective for the business. It seeks to guide the process of planning and designing the IT capabilities of an organisation in order to guide them through the business, information, process, and technology changes necessary to meet desired organisational objectives.

EA helps enforce discipline and standardisation of business processes, and enable process consolidation, reuse, and integration.

EA is basically designed for the whole system of systems across the "enterprise" - and like any design it has to start from the business requirements and specify the best fit IT solutions. The IT part of EA is split into systems and data and finally infrastructure such as servers and networks.

There is an emerging trend of organisations using storage repositories that hold vast amounts of raw data in native format (‘Data Lakes’) from disparate sources. These data lakes are used to respond to business questions by linking and querying relevant data and thus require a new type of EA to manage such linked datasets.

Benefits to the NSO of a well-designed EA include achieving better business performance as per the business goals, reducing investment risk in IT. EA also facilitates a more agile enterprise by making the IT architecture more flexible to transform business models. Agile Architecture practices support the evolution of the design and architecture of a system while implementing new system capabilities thus allowing the architecture of a system to evolve incrementally over time while simultaneously supporting the needs of current users.

Links to guidelines, best practices and examples:

  • European Statistical System (ESS) Enterprise Architecture Reference Framework:

https://ec.europa.eu/eurostat/cros/content/ess-enterprise-architecture-reference-framework_en

  • ISTAT project - Business Architecture model within an official statistical context:

https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.50/2014/Topic_4_Italy.pdf

  • Statistics Korea project - Building an Enterprise Architecture

https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.50/2010/wp.7.e.pdf

  • Statistics Poland project - Introducing Enterprise Architecture Framework:

http://csmm.wat.edu.pl/sites/default/files/articles/23_32_csmm3.pdf

1.8.4.2 Common Statistical Processing Architecture (CSPA)

The Common Statistical Production Architecture (CSPA) is a framework for developing statistical processing components that are reusable across projects, platforms and organisations - it is often referred to as ‘plug and play’. CSPA has been developed in recent years by the international statistical community under the auspices of the High-Level Group for the Modernisation of Official Statistics (HLG-MOS).

CSPA is an enabler of collaboration and modernisation and has potentially enormous advantages for NSOs of all capacity levels. It aims to align the enterprise architectures of different organisations to create an “industry architecture” for the whole “official statistics industry”.  CSPA provides guidance for building software services that can be shared and reused within and across statistical organisations and enables international collaboration initiatives for the development of common infrastructures and services. In addition, it encourages alignment with other statistical industry standards such as the Generic Statistical Business Process Model (GSBPM) and the Generic Statistical Information Model (GSIM).         

CSPA components focus on statistical production processes as defined by GSBPM and are based on the Service Oriented Architecture (SOA) approach wherein the components (Services) are self-contained and can be reused by a number of business processes either within or across statistical organisations without imposing a technology environment in terms of specific software platforms.

Advantages of CSPA for an NSO

There are great potential advantages for NSOs in using CSPA. Its goal is eventually to have components covering all processes and sub-processes covered by GSBPM. This is important for countries of all capacity levels but is especially useful for countries of low capacity with weak infrastructure as it would in theory allow any country to take advantage of developments by the international statistics community and assemble a complete statistical information system component by component according to their specific needs.

See also section 1.5.6 for the CSPA software inventory.

CSPA is an emerging standard for sharing statistical components that has been introduced in section 1.xx.xx.

For an NSO there are the obvious advantages of saving time and money by re-using or adapting an existing technical solution developed by another organisation to meet their own processing requirements.

Using standardised components will promote and enable the use of statistical standards, both in structure and content thus facilitating the exchange of data and metadata between organisations. It also serves to further strengthen ties and relationships among the statistical community.

Software available via CSPA inventory

Links to guidelines, best practices and examples:

https://statswiki.unece.org/display/CSPA/CSPA+Global+Artefacts+Catalogue

  • Common Statistical Production Architecture – list of CSPA-compliant services developed and under preparation:

https://statswiki.unece.org/display/CSPA/Implementing+CSPA+Projects

1.8.4.3 Common Statistical Data Architecture (CSDA)

Statistical organisations have to deal with many different external data sources. From (traditionally) primary data collection, via secondary data collection, to (more recently) Big Data. Each of these data sources has its own set of characteristics in terms of relationship, technical details and semantic content. At the same time the demand is changing, where besides creating output as "end products", statistical organisations create output together with other institutes.

In 2017 and 2018, the High-Level Group for the Modernisation of Official Statistics recognised that official statistics organisations were challenged by the capacities needed to incorporate new data sources in their statistical production processes. As a response HLG Data Architecture project developed a framework, based on Principles, Capabilities, Building Blocks and Guidelines, and tested it on use-cases involving traditional and new data sources.

Capabilities are abilities, typically expressed in general and high-level terms that an organisation needs or possesses. Capabilities typically require a combination of organisation, people, processes, and technology. Building Blocks represent (potentially re-usable) components of business, IT, or architectural capability that can be combined with other Building Blocks to deliver architectures and solutions. Guidelines provide Maturity model, recommendations and templates that help statistical organizations establish their data architecture and assess gaps in data capabilities.

The purpose and use of the Common Statistical Data Architecture as a reference architecture is to act as a template for statistical organisations in the development of their own Enterprise Data Architectures. In turn, this will guide Solution Architects and Builders in the development of systems that will support users in doing their jobs (that is, the production of statistical products.

The CSDA shows the organisations how to organise and structure their processes and systems for efficient and effective management of data and metadata, from the external sources through the internal storage and processing up to the dissemination of the statistical end-products. In particular, in order to help organisations modernise themselves, it shows how to deal with the newer types of data sources such as Big Data, Scanner data, Web Scraping, etc.

The CSDA supports statistical organisations in the design, integration, production and dissemination of official statistics based on both traditional and new types of data sources.

Links to guidelines, best practices and examples:

1.8.4.4 European Interoperability Framework (EIF) (Todo)

To do - Importance of interoperability

Foundational Interoperability

Foundational interoperability enables one information system to exchange data with another information system. The system receiving this information does not need to have to interpret the data. It will be instantly available for use.

Structural Interoperability

At an intermediate level, structural interoperability defines the format of the data exchange. This has to do with standards that govern the format of messages being sent from one system to another, so that the operational or clinical purpose of the information is evident and passes through without alteration. We are talking about information at the level of data fields, as in a database.

Semantic Interoperability

Semantic interoperability is the highest level of connection. Two or more different systems or parts of systems can exchange and use information readily. Here, the very structure of the exchange of data and how the data itself is codified lets data providers share data even when using completely different software solutions from different vendors.

Why Is Interoperability Important?

It’s useful to think of interoperability as a philosophy instead of just a “standards-based interaction between computer systems,”. So on the technical side of things, interoperability helps reduce the time it takes to have useful data exchange between providers and data consumers.

Improved Efficiency

The EIF framework gives specific guidance on how to set up interoperable digital public services. It offers public administrations 47 concrete recommendations on how to improve governance of their interoperability activities, establish cross-organisational relationships, streamline processes supporting end-to-end digital services, and ensure that both existing and new legislation do not compromise interoperability efforts.

EIF content and structure includes a set of principles intended to establish general behaviours on interoperability, a layered interoperability model and a conceptual model for interoperable public services. The model is aligned with the interoperability principles and promotes the idea of ‘interoperability by design’ as a standard approach for the design and operation of European public services.

EIF is therefore applicable for any inter-organizational collaboration including statistical data collaboratives, networks and platforms. Framework provides an overview of three aspects of interoperability that need to be considered for frictionless servicer delivery and data exchange:

  • organisational interoperability aims at addressing the requirements of the user community by making services available, easily identifiable, accessible and user-oriented;
  • Semantic interoperability enables systems to combine received information with other information resources and to process it in a meaningful manner;
  • Technical interoperability covers the technical issues of linking computer systems and services. It includes key aspects such as open interfaces, interconnection services, data integration and middleware, data presentation and exchange, accessibility and security services.

Links to guidelines, best practices and examples:

  • European Interoperability Framework (EIF):

https://ec.europa.eu/isa2/eif_en 


1.9 Specialist statistical processing / analytical software

A wide range of statistical processing and analysis software has been used within NSO’s historically, and options continue to grow with significant growth in open-source based solutions as well as important advances in commercial offerings. NSO’s have created rich statistical functions using these desktop and server-based solutions to perform common statistical processing functions such as edit and imputation, auto-coding, estimation, tabulation, and record linkage. The NSO community has shared these functions amongst each other over the years, first as licensed functions, and more recently as “free license” or open-source functions.

The growth in data scientist and data engineering roles within NSO’s has seen a shift to more powerful processing platforms as well as increased adoption of powerful open-source solutions for artificial intelligence and machine learning. These emerging communities are accessing extensive open-source libraries, modules, and packages via distribution mechanisms such as CRAN (common R archive network), Anaconda, and others. The growing number of data science communities and platforms are driving the number of functions and solutions available in the public domain.

Open-source processing, analysis, and data science  tools currently in use include

  • R and Python toolsets - Jupyter notebook and JupyterHub, a large variety of libraries for data manipulation, wrangling, integration, analysis, and visualization
  • Machine learning tools such as Tensor Flow, Python libraries (SciKit Learn), and others
  • Spark and Hadoop for acceleration and capacity, using in-memory and distributed processing and data techniques
  • Enhanced persistence and data management technologies that offer non-traditional approaches to data organization, management and access - including Dremio (data virtualization), Nifi (data orchestration), Postgres, Cassandra, Kafka, and more

As a general rule, open-source tools are frequently available with an enterprise option that provides support services via a commercial company.

On the commercial front, many NSO’s continue to use solutions including

  • SAS - an integrated suite of solutions offering a broad spectrum of features  can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on them
  • SPSS - a program for statistical analysis in social science
  • Stata - a general-purpose statistical software package whose capabilities include data management, statistical analysis, graphics, simulations, regression, and custom programming.
  • FAME is a database designed for economic time series analysis. database management facilities for storing time series data, The analytical tools used by Central Banks and Statistical agencies
  • MATLAB (matrix laboratory) is a programming language that allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages.

There are a number of considerations to consider in choosing a future direction in processing and analysis tools. The current cohort of data scientists and statisticians arriving from universities and institutions increasingly are trained in open-source tools based on R and Python, meaning that there is a demographic shift in tools used. With the retirement of more senior staff, this means that NSO’s will see a re-orientation in toolsets and technology.

There is important growth in NSO engagement with the external research community beyond traditional statistical domains, with access to fresh or alternative sources of processing and analytic components  and toolsets. This is driving a shift to the supporting tools and technologies these external components are based on, with a strong bias to R and Python.

Economic factors continue to drive change in NSO’s. There is a long-standing desire to increase the level of sharing of components and functions amongst NSO’s through activities with HLG and others. Access to proven functions via reuse is an important alternative to custom-developed functions, and the open-source community is providing new sources of these functions, especially in the data science area. The increasing use of open-source solutions and increasing levels experience with them is causing important cost / benefit analysis discussions within NSO’s as they look for efficient and effective operations. This is particularly important where processing and analysis use is focused on core features in commercial tools with equivalent functions in open-source tools, with little to no use of advanced features in commercial toolsets. Paying a premium for access to equivalent features is an indication that a robust review of benefit is required.

These considerations are especially true when looking for high-performance processing. Highly engineered commercial solutions are more expensive and complex when compared with solutions based on Spark and Hadoop, which are designed for operation in utility infrastructure. Encapsulated or containerized environments for analytic methods are a recent addition to the landscape, with Algorithmia providing fully encapsulated and instrumented auto-scalable methods and Data Bricks providing encapsulated and managed Spark processing.

As can be seen in the following Wardley map, core processing and analysis tools exist in the “commodity / utility” space - NSO IT strategies should look to profit from this.    

Insert tools Wardley map here

In the area of statistical functions for processing and analysis, most NSO’s are using legacy solutions or solutions acquired from other NSO’s. Many of these are large, monolithic functions that perform a number of different services and are candidates for refactoring, partitioning, or potential replacement so as to benefit from cloud, microservice, and serverless approaches to computation. From a Wardley map perspective these functions are in the Custom area, with on-going resource commitments to maintain them. NSO’s creating new environments should look to select open-source functions at granularities aligned to their architectural strategies - most of these functions use algorithms that have not been modified for a considerably long time.  A Warldey map illustrating this follows.

Insert methods / functions Wardley Map here

Links to guidelines, best practices and examples:

  • Comprehensive list of statistical packages (Wikipedia):

https://en.wikipedia.org/wiki/List_of_statistical_packages

Case Study - UN Global Platform

Some words on the Methods service, Data service, and the use of Algorithmia go here


1.10 Dissemination tools        

1.10.1 Channels

Dissemination of data and outputs is an important part of NSO operations, and one that has seen significant transformation since the last release of the Handbook. While paper and alternative outputs continue to be published the mainstream uses web-based dissemination techniques for content and data distribution. Together these services are an important part of the public face of NSO’s, with careful attention to brand, quality, accessibility, and related concerns. It is also a key component of an NSO’s strategy and market positioning with its stakeholders. Technology-driven evolution of stakeholder needs as well as opportunities for new means of engagement and production are a priority consideration in a digital or IT strategy.

Current approaches utilize web hosting and content creation platforms to create static content and manage it through an “authoring to publishing” value chain. Visual graphics are published as static charts and graphs, but may be dynamically modifiable via sliders, parameters, or other UI elements to either drill down into lower level detail, modify graphic types, or change parameters. An important enhancement integrates geography-based interaction allowing exploration and browsing of outputs via map-related controls and platforms. Public use datasets are available through a variety of means, allowing users to acquire extracts of data for further use in their own environments. Users typically access these datasets via a download facility, receiving a CSV or Excel (or other) formatted data set for further use - associated metadata that describes record formats is a critical component for use.

A variety of social media techniques are currently in use, which may include Facebook, syndication, Twitter, LinkedIn, (chinese platforms), reddit, and other social media challenges. Typical these are event-driven approaches, highlighting new releases and reports, indices, thematic information linked to current events, and more.

Since NSO’s are an important source of high-quality objective information, there are close relationships with media organizations through a variety of print, television, and other media. NSO’s may create more advanced access channels that provide near-broadcast-ready content as a way to remove friction between NSO’s and media agencies. A good example is CBS (Netherlands) with its approach that uses on-site production facilities coupled with media-cycle timelines (as an alternative to the common practice of having a single release time - such as 8:30 am - for publishing results). Publishing information with a timeline that reflects stakeholder needs (such as media broadcast times) can be an important way to increase reach, engagement, and relevance.

An important component of information publishing is the delivery of all relevant metadata, standards, and other supporting information that are essential to ensuring interpretability and use of outputs by stakeholders and users. Stakeholders access metadata via web-page content or via UI elements supporting discovery activities in the metadata.

There are a number of challenges with current approaches. The use of data usually requires some degree of data post-processing on the part of users to integrate data with other external data for the purposes of creating reports, digital dashboards, and further analysis. For example, users may download data sets, put them into a Postgres database, rearrange the data, and use a tool such as Tableau to create digital dashboards. This requires extra time and resources from the user, which may represent needless friction. A lack of standardization based on generally available open standards is part of the issue - the use of CSV for download formats is a very low-level generic structural approach to transfer that conveys no meaningful higher level structural or semantic value.

Fixed viewing approaches based on static or “fixed dynamic” techniques limit the options available for users. If they wish to create different visualizations this lack of flexibility forces them to extract data, ingest it into their own BI or visualization environments, and create their own visualizations. Updates of data may require rework. This represents friction for those wishing to create dynamic dashboards to support ministerial or policy analysis reporting.

API’s and interoperability frameworks are increasingly becoming alternative access methods of choice for stakeholders and users. The ability to make machine-to-machine connections is an alternative means of addressing the data download component of the aforementioned example of digital dashboards. Open and transparent government initiatives (opendata) are increasingly focusing on providing frictionless access to publicly available data via programmatic interfaces (as “web” services) and NSO’s need to ensure that they are well established in these spaces.

Most if not all web-based dissemination platforms make use of open-source or related search capabilities to enable user-specific retrieval of information and data. Currently most techniques use a combination of indexing and relevancy ranking to provide a (potentially long) list of results to users, who must then browse to find the desired result. Advanced techniques are available that provide enhanced metadata searching and domain-specific ontology (vocabulary and knowledge) techniques and NSO’s should investigate how they can take advantage of emerging capabilities (for example, see http://smartcity.linkeddata.es,  https://spec.edmcouncil.org/static/ontology) for areas such as smart cities and financial and banking domains. “Knowledge layers” are embedded in common search tools such as Google and Bing, which may provide a source of inspiration and “knowledge technology as a service” capabilities.

In the future we expect an evolution of dissemination technologies. Enhanced user experiences and a desire for self-serve reporting and analysis may see dissemination platforms incorporate rich open-source or commercial BI solutions (e.g. Tableau, PowerBI) that support flexible analysis and visualization hosted by the dissemination platforms. The importance of geospatial-based activities is reflected in platform product evolution by companies such as ESRI, and by incorporation of geospatial reference features in open-source publishing (e.g. Drupal and CKAN) and analysis platforms.

We will see a significant increase in the use of interoperable API’s based on open-standards, allowing external platforms to access information, data, and metadata via their own solutions without human intervention. This also provides an easy path to increase NSO relevance by ensuring that they are “built-in” to the broader set of public and private solutions. Users will range from open-source and open-data communities, co-publishing, through to demanding real-time trading information scrapers. A well stocked API store also supports the creation of myriad “apps” on popular platforms such as Android and iOS without NSO’s being responsible for the creation and support.

NSO’s should continue to explore opportunities to streamline and enhance tools and techniques to create powerful narratives based on output data - effective and efficient creation of stories, info-graphics, and other multi-media friendly outputs are essential to ensure relevance and engagement. These are often seen as separate and disjoint from mainstream statistical production, leading to gaps, friction, and ad hoc approaches.

An important brand and marketing consideration relates to how material is branded and presented to external audiences. The standard approach is via an NSO-branded and controlled web portal or presence, and all content is contained in or subordinate to this entry point. The explosion of data portals and related activities at all levels of government (municipal, regional, provincial, territorial, or national) calls into question an approach exclusively based on this approach. NSO’s need to consider the use of “branded embedded data publishing” whereby API’S and embeddable content (with branding) are made available for use within another Agency’s portal or web presence. The most common example is the way in which embedded Google maps frames are embedded in trip advisory tools, restaurant review and search tools, and so on. Technology choices need to reflect these directions, with associated terms of use and control guidelines.

An important technology-driven advancement that is occuring now is the adoption of “chatbots”, conversational voice assistants, and other manifestations of AI-driven features that are an important part of the evolution of dissemination and external-facing user experiences. Google Assistant, Amazon Alexa, Microsoft Cortana, and Apple Siri are all examples of these voice-driven capabilities that marry voice and natural language processing capabilities that connect with platform API’s to deliver enhanced, frictionless services.

It is important to address the current and emerging means by which NSO’s are providing access to “disaggregated data”, either public use microdata files (PUMF’s), or controlled access to “semi-sensitive” (de-identified) data. The latter typically make use of physically controlled and operated spaces to which approved researchers go to perform analysis with select data sets. In many NSO’s this is seen as a separate function from dissemination functions and activities, with alternative physical and technology infrastructures. At a business level NSO’s should reflect on whether this still is a useful approach, given the move to self-serve and more interactive dissemination activities. An alternative view sees a “data access continuum”, with standard public-use published data exploration and visualization, advanced access to analysis and visualization of public use microdata, and finally a private engagement with strictly controlled access to data spaces and workbenches using virtual data lab techniques. The ability to execute on this ability (either as an integrated experience or through separate pathways) is closely tied to privacy enhancing techniques to address security, privacy, and confidentiality concerns and requirements. The UN Global Platform has released a handbook on Privacy Enhancing Techniques addressing input, computational, and output privacy via a range of techniques and emerging technologies (see https://marketplace.officialstatistics.org/privacy-preserving-techniques-handbook for further details).

To summarize, future directions in dissemination continue to lead to multi-media integration with powerful narrative and infographics capabilities, self-serve user experiences that allow users to create custom views and visualizations hosted by the platform, standards-based and API supported data acquisition and access by analysts for use on their own platforms, media-friendly near-production quality outputs, an increased role for geospatial-directed interactions, the use of API-driven integration with external data platforms and communities, and strong analytics capabilities addressing traffic and usage statistics leveraging powerful machine-learning techniques to act as a semi-automated “flywheel” to improve dissemination. Voice and natural language processing approaches with AI-driven reasoning and interaction are important emerging solutions that will augment the user experience.

UN Global Platform - new approaches to dissemination

The UN Global Platform is using a novel approach to establishing a frictionless, engaging starting point for user and stakeholder access. The UN Global Platform “marketplace” uses a powerful “shopping-experience” platform approach based on the open-source platform Magento (see http://marketplace.officialstatistics.org). This unique approach acts as a launchpad to access data, collaboratives, methods, services, learnings, and more. Its integrated analytics capability provides a means to understand and enhance marketplace offerings based on user activity and feedback. An example (alpha) of the marketplace is shown below.

Links to guidelines, best practices and examples:

  • CountrySTAT is a collection of software tools, methods, and standards for the analysis of data coming from different sources. Data can be manipulated and visualized directly online, and various types of charts allow users to perform further analysis: http://www.fao.org/in-action/countrystat/en/

Example of implementation in Uganda: http://uganda.countrystat.org/

  • DevInfo is a database system for monitoring human development. It was developed originally with the specific purpose of monitoring the Millennium Development Goals (MDGs). The system integrates management information systems, geographic information systems, software training, technical support services, data dissemination solutions and technical publications: http://www.devinfo.org

  • The .Stat system allows users to search for and extract data from OECD databases. The platform is used by a number of other NSOs and International Organisations. https://stats.oecd.org/

  • Knoema is a subscription-based dissemination platform that provides access and visualisation tools for use with national databases: https://knoema.com/


1.11 The Anti-Pattern Organisation

Anti-patterns can be defined as “common response to a recurring problem that is usually ineffective and risks being highly counterproductive”. The term was coined in 1995 by Andrew Koenig.

Organisations that fail to focus on the following topics will usually be unable to deliver or deliver products and services that fail to meet user needs.

Fails to focus on user needs

Has difficulty in understanding on who their users are and unable to explain the users needs.

Fails to use a common language

Uses multiple different ways of describing the same problem space e.g. box and wire diagrams, business process diagrams and stories. Often suffers from confusion and misalignment.

Fails to be transparent.

Has difficulty in answering basic questions such as “How many data projects are we building?” Information tends to be guarded in silos.

Fails to challenge assumption

Action is often taken based upon memes or Hippo (highest paid person’s opinion) or popular articles in the HBR (Harvard Business Review).

Fails to remove duplication and bias

The scale of duplication is excessive and exceeds in practice what people expect. Any investigation will discover groups custom building what exists at a commodity in the outside world. Often resistance is given to changing this because it is somehow unique despite any inability of the group to explain user needs.

Fails to use appropriate methods.

Tends towards single size methods across the organisation e.g. “outsource all of IT” or “use Agile everywhere”.

Fails to think small

Tends toward large scale efforts (e.g. Deathstar projects) and big departments. This can include frequent major platform re-engineering efforts or major re-organisations.

Fails to design for constant evolution

Tends to bolt on new organisational structures as new technology fields are adopted, e.g. a cloud department, a digital department, a big data group etc. Rather than embedding and embedding evolution within the existing organisation.

Fails to enable purpose, mastery and autonomy

There is often confusion within the organisation over its purpose combined with feelings of lacking control and inability to influence.

Fails to understand basic economic patterns

Often conducts efficiency or innovation programmes without realising the connection between the two. Assumes it has choice on change (e.g. cloud) where none exists. Fails to recognise and cope with its own inertia caused by past success.

Fails to understand context specific play

Has no existing language that enables it to understand context specific play. Often uses terms as memes e.g. open source, ecosystem, innovation but with no clear understanding of where they are appropriate.

Fails to understand the landscape

Tends to not fully grasp the components and complexity within its own organisation. Often cannot describe its own basic capabilities.

Fails to understand strategy

Tends to be dominated by statements that strategy is all about the why but cannot distinguish between the why of purpose and the why of movement. Has little discussion on position and movement combined with an inability to describe where it should attack or even the importance of understanding where before why. Often strategy is little more than a tyranny of action statements based upon meme copying and external advice.

1.12 Terms Definitions

Context                Our purpose and the landscape

Environment                The context and how it is changing

Situational awareness        Our level of understanding of the environment

Actual                        The map in use

Domain                Uncharted vs Transitional vs Industrialised

Stage                        Of evolution e.g. Genesis, Custom, Product, Commodity

Type                        Activity, Practice, Data or Knowledge

Component                A single entity in a map

Anchor                        The user need

Position                Position of a component relative to the anchor in a chain of needs

Need                        Something a higher level system requires

Capability                High level needs you provide to others

Movement                How evolved a component is

Interface                Connection between components

Flow                        Transfer of money, risk & information between components

Climate                Rules of the game, patterns that are applied across contexts

Doctrine                Approaches which can be applied regardless of context

Strategy                A context specific approach

Smart Data                related to IoT data that a decision can be made based on it before

being sent downstream to be processed.


Co-evolution of Practice and Activity

This co-evolution of practice and activity can create significant inertia to change for consumers of that activity. In the case of infrastructure, if the consumers of large powerful servers had developed estates of applications based upon the practices of scale-up and N+1, then as the activity evolved to more utility services those consumers would incur significant costs of re-architecture of the “legacy estate” to the new world.

Our current way of operating often creates resistance (or inertia) to change due to the costs of changing practices (see figure above). In many cases we attempt to circumnavigate this by applying the “old” best practice to the new world or we attempt to persuade the new world to act more like the past. Today, cloud computing is an example of this as it represents an evolution of parts of IT from product to utility services and the “legacy” is often cited as a key issue for adoption or for the creation of services which mimic past models.

Figure x - Stages of Evolution

Page  of


[1]  Breur, Tom (July 2016). "Statistical Power Analysis and the contemporary "crisis" in social sciences". Journal of Marketing Analytics. 4 (2–3): 61–65. doi:10.1057/s41270-016-0001-3. ISSN 2050-3318.

[2] Laney, Doug (2001). "3D data management: Controlling data volume, velocity and variety". META Group Research Note. 6 (70).

[3] Goes, Paulo B. (2014). "Design science research in top information systems journals". MIS Quarterly: Management Information Systems. 38 (1).

[4]  Marr, Bernard (6 March 2014). "Big Data: The 5 Vs Everyone Must Know".

[5] https://datasciencecampus.ons.gov.uk/, https://www.cbs.nl/en-gb/our-services/unique-collaboration-for-big-data-research

[6] Why the Fuss about ServerLess? by Simon Wardley, October 2018

[7] Why the Fuss about Serverless - Simon Wardley

[8] https://aws.amazon.com/blogs/compute/applying-the-twelve-factor-app-methodology-to-serverless-applications/

[9] https://12factor.net/

[10] See https://marketplace.officialstatistics.org/privacy-preserving-techniques-handbook

[11] Other than what can be inferred solely from the function’s output.