Contents - Updated 12/04/19
This document has been prepared by the Technical Delivery Board of the UN Global Platform Committee, which is one of the sub-groups under the UN Global Working Group (GWG) on Big Data for Official Statistics.
The Statistical Commission created the GWG at its forty-fifth session, in 201. In accordance with its terms of reference (see E/CN.3/2015/4) and decision 46/101 of the Statistical Commission (see E/2015/24), the GWG provides strategic vision, direction and coordination of a global programme on big data for official statistics, including for the compilation of the Sustainable Development Goal indicators in the 2030 Agenda for Sustainable Development.
In its decision 49/107 (see E/2018/24), the Statistical Commission reaffirmed that the use of Big Data and other new data sources is essential for the modernization of national statistical institutions so that they remain relevant in a fast-moving data landscape and highlighted the opportunity for Big Data to fill gaps, make statistical operations more cost effective, enable the replacement of surveys and provide more granularities in outputs. The Commission further endorsed the proposal of the GWG to develop a global platform as a collaborative research and development environment for trusted data, trusted methods and trusted learning.
The GWG provides guidance on using Big Data for official statistics. It has produced so far handbooks on Earth Observations for official statistics (see https://marketplace.officialstatistics.org/earth-observations-for-official-statistics), on the use of mobile phone data for official statistics (see https://marketplace.officialstatistics.org/handbook-on-the-use-of-mobile-phone-data-for-official-statistics), and on privacy preserving techniques https://marketplace.officialstatistics.org/privacy-preserving-techniques-handbook
This Handbook provides guidance and recommendations on IT strategy for national statistical institutions, which are embarking on the use of Big Data in the production of their official statistics and indicators. It uses Wardley Maps to understand where the institute has to invest in development and where it can take off-the-shelf solutions. It runs through a number of principles from knowing your users (e.g. customers, shareholders, regulators, staff) and their needs, through challenging assumptions, removing bias, using appropriate methods (e.g. agile vs lean vs six sigma), being pragmatic and managing inertia (e.g. existing practice, political capital, previous investment) to choosing effectiveness over efficiency while leaving no one behind.
The Handbook will cover new topics like Cloud, IoT, Artificial Intelligence, Data Science and Serverless technology and security, as well as the use of standards and generic models for IT management.
The Handbook tries to give a high-level overview of the IT strategy for a modern statistical institute, which should be of benefit for senior managers, and provide sufficient detail to be of interest to the IT professionals.
Chair Technical Delivery Board
UN Global Platform
Member Technical Delivery Board
Member Technical Delivery Board
Member Technical Delivery Board
Statistics Division | Department of Economic and Social Affairs. United Nations
Member Technical Delivery Board
Statistics Division | Department of Economic and Social Affairs. United Nations
Office for National Statistics
Leading Edge Forum
Information Technology (IT) continues to play an essential role in all aspects of statistical processing throughout the entire production life cycle from data collection through to dissemination. This is a fast moving and rapidly changing environment with new innovations being developed at a breath-taking rate.
‘Change has never happened this fast before, and it will never be this slow again’
Since the publication of the last handbook in 2003 the IT landscape has changed almost beyond recognition, but it is predictable - at that time many NSOs were just emerging from the mainframe era and have since evolved through the phases of personal computers, distributed databases the explosion of the internet, smartphones, tablets, cloud technology and new data sources.
NSOs can expect a continued and accelerating the rate of change in the years to come with further advances in Artificial Intelligence, machine learning, increased computing power, smart data and the “Internet of Things”.
These developments combined with changing work practices, increased user expectations, competition from other data providers and a constant drive for modernisation and increased efficiency provide an ongoing challenge for NSOs.
Harnessing the power of IT can help to meet these challenges but only if supported by a robust strategy. A robust strategy consists of understanding the business and technology landscape, a strong set of principles and a clear delivery plan. However, to understand the landscape and communicate it to others requires some form of map. The purpose of producing a map is to help us to communicate our intentions, to allow others to challenge our assumptions and for an entire organisation to learn and then apply basic patterns of change (known as climatic patterns), principles of organisation (known as doctrine) and context specific forms of gameplay. Maps are our learning and communication tool for discovering these things and enabling us to make better decisions before acting.
Throughout time, understanding and exploiting the landscape has been vital in battle as it acts as a force multiplier. Probably the most famously cited example is the battle of the pass of Thermopylae where the Athenian general Themistocles used the terrain to enable 7,000 Greeks to hold back a Persian Army of 300,000 for seven days with King Leonidas and the “three hundred” reportedly holding them back for two of those days.
Maps and situational awareness are always vital to the outcome of any conflict. Maps enable us to determine the why of action – cut off an enemy supply route, gain a geographical advantage over an enemy position or restrict an opponent’s movement. The what (capture this hill), the how (bombard with artillery followed by ground assault) and the when (tomorrow morning) all flow from this, though the specifics change as no plan generally survives first contact intact.
Military maps are traditionally thought of in terms of describing a geographical environment, the physical landscape in which the theatre of battle operates. However, business is equally a competitive engagement between “opponents” but in this case fought over a business landscape of consumers, suppliers, geographies, resources and changing technology.
In Business and IT we almost never have a map of the landscape and we cannot know where to act. Our reasons for action (the why) can only ever be vague unlike the actions of Thermopylae and the exploitation of the environment to restrict a foe.
The lack of any map forces us to focus on the what rather than the why.
A map has three basic characteristics - an anchor (e.g. magnetic North), position of pieces relative to the anchor (this is North or South of that) and consistency of movement (e.g. heading North means to head North and not South). To understand the business landscape we used a Wardley Map that has an anchor (the needs of the user), position described through visibility in a value chain and movement described through evolution. An example, simplified map of a tea shop is given below.
Figure 2 -
The map itself allows others to instantly challenge the position of components, add missing components and therefore collaborate with others in describing the landscape. Each of the components of a map have a stage of evolution. These are:-
This evolution is shown as the x-axis and all the components on the map are moving from the left (an uncharted space of the novel and new) to the right (an industrialised space). This process of evolution is driven by supply and demand competition and hence a map is fluid if competition exists. However, this doesn’t mean organisations will build things in the right way. In our Tea Shop example, questions need to be asked over why we’re custom building kettles.
Figure 3 -
However, the map also has some advanced features which are not so immediately obvious. There is a flow of risk, information and money between components. In fact, the components themselves can represent different forms of capital such as activities, practices, data and knowledge and the lines represent bidirectional flow of capital e.g. a public consumer exchanges financial flow (revenue) for a physical flow (a cup of tea).
Any single map can have many different types of components (e.g. activities and data) and the terms we use to describe the separate stages of evolution are different for each type. In order to keep the map simple, the x-axis of evolution shows the terms for activities alone. The terms used today for other types of things are provided below.
Figure 4 -
Lastly on a map, we can not only show different forms of capital flow (e.g. financial flow, useful in creating income statements) but also our intended actions (e.g. shifting the kettle to more commodity forms) along with climatic impacts such as anticipated changes to components caused by competition (e.g. staff being replaced by robots).
Figure 5 -
With the map above, we can start to discuss the landscape. For example, have we represented the user need reasonably and are we taking steps to meet that user need? Maybe we’re missing something such as an unmet need that we haven’t included? Are we treating components in the right way? Have we included all the relevant components on the map or are we missing key critical items? We can also start to discuss our anticipation of change and our plans for the future.
Maps are part of the strategy cycle, shown below. At the highest level, it consists of an iterative series of four phases. Our first phase establishes purpose - what “game” are we in, what do we wish to attain, leading to a clear “why” of purpose. The next phase is where we observe our landscape and climate (system of forces) to establish context and awareness that we are operating in. Failure to do this adequately means that we are blind to potential opportunities or challenges, with the risk of engaging in lower value or risky actions. Our landscape is captured in the form of maps.
We then proceed to the “orient” phase, where we incorporate important elements of doctrine, in the form of principles and norms directing our action. We identify a number of principles in the section to follow. These provide important “north star” direction to our work. The final phase is that of deciding what we are going to do, resulting in a transition to “Act”. Collectively the Observe. Orient and Decide phases provide our “why of movement” wherein we make decisions on an on-going basis as we proceed i.e. why should we be taking this action over that action.
As we continue to act we further iterate to assess the on-going validity and change our maps and actions accordingly. The maps themselves can be an important source of learning assuming that at some future point we review the map which describes the context with the actions decided and examine what actually happened.
Figure 6 -
The next section addresses doctrine in our Orient phase through a discussion of principles.
The following section outlines the core doctrine of this handbook. This is not an exhaustive list but instead it covers the basic principles that should be applied. These principles are considered to be universal and applicable to all industries regardless of the landscape they operate in. This doesn’t mean that the doctrine is right but instead that it appears to be consistently useful for the time being. There will always exist better doctrine in the future and it is anticipated that this list will expand over time.
The principles are:
Using the SDG 9.1.1 (Proportion of the rural population who live within 2 km of an all-season road) as an example. SDG 9.1.1 is an indicator to support developing quality, reliable, sustainable and resilient infrastructure, including regional and trans-border infrastructure, to support economic development and human well-being, with a focus on affordable and equitable access for all.
The rest of the IT Strategy section walks through creating an IT strategy using the SDG 9.1.1 as an illustration, as seen in figure 1.
Figure 7 - SDG 9.1.1 Wardley map
Understand who the users are — statistical consumers, data regulators, governing organisations like the UN and national governments, staff, or statistics organisations.
Users exist within the wider community such as businesses, other UN agencies, and other NSO’s. In this sense users are identified in the broadest possible context as those who directly or indirectly receive benefit from official statistics.
Other forms of users are data partnerships and collectives where an NSO, third sector organisations, and private sector companies operate together to achieve joint goals.
Data based user research and utilising analytics aids understanding of the groups of people who have a stake in official statistics. The use of user interviews, focus groups, and surveys are common techniques used in user research to gather data about users and user needs.
Knowing who users are is the first stage in identifying whether the value generated by the organisation will satisfy the needs of each particular user group.
Figure 8 - Knowing your users
In the 9.1.1 SDG example, within the wider landscape the key users are the UN, governments, NSO’s, construction companies and the public. These groups all have needs for the SDG 9.1.1 indicator. Some users such as the public will have needs beyond the indicator itself. The public will be interested in outcomes from using the 9.1.1 SDG indicator.
Some of these users will be more visible in the value chain than others which explains their position on the map but in the early stages of creating a map it is enough simply mark who the user is. The following “cheat sheet” for Stages of Evolution helps to categorise where on the horizontal, axis users exist. This is always up for debate, which is the purpose of using a map. For example, Governments have been placed in stage III (noted on the x-axis as product) as it is has characteristics of “increasingly common / disappointed if not available / feeling left behind if not used / advantage through implementation / competing models of”.
Figure 9 - SDG 9.1.1 Wardley map
Figure 10 - Matching users to needs
User needs reflect what a user requires and/or values. For example, a government may have a need for funding which in turn requires the UN and its agencies. The UN has needs to improve the global economy and to end poverty. The UN also has a need to create goodwill with Governments and much of that comes from trust and in turn it is the NSOs that must provide trust in the data services that regulators base decisions upon. The map therefore provides a chain of needs.
User needs can be determined by investigation, by looking at the transactions that organisations makes with users and by examining the customer journey when interacting with those transactions. Questioning this journey and talking with customers will often find pointless steps or unmet needs or unnecessary needs being catered for. Mapping the landscape often clarifies what is really needed.
It is important to discuss user needs both with the users and to talk with experts in the field. Care must be taken to avoid bias, especially where legacy already exists. Be wary of the legacy mindset, the equivalent to a user saying to Henry Ford — “we don’t want a car; we want a faster horse!”
With rare and uncertain situations where users and experts don’t actually know what is needed beyond vague hand waving, take a chance on what the user needs are. These can be revisited with all stakeholders.
Looking at figure above, some examples of the interaction includes; Governments need to interact with construction companies for them to build roads. Public need access to jobs and services, to end poverty (a UN need) in order to do this they need to move from A to B, all season roads are needed for this, and construction companies will build those roads.
Using a common language is essential for collaboration. It is very difficult when different skilled groups or NSO members use their own arcane language and techniques.
For example, let us take a box representation of the 9.1.1 SDG map, translated into Elvish.
Figure 11 -
If we were to ask an observer which components should be outsourced, where should we use standards then it would difficult to provide any meaningful answer beyond simple guesswork. Let us now convert this to a map in the figure below. The map may still be in Elvish, but we can see which components we should be custom building and which components should be bought as a commodity or outsourced to others.
Figure 12 -
The map itself provides a common language for describing an environment despite the labels that might be used. If you break any complex system into components, then some of those components will be uncharted space and are going to be experimental. This is not a bad thing, this is just what they are. For those components then you’re likely to do this in-house with agile techniques or use a specialist company focused on more agile processes. But you won’t give that company all the components because the majority of components tend to be highly industrialised and hence you’ll use established utility providers such as Amazon, Microsoft, Google or Alibaba for computing infrastructure.
It is important for the governance system to provide a mechanism of consistent measurement against outcomes and for continuous improvement of measurements. Using maps means that strategy can be revisited at a later date to see what was done, understand what the outcomes of the strategy were, and scenario planning to understand alternative outcomes.
However the strategic tools used are less important than the principle of revisiting previous strategy in order to understand where we have come from, what was intended, what was actually achieved and what could have been done differently.
With the 9.1.1 SDG if the intention is to drive towards the commoditisation of components in the value chain then revisiting the map at a future date will show how much progress there has been towards this goal, and whether commoditising is still desirable and what inertia has (if any) been faced.
Figure 13 -
In attempting to decide on a course of action and decision making with our components and strategy there are a number of approaches that can be used, and yielding a broad range of results. What is notably absent in most approaches is the lack of context, of an understanding of the landscape you are in, the series of forces (or climate) you find yourselves in, and how that this is never static and instead changes over time.
Our strategy activity can answer a “why of purpose”, determining what outcome we wish to achieve, however there is a second “why of movement” that underlies effective decision making on an ongoing basis. To do this you require maps that let you “see” the landscape, “understand” the forces at play, that can guide you in your “movement” decision making, and capture the results and lessons being learned as you move forward. Plenty of examples from military history highlight the strategic importance of having this complete situational awareness.
We need maps to provide this situational awareness to us - capturing six basic elements - visual representation, context specific, position of components relative to some form of anchor and movement of these components. Many strategies rely on a “story”, a linear progression of directions without providing the contextual view and orienting landscape. A proper map and climatic forces allows us to identify patterns that we can exploit to our advantage.
NSOs can strongly benefit from a change in approach, overcoming traditional approaches that tend to be linear, localized, and inwardly focussed, thus posing strategy risk through a misreading and lack of knowledge of critical patterns in society, technology, data, and marketplaces.
The 9.1.1 SDG map provides situational awareness of the rural all season road indicator from user needs in a broad sense, public, governments and construction as well as the UN and NSO’s.
Understanding the data, sources of data, roads and population that deliver the 9.1.1 SDG, and how rare and certain they are allows decisions to be made about where to play.
Building trust is seen as important both in the way that data is collected, and in the relationship between NSO’s and the public. Understanding the landscape allows debate as to what types of trust there are, and how each type of trust is treated.
Transparency is difficult within organisations. Many people find challenging uncomfortable. The downside of sharing and openness is it allows others to challenge and question firmly held assumptions.
Sharing maps enables challenges and questioning of assumptions. Challenging helps to learn and refine maps.
In order to be open and transparent, maps must be published in one place in one format through a shared and public space.
There is little point in focusing on user needs, creating a common language through the use of a map and sharing it transparently if no-one is willing to challenge it. Challenging others assumptions is a key approach to communicating, innovating, creating and problem solving. Assumptions can too easily be made if not challenged or explained. Often these silent assumptions are based on an idea one believes to be true, or based on prior experience or one’s belief systems.
Maps provide a way to visually display our assumptions of users, needs and relationships and a method for people to challenge them.
The NSOs should avoid rebuilding what already exists and re-use where possible. To support this effort, maps should be circulated and collated to enable the removal of duplication and bias.
Such efforts need to be iterative i.e. it is not necessary to map the entire NSO statistical landscape prior to taking any action. Collating maps to create profile diagrams can also be a mechanism for discovering what future services the NSOs should provide along with what core capabilities are needed.
An example profile diagram, built from many maps, highlighting both duplication and bias is provided below.
Figure 14 -
In the diagram above, website - shows thirteen different references to website across multiple maps. This may be completely valid but it could also mean nine separate installations when only one is needed.
Collection - we see ten different references of which two are bespoke, eight are products and no use of commodity services.. There might be perfectly valid reasons for customised collection but the profile allows us to question this and also whether the nine product forms are stand alone installations or would be better suited for provision by a common service.
Surveys Tools - shows there are 12 instances, but there is a bias towards custom building these tools. If some are custom and some are nearly commodity, can they all be right?
Collating maps often helps in creating a common lexicon. The same thing is often described with different terms across organisations or within a single organisation.
When viewed from an airplane objects on the ground can appear to be very similar, yet they may in fact have important distinctions between themselves that could be vital to the success of actions and decisions.
A mismatch between understanding the important characteristics of components (within the context of analysis, method, and action) and the level of knowledge and understanding resident in decision makers and actors can be very problematic. Couple this with the tendency in many organizations to push decision making higher up (to senior management or executive layers) and the results can be messy.
Decomposing components into smaller components allows you to better understand the parts and their relationships, and it allows for the use of powerful techniques where small teams are organized around the smaller component pieces with clear contracts with producer and consumer components around them. Two examples of successful small team strategies are Amazon’s Two Pizza model and Haier’s Cell based structure.
The following figure shows how this has been done in a map.
Figure 15 -
The complexity of managing the contracts and interfaces between the components is more than offset by the benefits arising from having small, autonomous teams with clearly defined outputs who can be creative in their approach to realizing their components and services. This clearly supports the “use the right method” principle as it is evident that the components in the figure are in different phases of their maturity, thus supporting the ability to make effective choices.
It can be tempting to apply a new or existing method to decisions and actions without regard for the specific nuances surrounding a component. No “one size fits all” approach can be successful - it is important to consider a number of important factors - landscape, climatic forces, and position in the map are key.
Techniques that might be appropriate for commodity phase components (with a bias to standardization and certainty) will not work well for genesis phase ones, and vice versa. In the genesis case there is no standardization because of the “new” and “discovery” nature of the component. Similarly not looking at the context for components will likely result in surprises and failure.
It is important to understand that components themselves are assembled from subordinate ones, and the constituent pieces may themselves be in different phases. Outsourcing a CRM platform that is perceived to be a commodity may overlook the fact that there are additional features added to the solution that are in the custom or product phase, with a potential result being cost growth and solution degradation through contract-based conflicts.
An example helps clear this principle. In the following diagram for our SDG 9.1.1 example, a sourcing decision is being discussed.
The component labels are shown in elvish - in practice these could be english names, but for the decision making participants the words are completely opaque, as if in another language. A meaningful discussion and decision will be very difficult to arrive at.
In contrast, we can look at the context and maps that we have as in the following picture.
In this case we see that a component of the SDG 9.1.1 is actually in the commodity space, which indicates that embarking on a custom build decision for that component will be a mistake. The map has provided the important lifecycle context to allow the right method to be chosen.
Preparation and analysis is required to provide the context for selecting the most appropriate methods to our components. Methods have intrinsic strengths, weaknesses, and assumptions that influence their effectiveness in situations - situational awareness is required as well. Method selection covers a broad range of topics, including project management and delivery (e.g. agile), process effectiveness and efficiency (e.g. lean, kanban, six sigma), sourcing strategies (e.g. outsource, insource, open-source, commercial product), and more.
An effective practice is to indicate on your maps what methods are to be applied for each component, to decompose components further where they are composed of subcomponents in different phases, to understand the rate of change and climatic factors that will directly influence the creation, acquisition, and operation of components.
More importantly it is critical that the components are in the correct phase of product/commodity, etc. Using the right methodology when the component is in the wrong phase does not provide an effective outcome. For example, the map below is from the HS2 project within the UK and this shows which components should use the six-sigma methodology and are outsourced.
Using our SDG 9.1.1 example, we can see that NSOs are using the Agile methodology for developing their custom components, but the map shows us that these components should be commodity services and outsourced.
By placing the emphasis on speed, inexpensive components, keeping it simple and restraint (FIST) ensures that the focus is on simplifying the problem wherever possible and building in smaller components. The use of prototyping will provide faster feedback loops which will ultimately deliver greater benefit. Taking time to perfect a plan results in it being more susceptible to changing technological, market and economic changes. Delaying action increases the risk of scope creep which will increase the complexity of our design.
It is important to accept that not everything will fit perfectly into the model that has been described. There may be very good reasons for an approach, in theory. Most cases are applicable and would greatly benefit.
However a one size fits all approach often fails to work. Trying to use a one size fits all approach can lead to inertia and resistance. It is more important to maintain a pragmatic approach than an ideological one.
It may well be that it is more pragmatic to maintain an existing IT estate. Auditing and sweating the existing estate until they are replaced.
The future estate may require a fundamentally different approach such as agile, open source, or local delivery.
In the example, the drive to commoditise survey tool to support 9.1.1 SDG may, in theory, be desirable. However moving survey tools to a commodity may encounter inertia and resistance from NSO’s who see survey tools as part of their reason for being. Therefore, in order to be pragmatic survey tools may need to remain custom built.
In any established value chain, there exist interfaces between components along with accompanying practices. There is a significant cost associated with changing these interfaces and practices due to the upheaval caused to all the higher order systems that are built upon it e.g. changing standards in electrical supply impacts all the devices which use it. This cost creates resistance to the change.
You also find similar effects with data or more specifically our models for understanding data. As Bernard Barber once noted even scientists exhibit varying degrees of resistance to scientific discovery. For example, the cost associated with changing the latest hypothesis on some high level scientific concept is relatively small and often within the community we see vibrant debate on such hypotheses.
However changing a fundamental scientific law that is commonplace, well understood and used as a basis for higher level concepts will impact all those things built upon it and hence the level of resistance is accordingly greater. Such monumental changes in science often require new forms of data creating a crisis point in the community through unresolved paradoxes including things that just don’t fit our current models of understanding. In some cases, the change is so profound and the higher order impact is so significant that we even coin the phrase “a scientific revolution” to describe it.
The costs of change are always resisted and past paradigms are rarely surrendered easily — regardless of whether it is a model of understanding, a profitable activity provided as a product or a best practice of business. As Wilfred Totter said “the mind delights in a static environment”. Alas, this is not the world we live in. Life’s motto is “situation normal, everything must change” and the only time things stop changing is when they’re dead.
The degree of resistance to change will increase depending upon how well established and connected the past model is. A map can be used to anticipate not only a change but also the likely sources of resistance whether past activities (sunk capital) or practices. There are many forms of inertia from financial to political capital, from past best practice, to the cost of training in new practices. This inertia should be actively managed.
Such a change is problematic for several reasons:
For the reasons above, the existing business model resists change and the more successful and established it is then the greater the resistance. This is why the change is usually initiated by those not encumbered by past success.
This resistance of existing suppliers will continue until it is abundantly clear that the past model is going to decline. However, by the time it has become abundantly clear and a decision is made, it is often too late for those past incumbents.
In the 9.1.1 SDG example we would want to move towards commoditising trust, methods, processing and survey tools but may encounter inertia from a belief that these components rely on a customised approach based on existing practices within NSO’s. NSO’s may believe that custom surveys need to be created and processed to support independent statistics. Challenging the need to customise surveys will overcome the drive towards standardisation.
The company has its own data centres and it is rapidly growing which is creating a problem because of the time it takes to order and install new servers. The basic process is fairly straightforward, they order new servers, the servers arrive at goods in, they are then modified and mounted. The company analysed the flow (a rough diagram is provided in figure below) and found that the bottleneck was caused by modification of the servers due to the number of changes which need to be made before mounting.
The solution? To automate the process further including investigating the possible use of robotics. It turns out that the process of modification is not only time consuming but can cause errors and the cost of automating this process (given the rise of general purpose robotic systems) can be made given the future expectation of growth. The figures seems encouraging. Getting rid of bottlenecks in a flow sounds like a good thing. It all sounds reasonable.
The first thing to do is map out their environment including the flow. The map is provided in figure xx below.
Now, as we can see from the map then goods-in and ordering are fairly commodity (i.e. well defined, commonplace) and that makes sense. Computing on the other hand is treated as a “product” which seems suspect but far worse is they are making some fairly custom modifications to this “product” before mounting. Also, what is racks doing in the custom built section? Racks are pretty standard things aren’t they?
Well, not in this case. Here the company is using custom built racks which are slightly smaller than the 19 inch standard. Which is why they need to modify the servers. The company needs to take the cases off the servers, drill new holes and mount rails to make them fit. The proposed automation is about using robots to improve that process of drilling and modifying servers to fit into custom built racks. Wait, can’t we just use standard racks?
If we take our map, and mark on how things should be treated then surely it makes sense to use standard racks and get rid of the modification.
In fact if we look at the map, we should also mark compute as a utility (which is what it is now).
With the adoption of the 2030 Agenda, UN Member States pledged to ensure “no one will be left behind” and to “endeavour to reach the furthest behind first”. This principle equally applies to the modernisation. The data revolution with new data sources, new innovative methods and technologies and new partnerships can transform the operation of statistical systems in developing as well as developed countries. Without the old and often complex methods and legacy infrastructure and application landscape in some cases developing countries are even better positioned for fast leap forward. It is therefore very important that modernisation progress doesn't leave any country behind and that all benefits are achieved across global statistical community.
In order to better illustrate this practice of creating an effective IT and business strategy we have created a partial series of maps addressing UN SDG 9.1.1 - access to roads (Proportion of the rural population who live within 2 km of an all-season road. Develop quality, reliable, sustainable and resilient infrastructure, including regional and trans-border infrastructure, to support economic development and human well-being, with a focus on affordable and equitable access for all).
Our first phase is to establish purpose - this is expressed in the UN SDG 9.1.1 as noted above. For the SDG, our “why of purpose” is to ensure that all people have access to all-weather roads everywhere by 2030. There are a number of actors in this context. Our first map shows a subset of them - Government, National Statistical Organizations, UN, Public (at large), and construction industry. Each of these actors has “needs” that are to be addressed by this work.
We now observe our landscape to identify further components. The following map shows additional components in our landscape that are important as we strive to identify what our NSO strategy should be.
In this map we see the actors above identified as red circles. There are relationships between these actors and components - governments provide funding along with NGO’s and donors, Government relies on votes from the Public, the UN is a trusted independent agency that requires goodwill from governments, and so on. Importantly the UN (with countries) wish to improve the global economy and end poverty, and it has been determined that access to jobs and service is an important part of making that happen. The NSO in this case is an important trusted contributor through providing high quality objective indicators to the other actors to support their needs and goals. The NSO’s require access to data sources in order to make this happen.
We now look further at the data source component, as shown in the next map.
In this map we see that we have a continuum of data sources, ranging from the “well-known” standardized sources (those provided by underlying traditional survey mechanisms) to “newly emerging “other” data sources”. Satellite data is currently available as a product, but a significant amount of it is becoming a commodity. The red line shows that part of our strategy is to see this move from “genesis / custom” to “commodity” happen for all of our sources. By creating a global data platform and participating with the global marketplace we can identify and access standardized data to support all countries’ NSO’s.
If we look beyond the data pipeline we can now look at the next map that adds in the tools and methods used to create these data sources. We introduce survey tools (potentially across the GSBPM spectrum), methods, and underlying infrastructure components (processing, storage). We see that for may NSO’s their survey tools are all in the “Custom” phase - internally developed and used. Furthermore, for those with traditional data centres their processing is also a custom configuration, either at the Agency level or at a “whole of government” service provider level. The fact that these tools are all custom is a challenge that has been at the centre of many years of collaborative activity to change (through HLG and other work).
What shall we do with this? In the next map we identify that strategy (through NSO collaboration, the UN Global Platform, and open-source community) is to take advantage of the fact that the elements in the red circle are actually commodities in the broader landscape, and to identify that we wish to effect a change to how we source and use them. The red arrow signifies the move to the Commodity phase.
At this point we should step back and look at the landscape from our earlier broader perspective. The following map shows how we view the overall context.
The value of this map is apparent in how it now situates a technology strategy conversation within the broader business strategy context. We can picture how, in the context of the UN and SDG activities with member countries, there is a desire to attain our goal for 2030, and at the same time it is very likely that each NSO has some version of the elements in the red circle at the bubble. Our UN Global Platform provides a possible means for individual NSO “wins” and importantly to achieve a scalable, accessible, effective strategy at a global platform level at the same time.
Looking at the bigger picture in our map, an important element in our landscape is that of “trust”. This is shown in the following map.
The value of our NSO’s hinges on our trust - of objectivity, quality, applicability, and other elements of typical quality frameworks. This is an important component in the broader context. On the other hand,we see that our desire to move our tools, methods, processing, and storage to commodity (cloud, standardized methods, and tools) hinges on a related but different trust - are the public algorithms and methods sound, is there something unique to our practice, will there be sovereignty or residency issues with our data in the cloud, and so on. On the one hand our “trust” is a key differentiator and value that we deliver to governments and the public. On the other hand “trust” is appearing as an inhibitor in our move to establish a more effective approach to the elements in the red circle.
In our next map we focus again on key pipelines for data tools, methods, sources. Our red lines show our desire to move those elements to the commodity side to take advantage of all that the broader market has to offer - open-source, global data, collaborative communities. We also show a pipeline for the data itself, reflecting that our data space also has a span ranging from “well-known, standardized” elements such as address, to more custom and early elements to the left. Much of our standardization activity is focused on moving data to the right.
A further examination of the “trust” element noted earlier allows us to probe further into impact and nature of the element. The black bars represent “inertia” - elements that act as impediments to our desire to standardize and commoditize access to data, methods, and tools. It is important to identity these elements as we will need to address them individually (per NSO or department within an NSO) as well more broadly. Clearly we want to take advantage of traditional mechanisms such as peer review, publishing, etc.as part of this process; contracts can also address inertia elements for the tools, computation, and storage.
As a final step, we look again at our broader map and do a summary.
We see clearly the elements pertaining to helping realize the SDG by 2030, and the role of the NSO’s and UN in this broader context. We highlight a part of the landscape for NSO’s that addresses the means by which we deliver trusted insights to the ecosystem. We have a continuum of data sources we draw on, all of whom are moving to the right as time progresses (many governments are pursuing data marketplaces and sharing of government registers, for example, like Denmark and Estonia). We see that we can “leave no one behind” (across all country NSO’s) in our journey by driving the growth and accessibility to trusted algorithms, methods, tools, data, and infrastructure and we can derive what actions will be required. As result of this “observation” phase discussion we can now incorporate our doctrine (principle) elements, and then through leadership arrive at our “act” plan. We have identified that we will need to “unpack” trust as an inertial force, keeping the important value aspects while testing cultural and other resistance, to identify technical and non-technical actions that may need to be taken.
We have also identified the “win-win” opportunity for NSO’s to realize the vision of the UN Global Platform. The Platform provides a mechanism by which we can create this open, trusted marketplace of methods, data, tools, infrastructure, either by direct use or through collaborative development (with subsequent importation to individual NSO’s) to further their individual strategies and goals.
The use of our strategy cycle and maps allow us to create a coherent foundation to further elaborate and realize our goals.
All technologies and techniques go through the four phases, and so do practices, data and knowledge. If we place data, data science, artificial intelligence and serverless technology onto a Wardley Map, we can easily understand what methodologies and processes we need to handle these technologies and the impact on strategy.
"Big data" is a field that treats ways to analyse, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. Big data was originally associated with three key concepts: volume, variety, and velocity. Other concepts later attributed with big data are veracity (i.e., how much noise is in the data) and value.
All data is structured and goes through four phases of evolution. Unmodelled, divergent, convergent and modelled. All data is structured and will go through the evolution from unmodeled (e.g. we don't know what the structure is and hence we call it unstructured) to modelled.
Such data sources may not have predefined data models and often do not fit well into conventional relational databases. Many big data sources are still in the genesis, custom or product phases, therefore the data is not modelled. This will require the NSOs to focus efforts on modelling the data, and will require the use of new skills and roles, such as data engineers or data wranglers.
Despite the high expectations for using Big Data, the reality is currently proving to be that while the technology needed to process these huge data sets is available and maturing, the biggest obstacle for an NSO is to actually gain access to the data. This lack of access can be due to the reluctance of a business to release their data, legal obstacles or concerns about privacy. See privacy preserving techniques section.
Implications of Big Data for an NSO IT infrastructure and skill requirements
At the time of writing there are very few cases of using Big Data in NSO production processes with the exception of geolocation data. This type of data has the advantage of being both relatively simple to access and also structured in a standardised way that allows integration with traditional datasets to provide additional and more granular information on location.
There has been much discussion on what the position on an NSO should be vis-à-vis Big Data – scenarios include NSOs developing a role as ‘brokers’ of Big Data which has been integrated by a third party, or simply providing a stamp of official quality to such datasets once they have validated its content and methodologies. Once the NSO has mapped their landscape, they will be able to create a strategy on how to handle big data.
In order to make use of Big Data, an NSO will require access to large compute resources and staff with new skills. The processing of increasingly high-volume data sets for official statistics requires staff with statistical skills and an analytical mindset, strong IT skills and a nose for extracting valuable new insights from data – often referred to as “Data miners” or “Data scientists”.
NSOs need to develop these new analytical capabilities through specialised training. Skills will include how to adapt existing systems to integrate Big Data into existing datasets and processes using specific technology skills.
Pros and cons of cloud technology for an NSO
Cloud computing offers solutions to NSOs of all capacity levels but is especially useful for allowing countries with weak infrastructure to access more advanced computing technology. As the cloud requires only a reliable internet access it can eliminate high upfront costs of installing hardware and software infrastructure and reduce the resources needed for ongoing maintenance.
Cloud technology is scalable and can enable IT teams to more rapidly tailor resources to meet fluctuating and unpredictable demand and can remove the layers of complexity in setting up infrastructure. It can enable the sharing of software as applications can be saved on the cloud by one organisation and used by other NSOs as needed.
The inherent security capabilities of large scale Cloud computing services can guarantee a higher level of security for some lower capacity NSOs than their own infrastructure.
For some NSOs problems could be related to confidentiality as data on citizens is not stored on site which may go against legislation.
Artificial Intelligence (AI) is the development and application of computer systems that are able to perform tasks normally requiring human intelligence such as adapting and learning. These learning tasks include problems commonly associated with human intelligence, such as learning, problem solving, and pattern recognition.
In section 1.5.1 the availability of big data is discussed. This increasing volume, velocity and variety of data presents new challenges to NSO’s in terms of how to interpret this larger, more frequent and multiple types of data.
A main component of AI is Machine Learning (ML) which is the way the computer can learn from provided “training data” without being explicitly programmed. ML is separated into supervised and unsupervised learning techniques. Unsupervised learning uses training data without the desired output, whereas supervised learning uses training data that includes the desired output. An example would be to predict happiness based on social media posts or understand disease prediction using healthcare claims data.
The field of AI is evolving and growing rapidly, with further subfields being added regularly. Deep Learning is one such field It layers multiple algorithms to identify more relationships than humans could attempt to do with complex data with programming alone.
AI has major potential for NSOs to process and interpret big data faster and with greater accuracy. The benefit to the end users of the statistics is richer, more accurate and more up to date information in order to inform decisions.
NSOs wanting to exploit the benefits of AI will need to engage staff with new skills including the various methods of Machine Learning such as Supervised Learning, Unsupervised Learning, Reinforcement Learning and Deep Learning.
AI-based applications can replace or augment certain tasks. This presents several challenges to brownfield organisations, such as upskilling existing staff from administerial functions to data analysts, data engineers and data scientists. A careful communication strategy will be required for such a game-changing technological transformation to ensure staff are engaged and buy into the transformation and reduce the friction of change.
The job title of ‘Data scientist’ has emerged in recent years in parallel with the growth of Big Data. The two fields go hand in hand in the pursuit of extracting knowledge and insights from new data collections in various forms, both huge structured datasets or unstructured alternative data sources.
Data science requires a mix of expertise. Technology skills are required to manipulate Big Data using techniques such as massive parallel processing to analyse volatile unstructured data, perform data cleansing and then distilling it into a format suitable for analysis. Mathematical skills are needed to write the complex algorithms used in analysing these data. Statistical skills are needed to investigate the data and to respond to questions and derive insights from them. Other skills like Machine Learning and Deep Learning may also be required. .(Refer to para..)
Data Scientist candidates are consequently a highly sought-after species and many universities now offer data science courses. Given the range of skills involved in data science, the reality is that such tasks are carried out by a team rather than a single individual. Such teams consist of a data engineer who would access the primary data source and render it into a structured format, a software developer to write the routines to clean and aggregate the data and the data scientist who would create algorithms and use statistical techniques to gain insights from the data.
A typical list of the required competencies for a Data Science position would include:
Amalgam of data engineer, software engineer, data scientist (algorithms etc)
Links to guidelines, best practices and examples:
Serverless is the next phase after cloud computing. Serverless is enabled by the use of cloud components, removing the need to curate servers within the environment, e.g. patching, etc. Serverless is a fundamental change in architecture both in IT and the business. Allowing the organisation to focus on creating more value and not maintaining hardware.
Serverless is an event-driven, utility-based, stateless, code execution environment.
1. Serverless is a code execution environment. This means that a developer using a serverless environment is solely concerned with writing code and consuming standard building blocks provided through the services. Any code written could represent a discrete function or an application, which is a logical namespace for a collection of functions. Within a serverless environment there is no concept of machines, operating systems or the mechanics of distribution or scaling as they are all dealt with in the background.
2. Serverless is event-driven. This means that the initiation and execution of the code is caused by some event or trigger – for example, the calling of an API service or the storage of a file. The code is not running, listening and responding to some input, but instead is initiated and executed by the event.
3. Serverless is utility-based. You pay for the code only when it is running, and the cost paid depends upon the resources the code consumes. If you build a function to respond to an event which is subsequently never called, your cost of running the function will be zero because it will never actually run. There is no payment for idle compute or hosting. This utility-based charging is known as billing per function.
4. Serverless is stateless. The environment in which the code runs is constructed in response to the event, the code is then executed, and the environment deconstructed. This means that any information that needs to be passed between function calls must be stored or retrieved from services outside of the environment, such as a file storage, database or message queue service. If, for example, you call a function twice (through two different events), you cannot assume the same environment exists for both function calls, so no information can be passed between the two function occurrences by virtue of the environment they are running in.
Creating a serverless architecture is more a mindset of stitching together Lego
bricks to create a structure than designing your own Lego bricks. The result is significant
benefits in efficiency.
1. Serverless is a shift of a code execution environment from a product stack (such as LAMP and .NET) to a utility stack (such as AWS Lambda). A necessity of being a true utility is the environment has no permanence (i.e. it is stateless) and is only invoked when needed (i.e. event driven). This change potentially impacts the entire application pipeline and a vast array of value chains above.
2. Serverless lowers (Mean time to Repair) MttR through both action (automation of the code execution environment including provisioning and scaling) and observability (billing per function). A similar impact happened with IaaS.
3. The change in MttR will lead to a new set of co-evolved practices.
4. The provision of a serverless environment will increase efficiency, speed and access to new sources of worth by expanding the adjacent unexplored.
5. Serverless will not only impact novel and new activities but also existing application pipelines.
6. The focus of development will shift away from the lower-order systems including containers, VMs, DevOps and IaaS itself. Lower-order systems will increasingly become invisible and considered legacy.
7. Inertia to change will be reinforced by preexisting acts and practices, so those who have invested heavily in DevOps and IaaS may be disproportionately affected by the change compared to those who have not. We expect a high proportion of ‘traditional’ NSI’s to be among the early adopters of serverless as they seek to jump ahead.
The advice from the UK’s National Cyber Security Centre (NCSC) recommends using Serverless components as they are more secure than ones built on IaaS. NCSC believes that architectures that use Serverless components (on a good cloud platform) will be more secure than ones build on IaaS or on-premise.
NCSC defines serverless components as;
We recommend that NSIs:
As mentioned in the foreword, the GWG has also delivered a Handbook on Privacy Preserving Techniques Handbook, which describes the issues involved in privacy protection via a Wardley map, illustrates how various parties can work together in a controlled setting, such as the UN Global Platform, and which techniques can be used to guarantee confidentiality.
The figure below shows a top-level Wardley map of the ecosystem of a national statistics office (NSO) computation. Wardley maps are widely used to visualise priorities, or to aid organizations in developing business strategy. A Wardley map is often shown as a two-dimensional chart, where the horizontal dimension represents readiness and the vertical dimension represents the degree to which the end user sees or recognizes value. Readiness typically increases from left to right, while recognized value by the end user increases from bottom to top in the chart. As shown, NSOs are charged to deliver diverse official statistical reports, which sometimes rely on sensitive data from various sources.
The next figure below illustrates a setting where multiple NSOs collaborate under the coordination of the United Nations. NSOs from individual nations act as Input Parties in this setting to share their results and methods with each other on the UN Global Platform. In this setting, the Global Platform takes on the role of the Computing Party. Also in this setting the Result Parties may be more diverse than in the first setting above: people, organizations, and governments across the world may receive and benefit from reports produced by the Global Platform.
Figure 2: Privacy-preserving statistics workflow for the UN Global Platform
The Privacy Preserving Techniques Handbook lists five techniques for statistics that will help reduce the risk of data leakage.
The importance of data security
Data security is of paramount importance for an NSO. Confidentiality and privacy are therefore one of the most important of the Fundamental principles (Ref. Chapter 3.2.6) and a major concern for citizens. Maintaining data security is vital for the good reputation of an NSO.
National statistics are aggregated from individual records and often contain personal or commercial information - thus security measures must be designed to preserve data confidentiality and ensure data is accessible only by authorised people and only on an as needed basis. Alongside public concerns with data confidentiality and privacy, there is a growing demand for researchers to access microdata – and this access is often limited by the fear that confidentiality protection cannot be guaranteed.
There are a number of ways an NSO can address data security.
Security measures can be implemented at the level of the data by using anonymisation techniques so that individual records in a microdata set have personal details removed so that identification of individuals is highly unlikely.
Security measures can put in place at the physical level by restricting access to where the data is stored and implementing strict data controls. Many NSOs have set up of Data Laboratories where on-site access to microdata is under NSO supervision with strict audit trails and supervision to ensure no confidential data leaves the premises.
An alternative to Data Laboratories are Remote Access Facilities (RAFs). RAFs are becoming increasingly important as a way of facilitating secure access to microdata in order that researchers do not have to suffer the inconvenience of having to go to the NSO premises but can rather launch algorithms for microdata remotely via the internet. The job is then run by the NSO and results returned to the researcher while the microdata do not actually leave the NSO.
Procedural measures include vetting processes to approve requests for individual researchers access to microdata and the signing of contractual agreements with these researchers that include penalties if security rules are breached.
The growth of linked data
The term Linked Data refers to the method of publishing structured data so that it can be interlinked through semantic queries, connecting related data that weren’t formerly related. It is defined as "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web”.
In practice Linked Data builds upon standard Web technologies such as HTTP, the Resource Description Framework (RDF) and Uniform Resource Identifiers (URI), but instead of using them to generate standard web content as pages to be read by users, it extends them to connect information in a way that can be read automatically by computers.
In this way data is linked to other data, fully exploiting these connections so its value increases exponentially. Thus, data becomes discoverable from other sources and is given a context through links to textual information via glossaries, dictionaries and other vocabularies.
Uses of linked data for an NSO
There are a number of examples of NSOs using Linked data that can be interlinked and become more useful. The process is lengthy and requires significant investment.
The importance of data integration and data linkage for an NSO
As new data sources are becoming available NSOs are facing the challenge of finding ways to integrate changeable and often unstructured data with traditional data maintained by the NSO in order to produce new and reliable outputs.
Data integration provides the potential to augment existing datasets with new data sources, and produce timelier, more disaggregated statistics at higher frequencies than traditional approaches alone.
New and emerging technologies are available to support data integration and NSOs need to ensure that staff have the necessary new skills, in particular the need for data scientists - new skills, new methods and new information technology approaches, designing new concepts or aligning existing statistical concepts to the concepts in new data sources.
Examples of data integration and data linkage
There are many possible types of data integration which include using administrative sources with survey and other traditional data; new data sources (such as Big Data) with traditional data sources; geospatial data with statistical information; micro level data with data at the macro level; and validating data from official sources with data from other sources.
Links to guidelines, best practices and examples:
There are a number of models for managing IT staff and resources in an NSO. These range from in-house development, outsourcing and offshoring IT work to external companies, a hybrid in-house/outsourced approach, and more recently, a collaborative approach to development. There has been a continuous cycle of changing approaches to management over the years – these depend on a number of factors: the size of the organisation; government policy; budgetary issues; general management attitude toward managed IT or IT support models; overall staff resources; and global IT trends.
Insufficient resources for IT functions will create problems for entire organisation so it is vital to understand which support model is best to run an organisation effectively.
Each different approach has its advantages and challenges.
Outsourcing is a global practice that is often disparaged in the popular press due to associations with excessive costs and failure. The problems are generally not with outsourcing per se but instead what is outsourced.
The concept of outsourcing is based upon a premise that no organisation is entirely self-sufficient nor does any have unlimited resources and some work can be conducted by others at a lower cost. The organizational focus should not therefore be on the pursuit of capabilities that third parties have the skills and technology to better deliver and can provide economies of scale.
This practice is common in all industries; the machine manufacturer doesn’t have to make its own nuts and bolts and can instead buy those from a supplier.
In IT, it is not uncommon to treat entire projects as single things. For example, we will take the Fotango value chain and imagine that we had decided to outsource the development and maintenance of Fotango to a third party on the assumption that the Fotango system was a single thing and someone else could better provide it with economies of scale.
Being a consumer of these outsourced services, we’d want to ensure that we’re getting value for money and the features we require are delivered when they are expected. Hence the process of outsourcing often requires a well-defined contract for delivery based upon our desire for certainty i.e. we’re getting what we expect and paid for.
As a result both parties will tend to treat the entire activity as more linear and hence structured techniques are often applied with formal specifications and change control processes.
However, looking at the Fotango system through the lens of value chain vs evolution, we can see that whilst some components are linear (e.g. compute resource, installation of a CRM system), other components are clearly not (e.g. image manipulation system).
The more chaotic components will inevitably change due to their uncertain nature and this will incur an associated change control cost. In a review of various studies over the last decade, the most common causes of “outsourcing” failure have been cited as buyer’s unclear requirements, changing specifications and excessive costs.
However, it’s the very act of treating large-scale systems as one thing that tends to set up an
unfavourable situation whereby the more linear activities are treated effectively but the more chaotic activities cause excessive costs due to change. In any resultant disagreement, the third party can also demonstrate this by showing that the costs were incurred due to client’s changing of the specification but in reality those more chaotic activities were always going to change (see figure below).
The “excessive cost” associated with these changes should be unsurprising as a more structured technique is applied to a more chaotic activity. A better approach would to subdivide the large-scale project into its components and outsource those more linear components.
In today’s world, this is in effect happening where well defined and common components such as compute resources are “outsourced” to utility providers of compute (known as IaaS – infrastructure as a service). Or equivalently, well-defined and common systems (such as CRM) are “outsourced” to more utility providers through software as a service.
Those more chaotic activities offer no opportunity for efficiencies through volume operations because of their uncertain and changing nature and hence they are best treated on a more agile basis with either an in-house development team or a contract development shop used to working on a cost plus basis. Outsourcing itself is not an inherently ineffective way of treating IT, on the contrary it can be highly effective. However, it’s important to outsource those more linear components that are suitable to outsourcing.
The In-house model is the case where all software development and maintenance are carried out within the NSO by staff of the IT department. This model was quite common in the past but is much less so today as IT and other support services are wholly or partially outsourced to external companies or individual freelance IT experts.
Statistical processing is a niche market for software vendors and very few ‘off the shelf’ products exist for managing the statistics processing life cycle and consequently NSOs often have a legacy of internally developed statistical software (unlike, say Human Resources or budget planning which have a large range of commercial software solutions). This can make maintenance and evolution more complex as upgrades have to be coded rather than being provided by vendors.
An advantage of the in-house model can be the autonomy of development and stability of teams can provide capability to ensure the stability of systems and the technical know-how of often complex processes is retained by the NSO. A common challenge with this approach is the difficulty of attracting and retaining IT staff, as salaries in the NSO are often not competitive with those in the private sector, particularly in developing countries. This can result to a high turnover as staff leave once they have been trained in the IT skills which are in high demand in the marketplace.
In-sourced IT can be extremely costly, as it requires investment in time, training, employee salaries and benefits, and management.
In this Outsource IT management model, the main part of development and support is carried out by external resources. External resources can be onsite and come from local suppliers, be offshore and coordinated remotely, or a mixture of the two.
Using external resources has the advantage of flexibility in that resources are used only when needed for specific tasks which can save costs. Points to take into consideration when using external resources include the loss of institutional knowledge when a consultant leaves and the lack of continuity in a project when an outsource provider changes personnel due to their own priorities. This is a particular risk for low capacity countries when external staff are brought in, often by a donor agency, to implement a system – once the work is completed and the consultants leave there is inadequate internal capacity to maintain and use the system.
One should also consider vendor incentives as it can be in the interest of external staff to extend a task as long as possible, so it is important to ensure a transparent and ethical relationship with vendors and close monitoring of projects.
This also applies to low capacity countries where donor aid can be linked to adopting a particular software package implemented by external consultants that the NSO is then unable to maintain themselves once the consultants leave. Cases exist where countries have been left with multiple tools meeting similar needs, particularly in the domain of dissemination software.
Outsourcing is typically a less expensive option, as there are minimal training and time involved with IT management and outsourcing typically converts fixed IT costs into variable costs which allow for effective budgeting as you only pay for what you use when you need it. By outsourcing, IT administration is trusted upon the experts to provide leadership and professional expertise for IT solutions.
In the hybrid In-house/Outsource model, the NSO uses both its own staff and also external staff. A hybrid IT model requires internal and external IT professionals to support the business capabilities of the enterprise. With this model, in some cases only the IT managers are NSO staff members while all development and support are carried out by external staff. Other cases can include more of a mix of both managers and IT experts.
This model is widely adopted in NSOs and is the most common approach because it reflects the realities of the IT market - high mobility of staff with the latest skills are highly mobile and difficult to retain.
This approach typically allows an organisation to maintain a centralised approach to IT governance, while using experts to deal with the functionality that is beyond the capabilities of the organisation’s IT staff.
The collaborative model of NSOs working together on IT projects is a trend that is increasing considerably in recent years. Collaborations can take the form of several organisations working together to develop a software that they will then all use; it can be a single organisation developing a software tool that is then adopted and, possibly, further developed by other NSOs.
In the past the vast majority of statistical software used by an NSO would have been developed within an NSO for use only by that NSO - today the trend is for there to be a mix of the older legacy software and common shareable tools.
The collaborative approach has many obvious advantages for an NSO. These advantages include sharing the software development burden and also sharing experiences, knowledge and best practices through multilateral collaboration and help build collective capacity. It also reduces risk for new developments through additional scrutiny and testing according to open-source principles with all members benefiting from each other in terms of ideas and methods.
Collaborating on projects does of course have its own challenges for an NSO – particularly in determining how to balance development priorities between the different partner organisations and the increased complexity of project management in the context of multiple partner collaboration. To achieve this model, partnership management capabilities will need to be developed in an NSO (Ref HR chapter).
Links to guidelines, best practices and examples:
Standards are enablers of modernisation - by using common standards, statistical systems can be modernised and “industrialised” allowing internationally comparable statistics to be produced more efficiently. Standards facilitate the sharing of data and technology in the development of internationally shared solutions which generate economies of scale. A number of major statistical standards are in use today while others are emerging and maturing. But in the fast changing and interconnected world it is not enough to rely only on statistical standards. Other useful categories are official standards (for example ISO/IEC 11197), industry/domain standards and taxonomies (for example XBRL), open standards (standards covered by W3C) and even widely used de-facto standards (such as JSON and PDF formats). While the focus in this document is on statistical standards it is strongly recommended to consider other types of standards and define their role in statistical organizations.
In the next sections we will describe the most relevant statistical industry standards, their purpose, evolution stage and their potential for modernisation activities.
Figure xx -
Refer to chapter 5 “The National Statistical Office”
The Generic Activity Model for Statistical Organisations (GAMSO) is the standard covering activities at the highest level of the statistics organisation. It describes and defines the activities that take place within a typical organisation that produces official statistics. GAMSO was launched in 2015 and extends and complements the Generic Statistical Business Process Model (GSBPM) by adding additional activities beyond business processes that are needed to support statistical production. It is part of the common vocabulary of collaboration.
The GAMSO standard covers four broad areas of activity within an NSO: production; strategy and leadership; capability management and corporate support. It provides a common vocabulary for these activities and a framework to support international collaboration activities, particularly in the field of modernisation and can be used as a basis for resource planning within an NSO. GAMSO can contribute to the development and implementation of Enterprise Architectures, including components such as capability architectures, and also support risk management systems.
GAMSO can be used as a basis for measuring the costs of producing official statistics in a standardised way allowing comparison between NSOs, and also as a tool for resource planning. It can help assess the readiness of organisations to implement different aspects of modernisation, in the context of a proposed “Modernisation Maturity Model” allowing NSOs to evaluate their levels of maturity against a standard framework, and to help them determine the priorities for the next steps based on a roadmap.
The GAMSO activities specifically concerned with IT management cover coordination and management of information and technology resources and solutions. They include the management of the physical security of data and shared infrastructures:
Links to guidelines, best practices and examples:
The Generic Statistical Business Process Model (GSBPM) is a statistical model that provides a standard terminology for describing the different steps involved in the production of official statistics. GSBPM can be considered the "Production" part of GAMSO. Since its launch in 2009 it has become widely adopted in NSOs and other statistical organisation. GSBPM allows an NSO to define, describe and map statistical processes in a coherent way, thereby making it easier to share expertise. GSBPM is part of a wider trend towards a process-oriented approach rather than one focused on a particular subject-matter topic. GSBPM is applicable to all activities undertaken by statistical organisations which lead to a statistical output. It accommodates data sources such as administrative data, register-based statistics and also Census and mixed sources.
GSBPM standardises process terminology. This allows an NSO to compare and benchmark processes within and between organisations. It can help identify synergies between processes in order to make informed decisions on systems architectures and organisation of resources. GSBPM is not a linear model – instead it should be seen as a matrix through which there are many possible paths, including iterative loops within and between processes and sub-processes.
GSBPM main processes and sub-processes
GSBPM covers the processes that cover specifying needs, survey design, building products, data collection, data processing, analysis, dissemination and evaluation. Within each process there are a number of sub-processes.
Using GSBPM in a statistical organisation
GSBPM contributes to a common vocabulary among statistical organisations - having a standard terminology makes it much easier to communicate on collaboration projects and its methodology allow for the re-use of concepts and definitions throughout the life cycle of statistical projects. It can be used as a reference in planning, mapping, documentation and self-assessment of capacity needs.
GSBPM plays an important role in the modernising of the statistical system, especially concerning the statistical project cycle and can accommodate emerging issues in data collection like the introduction of mobile data collection and Big Data.
Links to guidelines, best practices and examples:
The Generic Statistical Information Model (GSIM) standard was launched in 2012 and describes the information objects and flows within statistical business process. GSIM is complementary to GSBPM and the framework enables descriptions of the definition, management and use of data and metadata throughout the statistical information process.
GSIM information objects are grouped into four broad categories: Business; Production; Structures; and Concepts. It provides a set of standardized information objects, inputs and outputs in the design and production of statistics, regardless of subject matter. By using GSIM, NSOs are able to analyse how their business could be more efficiently organised.
As with the other standards, GSIM helps improve communication by providing a common vocabulary for conversations between different business and IT roles, between different business subject matter domains and between NSOs at national and international levels. This common vocabulary contributes towards the creation of an environment for reuse and sharing of methods, components and processes and the development of common tools. GSIM also allows NSOs to understand and map common statistical information and processes and the roles and relationships between other standards such as SDMX and DDI.
Links to guidelines, best practices and examples:
The Statistical Data and Metadata Exchange (SDMX) standard for statistical data and metadata access and exchange was established in 2000 under the sponsorship of seven international organisations (IMF, World Bank, UNSD, Eurostat, BIS, ECB & OECD).
The importance of a standard for statistical data exchange is well known and cannot be underestimated. The labour-intensive nature of data collection and dissemination mapping to different formats is a problem well known to NSOs and in the context of the timely transmission of SDG indicators it has become even more vital.
SDMX is a standard for both content and technology that standardises statistical data and metadata content and structure. SDMX facilitates data exchange between NSOs and international organisations – and also within a national statistical system. SDMX aims to reduce reporting burden for data providers and provide faster and more reliable data and metadata sharing. Using SDMX facilitates the standardisation of IT applications and infrastructure and can improve harmonisation of statistical business processes. There is much reusable software available to implement SDMX in an NSO which can reduce development & maintenance costs with shared technology and know-how.
SDMX ensures data quality as it incorporates data validation into its data structures and validation rules as well as with the many tools made freely available with the standard as part of its open source approach. SDMX is an ISO standard and has been adopted by the UNSC as the preferred standard for data exchange.
The SDMX sponsors continue to improve the SDMX standard and have identified four main priority areas for their "roadmap 2020": a) strengthening the implementation of SDMX, b) making data usage easier, especially for policy use, c) modernising statistical processes, improving the SDMX standard and the IT infrastructure and d) improving communication
Consequently the new version of SDMX (version 3) is going to be finalized in the near future. The new versions is going to backward compatible with the current 2.1 version in order to provide for easy upgrade of various SDMX artifacts. Further it is going to address some semantic and technical issues including better metadata integration.
The new addition to SDMX is Validation and Transformation Language (VTL) which is a standard language for defining validation and transformation rules (set of operators, their syntax and semantics) for any kind of statistical data. The VTL builds on the generic framework for defining mathematical expressions existing in the SDMX information model but the intention is to provide a language that is usable with other standards as well as for expressing logical validation rules and transformations on data.
The present SDMX software products, packages and components are mostly designed for traditional data centers and server based environments. The transition to the native serverless cloud environments is going to require some additional work to adjust present SDMX software to this new environment.
SDMX and SDGs
A specific SDG indicator data structure (Data Set Definition) has been defined which will be used to report and disseminate the indicators at national and international levels. SDMX compliance has been built into a number of internationally used dissemination platforms such as the African Information Highway, the IMF web service and the OECD.Stat platform to ensure efficient transmission of SDG Indicator data and metadata.
Links to guidelines, best practices and examples:
The Data Documentation Initiative (DDI) is an international standard for describing metadata from surveys, questionnaires, statistical data files, and social sciences study-level information. DDI focuses on microdata and tabulation of aggregates/indicators.
The DDI specification provides a format for content, exchange, and preservation of questionnaire and data file information. It fills a need related to the challenge of storing and distributing social science metadata, creating an international standard for the design of metadata about a dataset.
DDI is a membership-based alliance of NSOs, International organisations, academia and research bodies.
In many NSOs the exact processing in the production of aggregate data products is not well documented. DDI can be used to describe processing of data in a detailed way to document each step of a process. In this way DDI can be used not just as documentation but can help use metadata to automate throughout the entire process, thus creating “metadata-driven” systems. In this way DDI can also act as the institutional memory of an NSO.
DDI is another standard that promotes greater process efficiency in the “industrialised” production of statistics. DDI can be used to facilitate data collection/microdata to support the GSBPM (SDMX for dissemination of aggregates). DDI can be also be used for facilitating microdata access as well as for register data.
Links to guidelines, best practices and examples:
Enterprise architecture (EA) is a conceptual blueprint that defines the structure and operation of an organisation whose role is to determine how the organisation can most effectively achieve its current and future objectives.
EA maps the goals and priorities of an organisation to Information Technology that is fit to support those goals as far as it can by managing information and delivering it accurately and in time where and when it is needed, and in a way that is cost effective for the business. It seeks to guide the process of planning and designing the IT capabilities of an organisation in order to guide them through the business, information, process, and technology changes necessary to meet desired organisational objectives.
EA helps enforce discipline and standardisation of business processes, and enable process consolidation, reuse, and integration.
EA is basically designed for the whole system of systems across the "enterprise" - and like any design it has to start from the business requirements and specify the best fit IT solutions. The IT part of EA is split into systems and data and finally infrastructure such as servers and networks.
There is an emerging trend of organisations using storage repositories that hold vast amounts of raw data in native format (‘Data Lakes’) from disparate sources. These data lakes are used to respond to business questions by linking and querying relevant data and thus require a new type of EA to manage such linked datasets.
Benefits to the NSO of a well-designed EA include achieving better business performance as per the business goals, reducing investment risk in IT. EA also facilitates a more agile enterprise by making the IT architecture more flexible to transform business models. Agile Architecture practices support the evolution of the design and architecture of a system while implementing new system capabilities thus allowing the architecture of a system to evolve incrementally over time while simultaneously supporting the needs of current users.
Links to guidelines, best practices and examples:
The Common Statistical Production Architecture (CSPA) is a framework for developing statistical processing components that are reusable across projects, platforms and organisations - it is often referred to as ‘plug and play’. CSPA has been developed in recent years by the international statistical community under the auspices of the High-Level Group for the Modernisation of Official Statistics (HLG-MOS).
CSPA is an enabler of collaboration and modernisation and has potentially enormous advantages for NSOs of all capacity levels. It aims to align the enterprise architectures of different organisations to create an “industry architecture” for the whole “official statistics industry”. CSPA provides guidance for building software services that can be shared and reused within and across statistical organisations and enables international collaboration initiatives for the development of common infrastructures and services. In addition, it encourages alignment with other statistical industry standards such as the Generic Statistical Business Process Model (GSBPM) and the Generic Statistical Information Model (GSIM).
CSPA components focus on statistical production processes as defined by GSBPM and are based on the Service Oriented Architecture (SOA) approach wherein the components (Services) are self-contained and can be reused by a number of business processes either within or across statistical organisations without imposing a technology environment in terms of specific software platforms.
Advantages of CSPA for an NSO
There are great potential advantages for NSOs in using CSPA. Its goal is eventually to have components covering all processes and sub-processes covered by GSBPM. This is important for countries of all capacity levels but is especially useful for countries of low capacity with weak infrastructure as it would in theory allow any country to take advantage of developments by the international statistics community and assemble a complete statistical information system component by component according to their specific needs.
See also section 1.5.6 for the CSPA software inventory.
CSPA is an emerging standard for sharing statistical components that has been introduced in section 1.xx.xx.
For an NSO there are the obvious advantages of saving time and money by re-using or adapting an existing technical solution developed by another organisation to meet their own processing requirements.
Using standardised components will promote and enable the use of statistical standards, both in structure and content thus facilitating the exchange of data and metadata between organisations. It also serves to further strengthen ties and relationships among the statistical community.
Software available via CSPA inventory
Links to guidelines, best practices and examples:
Statistical organisations have to deal with many different external data sources. From (traditionally) primary data collection, via secondary data collection, to (more recently) Big Data. Each of these data sources has its own set of characteristics in terms of relationship, technical details and semantic content. At the same time the demand is changing, where besides creating output as "end products", statistical organisations create output together with other institutes.
In 2017 and 2018, the High-Level Group for the Modernisation of Official Statistics recognised that official statistics organisations were challenged by the capacities needed to incorporate new data sources in their statistical production processes. As a response HLG Data Architecture project developed a framework, based on Principles, Capabilities, Building Blocks and Guidelines, and tested it on use-cases involving traditional and new data sources.
Capabilities are abilities, typically expressed in general and high-level terms that an organisation needs or possesses. Capabilities typically require a combination of organisation, people, processes, and technology. Building Blocks represent (potentially re-usable) components of business, IT, or architectural capability that can be combined with other Building Blocks to deliver architectures and solutions. Guidelines provide Maturity model, recommendations and templates that help statistical organizations establish their data architecture and assess gaps in data capabilities.
The purpose and use of the Common Statistical Data Architecture as a reference architecture is to act as a template for statistical organisations in the development of their own Enterprise Data Architectures. In turn, this will guide Solution Architects and Builders in the development of systems that will support users in doing their jobs (that is, the production of statistical products.
The CSDA shows the organisations how to organise and structure their processes and systems for efficient and effective management of data and metadata, from the external sources through the internal storage and processing up to the dissemination of the statistical end-products. In particular, in order to help organisations modernise themselves, it shows how to deal with the newer types of data sources such as Big Data, Scanner data, Web Scraping, etc.
The CSDA supports statistical organisations in the design, integration, production and dissemination of official statistics based on both traditional and new types of data sources.
Links to guidelines, best practices and examples:
To do - Importance of interoperability
Foundational interoperability enables one information system to exchange data with another information system. The system receiving this information does not need to have to interpret the data. It will be instantly available for use.
At an intermediate level, structural interoperability defines the format of the data exchange. This has to do with standards that govern the format of messages being sent from one system to another, so that the operational or clinical purpose of the information is evident and passes through without alteration. We are talking about information at the level of data fields, as in a database.
Semantic interoperability is the highest level of connection. Two or more different systems or parts of systems can exchange and use information readily. Here, the very structure of the exchange of data and how the data itself is codified lets data providers share data even when using completely different software solutions from different vendors.
Why Is Interoperability Important?
It’s useful to think of interoperability as a philosophy instead of just a “standards-based interaction between computer systems,”. So on the technical side of things, interoperability helps reduce the time it takes to have useful data exchange between providers and data consumers.
The EIF framework gives specific guidance on how to set up interoperable digital public services. It offers public administrations 47 concrete recommendations on how to improve governance of their interoperability activities, establish cross-organisational relationships, streamline processes supporting end-to-end digital services, and ensure that both existing and new legislation do not compromise interoperability efforts.
EIF content and structure includes a set of principles intended to establish general behaviours on interoperability, a layered interoperability model and a conceptual model for interoperable public services. The model is aligned with the interoperability principles and promotes the idea of ‘interoperability by design’ as a standard approach for the design and operation of European public services.
EIF is therefore applicable for any inter-organizational collaboration including statistical data collaboratives, networks and platforms. Framework provides an overview of three aspects of interoperability that need to be considered for frictionless servicer delivery and data exchange:
Links to guidelines, best practices and examples:
A wide range of statistical processing and analysis software has been used within NSO’s historically, and options continue to grow with significant growth in open-source based solutions as well as important advances in commercial offerings. NSO’s have created rich statistical functions using these desktop and server-based solutions to perform common statistical processing functions such as edit and imputation, auto-coding, estimation, tabulation, and record linkage. The NSO community has shared these functions amongst each other over the years, first as licensed functions, and more recently as “free license” or open-source functions.
The growth in data scientist and data engineering roles within NSO’s has seen a shift to more powerful processing platforms as well as increased adoption of powerful open-source solutions for artificial intelligence and machine learning. These emerging communities are accessing extensive open-source libraries, modules, and packages via distribution mechanisms such as CRAN (common R archive network), Anaconda, and others. The growing number of data science communities and platforms are driving the number of functions and solutions available in the public domain.
Open-source processing, analysis, and data science tools currently in use include
As a general rule, open-source tools are frequently available with an enterprise option that provides support services via a commercial company.
On the commercial front, many NSO’s continue to use solutions including
There are a number of considerations to consider in choosing a future direction in processing and analysis tools. The current cohort of data scientists and statisticians arriving from universities and institutions increasingly are trained in open-source tools based on R and Python, meaning that there is a demographic shift in tools used. With the retirement of more senior staff, this means that NSO’s will see a re-orientation in toolsets and technology.
There is important growth in NSO engagement with the external research community beyond traditional statistical domains, with access to fresh or alternative sources of processing and analytic components and toolsets. This is driving a shift to the supporting tools and technologies these external components are based on, with a strong bias to R and Python.
Economic factors continue to drive change in NSO’s. There is a long-standing desire to increase the level of sharing of components and functions amongst NSO’s through activities with HLG and others. Access to proven functions via reuse is an important alternative to custom-developed functions, and the open-source community is providing new sources of these functions, especially in the data science area. The increasing use of open-source solutions and increasing levels experience with them is causing important cost / benefit analysis discussions within NSO’s as they look for efficient and effective operations. This is particularly important where processing and analysis use is focused on core features in commercial tools with equivalent functions in open-source tools, with little to no use of advanced features in commercial toolsets. Paying a premium for access to equivalent features is an indication that a robust review of benefit is required.
These considerations are especially true when looking for high-performance processing. Highly engineered commercial solutions are more expensive and complex when compared with solutions based on Spark and Hadoop, which are designed for operation in utility infrastructure. Encapsulated or containerized environments for analytic methods are a recent addition to the landscape, with Algorithmia providing fully encapsulated and instrumented auto-scalable methods and Data Bricks providing encapsulated and managed Spark processing.
As can be seen in the following Wardley map, core processing and analysis tools exist in the “commodity / utility” space - NSO IT strategies should look to profit from this.
Insert tools Wardley map here
In the area of statistical functions for processing and analysis, most NSO’s are using legacy solutions or solutions acquired from other NSO’s. Many of these are large, monolithic functions that perform a number of different services and are candidates for refactoring, partitioning, or potential replacement so as to benefit from cloud, microservice, and serverless approaches to computation. From a Wardley map perspective these functions are in the Custom area, with on-going resource commitments to maintain them. NSO’s creating new environments should look to select open-source functions at granularities aligned to their architectural strategies - most of these functions use algorithms that have not been modified for a considerably long time. A Warldey map illustrating this follows.
Insert methods / functions Wardley Map here
Links to guidelines, best practices and examples:
Case Study - UN Global Platform
Some words on the Methods service, Data service, and the use of Algorithmia go here
Dissemination of data and outputs is an important part of NSO operations, and one that has seen significant transformation since the last release of the Handbook. While paper and alternative outputs continue to be published the mainstream uses web-based dissemination techniques for content and data distribution. Together these services are an important part of the public face of NSO’s, with careful attention to brand, quality, accessibility, and related concerns. It is also a key component of an NSO’s strategy and market positioning with its stakeholders. Technology-driven evolution of stakeholder needs as well as opportunities for new means of engagement and production are a priority consideration in a digital or IT strategy.
Current approaches utilize web hosting and content creation platforms to create static content and manage it through an “authoring to publishing” value chain. Visual graphics are published as static charts and graphs, but may be dynamically modifiable via sliders, parameters, or other UI elements to either drill down into lower level detail, modify graphic types, or change parameters. An important enhancement integrates geography-based interaction allowing exploration and browsing of outputs via map-related controls and platforms. Public use datasets are available through a variety of means, allowing users to acquire extracts of data for further use in their own environments. Users typically access these datasets via a download facility, receiving a CSV or Excel (or other) formatted data set for further use - associated metadata that describes record formats is a critical component for use.
A variety of social media techniques are currently in use, which may include Facebook, syndication, Twitter, LinkedIn, (chinese platforms), reddit, and other social media challenges. Typical these are event-driven approaches, highlighting new releases and reports, indices, thematic information linked to current events, and more.
Since NSO’s are an important source of high-quality objective information, there are close relationships with media organizations through a variety of print, television, and other media. NSO’s may create more advanced access channels that provide near-broadcast-ready content as a way to remove friction between NSO’s and media agencies. A good example is CBS (Netherlands) with its approach that uses on-site production facilities coupled with media-cycle timelines (as an alternative to the common practice of having a single release time - such as 8:30 am - for publishing results). Publishing information with a timeline that reflects stakeholder needs (such as media broadcast times) can be an important way to increase reach, engagement, and relevance.
An important component of information publishing is the delivery of all relevant metadata, standards, and other supporting information that are essential to ensuring interpretability and use of outputs by stakeholders and users. Stakeholders access metadata via web-page content or via UI elements supporting discovery activities in the metadata.
There are a number of challenges with current approaches. The use of data usually requires some degree of data post-processing on the part of users to integrate data with other external data for the purposes of creating reports, digital dashboards, and further analysis. For example, users may download data sets, put them into a Postgres database, rearrange the data, and use a tool such as Tableau to create digital dashboards. This requires extra time and resources from the user, which may represent needless friction. A lack of standardization based on generally available open standards is part of the issue - the use of CSV for download formats is a very low-level generic structural approach to transfer that conveys no meaningful higher level structural or semantic value.
Fixed viewing approaches based on static or “fixed dynamic” techniques limit the options available for users. If they wish to create different visualizations this lack of flexibility forces them to extract data, ingest it into their own BI or visualization environments, and create their own visualizations. Updates of data may require rework. This represents friction for those wishing to create dynamic dashboards to support ministerial or policy analysis reporting.
API’s and interoperability frameworks are increasingly becoming alternative access methods of choice for stakeholders and users. The ability to make machine-to-machine connections is an alternative means of addressing the data download component of the aforementioned example of digital dashboards. Open and transparent government initiatives (opendata) are increasingly focusing on providing frictionless access to publicly available data via programmatic interfaces (as “web” services) and NSO’s need to ensure that they are well established in these spaces.
Most if not all web-based dissemination platforms make use of open-source or related search capabilities to enable user-specific retrieval of information and data. Currently most techniques use a combination of indexing and relevancy ranking to provide a (potentially long) list of results to users, who must then browse to find the desired result. Advanced techniques are available that provide enhanced metadata searching and domain-specific ontology (vocabulary and knowledge) techniques and NSO’s should investigate how they can take advantage of emerging capabilities (for example, see http://smartcity.linkeddata.es, https://spec.edmcouncil.org/static/ontology) for areas such as smart cities and financial and banking domains. “Knowledge layers” are embedded in common search tools such as Google and Bing, which may provide a source of inspiration and “knowledge technology as a service” capabilities.
In the future we expect an evolution of dissemination technologies. Enhanced user experiences and a desire for self-serve reporting and analysis may see dissemination platforms incorporate rich open-source or commercial BI solutions (e.g. Tableau, PowerBI) that support flexible analysis and visualization hosted by the dissemination platforms. The importance of geospatial-based activities is reflected in platform product evolution by companies such as ESRI, and by incorporation of geospatial reference features in open-source publishing (e.g. Drupal and CKAN) and analysis platforms.
We will see a significant increase in the use of interoperable API’s based on open-standards, allowing external platforms to access information, data, and metadata via their own solutions without human intervention. This also provides an easy path to increase NSO relevance by ensuring that they are “built-in” to the broader set of public and private solutions. Users will range from open-source and open-data communities, co-publishing, through to demanding real-time trading information scrapers. A well stocked API store also supports the creation of myriad “apps” on popular platforms such as Android and iOS without NSO’s being responsible for the creation and support.
NSO’s should continue to explore opportunities to streamline and enhance tools and techniques to create powerful narratives based on output data - effective and efficient creation of stories, info-graphics, and other multi-media friendly outputs are essential to ensure relevance and engagement. These are often seen as separate and disjoint from mainstream statistical production, leading to gaps, friction, and ad hoc approaches.
An important technology-driven advancement that is occuring now is the adoption of “chatbots”, conversational voice assistants, and other manifestations of AI-driven features that are an important part of the evolution of dissemination and external-facing user experiences. Google Assistant, Amazon Alexa, Microsoft Cortana, and Apple Siri are all examples of these voice-driven capabilities that marry voice and natural language processing capabilities that connect with platform API’s to deliver enhanced, frictionless services.
It is important to address the current and emerging means by which NSO’s are providing access to “disaggregated data”, either public use microdata files (PUMF’s), or controlled access to “semi-sensitive” (de-identified) data. The latter typically make use of physically controlled and operated spaces to which approved researchers go to perform analysis with select data sets. In many NSO’s this is seen as a separate function from dissemination functions and activities, with alternative physical and technology infrastructures. At a business level NSO’s should reflect on whether this still is a useful approach, given the move to self-serve and more interactive dissemination activities. An alternative view sees a “data access continuum”, with standard public-use published data exploration and visualization, advanced access to analysis and visualization of public use microdata, and finally a private engagement with strictly controlled access to data spaces and workbenches using virtual data lab techniques. The ability to execute on this ability (either as an integrated experience or through separate pathways) is closely tied to privacy enhancing techniques to address security, privacy, and confidentiality concerns and requirements. The UN Global Platform has released a handbook on Privacy Enhancing Techniques addressing input, computational, and output privacy via a range of techniques and emerging technologies (see https://marketplace.officialstatistics.org/privacy-preserving-techniques-handbook for further details).
To summarize, future directions in dissemination continue to lead to multi-media integration with powerful narrative and infographics capabilities, self-serve user experiences that allow users to create custom views and visualizations hosted by the platform, standards-based and API supported data acquisition and access by analysts for use on their own platforms, media-friendly near-production quality outputs, an increased role for geospatial-directed interactions, the use of API-driven integration with external data platforms and communities, and strong analytics capabilities addressing traffic and usage statistics leveraging powerful machine-learning techniques to act as a semi-automated “flywheel” to improve dissemination. Voice and natural language processing approaches with AI-driven reasoning and interaction are important emerging solutions that will augment the user experience.
UN Global Platform - new approaches to dissemination
The UN Global Platform is using a novel approach to establishing a frictionless, engaging starting point for user and stakeholder access. The UN Global Platform “marketplace” uses a powerful “shopping-experience” platform approach based on the open-source platform Magento (see http://marketplace.officialstatistics.org). This unique approach acts as a launchpad to access data, collaboratives, methods, services, learnings, and more. Its integrated analytics capability provides a means to understand and enhance marketplace offerings based on user activity and feedback. An example (alpha) of the marketplace is shown below.
Links to guidelines, best practices and examples:
Example of implementation in Uganda: http://uganda.countrystat.org/
Anti-patterns can be defined as “common response to a recurring problem that is usually ineffective and risks being highly counterproductive”. The term was coined in 1995 by Andrew Koenig.
Organisations that fail to focus on the following topics will usually be unable to deliver or deliver products and services that fail to meet user needs.
Fails to focus on user needs
Has difficulty in understanding on who their users are and unable to explain the users needs.
Fails to use a common language
Uses multiple different ways of describing the same problem space e.g. box and wire diagrams, business process diagrams and stories. Often suffers from confusion and misalignment.
Fails to be transparent.
Has difficulty in answering basic questions such as “How many data projects are we building?” Information tends to be guarded in silos.
Fails to challenge assumption
Action is often taken based upon memes or Hippo (highest paid person’s opinion) or popular articles in the HBR (Harvard Business Review).
Fails to remove duplication and bias
The scale of duplication is excessive and exceeds in practice what people expect. Any investigation will discover groups custom building what exists at a commodity in the outside world. Often resistance is given to changing this because it is somehow unique despite any inability of the group to explain user needs.
Fails to use appropriate methods.
Tends towards single size methods across the organisation e.g. “outsource all of IT” or “use Agile everywhere”.
Fails to think small
Tends toward large scale efforts (e.g. Deathstar projects) and big departments. This can include frequent major platform re-engineering efforts or major re-organisations.
Fails to design for constant evolution
Tends to bolt on new organisational structures as new technology fields are adopted, e.g. a cloud department, a digital department, a big data group etc. Rather than embedding and embedding evolution within the existing organisation.
Fails to enable purpose, mastery and autonomy
There is often confusion within the organisation over its purpose combined with feelings of lacking control and inability to influence.
Fails to understand basic economic patterns
Often conducts efficiency or innovation programmes without realising the connection between the two. Assumes it has choice on change (e.g. cloud) where none exists. Fails to recognise and cope with its own inertia caused by past success.
Fails to understand context specific play
Has no existing language that enables it to understand context specific play. Often uses terms as memes e.g. open source, ecosystem, innovation but with no clear understanding of where they are appropriate.
Fails to understand the landscape
Tends to not fully grasp the components and complexity within its own organisation. Often cannot describe its own basic capabilities.
Fails to understand strategy
Tends to be dominated by statements that strategy is all about the why but cannot distinguish between the why of purpose and the why of movement. Has little discussion on position and movement combined with an inability to describe where it should attack or even the importance of understanding where before why. Often strategy is little more than a tyranny of action statements based upon meme copying and external advice.
Context Our purpose and the landscape
Environment The context and how it is changing
Situational awareness Our level of understanding of the environment
Actual The map in use
Domain Uncharted vs Transitional vs Industrialised
Stage Of evolution e.g. Genesis, Custom, Product, Commodity
Type Activity, Practice, Data or Knowledge
Component A single entity in a map
Anchor The user need
Position Position of a component relative to the anchor in a chain of needs
Need Something a higher level system requires
Capability High level needs you provide to others
Movement How evolved a component is
Interface Connection between components
Flow Transfer of money, risk & information between components
Climate Rules of the game, patterns that are applied across contexts
Doctrine Approaches which can be applied regardless of context
Strategy A context specific approach
Smart Data related to IoT data that a decision can be made based on it before
being sent downstream to be processed.
Co-evolution of Practice and Activity
This co-evolution of practice and activity can create significant inertia to change for consumers of that activity. In the case of infrastructure, if the consumers of large powerful servers had developed estates of applications based upon the practices of scale-up and N+1, then as the activity evolved to more utility services those consumers would incur significant costs of re-architecture of the “legacy estate” to the new world.
Our current way of operating often creates resistance (or inertia) to change due to the costs of changing practices (see figure above). In many cases we attempt to circumnavigate this by applying the “old” best practice to the new world or we attempt to persuade the new world to act more like the past. Today, cloud computing is an example of this as it represents an evolution of parts of IT from product to utility services and the “legacy” is often cited as a key issue for adoption or for the creation of services which mimic past models.
Figure x - Stages of Evolution
 Breur, Tom (July 2016). "Statistical Power Analysis and the contemporary "crisis" in social sciences". Journal of Marketing Analytics. 4 (2–3): 61–65. doi:10.1057/s41270-016-0001-3. ISSN 2050-3318.
 Laney, Doug (2001). "3D data management: Controlling data volume, velocity and variety". META Group Research Note. 6 (70).
 Goes, Paulo B. (2014). "Design science research in top information systems journals". MIS Quarterly: Management Information Systems. 38 (1).
 Why the Fuss about ServerLess? by Simon Wardley, October 2018
 Why the Fuss about Serverless - Simon Wardley
 Other than what can be inferred solely from the function’s output.