Gregory Hill
School of Business Systems
Monash University
Information Systems practitioners must justify the costs associated with improving organisational information. In many organisations, IS proposals must be justified as investments: that is, with generic, measurable and value-focussed arguments.
As more resources are directed to supporting customer-facing organisational processes (CRM processes), the successful investment in information to support these processes will become increasingly important to organisations.
However, at present, there is not a suitable method for valuing the contribution that information makes to CRM processes. Within IS academia, the Information Quality (IQ) sub-discipline is instead concentrating on models and measures of quality, with subjective descriptions of benefits, which are not directly useable for the valuation task.
The Argument (Conclusion)
This is a study of information quality improvement in CRM processes, and is concerned with the causal link between measures of information quality and value for the purposes of investment. The thesis is that classifier performance measures are useful for predicting, valuing and tracking the improvement of information quality for CRM processes.
A CRM process can be thought of as a classification problem whereby a set of customers is to be partitioned for some task. Each customer is assigned to one partition, where he or she receives the same action as other customers in that partition. Actions have the effect of changing an individual customer’s value to the organisation, depending on that customer’s response. For example, a direct mail process might require partitioning a customer list into those who are to receive the offer, and those excluded. In this case, two kinds of mistakes are possible (failing to make an offer to receptive customers and making an offer to customers who decline) out of four outcomes. Both mistakes will have the effect of attracting different costs to the organisation, both directly (cost of mail-out) and indirectly (opportunity cost, reputation).
Assume that the cost of each outcome is fixed and agreed by the business (this is called a “pay-off matrix” in the decision sciences) but there is uncertainty about how a customer will respond (Piatetsky-Shapiro et al. 1999, Chauchat et al. 2001). We can model the classifier’s performance (ie success) using generic techniques from statistics and machine learning. By weighting this model of classifier performance with the pay-offs (or rather, penalties) we can show how the value of the classifier changes as a function of its performance.
It is argued that any improvement to Information Quality – for the purposes of the task at hand – should be measured by its effect on the classifier performance. Similarly, the value of such an improvement can be determined by using the value-weighted model. Hence, competing proposals to improve particular attributes (such as postcode data vs date of birth) or particular quality goals (perhaps timeliness vs accuracy) can be compared as investments.
In order to value an IQ proposal as an investment, it must be described using financial measures such as Return On Investment (ROI) or Net Present Value (NPV). These can be calculated using a standard discounted cash flow approach, where the cash flow arises from each use of the classifier across multiple process instances. For example, improvement to a single attribute-set may result in improved classification for three direct mailings and a de-duplication project.
The “de-coupling” of classifier performance and the value of the outcomes has been advocated in the statistical and machine learning literature (Ming 2002). This is to allow comparison of different classifiers in the same task, and prediction of classifier performance in context in advance of its deployment (Piatetsky-Shapiro et al. 1999). To that end, this discipline has formulated models and measures of performance that can be adapted for predicting and describing classifier performance in CRM processes.
Hence, these performance measures should form the basis of organisational investment decisions and contractual agreements for information improvement, as they are generic, measurable and value-focussed.
Definitions
Following Meltzer (2002), a CRM process is an organisational process for managing customers. He identifies six basic functions:
Cross-sell: selling a customer additional products/services.
Up-sell: selling a customer higher-value products/services.
Retain: keeping desirable customers (and divesting undesirable ones).
Acquire: attracting (only) desirable customers
Re-activate: acquiring lapsed but desirable customers.
Experience: managing the customer experience at all contact points
The first five processes can be seen to be processes for targeting customers for differentiated action on the basis of customer information. The customer information systems plus decision-making processes has the effect of classifying customers. In order to estimate the true costs of the mistakes each classifier can make, a business determination will be required. This can be made by judgement of a suitable business representative.
Information Quality is a research area that seeks to apply modern quality management theories and practices to organisational data. This involves building and applying conceptual frameworks and operational measures for understanding the causes and effects of Information Quality problems. A number of proposals have been made in this area, for example Wand and Wang have an ontologically-based framework consisting of four intrinsic dimensions: complete, unambiguous, meaningful and correct (Wand and Wang, 1996). Recently, an alternative has been proposed through adopting ideas from the field of semiotics, or semiology (Shanks and Darke, 1998). Under this framework, information quality goals are grouped into four abstract levels that build upon each other:
Syntactics: concerned with form.
Semantics: concerned with meaning.
Pragmatics: concerned with use.
These frameworks are very general, and are intended to apply to all types and uses of organisational data. They do not address directly concepts of rule quality (Dean et al. 1996). In contrast, this proposal is concerned with developing a set of measures that relates the effects of initiatives to the classification of customers within organisational processes.
Motivation
The researcher’s personal experience of many CRM processes suggests that poor information quality is impacting upon customers in a negative way and lowering satisfaction levels. In addition, this researcher’s practice in Business Intelligence and Management Reporting suggests that the “investment paradigm” – though the dominant one in the corporate world - is an imperfect filter for information-related projects. That is, the inability to articulate and measure value is potentially hampering investment in information improvement projects, resulting in economically inefficient investment and lower levels of customer satisfaction.
The Evidence
There has been considerable academic interest in CRM strategies, applications and processes, with some 600 papers published in recent years (Romano 2001). While quality data (or information) about customers is identified as key to the success of CRM initiatives it is not clear exactly how one should value this. Indeed, even the real costs of poor customer data are difficult to gauge due to the complexities of tracing causes through to effects. This is part of the much larger “data quality” problem. At the large scale, The DataWarehousing Institute estimated that – broadly defined - poor data quality costs the US economy over $US600 billion per annum (TDWI, 2002).
Most academic research in Information Quality involves the use of measures. The different approaches can be broken down into subjective and objective measures. Examples of the former include Likert scale measures gathered from different stakeholders and subjective value assessments. Objective measures may result from audits of database records or from process performance evaluation. However, most methods employ a combination of objective and subjective measures.
A number of general methods for objective IQ measures have been proposed (Ballou and Pazer 1985, Kahn et al 2002, Ballou et al 1998, Lee et al 2002). These approaches all use a group of metrics or measurements to assess the quality of information along a number of different dimensions. While these approaches are useful for understanding quality in existing systems, they do not explicitly address the organisational value of proposed IQ treatments. This is important because practitioners report that in many organisations IQ activities must compete for resources based upon an investment criterion (eg. Return on Investment or Net Present Value).
The IQ literature has identified this organisational need. For example, Ballou and Tayi (1989) prescribed a method for periodic allocation of resources to a class of IQ proposals (maintenance of data assets). It assumes a budgetary approach (that is, a fixed budget for IQ to be shared among a set of proposals), rather than an investment approach (evaluation of proposals based upon expected value returned). It further assumes that the data managers have sought and won the largest budget they can justify to their organisation. Based upon statistical sampling, a parameter estimation heuristic and an iterative integer program, the method arrives at an optimal dispersal of resources across proposals.
The method requires data analysts to understand the appropriate level of data granularity (fields, attributes, records) for the analysis and the expected costs of errors in these data sets. In general, the problem of estimating the costs of IQ defects is extremely complex. Earlier work by Ballou and Pazer (1985) employs the differential calculus to estimate transformation functions that describe the impact of IQ defects on “down-stream” decision-making. This functional approach is combined with a Data Flow Diagram method in Ballou et al (1998). Gathering information on the parameters required for this method is likely to be an enormous undertaking.
Other objective approaches include can be found in Kaomea (1994), who applied a decision-theoretic analysis involving probabilities and pay-offs to argue for a method of valuing data content in context. A methodology for developing IQ metrics known as InfoQual has been proposed (Dvir et al. 1996), while the Data Quality Engineering Framework has a similar objective (Willshire et al. 1997). These efforts focus on measuring properties of data (possibly complementing subjective user ratings), rather than process outcomes (actual use). Also, the very general nature of the situations these proposals address means they offer little support for the valuing task
When making objective measures, the issue of appropriate units arises. Whether subjective or objective, nearly all IQ assessments employ unitless measures (Likert scales and percentages respectively) of dimensions such as completeness, accuracy, (Motro and Rakov 1996, Shanks and Darke 1998), currency (Ballou et al. 1998) and integrity (Motro and Rakov 1996). It is not obvious that when one claims, say, 50% completeness for a database whether this means that all customers have 50% of their attributes present. Similarly, it does not indicate how much the missing 50% impacts the organisation. This may not be a problem if all attributes and entities carry equal importance; however, this is unlikely to be the case.
While most IQ approaches make some use of objective measures of information quality dimensions (for example Shanks and Darke 1998; Ballou et al 1998; Motro and Rakov 1996) most do not address directly the economic aspects. For example, Wang (1996) and later Strong et al (1997) consider “Value-added” to be an attribute of the “Contextual” dimension, but provides little detail about how exactly this is defined and measured, or its role in organisational decision-making. Some authors frame time, cost and quality as different dimensions for design trade-offs (Ballou et al 1998).
The Information Quality academic discipline also develops conceptual frameworks with subjective measures, for example the AIMQ methodology developed with MIT’s TDQM program (Lee et all 1999). Here, the focus is “fitness for purpose”, either as product specifications or consumer expectations (Kahn et al 2002). This latter view involves measurement of the subjective experiences of information consumers, typically by a Likert Scale. Naumann and Rolker (2000) provide an overview of different assessment methods. These ratings undergo transformations using weighting, sums and differences to derive metrics that allow comparison of quality levels over time (Parssian et al 1999 and Pipino et al 2001). In addition to the incommensurability with competing uses of organisational resources, these approaches are limited for planning purposes by the difficulties in forecasting consumers’ expected satisfaction after implementing an IQ treatment.
Like most IQ proposals, this one uses a combination of objective and subjective measures. The objective measures are derived from information theory. These measures relate to entropy, or the reduction of uncertainty, first proposed by Shannon (1948). Kononeko and Bratko (1991) show how this theory can be used to develop theoretically-sound and robust measures of classifier performance: the “average information score” and “relative information score” measure how much uncertainty is reduced by a classifier. The subjective measures relate to the impact on the organisation of different outcomes. These subjective value assessments are elicited from appropriate business representatives and used to model the expected future impact of proposals.
A set of measures to describe the performance of the classifier in CRM processes must be sufficiently generic to characterise a wide range of CRM processes in general, and the different initiatives under examination. During planning, estimating classifier performance in advance of implementation is required, while measurements based on observable outcomes are used for review. In both cases, the classifier performance measures drive a value-based model to derive the financial outcomes.
Unbiased, practicable and theoretically sound objective measures of IQ linked to value would help researchers and practitioners understand the impact of IQ defects on processes that use the information. This in turn would help manage IQ activities, including selection of proposals and benchmarking of implementations.
Method
The research method follows the System Development approach, as recommended by Burstein and Gregor (1999), as it spans naturally the theory building and theory testing aspects of research. Here, the term “system” is used in its broadest sense, and is intended to mean a systematic method used by analysts to support decision-making.
The research consists of two broad phases: method development and method validation. Firstly, a conceptual study has been undertaken, with the goal of constructing and validating a suitable set of measures. This was done through a review of academic and practitioner literature, a series of semi-structured interviews with practitioners and the development of a model of information quality for CRM. This model is presented in Hill (2004).
Interview subjects were drawn from practice in CRM, marketing, data quality, information management, data mining and corporate processes. Opinion on the use of measures for decision-making purposes, scope of applicability and practicability were sought, in addition to anecdotes, maxims and references. The outcome of this phase is an understanding of the current practice regarding IQ investment and how quality measures are used to support investment. A key finding is that IQ initiatives must be justified as investments using financial measures such as Net Present Value and Return on Investment.
The second phase consists of conducting a series of simulations with real data sets to provide some supporting evidence to practitioners for the suitability of the measures. The simulations test hypotheses from the theoretical model of IQ investment (Hill, 2004). The simulations will be carried out as an after-the-fact analysis of industry projects pertaining to information improvement (ie data quality and data mining initiatives). The objective is to characterise the performance of the information improvement using the proposed set of measures, and relate this to the value attributed to the initiative. The analysis will be quantitative, and include expected (planning) and perceived (review) estimates of the quantities under investigation (classification performance, pay-offs and value).
Finally, these data will be presented to experienced practitioners in support of the thesis. Through a focus group assessment, the suitability, practicability, strengths, limitations and potential of using the measures for this purpose will be determined. The outcome of this phase is some factual evidence and expert opinion to support (or not) the claims of the thesis. These data will be used to help evaluate the method against the following criteria proposed by Burstein and Gregor (1999):
Significance
Is there theoretical significance?
Is there practical significance?
Internal Validity
Do the methods work?
Have rival methods been considered?
Has sufficient evidence been collected in evaluating the methods?
External Validity
Are the findings congruent with existing theory?
Can the findings be applied elsewhere?
Objectivity/Confirmability
Are the study’s methods described in detail?
Are the researchers explicit about personal assumptions, values and biases?
Reliability/Dependability/Auditability
Are the research questions clear?
Are basic constructs clearly specified?
Implications
The outcome of the research is a set of measures suitable for valuing classification performance resulting from information quality, within CRM processes. If the set is accepted by practitioners it can form the basis of organisational decision-making. For example, the set of measures can be used to inform business cases (for investment), Key Performance Indicators (for internal workgroups and staff) and Service Level Agreements (for suppliers and partners).
The set will also provide a mean of benchmarking the performance of CRM processes for comparison over time, within organisations or across industries. This should give managers a sense of where they are under- or over-performing, and what magnitude of impact improvements can be expected to make.
The likely beneficiaries of more efficient investment in information improvement are primarily customers, through increased levels of satisfaction. Secondarily, organisations will reduce the costs of mistakes (brand, attrition, rework) and uncertainty (hedging, risk premiums, morale), benefiting employees and owners. Finally, the productivity of the economy will be improved through better resource allocation, benefiting society as a whole.
References
Ballou, D. and Pazer, H. (1985) Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems, Management Science, 31:2, 150-162.
Ballou, D. and Tayi, G. (1989) Methodology for Allocating Resources for Data Quality Enhancement, Communications of the ACM, 32:3, 320-331.
Ballou, D., Wang, R., Pazer, H. and Tayi, G. (1998) Modeling Information Manufacturing Systems to Determine Information Product Quality, Management Science, 44:4, 462-484.
Burstein, F. and Gregor, S. (1999) "The Systems Development or Engineering Approach to Research in Information Systems: An Action Research Perspective", Proceedings of the Australasian Conference on Information Systems, Victoria University, Wellington, NZ.
Chauchat, J-H., Rakotomalala, R., Carloz M., Pelletier C. (2001) Targeting Customer Groups using Gain and Cost Matrix : a Marketing Application, in Data Mining for Marketing Applications (Working Notes), PKDD'2001, pp. 1-13, September 2001.
Dean P., Famili, A.. Comparative Performance of Rule Quality Measures in an Induction System. Applied Intelligence Journal. 1996, ftp://ai.iit.nrc.ca/pub/iit-papers/NRC-39188.pdf
Dvir R., Evans, S. (1996). "A TQM approach to the improvement of Information Quality", in the proceedings of the 1996 conference on Information Quality, MIT, http://web.mit.edu/tdqm/papers/other/evans.html
Hill G., (2004) “An Information-Theoretic Model of Customer Information Quality”, forthcoming in the proceedings of DSS-2004, July 2004.
Kahn, B., Strong, D. and Wang, R. (2002) Information Quality Benchmarks: Product and Service Performance, Communications of the ACM 45:4, 184-192.
Kaomea, P. (1994), "Valuation of Data Quality: A Decision Analysis Approach," Massachusetts Institute of Technology (MIT) Sloan School of Management, Cambridge, MA, TDQM-94-09, http://web.mit.edu/tdqm/www/papers/94/94-08.html
Kononeko, I. Bratko I. (1991) “Information based evaluation criterion for classifier's performance”, Machine Learning Journal, Vol 6, pp67-80.
Lee, Y., Strong, D., Kahn, B. and Wang, R. (2002) AIMQ: A Methodology for Information Quality Assessment, Information & Management, 40, 133-146.
Ming, L. (2002) “Brief Report: ROC Analysis in Machine Learning”, University of Bristol, Dept of Computer Science, Technical Report 2002-3-13, http://www.cs.bris.ac.uk/~ml1513/doc/roc.pdf
Lee, Y., Strong, D., Kahn, B. and Wang, R. (2002) “AIMQ: A Methodology for Information Quality Assessment", Forthcoming in Information & Management, http://web.mit.edu/tdqm/www/tdqmpub/AIMQJun02.pdf
Meltzer, M. (2002), “CURARE Drives CRM”, in DM Direct June 2002, http://www.dmreview.com/master.cfm?NavID=55&EdID=5316
Motro, A. and Rakov, I. (1996) Estimating the Quality of Data in Relational Databases. In Proceedings of the 1996 Conference on Information Quality, pp. 94-106, October 1996.
Naumann, F. and Rolker, C. (2000) Assessment Methods for Information Quality Criteria. In Proceedings of the 2000 Conference on Information Quality, Cambridge, MA 2000.
Parssian ,A., Sarkar, S. and Jacob, V. (1999) Assessing Data Quality for Information Products, In Proceedings of the 20th International Conference on Information Systems (ICIS 99), Charlotte, North Carolina, Dec. 15, 1999.
Piatetsky-Shapiro, G. and Steingold, S (2000) Measuring Lift Quality in Database Marketing, ACM SIGKDD Explorations, December 2000, http://www.kdnuggets.com/gpspubs/sigkdd-explorations-2000-12-lift-quality.pdf
Piatetsky-Shapiro, G. and Masand, B. (1999) “Estimating campaign benefits and modeling lift”, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, http://www.kdnuggets.com/gpspubs/kdd99-est-ben-lift/
Pipino, L., Lee, Y., and Wang, R. (2002) Data Quality Assessment, Communications of the ACM, 45:4, 211-218.
Romano, N. C. (2001) Customer Relationship Management Research: An Assessment of Sub Field Development and Maturity, Proceedings of the 34th Hawaii International Conference on System Sciences
Shanks, G. and Darke, P (1998) Understanding Data Quality in Data Warehousing: a Semiotic Approach, Proc. MIT Conference on Information Quality, I.Chengilar-Smith, L. Pipino (eds), Boston (November), pp 247-264
Shannon, C. and Weaver, W. (1949). A Mathematical theory of communication. Univ. of Illinois Press, http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf
Strong, D., Lee, Y. and Wang, R. (1997) Data Quality in Context, Communications of the ACM, 40:5, 103-110.
TDWI, (2002), “Data Quality and the Bottom Line: Achieving Business Success through a Commitment to High Quality Data”, The Data Warehousing Institute, http://www.dw-institute.com/research/display.asp?id=6064
Wand, Y. and Wang, R. (1996) Anchoring Data Quality Dimensions in Ontological Foundations, Communications of the ACM, 39:11, 86-95
Wang, R. and Strong, D (1996) Beyond Accuracy: What Data Quality Means to Data Consumers, Journal of Management Information Systems, 124, 5-34
Willshire, M. J. M., Donna (1997). "A Process for Improving Data Quality." Data Quality 3(1): 8, http://www.dataquality.com/997meyen.htm
Page 1 of 7