ACIS 2002 – Doctoral Consortium Proposal


Greg Hill

Department of Information Systems

University of Melbourne


“A Framework for Managing the Value of Customer Information”

The Problem


Information Systems practitioners must justify the costs associated with improving organisational information. In many organisations, IS proposals must be justified as investments: that is, with generic, measurable and value-focussed arguments.


As more resources are directed to supporting customer-facing organisational processes (CRM processes), this area of investing will become increasingly important to organisations. Fortunately, there is a comprehensive and widely accepted method for valuing these processes: customer lifetime value (CLV).


However, at present, there is not a suitable method for valuing the contribution that information makes to CRM processes. Within IS academia, the Information Quality (IQ) sub-discipline is instead concentrating on models and measures of quality, with subjective descriptions of benefits, which are not directly useable for the valuation task.



The Argument (Conclusion)


This is a study of information improvement in CRM processes, and is concerned with the causal link between measures of information quality and customer lifetime value for the purposes of investment. The thesis is that classifier performance measures are useful for predicting, valuing and tracking the improvement of information quality in CRM processes.


A CRM process can be thought of as a classification problem whereby a set of customers is to be partitioned for some task. Each customer is assigned to one partition, where he or she receives the same treatment as other customers in that partition. Treatments have the effect of changing an individual customer’s Lifetime Value (CLV), depending on that customer’s response. For example, a direct mail process might require partitioning a customer list into those who are to receive the offer, and those excluded. In this case, there are four possible outcomes from the Treatment dimension “Offer/Not Offer” and the Response dimension “Accept/Not Accept”.


Assume that the effect of each outcome is fixed and known in advance (this is called a “pay-off matrix” in the decision sciences) but there is uncertainty about how a customer will respond to a treatment (Piatetsky-Shapiro et al. 1999). We can model the classifier’s performance (ie success) using generic techniques from statistics and machine learning. By weighting this model of classifier performance with the CLV pay-offs, we can show how the value of the classifier changes as a function of its performance.


It is argued that any improvement to Information Quality – for the purposes of the task at hand – should be measured by its effect on the classifier performance. Similarly, the value of such an improvement can be determined by using the CLV weighted model.


In order to value an IQ proposal as an investment, it must be described using financial measures such as Return On Investment (ROI) and Net Present Value (NPV). These can be calculated using a standard discounted cash flow approach, where the cash flow arises from each use of the classifier across multiple process instances. For example, improvement to a single attribute-set may result in improved classification for three direct mailings and a de-duplication project.


Hence, these performance measures should form the basis of organisational investment decisions and contractual agreements for information improvement, as they are generic, measurable and value-focussed.



Definitions


Following Meltzer (2002), a CRM process is an organisational process for managing customers. He identifies six basic functions:



Customer Value is sometimes called Lifetime Value (LTV) or Customer Lifetime Value (CLV) or Future Customer Value. It is widely used as the basis for evaluating CRM and Database Marketing initiatives, and is now identified as a standard by the Database Marketing Institute (Hughes 2002). The idea is that the worth of a customer relationship to an organisation can be evaluated by adding up the revenues and costs associated with servicing that customer over the lifetime of the relationship, taking into account future behaviours (such as churn) and the time value of money (Berger et al. 1998). As such, it represents the Net Present Value of the customer relationship.


Information Quality is a research area that seeks to apply modern quality management theories and practices to organisational data. This involves building and applying conceptual frameworks and operational measures for understanding the causes and effects of Information Quality problems. A number of proposals have been made in this area, for example Wand and Wang have an ontologically-based framework consisting of four intrinsic dimensions: complete, unambiguous, meaningful and correct (Wand and Wang, 1996). Recently, an alternative has been proposed through adopting ideas from the field of semiotics, or semiology (Shanks and Darke, 1998). Under this framework, information quality goals are grouped into four abstract levels that build upon each other:



These frameworks are very general, and are intended to apply to all types and uses of organisational data. They do not address directly concepts of rule quality (Dean et al. 1996). In contrast, this proposal is concerned with developing a set measures that relate the effects of initiatives to the classification of customers within organisational processes.


By “a set of measures” is meant a group of well-defined, theoretically-sound metrics which can be used within an organisation for decision-making purposes. A number of quality frameworks advocate the development and use of such measures, for example “The House of Quality” associated with Quality Function Deployment (Hauser et al., 1988). IQ researchers in the past have applied these frameworks to develop measures, for example Total Quality Management (Dvir et al. 1996).



Motivation


The researcher’s personal experience of many CRM processes suggests that poor information quality is impacting upon customers in a negative way and lowering satisfaction levels. In addition, the researcher’s practice in Business Intelligence and Management Reporting suggests that the “investment paradigm” – though the dominant one in the corporate world - is an imperfect filter for information-related projects. That is, the inability to articulate and measure value is potentially hampering investment in information improvement projects, resulting in economically inefficient investment and lower levels of customer satisfaction.



The Evidence


There has been considerable academic interest in CRM strategies, applications and processes, with some 600 papers published in the last five years (Romano 2001). While quality data (or information) about customers is identified as key to the success of CRM initiatives it is not clear exactly how one should value this. Indeed, even the real costs of poor customer data are difficult to gauge due to the complexities of tracing causes through to effects. This is part of the much larger “data quality” problem. At the large scale, The DataWarehousing Institute estimated that – broadly defined - poor data quality costs the US economy over $US600 billion per annum (TDWI, 2002).


The Information Quality academic discipline places emphasis on conceptual frameworks and subjective measures, for example the AIMQ methodology developed with MIT’s TDQM program (Lee et all 1999). However, at a Data Quality workshop hosted by the National Institutes for Statistical Sciences in 2001, one of the key recommendations was that “Metrics for data quality are necessary that … represent the impact of data quality, in either economic or other terms” (NISS, 2000). This is very difficult owing to the very broad impacts of data quality within – and beyond - an organisation, and the large range of purposes for which particular data are used. A final confounding factor is the diffused intangibility of many of these impacts.


Efforts at defining and measuring objective measures of IQ – though less widely employed – have been made. For example, Kaomea (1994) applied a decision-theoretic analysis involving probabilities and pay-offs to argue for a method of valuing data content in context. A methodology for developing IQ metrics known as InfoQual has been proposed (Dvir et al. 1996), while the Data Quality Engineering Framework has a similar objective (Willshire et al. 1997). These efforts focus on measuring properties of data (possibly complementing subjective user ratings), rather than process outcomes. Also, the very general nature of the situations these proposals address means they offer little support for the valuing task, as shown by the NISS call for economic measures of data quality.


By restricting ourselves to the context of CRM processes, it is posited that the Customer Lifetime Value measurement is the most suitable economic measure for describing the impact of information quality. This allows for a mixture of subjective and objective value, as deemed necessary by the decision-maker. Hence, it is not required to model all the financial implications of information quality, just enough to satisfy decision-makers for the purposes at hand.


Further, a set of measures to describe the performance of the classifier (allocation function) in CRM processes must be sufficiently generic to characterise a wide range of CRM processes in general, and the different initiatives under examination. During planning, estimating classifier performance in advance of implementation is required, while measurements based on observable outcomes are used for review. In both cases, the classifier performance measures drive a customer lifetime value-based model to derive the financial outcomes.


The “de-coupling” of classifier performance and the value of the outcomes has been advocated in the statistical and machine learning literature (Ming 2002). This is to allow comparison of different classifiers in the same task, and prediction of classifier performance in context in advance of its deployment (Piatetsky-Shapiro et al. 1999). To that end, this discipline has formulated models and measures of performance that can be adapted for predicting and describing classifier performance in CRM processes.


There are two broad categories of measures identified within the literature. The first examines ratios of “true positives” (eg “hits” in direct marketing ) and “false positives” (eg. “misses”). This concept is addressed generically by the ROC concept (Ming 2002). A marketing-specific treatment is found in the L-Quality metric proposed by Piatestky-Shapiro et al. (2000), based on earlier work in direct marketing (Piatestky-Shapiro et al. 1999).


The second category of measures is those derived from information theory. These measures relate to entropy, or the reduction of uncertainty, first proposed by Shannon (1948). One approach widely used within the machine learning literature is that described by Kononeko et al. (1991). The “average information score” and “relative information score” measure how much uncertainty is reduced by a classifier, on average.


To support this thesis, such a set of measures must be shown to exist and validated for a particular situation. Next, the case must be made that these measures are a suitable basis for organisational investments in general. For wider adoption, experienced practitioners must be satisfied that these measures are useable and useful ways to relate information quality with customer value.



Method


The research method follows the System Development approach, as recommended by Burstein et al. (1999), as it naturally spans the theory building and theory testing aspects of research. Here, the term “system” is used in its broadest sense, and is intended to mean a systematic method used by analysts to support decision-making.

The research consists of two broad phases. Firstly, a conceptual study will be undertaken, with the goal of constructing and validating a suitable set of measures. This will be done through a review of academic, and practitioner literature and a series of semi-structured interviews with practitioners.


Interview subjects will be drawn from practice in CRM, marketing, data quality, information management, data mining and corporate processes. Opinion on the suitability of the measures for decision-making purposes, scope of applicability and practicability will be sought, in addition to anecdotes, maxims and references. The outcome of this phase is a set of measures that can be used to describe a realistic situation.


The second phase consists of conducting a series of field trials to provide some supporting evidence to practitioners for the suitability of the measures. The field trials will be carried out as an after-the-fact analysis of industry projects pertaining to information improvement (ie data quality and data mining initiatives). The objective is to characterise the performance of the information improvement using the proposed set of measures, and relate this to the value attributed to the initiative. The analysis will be quantitative, and include expected (planning) and perceived (review) estimates of the quantities under investigation (classification quality, pay-offs and aggregate customer value).


Finally, these data will be presented to experienced practitioners in support of the thesis. Through a focus group assessment, the suitability, practicability, strengths, limitations and potential of using the measures for this purpose will be determined. The outcome of this phase is some factual evidence and expert opinion to support (or not) the claims of the thesis.



Implications


The outcome of the research is a set of measures suitable for valuing classification quality within CRM processes. If this set is accepted by practitioners, it can form the basis of organisational decision-making. For example, the set of measures can be used to inform business cases (for investment), Key Performance Indicators (for internal workgroups and staff) and Service Level Agreements (for suppliers and partners).


The set will also provide a mean of benchmarking the performance of CRM processes for comparison over time, within organisations or across industries. This should give managers a sense of where they are under- or over-performing, and what magnitude of impact improvements can be expected to make.


The likely beneficiaries of more efficient investment in information improvement are primarily customers, through increased levels of satisfaction. Secondarily, organisations will reduce the costs of mistakes (brand, attrition, rework) and uncertainty (hedging, risk premiums, morale), benefiting employees and owners. Finally, the productivity of the economy will be improved through better resource allocation, benefiting society as a whole.




References



Berger, Paul D. and Nada I. Nasr (1998), “Customer Lifetime Value: Marketing Models and Applications” Journal of Interactive Marketing

Burstein, F. and Gregor, S. (1999) "The Systems Development or Engineering Approach to Research in Information Systems: An Action Research Perspective", Proceedings of the Australasian Conference on Information Systems, Victoria University, Wellington, NZ.  

Dean P., Famili, A.. Comparative Performance of Rule Quality Measures in an Induction System. Applied Intelligence Journal. 1996, ftp://ai.iit.nrc.ca/pub/iit-papers/NRC-39188.pdf

Dvir R., Evans, S. (1996). "A TQM approach to the improvement of Information Quality", in the proceedings of the 1996 conference on Information Quality, MIT, http://web.mit.edu/tdqm/papers/other/evans.html

Hauser, J.R. and D. Clausing (1988), "The House of Quality," Harvard Business Review, May-June, pp. 63--73.

Hughes, A (2002) “How Lifetime Value is Used to Evaluate Customer Relationship Management”, Database Marketing Institute, http://www.dbmarketing.com/articles/Art194.htm

Kaomea, P. (1994), "Valuation of Data Quality: A Decision Analysis Approach," Massachusetts Institute of Technology (MIT) Sloan School of Management, Cambridge, MA, TDQM-94-09, http://web.mit.edu/tdqm/www/papers/94/94-08.html

Kononeko, I. Bratko I. (1991) “Information based evaluation criterion for classifier's performance”, Machine Learning Journal, Vol 6, pp67-80.

Ming, L. (2002) “Brief Report: ROC Analysis in Machine Learning”, University of Bristol, Dept of Computer Science, Technical Report 2002-3-13, www.cs.bris.ac.uk/~ml1513/doc/roc.pdf


Lee, Y., Strong, D., Kahn, B. and Wang, R. (2002) “AIMQ: A Methodology for Information Quality Assessment", Forthcoming in Information & Management, http://web.mit.edu/tdqm/www/tdqmpub/AIMQJun02.pdf


Meltzer, M. (2002), “CURARE Drives CRM”, in DM Direct June 2002, http://www.dmreview.com/master.cfm?NavID=55&EdID=5316


NISS, (2000) “Affiliates Workshop on Data Quality”, National Institute of Statistical Sciences, http://www.niss.org/affiliates/dqworkshop/report/dq-report.pdf


Piatetsky-Shapiro, G. and Steingold, S (2000) Measuring Lift Quality in Database Marketing, ACM SIGKDD Explorations, December 2000, http://www.kdnuggets.com/gpspubs/sigkdd-explorations-2000-12-lift-quality.pdf


Piatetsky-Shapiro, G. and Masand, B. (1999) “Estimating campaign benefits and modeling lift”, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, http://www.kdnuggets.com/gpspubs/kdd99-est-ben-lift/


Romano, N. C. (2001) Customer Relationship Management Research: An Assessment of Sub Field Development and Maturity, Proceedings of the 34th Hawaii International Conference on System Sciences


Shanks, G. and Darke, P (1998) Understanding Data Quality in Data Warehousing: a Semiotic Approach, Proc. MIT Conference on Information Quality, I.Chengilar-Smith, L. Pipino (eds), Boston (November), pp 247-264


Shannon, C. and Weaver, W. (1949). A Mathematical theory of communication. Univ. of Illinois Press, http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf


TDWI, (2002), “Data Quality and the Bottom Line: Achieving Business Success through a Commitment to High Quality Data”, The Data Warehousing Institute, http://www.dw-institute.com/research/display.asp?id=6064


Wand, Y. and Wang, R. (1996) Anchoring Data Quality Dimensions in Ontological Foundations, Communications of the ACM, 39:11, 86-95


Willshire, M. J. M., Donna (1997). "A Process for Improving Data Quality." Data Quality 3(1): 8, http://www.dataquality.com/997meyen.htm


Page 1 of 6