Statistics in Healthcare – Friend or Foe? – Taking the Compliance Office’s Perspective
Growing Importance of Statistics in Compliance Enforcement
Data analytics, data mining, overpayment extrapolation using statistical concepts including sampling, and the overall use of statistics in healthcare have become prominent tools in the enforcement community’s war chest.
The specifically created Healthcare Fraud Prevention and Enforcement Action Team (HEAT), new CMS contractors such as the Medicare Recovery Audit Contractors (RACs), Zone Program Integrity Contractors (ZPICs), and Medicaid Integrity Contractors (MICs), and soon to come Medicaid Recovery Audit Contractors with similar goals and payment incentives as the RACs, indicate that there is war on fraud and abuse.
All of these contractors employ statistical techniques to weed through data, detect abnormal or suspect payment patterns, and often draw conclusions from a limited set of observations to a large universe of audit populations. Recovery demands from contractor audits and resulting investigations can easily make statistics appear as a foe in the eyes of the Compliance Officer.
However, it is not only the enforcement community that got more sophisticated over the past years to combat fraud and abuse and non-compliant behavior.
Over the last decade Compliance Officers around the country have steadily improved the ways they are running their programs. The most advanced and proactive compliance programs have begun integrating statistical concepts into their compliance auditing and monitoring efforts. “Auditing and monitoring” is one of the seven essential elements of an effective compliance program as defined in the Department of Health and Human Services (HHS) Office of Inspector General (OIG)’s compliance program guidance.
Effective compliance programs have become a condition of participation (COP) in Federal healthcare programs through the Patient Protection and Affordable Care Act (PPACA) of 2010, and they are even mandated by state law for Medicaid providers in some cases, e.g., New York. It therefore behooves compliance offices in the provider and supplier community to include statistical tools in their own arsenal to fight fraud and abuse, evidence effectiveness, and keep the government contractors at bay. Statistics in healthcare can be a friend in this endeavor.
This article explores some of the basic statistical concepts and considerations that turn a potential foe to a Compliance Office’s friend. The authors, as part of their professional careers, have seen and are involved in both the government audit and provider/supplier community side of things. In the authors’ experience statistics is simply a tool.
It would be wise for any Compliance Office or Internal Audit group to gain an understanding of the basics so that they can use it effectively for auditing and monitoring purposes, regardless of the particular audit target (such as claims, arrangements, general ledger, financial transactions, reimbursement or productivity studies, etc.)
Many statistical terms and concepts may appear too complicated or even intimidating. There are, however, some basic concepts and when applied can render a powerful tool in evaluations, auditing and monitoring that is worthwhile getting more familiar with.
These concepts are explored below and can assist the Compliance Office or Internal Audit group in deciding on how they can use statistics or when to ask for assistance by an expert to apply these concepts in a most productive and cost effective manner.
Data Analysis versus Data Mining
1) Data Analysis
Data Analysis uses statistical tools to study the basic characteristics of collections of data. The most basic type of Data Analysis, Exploratory Data Analysis (EDA), just exploresthe data without assuming any structure. EDA postpones the usual assumptions about what kind of model the data follow allowing the data itself to reveal its basic structure and model.
Typically EDA aims to uncover underlying structure, extract important variables, detect outliers and anomalies, and test underlying assumptions.[i] For example, reviewing claims data without any preconceived notion or suspected problem in mind. The data will tell us new things.
In contrast, the goal of Classical Data Analysis (CDA) is to collect data to analyze according to a pre-defined model presumably established as a result of previous rounds of EDA. For example, through using statistics in healthcare, if the assumed model states the existence of a linear association among variables relevant to the study in hand (linear regression model), the analysis, estimation, and testing that follows are focused on the parameters of that model. For example, a hypothesis might be that Point of Service (POS) errors in E/M coding increase with the percentage of usage of a new ERM system.
CDA often employs statistical tools to study a subset (sample) of a large data collection (population) and generalize conclusions derived from the sample to the population. Probability Theory is the standard statistical tool to deal with the underlying degree of risk associated with the process. One source of risk arises from fact that the analyzed sample may not be representative of the population from which it was drawn (sampling risk) rendering problematic the extension of sample-specific conclusions to the population.
A customary way to hedge against this possibility is to report conclusions about the population (drawn from analysis based on sample data) with a related minimum acceptable degree of statistical confidence (confidence level). Another source of risk comes up from measurement error – for example, an auditor may fail to recognize misstatements included in documents that she examines (non-sampling risk). Non-sampling risk in an auditing context “can be reduced to a negligible level through such factors as adequate planning and supervision.”[ii]
CDA may be used, for example, to validate the assumption of linear growth trends in time-referenced datasets. Considering a dataset including 26 weekly observations of the number of units of service of a pharmaceutical drug dispensed per beneficiary, the working assumption could be the existence of a linear relationship between units of service per beneficiary and time.
The hypothesis of linear growth trend may be validated by fitting this simple linear regression model to the data to corroborate statistically one of three possible outcomes – upward linear trend, downward linear trend or no detectable linear trend. If the statistical validation upholds the linear trend hypothesis, the regression parameters (weights) can be used to predict the number of units per beneficiary for subsequent points in time.
2) Data Mining
Data Mining (DM) is the process of discovering patterns in data. In general the search for patterns takes place in very large databases yielding to outcomes that have a broad range of payoffs for business applications allowing for an efficient use of statistics in healthcare
Although DM processes rely greatly on the execution of automatic computer-based operations, users play the critical role of selecting the methods that suit better their data mining goals and assessing their economic advantages. The essential role of users in taking full advantage of DM processes is not emphasized enough in the literature of DM software vendors, which focus instead on the ability of their products in deploying a sophisticated array of DM techniques.
Most of the techniques labeled as DM analytical methods come from three areas of knowledge – Statistics, Computer Science/Machine Learning, and Engineering. Years ago DM clearly did not have an identity as a field of knowledge.
A 1997 Stanford University academic working paper proposed the question whether “from the perspective of statistical data analysis DM methodology is an intellectual discipline” to put forward an emphatic “not yet” response, added by the qualification that “in the future the answer is almost surely, yes!”[iii], Fourteen years later it is safe to state that the prophecy was fulfilled. There is no doubt that data mining acquired its own identity, in spite of its recurrent borrowing of methods developed in other areas of knowledge.
The two main goals of DM users are to gain knowledge about data (using classification algorithms, for example cluster analysis) and uncover meaningful patterns that will potentially lead to non-trivial predictions about new data (using predictive models, for example logistic regression).
A list of some of the most frequently used data mining techniques used with statistics in healthcare includes:
- Logistic Regression (LR): Predictive model based on the relationship between predictor variables and a dichotomously coded response variable (coded as either 0 or 1). The influence of predictor variables on the response variable is often assessed on an odds-ratio basis – how likely it is that a particular set of values (say coded credit card applicant characteristics) will score either of the two alternatives (default, yes or no).
- Artificial Neural Network (ANN): Non-linear predictive models that learn through training and resemble biological neural networks in structure. In the example of assessing the likelihood of credit card default by applicants, both LR and ANN require a training dataset including the relevant attributes (predictor variables) of a number of individuals associated with the corresponding known outcomes (default, yes or no). The estimated weights, computed by fitting the models to the training dataset, will be used to score the probability that applicants will default given their attributes.
- Decision Trees: (DT) Tree-shaped data structures that specify the sequences of decisions that need to be made and the resulting recommendation. The internal nodes denote tests on attributes, the branches represent outcomes of tests and the leaf nodes represent recommendations.
- Cluster Analysis (CA): process of organizing properly represented data into meaningful groups, possibly ignoring non-essential information (noise), by interpreting which data points are in some sense connected.
- Social Network Analysis (SNA), alternatively labeled Link Analysis: the formal study of systems of people (or groups of people), with an emphasis on their relationships.
- Association Rules (AR), also labeled Market Basket Analysis, search for interesting and useful patterns in transaction databases; originally employed to reveal regularities between products using large scale data recorded by points of sale systems in supermarkets.
3) Statistical Package or Data Mining Software?
Classical Data Analysis and Data Mining are now well established tools of modern business management. Standard statistical packages such as SAS/STAT have all the statistical tools to match typical business data analysis demands, including a random number generator to support data sampling. As DM is typically associated with very large databases, the decision to acquire Data Mining software such as SAS Enterprise Miner should be clearly based on the scale of business operations and in-house expertise.
SAS/STAT most likely meets the moderate data mining needs of small and medium size business operations: it comprises of a number of predictive linear models such as ordinary regression models, diverse types of generalized linear models including logistic regression, and tools to perform cluster analysis.[iv] In addition, code to implement the Decision Trees algorithm using SAS/STAT is publicly available as a SAS Macro at the world wide web.[v] SAS is often used by government contractors that conduct data mining of high volume transactions, payment, or claims data.
Besides Artificial Neural Network, the DM analytical methods typically not available in statistical packages are the ones developed in the area of Computer Science/Machine learning – such as Decision Trees, Social Network Analysis and Association Rules.
Businesses having medium size scale of operations, not yet large enough to justify the acquisition of full blown and costly Data Mining software, can add to their current statistical packages and mining capabilities the resources available in the R Project – “a language and environment for statistical computing and graphics” accessible as a free software aimed at increasing the use of statistics in healthcare.[vi]
The Comprehensive R Archive Network, available at a number of URLs worldwide, hosts the root software as well as a large number of packages to implement Artificial Neural Network, Association Rules, Social Network Analysis, Decision Trees, etc. Although the software is free, the programming language learning curve is a cost to be considered regarding the decision of making it complementary to the currently used statistical package.
The Importance of Sampling
There are many different types and uses of sampling, but one particular distinction is noteworthy in the context of auditing and monitoring. When data analysis or data mining may have given rise to a suspected pattern, increased the likelihood of aberrant processes, and flagged certain claims, providers, transactions, or similar, additional manual reviews (chart review for medical necessity, inspection of a provider site, authenticity verifications of service agreements, etc) may still be necessary for a conclusive result.
In other words hands-on work and detailed reviews are required. In those cases, sampling to further assess and verify or test a hypothesis may be useful. There are two types of sampling: statistical versus judgmental that are noteworthy.
1) Statistical/Probability Sampling
Statistical auditing by sampling and the use of statistical sampling methods are used for a variety of applications, including audit testing; direct estimation of values such as dollar values of overpayments, transaction values, value of inventories; and transaction accounting, or cost accounting.
Audit testing typically involves assessing the adequacy of an organization’s internal control system against a stated audit objective; a great use of statistics in healthcare. Estimation deals with the projection from a relatively small sample of observations to a universe without having to analyze every observation or unit in that universe.
A Statistical Sample, also called a Probability Sample, or statistically valid random sample (SVRS), provides a) a means for an advanced estimate of the sample size required for a given audit objective and desired confidence level and b) the ability to appraise sample results, i.e., make valid projections.
It is objective and defensible. The CMS Program Integrity Manual, for example, defines the Probability Sample as one that is generated by procedure for which two things need to apply: (1) it must be possible, in principle, to enumerate a set of distinct samples that the procedure is capable of selecting if applied to the target universe, and 2) each sampling unit in each distinct possible sample must have a known probability of selection.
In simple random sampling and stratified sampling, which are most commonly used by CMS contractors, such a procedure typically involves a randomization tool, such as a random number generator (e.g., in MS Excel, RAT-STATS, SAS, and SPSS and other statistical packages), or random number tables. Systematic sampling, another type of probability sampling, requires that the sampling units be numbered with consecutive integers and then picking a random start and fixed interval of selection, e.g., every tenth item.
The key benefit of a statistical sample that is valid is that it is fully replicable, and hence verifiable, and therefore lends itself to defensible and objective projections. A statistical sample also allows for quantifying the uncertainty that remains when calculating an estimate for the universe from sampled items in the universe.
This remaining uncertainty is typically expressed in terms of confidence and precision levels. As stated above, these levels represent a customary way to hedge against the possibility that the sample is not representative for the population. It is important to note that both the HHS OIG and the CMS Medicare Program Integrity Manual require Probability Samples for overpayment assessments. Program Integrity Contractors, such as RACs and ZPICs, must use probability sampling to extrapolate overpayments from samples.[vii] Statistics in healthcare allows for auditors to be extremely thorough and uncover anomalies that may have been missed internally.
While probability sampling has many advantages, some inherent risk in using them must also be raised. The Fraud Enforcement and Recovery Act (FERA) of 2009 extended and amended the False Claims Act, including reverse false claims, and PPACA requires to report and refund identified overpayments within 60 days.[viii]
The overpayment assessments from Probability Sample results in retrospective claims audits that exceed probe samples (size 30) may provide implicit information of overpayments in the universe for the look-back period. Larger samples may identify the fact of overpayments but potentially even the amount for the universe, if the error pattern is clear enough and high confidence and precision levels of the overpayment estimate are reached. Judgmental samples cannot identify overpayment in a similarly reliable manner and hence do not have the same inherent business risk.
2) Judgmental Sampling/Non-statistical sampling
The importance of judgmental sampling is that it is not confined to samples in which the auditor actually exercises his/her judgment in the selection of items but broadly includes all samples, which are obtained by non-probability sample methods.
For instance, pulling several charts from a pile of medical charts in a haphazard manner, picking records from a transaction list at will, etc. “Judgmental samples, though not necessarily less accurate than probability samples do not have two important characteristics – estimation of the required sample size and of objective projection or evaluation of the sample results.”[ix]
Judgmental sampling often occurs on the basis of knowledge of a certain problem that is researched or audited. For example, based on experience, an auditor may know which types of claims are more prone to non-compliance or which types of transactions have had problems in the past.
Such samples can still probe into certain areas or spot check functioning of internal controls.[x] Another benefit is that they are typically less expensive to assemble and hence may be favored over probability samples in certain audit situations.
To conclude, both types of sampling have their role in auditing and monitoring. Compliance Officers should be sure they understand which type of sampling is applied in an audit, if any, and which sampling they prefer or seek in requests for proposal, if they outsource evaluations that involve sampling. Overall this type of sampling leads to an effective use of statistics in healthcare.
Interval Estimate and Hypothesis Tests (P)
Sampling a subset of a population to draw conclusions about it is one of the most frequent applications of classical data analysis. For example, a large batch of medical records may include inappropriate payments, e.g., overpayments; sampling a subset of these records rather than surveying their whole population can yield a reliable estimate of the population overpayment value at a much lower cost. The computed sample overpayment value can be used to determine a Point Estimate of the overpayment amount for the whole population.[xi]
However, because different samples of the same population generate different point estimates, statements about the population based on sample outcomes have a degree of uncertainty that need to be qualified. Two standard ways of addressing the uncertainty brought by the sampling variability include: 1) constructing a confidence interval that is defined by the confidence level and desired precision level; and 2) performing hypothesis tests.
A confidence interval expands the point estimate to a range of values that defines an interval estimate. An interval estimate quantifies this uncertainty in the sample estimate by computing lower and upper values of an interval, which will, with a given level of confidence (probability) and precision level, contain the true population parameter.
The uncertainty associated with drawing conclusions about the population based on sample estimated values can also be addressed by hypothesis tests. A hypothesis test basically attempts to refute a specific statement about a population parameter based on the sample data. Hypotheses are conjectures about the population parameter, not about the sample estimate. Considering the example above of a a sample extracted to estimate the total overpayment misstatement of a large batch of medical records, typical hypothesis tests could be:
- Is the total overpayment less than $10,000?
- Is the total overpayment equal to $25,000?
A rejection of a hypothesis leads to the conclusion that it is false, but the non-rejection (“acceptance”) of a hypothesis does not mean that it is true: the only possible conclusion is that there is no evidence to believe otherwise. Hypothesis tests are usually stated in terms of both a condition that is doubted (null hypothesis) and a condition that is believed (alternative hypothesis).
Pattern recognition is another tool to be used with by applying statistics in healthcare compliance. Although pattern recognition is usually associated with Computer Science/Machine Learning and more recently with Data Mining, Classical Data Analysis practitioners have long used the “learning” or “training” component typical of a number of DM applications without employing the related terminology. The logistic regression (LR) used as predictive model to score the probability of default by credit card applicants (discussed in a previous Section) qualifies as an example of training a structure – by fitting the LR to a sample representative of the population – to score previously unclassified data (pattern recognition).
As DM encompasses other areas of knowledge besides Statistics though, its need for operational definitions for the concepts of pattern recognition/machine learning is more critical. Here are some definitions quoted from a comprehensive text on Data Mining:[xii]
- Pattern recognition is a topic that is closely related to machine learning, and many of the same techniques apply.
- Machine learning provides the technical basis of data mining. It is used to extract information from the raw data in databases—information that is expressed in a comprehensible form and can be used for a variety of purposes
- Besides mining methods focused on prediction, we are equally interested (perhaps more) in applications in which the result of “learning” is an actual description of a structure that can be used to classify examples
- In our experience, insights gained by the applications’ users are of most interest in the majority of practical data mining applications; indeed, this is one of machine learning’s major advantages over classical modeling methods of statistics in healthcare.
Still it is worth emphasizing once more that in the Classical Data Analysis environment pattern recognition is often employed but not characterized as such. Another example – in the context of Auditing – is the suggested deployment of a linear regression model to estimate ratios to be used in Analytical Procedures (APs).
The approach follows the two-step approach described above – fit a model to data (train the model) and score/classify new data (recognize pattern).[xiii]
Improving Internal Controls Using Statistical Techniques
Monitoring versus Auditing
Statistics in healthcare compliment “Auditing and monitoring”, one of the seven elements of HHS OIG corporate compliance programs that are in turn based on the US Sentencing Commission’s Guidelines[xiv]. The terms are often used in tandem but they are nevertheless distinct concepts. Marshall, Schwartz, and Kinman summarize monitoring in the context of corporate compliance programs.[xv]
“Once developed and implemented, the compliance program must be constantly monitored to ensure its effectiveness in preventing and detecting violations of law. Thus the dual objectives of the compliance monitoring function are to ensure that compliance programs are being implemented and to monitor the effectiveness of these programs. Monitoring of the compliance program is fundamental to the due diligence that organizations must exercise under the federal sentencing guidelines.”
Auditing and monitoring are crucial elements in establishing and maintaining an effective system of internal control. Auditing and monitoring can also be considered part of risk management activities designed to protect the organization and its workforce, customers and assets from risk or harm. The key characteristics distinguishing auditing and monitoring are independence, objectivity and frequency.[xvi]
Auditing and monitoring are both processes, where:
- Monitoring can be conceived as an ongoing process, and typically routine, activity that provides the primary mechanism to ensure that effective compliance is achieved. It is composed of evaluation activities completed by individuals who may not independent of the process on a routine or continuous basis (quality assurance function).
- Auditing focuses on testing whether the monitoring actually works as it should. Auditing involves testing, spot-checking, and inspections and is typically done at a point in time, either scheduled and routinely, or non-routinely and whenever adverse events trigger an audit. A key feature of auditing is that it must be independent and objective.
The types of control[xvii] that monitoring employs can be categorized into:
- Steering controls. Through the identification of events which prompt the organization to take interim actions that will contribute to the larger objectives. The common characteristic is that they alert the organization to the need of some managerial action.
- Yes-no controls. Designed to function more automatically to ensure desired results. The common element is that there is a pre-established control device or arrangement that under normal conditions will more or less help assure the desired protective, or improved action. For example, a routine that checks if a coder meets a target accuracy rate or not.
- Post-action controls. This type of control overlaps with the two above but is distinctive in that the managerial action comes about some time later and takes the form of doing the best possible under the given circumstances. The action taken may be, for example, to repair damage to reputation, mitigate the effects of a breach, to report and refund overpayment to CMS subsequent to control failure, and take corrective action against an employee, etc.
Mandatory compliance programs will have to be designed so that they can evidence that the internal controls, i.e., the procedures that define compliant operational processes through standards, metrics, and thresholds or ranges of acceptability, are functioning properly. Verification audits, audit testing to assess risks, ongoing monitoring whether compliance operations are working within expected parameters or metrics will be necessary.
By applying statistics in healthcare, sampling can be used effectively to assess whether internal controls are within the organization’s stated goals and to manage risk according to risk tolerance levels. To test the internal controls of a process, e.g., the process of claims submission, patient admission, coding, auditors often rely on stated metrics. Metrics are also frequently used in quality assurance (QA) functions, an internal monitoring function of providers and suppliers. A metric basically provides a bound for a controlled process.
By using a sample of observations generated by the process, the auditor can assess if the process is within stated bounds. For example, a Health Information Management (HIM) department may have a coding accuracy rate target of 95% for coders. A judgmental audit of a few charts selected haphazardly for an individual coder may quickly provide insight into some flaws of the coding process. The importance of this is that it may trigger an audit or data analysis effort to further investigate if there is actually a systemic problem or the observations were outliers and unusual.
If the coder’s accuracy rate is to be representative, defensible and objective, a probability sample is best used for analysis. For the auditor to conclusively and objectively test the hypothesis that the coder deviates or meets the target accuracy rate of 95% a probability sample should be drawn and analyzed.
Furthermore, the sample size should be established according to pre-defined levels of confidence for the test. As a rule of thumb, however, when using sampling as part of hypothesis testing for significance of deviation from an observed metric against a target metric for a large universe of observations, one should have at least have 30 samples.
In the context of compliance program effectiveness, many different processes exist. Various metrics may be used to define the organization’s compliant operational procedures and set internal control standards. Examples for compliance metrics are:
- Coding accuracy rate
- E/M Profile deviation from norm
- Compliance training participation rate
- Overpayment error rate
- Compliance rate for Notice of Privacy Practice provision to patients
- Conflict of Interest Form completion rate
- Occurrence rate for Point of Service (POS) errors in a claims data set
- Physician arrangements database completeness rate
- Claims denial rate
- Collection rate of claims billed
The Compliance Office may want to prepare a sampling strategy that relies on both judgmental and probability samples to assess if the metrics are generally met or need to be readjusted.
Things to Consider by the Compliance Office
While this article only touched upon many of the concepts, it still provides some background on what statistics in healthcare, data analysis, and data mining in the context of auditing and monitoring are all about. Some of the key insights when auditing or monitoring using samples are:
- Probability samples are powerful but harbor some risk. They are required for valid extrapolations.
- While the use of probability samples is required by CMS contractors, they may not necessarily always be the first choice of an internal audit. Judgmental samples may be the better choice in some cases;
- Data mining and data analysis are different techniques, where data mining focuses on massive data sets and highly mathematical and sophisticated routines;
- Internal auditing and monitoring should rely on probability samples in order to arrive at definitive and objective conclusions drawn for a population of sampled items;
- Confidence and precision levels should be stated as part of the audit objective and are a way of hedging against the sample not being representative;
- Samples of less than 30 don’t lend themselves to prediction/extrapolation; and
- Engagement partners and consultants should be specific as to what type of sampling they perform and if it can be used for valid extrapolations.
“How Does Exploratory Data Analysis differ from Classical Data Analysis?” Engineering Statistical Handbook
Statements on Auditing Standards (SAS) No 39, Auditing Sampling, American Institute of Certified Public Accountants (AIPCA)
Jerome H. Friedman, “Data Mining and Statistics: What’s the Connection?”, November 1997, Stanford University
For an analysis of data mining in SAS/STAT (statistical package) in contrast with SAS Enterprise Miner see Patricia B Cerrito, “Comparison of Enterprise Miner and SAS/Stat for Data Mining”, Paper DM100, University of Louisville, Louisville, KY
Decision Tree Algorithm
The R Project for Statistical Computing
CMS Program Integrity Manual, Chapter 8, “Administrative Actions and Statistical Sampling for Overpayment Estimates,” see section 8.4.2
See Herbert Arkin, “Handbook of Sampling for Auditing and Accounting ”Third Edition, Prentice Hall (1983) p.8[x]
If a stratified random sampling design is used, the population point estimate is computed as the sum across all strata of the average overpayment value (per sampling unit) times the number of records in the stratum[xii]
Witten, Ian H. and Eibe Frank ,“Data Mining: Practical Machine Learning Tools and Techniques,” Elsevier 2nd edition, 2005
Neal B. Hitzig “The Hidden Risks of Analytical Procedures: What WorldCom Revealed,” The Accounting Journal (Fraud Law Resources for Oregon and Washington)
Kenneth K. Marshall, R. Malcolm Schwartz and Brain J. Kinman (1998). Chapter 11: “Auditing and Monitoring Systems,” in: Jeffrey M. Kaplan, Joseph E. Murphy, and Winthrop M. Swenson (1998). Compliance Programs and the Corporate Sentencing Commission Guidelines, West Group[xvi]
“Mark P. Ruppert, “Auditing and Monitoring – Defined”
Victor Z. Brink and Herbert Witt (1982). “Modern Internal Auditing: Appraising Operations and Controls” (4th Edition), John Wiley and Sons, New York