The rapid growth in health care expenditures has prompted third-party payers, both governmental and private, to institute programs that try to control costs by restraining the use of health care services. These programs range from direct efforts to identify and discourage specific unnecessary services (e.g., prior review of proposed care) to financial incentives for providers and consumers to reduce services (e.g., capitated payments to health care providers and cost-sharing by patients). These steps, if successful, can not only control costs but also improve the quality of care by reducing exposure to iatrogenic illness and injury. However, these programs could also over-reach to discourage the provision or use of needed services.
Good criteria for assessing quality of care and for distinguishing appropriate from inappropriate care can operate at the intersection between cost and quality concerns in two major ways. On the one hand, they can strengthen the clinical basis for prior review activities aimed at detecting and avoiding unnecessary care. On the other hand, they can help to identify or to prevent the underuse of care that might be an undesirable side effect of review programs, financial incentives, and other methods of controlling costs. It was against this background that Congress mandated the Medicare quality assurance study and specified as one task the “development of prototype criteria and standards” for defining and measuring quality of care.
Developing quality-of-care criteria is not a simple task, and the results are not uniformly helpful. Criteria sets vary considerably in their method of development and their substance, depending on the objectives, focus, skills, and experience of their creators. Even when criteria sets have a basic approach and specific application in common (as described in the next sections of this chapter), their formulations may differ substantially in scope,
explicitness, flexibility, and scientific support. Not surprisingly, criteria sets vary in their utility and acceptability.
A prerequisite for developing useful and acceptable quality-of-care criteria is a consensus on the characteristics of sound criteria sets and acceptable methods for constructing them. The immediate goal in this chapter is to propose a basis for such a consensus. The actual development and implementation of sound guidelines will require a commitment of considerable time, resources, and expertise over a period of years.
The Institute of Medicine (IOM) study committee believed that the best way for it to move toward a framework for developing sound criteria was to convene a panel of respected experts in guideline formulation from various organizations active in this field. The main purpose of the panel was to reach agreement on the desirable attributes of quality-of-care criteria. These attributes would be standards against which old or newly developed criteria could be compared and evaluated. The panel’s focus was thus on the formulation of “criteria for judging criteria” rather than on the endorsement of specific sets of criteria. Appendix A describes the composition and activities of the panel.
The remainder of this chapter discusses the conceptual issues presented to the panel. These included: three types of quality-of-care criteria sets, the range of attributes and characteristics that might be considered desirable or necessary for such criteria sets to have, the uses to which such criteria sets can be put in a quality assurance context (such as education or quality review), and key attributes for such criteria sets.
Producing criteria sets that meet the standards proposed by the panel calls for a complex and sophisticated development strategy, or perhaps several strategies depending on the type of criteria set in question. Later sections of this chapter briefly discuss methods for developing criteria with particular emphasis on stages in the development process, priority-setting, and affordability.
Different kinds of criteria sets have evolved to meet different needs. The expert panel identified three broad types: appropriateness guidelines, patient care evaluation and management criteria, and case-finding screens. Each of these is discussed below.
Appropriateness guidelines describe accepted indications for using particular medical interventions and technologies, ranging from surgical procedures to diagnostic studies. Some guidelines specify under what circum-
stances a particular service is appropriate (indicated). For instance, one indication for colonoscopy might be lower gastrointestinal tract bleeding. Guidelines may also describe when an intervention is not indicated. One example might be performing a carotid endarterectomy on an asymptomatic patient when the carotid angiography shows stenosis of less than 50 percent (Merrick et al., 1986). Finally, guidelines may identify equivocal indications or areas of uncertainty where consideration might be given to complex or hard-to-enumerate patient factors or where different clinicians simply disagree. For example, the indications for an exercise test to detect coronary artery disease may be “equivocal” for asymptomatic male patients over age 40 in special occupations involving public transportation or safety, including pilots, railroad engineers, and police officers (American College of Cardiology, 1986a).
Appropriateness is an integral part of quality health care (Brook, 1988b; Greenfield, 1988). In this context appropriateness generally means that the service in question has demonstrated clinical benefit for a particular indication and that the likelihood of benefits outweighs the likelihood of harm. Good quality care does not include surgery or other services that are technically flawless but not indicated or necessary. Thus, some experts have begun to explore whether economic costs ought to be factored into definitions of appropriateness, but there is no clear agreement on this point (Paterson, 1988; see also the discussion in Chapter 1 of this report).
Many organizations formulate appropriateness guidelines. The National Institutes of Health (NIH) consensus conference represents one forum for guideline development (Kanouse et al., 1987; Kosecoff et al., 1987; PPRC, 1988a). Particularly active in recent years have been technology assessment committees of several medical specialty societies. The best known effort may be that of the Clinical Efficacy Assessment Project (CEAP) of the American College of Physicians (ACP) (Sox, 1987; Steinberg, 1988). In addition, endoscopy guidelines have been developed by the American Society for Gastrointestinal Endoscopy (ASGE, 1986) and guidelines for various cardiovascular procedures by the American College of Cardiology (ACC) and American Heart Association (AHA) Task Force (ACC, 1986a, 1986b).1
Third-party payers such as Blue Cross and Blue Shield Association (BCBSA) and Medicare also develop standards identifying appropriate indications for various medical procedures and technologies. Their primary purpose is to serve as bases for making coverage or utilization review decisions that are intended to control costs by reducing payments for inappropriate or unnecessary services. Examples of such criteria include the preadmission review criteria developed or adopted by the Medicare Peer Review Organizations (PROs) for specified procedures and the guidelines for appropriate use of selected medical technologies developed by the BCBSA
Medical Necessity Program (IOM, 1988; Schaffarzick, 1988). The BCBSA has also supported the CEAP work and has worked with different specialty and research groups to develop guidelines that are disseminated to hospitals and physicians for educational and quality assurance purposes (not payment decisions).
Independent research organizations also have been involved in developing guidelines for the appropriate use of various medical technologies. One example is represented by The RAND Corporation’s appropriateness criteria for coronary angiography, coronary artery bypass surgery, carotid endarterectomy, cholecystectomy, and diagnostic upper gastrointestinal endoscopy and colonoscopy2 (Chassin et al., 1986a, 1986b; Kahn et al., 1986a, 1986b; Merrick, 1986; Park et al., 1986; Solomon et al., 1986; Chassin et al., 1987; Chassin, 1988; Winslow et al., 1988a, 1988b).
A second type of criteria set has evolved to help assess or guide the management of particular outpatient or inpatient medical problems rather than use of a specific service or technology.3 These criteria sets often involve medical conditions that are characterized by ill-defined symptom complexes or that require multiple discrete clinical decisions over time. For example, they may define the range of appropriate services and care for problems such as hypertension, right lower quadrant pain, or post-operative fever, or they may specify various screening and preventive services.
A major challenge for evaluation and management criteria is variability in patients’ clinical status, sociodemographic characteristics, and treatment preferences. For example, the appropriate diagnostic work-up of right lower quadrant pain may differ for a young male with fever, a young woman with an intrauterine device, or a middle-aged woman with a history of irritable bowel syndrome. Similarly, the appropriate management strategy for an elderly person with Type II (adult onset) diabetes who has difficulty checking fingersticks (home tests for blood sugar levels) may differ from that for a more adept, medically sophisticated younger diabetic.
Traditionally, quality assurance criteria developed for evaluating patient care have dealt with this complexity and variability by identifying the minimum process elements for managing a particular condition.4 Beyond this minimum, the criteria allow for substantial clinical judgment about patient management activities. This strategy reflects, in part, a dearth of clinical research that would permit greater specificity and, in part, a lack of resources that would allow the developers of the criteria to be more precise.
Recently, computerized software systems have helped extend the use of patient management criteria by making it faster and simpler to match spe-
cific diagnostic or treatment steps to a variety of medical conditions. For example, the management of hypertension for a particular patient might be evaluated through on-line scoring of the patient’s medical record for compliance with criteria calling for documentation of a funduscopic examination of the eye, urinalysis, potassium measurement, dietary instruction, and medication for patients with diastolic blood pressure consistently over 100 mm Hg.
A related approach is represented by detailed algorithms, decision trees, or criteria maps that more comprehensively specify the steps for managing a problem (Greenfield et al., 1975, 1977, 1981; Stulbarg et al., 1985). Patient variability is addressed by constructing a “network,” diagram, or flow chart that helps the practitioner choose which of several alternate pathways provides the best fit between treatment options and patient characteristics. These algorithms represent optimal rather than minimal standards. Compared to the latter, they tend to be more difficult to develop and validate, and consensus may be harder to achieve. Complex algorithms may be difficult for practicing physicians to understand or accept. Even when physicians do understand the algorithms, they may not find them practical in normal clinical or (especially) crisis situations or for routine quality-of-care evaluations.
Case-finding screens identify potential quality-of-care problems that warrant further evaluation. These screens are objective, easily used, and often related to outcomes such as surgical complications. They trigger more in-depth analysis and peer review to confirm the presence of the problem and to detect remediable defects in processes of care at a particular institution or by a particular provider. Their relative ease of application makes them appealing for monitoring the effects of changes in provider organizational features, process of care, or payment methods.
One variety of case-finding screen is represented by hospital generic screens, sometimes called “occurrence screens.” These screens have traditionally focused on single, adverse, “sentinel” events, such as an unplanned return to an operating room (OTA, 1988). The PROs have used generic screens for several years (see Chapter 6). Their set includes occurrences or specific “flags” such as nosocomial infections, unexpected death, or a return to the intensive care unit (ICU) within 24 hours of discharge from the ICU. Hospital-wide process or outcome criteria are intended to be broadly applicable across clinical departments and specialties rather than specific to, say, a clinical department, the emergency room, or pediatric care. They have been adopted by the AHA as part of its “Integrated Quality Assur-
ance” (IQA) program (Longo et al., 1989), which in turn is based on a complex IQA model developed by the Hospital Association of New York State.
Another variety of case-finding screen is represented by specialty-specific clinical indicators such as those being developed by the Joint Commission on Accreditation of Healthcare Organizations (Joint Commission) (Lehmann, 1989; Marder, 1989; Winchester, 1989). Like generic screens, these indicators can consist of sentinel events that trigger more in-depth review. Unlike generic screens, however, they are specific to a particular specialty, type of procedure, or clinical system for delivering care. One such sentinel event in obstetrics, for instance, is the delivery by planned cesarean section of an infant weighing less than 2500 grams or one with hyaline membrane disease. The indicator may be either an adverse outcome that is linked to a process under the practitioner’s or institution’s control or a process than has been clearly associated with an adverse outcome (Lehmann, 1989).
More complicated, less easily applied versions of screens also exist. With “threshold” criteria, the trigger is not a specific event, but a rate of events above or below a defined level; for example, more than a 10-percent rate of appendectomies where the appendix is normal. Other screens involve failure to follow up abnormal results of laboratory tests or diagnostic studies (for example, positive blood cultures, suspicious shadows on radiographic films, or abnormal Papanicolaou [Pap] smears). Hospital admissions for conditions that could indicate poor ambulatory care are a newer focus. The 13 sentinel conditions discussed in Chapter 6 (for example, diabetic complications and malignant neoplasm of the genitourinary organ) in the Third Scope of Work for PROs constitute another example.
Since the release of hospital mortality rates by the Health Care Financing Administration (HCFA) beginning in 1986, researchers and others have focused on using aggregate mortality rates or aggregate rates of other adverse occurrences to screen institutions or patient populations (with adjustments for severity of case mix) and flag possible institutional quality-of-care problems. This approach has generated a considerable literature in a comparatively short time (Dubois et al., 1987a, 1987b; Dubois, 1989; Daley et al., 1988; Jencks et al., 1988; Kahn et al., 1988; OTA, 1988; Chassin et al., 1989; Ente and Lloyd, 1989; Fink et al., 1989; Hannan et al., 1989) and is reviewed more thoroughly in Chapter 6, Volume II.
The above classifications do not imply that these groupings are mutually exclusive or in conflict. Criteria sets can be difficult to categorize, and it is probably not productive to draw distinctions too finely. The labels are less
important than the purposes for which criteria sets are used, individually or together.
Case-finding screens, for instance, can be used in conjunction with either appropriateness guidelines or patient evaluation and management criteria. Screens are an initial, easily applied mechanism to locate cases for more detailed review. Items included in such screens could be selected from the more easily identified or discrete elements of a set of appropriateness or patient management guidelines. Cases failing the screen would then receive in-depth review against the more detailed guidelines, thus linking these different types of criteria sets in a review continuum.
For example, one element of an evaluation-management criteria set for hypertension, such as documentation in the medical record of a funduscopic examination of the eye, might serve as a case-finding screen that nonphysician reviewers could apply. Regardless of the element or elements used as screens, the in-depth review might draw on the complete set of evaluation criteria. Traditionally, in-depth assessment has consisted of subjective “implicit” review by peer physician reviewers, but evaluation guidelines might well serve as an objective aid to their efforts.
Traditional screens, whether based on sentinel adverse occurrences or elements of the process of care, have focused more on misuse of medical technology in the sense of poor technical quality than on problems of overuse or underuse of technology. Likewise, outcome data used to screen for statistical outliers are directed primarily at poor technical quality of services rather than at overuse or underuse of care. Screens adapted from appropriateness guidelines might complement existing screens by focusing on services performed for a clearly inappropriate indication.
All these criteria sets can be used in different contexts for different purposes. Three major purposes are to educate practitioners, to educate and empower consumers, and to establish minimum standards of care for use in quality-of-care review. Such reviews may be prospective, concurrent, or retrospective.
Third-party payers and others hope that certain types of criteria, especially appropriateness guidelines, can be used to reduce the costs of medical care. This study, however, differentiates quality of care from cost containment. To the extent that using such criteria reduces overuse of care, costs may be lower. The application of criteria also may identify underuse of services, and this could increase expenditures, at least in the short term. This chapter and this report focus on the quality-of-care applications of criteria sets, not their uses for cost control.
The desirable characteristics of a criteria set may vary somewhat according to its use. For example, guidelines used to educate health professionals almost surely need to be different from those used to review care. Complex, comprehensive algorithmic criteria useful for educational purposes might be difficult to apply in the emergency care of acutely ill patients (when speed and parsimony are important) or in retrospective review (where brevity is desirable).5 Criteria for retrospective review may need to be different in some respects from those used for concurrent or prospective review. The same may be true for criteria for internal versus external review.
Greenfield (1989) discusses the differences between prospective and retrospective algorithms. Prospective algorithms are directive because care has not yet been rendered. They must be logically complete, include rare diagnoses and unlikely events, have a narrower range of options, and be independent of medical records. Retrospective algorithms, by contrast, review care already delivered. They tend to be used as screens for further review and thus do not need to be logically complete. They have a more extensive range of options to allow for variation in clinical practice, and they depend on information documented in the medical record.
The features of a criteria set may also differ by level of review, such as whether they are to be used for the initial screening by nonphysician reviewers or for in-depth physician review. If a criteria set is intended to support making judgments about individuals, individual cases, or individual episodes of care, then several attributes such as sensitivity, specificity, reliability, and validity are much more important than if the criteria are simply going to physicians or to patients for educational purposes.
The desirable attributes of a criteria set will also vary according to the type of criteria set. For example, whether they are manageable for nonphysician reviewers or whether they are easy to adapt for use by computer may be especially important for case-finding screens. By contrast, whether criteria sets have built-in flexibility or are demonstrably acceptable to professionals may be more important for technology-specific or patient management guidelines.
As a starting point for discussing desirable attributes of criteria sets, the IOM staff prepared an extensive list of possible attributes based on review of the limited literature on guideline development and technology assessment (Eddy, 1987, 1988, forthcoming; Brook, 1988a, 1988b; Greenfield, 1988; Lewin and Erickson, 1988; PPRC, 1988a, 1988b, 1989; Brook et al., 1989). The final list of general attributes as modified by the expert panel appears in Table 10.1. This section defines the basic concepts behind the short labels for attributes that are used to simplify discussion.
Attributes can usefully be divided into two basic categories: substantive (or structural) attributes and implementation (or process) attributes. Substantive attributes relate to inherent characteristics of a criteria set. Implementation attributes focus on the processes of developing and applying a criteria set.
In the category of substantive attributes are concepts such as sensitivity, specificity, and predictive value. Sensitivity refers technically to the likelihood that a case will be identified as deficient given that it really is deficient, where deficient care is measured by some outside “gold” standard that reviews all care provided. Specificity refers to the likelihood that truly good care will be identified. The term predictive value is defined as the proportion of cases identified by screens or other criteria as presenting quality problems that subsequently prove to be true quality problems. It takes into account the prevalence of the quality problem being investigated as well as the screen’s sensitivity.6 The traditional computational definitions are shown Figure 10.1.
These terms have generally been used in the context of case-finding screens to measure how frequently the screen detects cases of deficient care for further review (sensitivity) while passing over cases of adequate care without triggering review (specificity). A screen or criterion has poor specificity if it flags a lot of cases for review when the care was satisfactory. This wastes time and money and leads to considerable frustration on the part of reviewers. Conversely, a screen or criterion has low sensitivity if it misses a lot of cases where care was poor. This means it is ineffective for its intended purpose. (Both these criticisms have been leveled at the case-finding generic screens used by the Medicare PROs; see Chapter 6.)
Sensitivity and specificity are also important attributes for technology-specific or evaluation-management guidelines. Indeed, with some modifications, these concepts can be applied to all three types of criteria sets. The sensitivity of technology-specific and patient management guidelines refers to their ability to detect and deal with all potential cases of inappropriate or deficient care. Their application should lead to the identification of most cases of inappropriate or poor quality care with high sensitivity. For instance, in retrospective review of care of patients with chest pain, sensitivity refers to the likelihood that the quality measure correctly identifies deficient care if the physician does not follow an indicated step in the guidelines, such as admitting a patient in cardiac shock.
Reliability requires that a criteria set be appropriate, and generate consistent results for all user groups for which it is intended and that it do so time and time again. Reliable criteria relating to, for instance, a cardiovascular procedure or problem must produce the same decisions or evaluation
TABLE 10.1 General Attributes of Criteria Sets: Final List
Attribute | Definition or Explanation |
Substantive and Structural Attributes | |
Sensitivity | High “true positive rate” in detecting deficient or inappropriate care |
Specificity | High “true negative rate” in passing over cases of adequate care |
Reliability | Known to produce same decisions or evaluations when applied by the user groups for which the criteria set is intended |
Validity | Based on outcome studies or other scientific evidence of effectiveness |
Documentation | A. Documents methods of development and cites literature (including estimates of outcomes) B. Documents how reliability was established |
Patient Responsiveness | Allows for eliciting or taking account of patient preferences |
Flexibility | Respects the role of clinical judgment, with “clinical judgment” explicable |
Clinical Adaptability | Allows for or takes into consideration clinically relevant differences among different classes of patients; population to which criteria apply is specified |
Inclusiveness | Covers all major foreseeable clinical situations and full range of clinical problems |
Concordance | Reflects consensus of professionals with extensive experience in field, with input from academic and nonacademic practitioners, generalists and specialists |
Acceptability | Acceptable to majority of professionals |
Clarity | Written in unambiguous language; terms, populations, data elements, and collection approach clearly defined |
Appropriateness | Specifies appropriate, inappropriate, and equivocal indications (procedure and technology appropriateness guidelines) |
Implementation and Process Attributes | |
Pretesting | Guidelines are tested before implementation |
Dynamism | Mechanism and commitment exists for reviewing and updating criteria sets to incorporate new information and cover new situations |
Evaluation | Mechanism exists to review and evaluate outcome or impact of guidelines |
Comprehendability | A. Format understood by nonphysician reviewers B. Format understood by practitioners C. Format easily understood by patients/consumers |
FIGURE 10.1 Computational Definitions of Sensitivity, Specificity, and Predictive Value
whether applied in the cardiology department of a university hospital or in a community setting. Similarly, case-finding criteria should identify the same kinds of cases regardless of who uses them.
For the purposes of this discussion, the attribute validity relates to outcomes and scientific evidence of effectiveness (if not efficacy). That is, a criteria set that contradicts clinical research data on effectiveness of health services, ignores them, or misuses them is not valid (and can be useless or even harmful). A valid criteria set should be based on or related to studies of patient outcomes or other scientific evidence of effectiveness to the extent such evidence is available.
This definition can be expanded to make more explicit the type of evidence used (e.g., randomized clinical trials or expert consensus) or to grade the quality of the supporting evidence. The type of evidence that is appropriate or available may vary considerably depending on the particular technology or problem in question. Some medical interventions have been heavily investigated, for instance, aspirin in the prevention of stroke in patients with transient ischemic attacks (TIAs) and coronary artery bypass grafting in patients with angina; others have not, for example, hysterectomy. Where the criteria begin with an outcome (as in many case-finding screens), there should be an effort to link the outcome to an identified process.
Whatever form the supporting evidence might take, the nature of that evidence should be made very explicit because users need to be able to assess the product. Thus, an important aspect of a criteria set’s validity is clear documentation. Documentation should include (A) information about how the criteria were developed, including the literature on which it is based and what is known or not known about expected outcomes, and (B) how reliability of the criteria set was established.
Patient responsiveness refers to whether the guidelines have some mechanism for eliciting and taking into account patient preferences and values. In some chronic conditions, such as prostatic hypertrophy, it is not clinically obvious whether surgical intervention is the best course. For some patients, the probability of undesirable side effects of surgery might outweigh the symptoms. Patient values should be included, for instance, in decisions to forego potentially disabling or disfiguring interventions such as life-prolonging chemotherapy that may cause blindness as a side effect.
To make certain that guidelines are responsive to patient values and preferences, one might incorporate what some call patient decision nodes. For example, for an individual with chronic obstructive pulmonary disease, the node might be whether the use of a respirator had been discussed with the patient. For others, the patient-responsiveness elements might be reflected in a footnote listing reasons for making an exception to the guidelines. For instance, although a given medicine might prolong the life of a patient with
acquired immunodeficiency syndrome, it may cause blindness and be refused by the patient. Eliciting patient preferences is not routine at present, but it is a worthwhile goal that is consistent with the overall strategy for quality assurance advanced in this report.
Flexibility reflects the extent to which a criteria set identifies and specifies exceptions to criteria. Quality review can be seen as a three-step process: (1) application of an initial screen, (2) in-depth or peer review, and (3) appeal. Flexibility may not be an important feature of an initial screen applied as a case-finding mechanism; the criteria at this initial stage should be clear (although exceptions may be stated). Flexibility is, however, an important aspect of the secondary, in-depth review process triggered by the screen, and substantial allowance should be made for clinical judgment at this stage. The less specific are guidelines used to aid secondary review, the greater the role of clinical judgment.
Particularly in areas of uncertainty in clinical medicine, greater leeway in decision making should be afforded to the practitioner. In the case of criteria sets for clinical management, psychosocial and interpersonal concerns justify preserving considerable discretion for clinical judgment. Too much leeway for clinical judgment, however, can undermine the criteria-setting effort. For this reason, guidelines should anticipate and make provision for the more common clinical exceptions and variations. When clinical judgment is invoked, the reasoning should be accurately described by the practitioner or reviewer. Such a situation might occur when an idiosyncratic patient factor or an extenuating circumstance that had not been taken into account in the criteria interacts with the characteristics of a specific patient to make that case an exception.
Whereas flexibility deals with the more idiosyncratic cases, clinical adaptability means that the criteria set takes into account the predictable clinically relevant differences among classes of patients. The criteria set should then specify the classes to which it is intended to apply. These classes may be based on age, sex, diagnosis, surgical risk, problem severity, or other factors.
Clinical adaptability is distinct from inclusiveness, which means that the criteria set applies to a large proportion of the patients to which it is addressed. For example, inclusive carotid endarterectomy guidelines would not be limited to assessing the need for that procedure in patients with transient ischemic attacks, but rather would address the full range of potential indications, from asymptomatic carotid bruits to strokes in evolution. Inclusiveness further implies that criteria sets collectively should cover the full range of surgical and medical problems encountered in the hospital or other setting being reviewed.
Several of the foregoing characteristics—in particular, patient responsiveness, flexibility, and clinical adaptability—underscore the point that
criteria cannot be unremittingly rigid. If exceptions cannot be specified in the criteria set itself, then there must be an opportunity for recourse or appeal in special cases. The panel characterized this concept of appealability as an implementation or process attribute, rather than an inherent characteristic of a criteria set per se, and it is therefore discussed later.
Concordance can be considered an attribute either of substance or of implementation. It embodies the important concept that guideline development should reflect a consensus of professionals with extensive and appropriate clinical experience. The body that formulates guidelines should not be limited to “experts,” but should have input from experienced generalists as well as specialists and from nonacademic as well as academic practitioners. This should be reflected in the documentation for the criteria set.
Acceptability refers to whether the guidelines are satisfactory and credible—that is, seem to have at least face validity—to the professionals who will be using them. It is differentiated from concordance in that it focuses on acceptance by the target user group as opposed to those formulating the guidelines.
Clarity requires that criteria sets can be easily understood and consistently interpreted and applied. This is the everyday version of reliability as discussed above. Clarity calls for unambiguous language and specific definitions of the terms, data elements, and the target population. Terms such as “persistent abdominal distress despite appropriate therapy” are too vague to be meaningful in a list of appropriate endoscopy guidelines. Similarly, vague language in some carotid endarterectomy guidelines include “transient speech dysfunction,” “altered body sensation,” or “angiography confirming an atherosclerotic lesion in the appropriate carotid artery.” Table 10.2 includes more such examples of unclear language and some suggested alternatives that are less ambiguous.
Appropriateness refers to whether the criteria explicitly describe (1) what actions are clearly appropriate, (2) where there is divergence of or absence of evidence, and (3) what actions are clearly inappropriate. Guidelines should indicate whether these distinctions are based on scientific evidence or a preponderance of expert opinion. In addition, alternative approaches that may be appropriate for diagnosing or treating a problem should be listed. The terms indicated, equivocal, and not indicated are viewed by some practitioners as more neutral and therefore preferable terms.7
Before implementing criteria of all sorts, formulators should pretest them on a small scale to determine their effect on providers and patients. Pretesting also provides an opportunity to modify language and format and thus
TABLE 10.2 Examples of Vague and Clear Language for Criteria Sets
Vague | Clear |
Unexplained weight loss | Weight loss of >15 percent of body weight during the preceding 4 months |
Persistence of bleeding (or any sign or symptom) | Bleeding that continues for four or more months following initiation of therapy with oral contraceptives |
Severe bleeding | Drop in hematocrit of >6 percent in less than eight hours Bleeding that has required transfusions on two or more occasions in the past six months Blood volume depletion greater than 2,000 ml Bleeding that requires the acute replacement of blood volume with two or more units of whole blood |
Appropriate trial of therapy | At least one month of treatment with aspirin at a dose of 325 mg every other day or more Quadriceps strengthening exercises performed for at least 20 minutes per day, five days per week for at least six weeks Treatment for 10 to 14 days with penicillin or erythromycin at a dose of 250 mg four times a day Wearing a splint at least 12 hours a day for two months or more |
Upper abdominal distress | Epigastric pain occurring one-half to two hours after eating Epigastric pain of more than one hour duration Discomfort consistently related to eating certain foods |
Deteriorating (or unstable) vital signs | Blood pressure less than 90 mm Hg systolic, with a drop of at least 15 percent from average level of past 24 hours Temperature >101° F Pulse >120 |
Significant organ involvement | A process that causes loss of >20 percent of the function of an organ system (e.g., blood urea nitrogren, vital capacity, or left ventricular ejection fraction) |
Evaluation should be efficient yet thorough | Evaluation should include history of exposure, physical examination of systems at risk, and laboratory testing capable of changing diagnostic logic |
Positive stress test | A 1 mm or more horizontal or downsloping ST segment depression during exercise of patient who has normal electrocardiogram at rest |
improve reliability. Pretesting can provide information about the extent to which a perceived problem even exists.
The attribute dynamism emphasizes a commitment to and a mechanism for ongoing review and updating of the criteria. Where controversy is high or change is rapid, reassessments and revisions will need to be more frequent. Dynamism means building feedback into the implementation system. That is, guidelines or interpretive material need to be modified to accommodate new scientific findings and lessons learned about practitioners’ use of the criteria once they have been disseminated.
Closely related to dynamism is the need for periodic evaluation. A mechanism for evaluating the impact of criteria should be built into the implementation plan. Pretesting, dynamism, and evaluation all refer to the need for an iterative process for building and refining criteria sets.
Although the medical logic underlying guidelines may be very complex, their structure and elements should be comprehensible. For instance, complex algorithmic guidelines should probably read from top to bottom and from left to right.8 If criteria cannot be understood by the intended users, whether they are trained reviewers, practitioners, or patients, they are likely to be improperly used or not used at all.
Along the same lines, criteria should be manageable for physician and nonphysician reviewers and for practitioners. Particularly with procedure and management guidelines, the practitioner should be able to internalize the practice standards rather than have to refer constantly to the written criteria.
The information-gathering process should be as nonintrusive as possible. Clinicians frequently complain that existing utilization and pre-procedure review programs require them to spend an inordinate amount of time interacting with reviewers. Although prior review necessarily entails some such interaction, well-designed and pretested criteria sets should keep interactions with treating physicians to the essential minimum.
Appealability is the extent to which exceptions to even the best criteria can be allowed. The point here is that each patient is unique, and thus even highly valid, sensitive, and specific criteria will not be appropriate for every case in every conceivable situation. A means by which this patient uniqueness can be taken into account is essential.
Another desirable attribute of criteria sets is the relative ease with which information for review can be obtained, that is, feasibility. There can be a delicate balance between avoiding unduly burdensome data collection and vitiating the pressure needed to raise the quality and availability of information in medical records and other sources. Only when the cost of acquiring the information is clearly out of balance with the value of the information should the criteria set be designed to exclude those elements.
Two last dimensions of guideline implementations are computerization
and executability. Computerization refers to construction of criteria sets so that they can be translated into a computerized format for use by reviewers or practitioners as appropriate.9 Guidelines are executable if they include specific instructions for implementation. This may require that a data collection format (abstracting form) be included. If a scoring system is used, the criteria set should include clear instructions for scoring and quantifying results. For instance, if several process-of-care evaluation criteria are to be combined into a single “index” score, the method of aggregation, whether simple addition or more complex arithmetic steps, should be described. If some variables are to be weighted more heavily than others, that must be clearly stated. Even if each variable in the index score is to be given equal weight (i.e., has a weight of 1), that should be specified.
Although each attribute defined in Table 10.1 is important, no criteria set is likely to conform equally well to all these requirements. Which, then, are the more critical?
As the expert panel considered this question, they arrived at somewhat different rankings for the three types of criteria sets, as reported in Table 10.3A. Each column in the table shows the attributes rated 4.5 or higher on a scale that rated 5 as most important and 1 as least important. (The data on which this table is based are found in Table A.4 of Appendix A. Because of possible interest in the “second tier” of desirable attributes, we have also included those attributes with ratings from 4.0 through 4.4 in Table 10.3B). The order of the attributes in the table reflects the frequency of their appearance across criteria sets, rather than how panelists rated them for each specific set.
Clarity was identified as a key attribute for all three types of criteria sets, and panelists repeatedly stressed the need for clarity during their discussions. The generality, vagueness, and convoluted nature of many existing criteria undoubtedly contributes to the emphasis, first, on clear and complete but also simple and parsimonious definitions of terms and, second, on straightforward descriptions of target populations, medical conditions, and patient variables.
Validity—scientific evidence of effectiveness—was identified as the most important attribute for technology- and procedure-specific appropriateness guidelines. This evidence, insofar as possible, should include assessments of patient outcomes and comparisons of principal alternatives, for example,
TABLE 10.3A Key Attributes of Criteria Sets, by Type of Criteria Set and Type of Attribute
| Type of Criteria Set | ||
Attributes | Appropriateness Guidelines | Evaluation and Management | Case-Finding Screens |
Substantive |
|
|
|
| Clarity Validity | Clarity Validity | Clarity |
| Sensitivity Documentation-Aa |
| Sensitivity |
|
| Reliability Flexibility Clinical Adaptability Concordance |
|
Implementation |
|
|
|
| Appealability |
| Appealability Comprehendabilityb Evaluation |
|
| Dynamism Executability |
|
aDocuments methods of development and cites literature. bComprehendability by nonphysician reviewers. NOTE: See Table 10.1 for definitions of attributes. Criteria in this table are listed in order of frequency with which they appeared in the three tables of criteria sets (e.g., for all three criteria sets, for only two, or for only one). The decision cutoff for inclusion in this table was a mean score [on a scale of 1 (least important) to 5 (most important)] equal to or greater than 4.5 in the second round of ratings by the expert panel (see Appendix, Table A.4). | |||
watchful waiting versus procedural intervention. When the evidence changes, the guidelines should be evaluated for continuing validity. The panelists indicated that many current criteria sets are not adequately validated and that validity must be emphasized as the field of criteria development progresses. In a related point, the panelists stressed that the evidence on which the guidelines are based must be clearly documented along with the processes by which criteria sets were created.
Another substantive attribute the panelists emphasized was sensitivity. This, to repeat, is the ability to identify cases appropriate for intervention and to avoid cases where the intervention would produce little or no net benefit or might harm the patient.
Among the process attributes, the panelists saw appealability as most important. Even if a set of guidelines includes detailed provisions for different classes of patients and clinical situations, there must be a means for “real time” responses to questions about how guidelines should apply to patients and situations that fall outside these classifications.
TABLE 10.3B Attributes of Criteria Sets, Rated Lower Than the Key Attributes, by Type of Criteria Set and Type of Attribute
| Type of Criteria Set | ||
Attributes | Appropriateness Guidelines | Evaluation and Management | Case-Finding Screens |
Substantive |
|
|
|
| Clinical Adaptability Concordance Flexibility Reliability |
| Clinical Adaptability Concordance Flexibility Reliability |
|
Appropriateness | Appropriateness |
|
|
Documentation-Ba Specificity |
|
|
|
| Sensitivity Documentation-Ad |
|
Implementation |
|
|
|
| Comprehendabilityb Pretesting Dynamism | Comprehendabilityc Pretesting | Comprehendabilityb Pretesting Dynamism |
| Comprehendabilityc Evaluation | Comprehendabilityb Evaluation |
|
|
|
| Manageability, MD Manageability, non-MD |
aDocuments how reliability was established. bComprehendability by physicians. cComprehendability by nonphysician reviewers. dDocuments methods of development and cites literature. NOTE: See Table 10.1 for definitions of attributes. Criteria in this table are listed in order of frequency with which they appeared in the three types of criteria sets (e.g., for all three criteria sets, for only two, or for only one). The decision cutoff for inclusion in this table was a mean score [on a scale of 1 (least important) to 5 (most important)] equal to or greater than 4.0 but less than 4.5 in the second round of ratings by the expert panel (see Appendix, Table A.4). | |||
Clarity, flexibility, and clinical adaptability were rated as critical attributes for evaluation and management criteria, with reliability, validity, and concordance also rated as very important. For evaluation and management criteria sets, flexibility and clinical adaptability are central. The first attribute underscores the role of clinical judgment and expertise in making case-by-case evaluations of the care that individual practitioners render to individual patients. The second concept focuses on the need for the criteria sets to distinguish, when possible, the differences in classes of patients or clinical situations that warrant differences in the application of the criteria. Both attributes emphasize the continuing significance of peer review for the process of quality assurance.
The more advanced kinds of criteria sets can build in a considerable amount of flexibility and adaptability and thereby reduce the workload for peer reviewers. For instance, a simple and mechanically applied criteria set might flag the failure to do a funduscopic examination on a blind patient as a quality problem, and the peer review process would have to be activated to determine that no deficiency existed in fact. A more sophisticated criteria set would specify such foreseeable exceptions to basic evaluation criteria and allow reviewer judgment to be reserved for less straightforward cases.
The emphasis placed on flexibility and adaptability may vary depending on whether one is developing and applying prospective education protocols or algorithms or working with retrospective assessments of actual care, where the need for flexibility in peer review may be especially significant. Although the panelists did not mention it explicitly, responsiveness to patient preferences and sensitivity to the physician-patient relationship are also aspects of good clinical judgment and peer review.
More generally, criteria sets must recognize valid alternative approaches to patient evaluation and treatment. By analogy to the technology- and procedure-specific guidelines, developers of criteria sets could categorize different approaches to patient evaluation and management as being appropriate, equivocal, or inappropriate. A more-or-less equivalent tactic would be to assign grades of superior, acceptable, or unacceptable to different combinations of patient evaluation and management steps. Some patient management problems might be assessed as a series of discrete decisions, rather than as an overall process. Thus, an assessment would focus not on “how to treat a patient with a headache” but rather on “appropriate indications for use of Fiorinal” and other specific services that might be combined to treat a headache. In all these situations, however, the criteria must reflect concordance among practitioners with clinical experience.
Among implementation attributes, dynamism and executability stand out
as important for patient evaluation and management criteria. In many areas of care for the elderly, standards of care of a decade ago might serve quite well today. Nevertheless, the advance of knowledge in medicine is swift, and process-of-care criteria risk obsolescence if they are not systematically reviewed and updated as appropriate. Outdated criteria sets may yield no better or even worse decisions about the adequacy of patient care than implicit peer review alone.
Finally, the developers of evaluation and management criteria must provide clear guidance about how the criteria sets are to be used, especially how elements in the criteria sets are to be quantified, scored, weighted, and aggregated. This is particularly true when criteria sets will be used for formal quality-of-care evaluation rather than simply education. Review decisions that label care as exemplary, acceptable, or substandard can have a considerable impact on practitioners’ and institutions’ reputations and financial well-being.
Many types of case-finding mechanisms exist, including the kinds of generic screens used by PROs and the specialty-specific clinical (outcome) indicators developed by the Joint Commission. In this study, the focus is on the former. Other than clarity, the only critical attribute the expert panel identified for generic screens was sensitivity, intended here to mean “a high true positive rate in detecting deficient care.”
Because generic screens are meant to flag cases for in-depth review based on more rigorous criteria, it is not surprising that the panel emphasized the implementation aspects of case-finding screens more than their substance. Among the implementation attributes, appealability ranked high, reflecting the view that case-finding screens must be backed by a well-developed system of peer review that includes adequate avenues for appealing initial negative decisions. In addition, the case-finding screens must be comprehensible to nonphysician reviewers, and their usefulness must be periodically evaluated. The disappointing history of the generic screens in the Medicare program, as described in Chapters 6 and 9, probably accounts for the importance accorded these attributes.
Criteria development should follow several basic stages, which have been described in greater detail by Eddy (1988) and others (Chassin, 1988; PPRC,
1989). First, a group of experienced clinicians should be convened (in person or by telecommunication) to review the relevant literature and the existing criteria sets in the area under consideration. As previously indicated, this group should include generalists as well as specialists and nonacademic as well as academic practitioners. The best of the existing criteria sets can be used as a starting point, with modifications and refinements based on the literature and the attributes discussed above.
Because evidence on which the experts’ judgments are based should be clearly documented, a second requirement is a thorough literature review. Procedurally, this can and perhaps should precede the convocation of any expert group. The literature review must specify both where data exist and where they are lacking.
Whether a specific approach to the literature review should be used is not a settled matter. Opinions differ about the value of meta-analysis (a technique for quantitative synthesis of multiple studies with different methods and findings) over traditional literature reviews, and the merits of including unpublished information in the analysis are also debated. Selection of particular approaches will depend on the issue under consideration and the quality and extent of the available literature. It may not be productive to attempt to specify in advance rules as to the specific form and content of the literature review. Some experts suggest that the criteria produced through such a literature review and deliberation by a group of experienced practitioners should be circulated to a second set of experts for further review before any attempt is made to implement them; this suggestion, too, occasions debate.
Third, the criteria should be pilot-tested before they are put into general use. Fourth, a mechanism should be established to evaluate the impact of the criteria on patient care after general implementation. This might involve a small number of systematic evaluation projects (which conceivably could use a randomized controlled trial approach) as well as ongoing monitoring.
Whether provision should be made for some form of consumer input into the process of criteria development remains an open question (as does provision for payer input). If consumer representation is decided on, it might be most appropriate at the level of a steering or oversight committee that determined general strategy rather than at the technical level. Consumer representatives must be made familiar with the issues. As an alternative to direct consumer representation, provision might be made for seeking the assistance of public opinion specialists.
With respect to strategies for criteria development, in general, priority should be given to high-risk, high-cost, high-volume, or problem-prone serv-
ices. Examples include carotid endarterectomy (high risk), liver transplant (high cost), Pap smears (high volume), or nursing home care (problem prone). Emphasis might be placed on guidelines for technologies or management problems that may result in serious outcomes such as death or severe disability, although these may be areas where a clear consensus on standards does not exist. The absence of broad agreement may require greater flexibility in criteria, but it does not argue against the adoption of criteria. In fact, guidelines may be most useful where practices are most divergent, partly because they will make the divergence more obvious and partly because they will highlight where practice and data conflict.
Nonetheless, a headlong rush to formulate guidelines for the sake of having guidelines is not desirable. Criteria development is a vital component of quality assurance, but the necessary time and resources must be committed to this process if criteria that truly enhance quality of care are to emerge.
The expert panel mentioned the need for cost-consciousness in developing and promulgating criteria. Resources to provide care, to do research, and to support quality assurance are scarce. The trade-offs can be difficult, and criteria development may have a lower priority. Thus, efficient mechanisms for building on existing criteria, disseminating and sharing criteria across organizations, and minimizing duplicate efforts are necessary. As with any endeavor in health care delivery, the value of the product—that is, the quality of the work in the face of its cost—needs to be optimal.
One must avoid, however, too narrow a view of the utility of good criteria and the contributions they can make to good patient care, research, and quality assurance. For instance, appropriateness guidelines can help curtail use of unnecessary services, potentially freeing resources for provision of additional care where it is needed. Patient management criteria can guide patient care more appropriately. Process and outcome criteria are the central issues of quality assurance. Finally, the process of developing criteria itself can highlight areas of medical practice in need of further research. Again, affordability is a legitimate concern, but excessive frugality in the resources devoted to criteria development may prove to be a false economy.
By the end of the expert panel discussion, it had become clear that the group was not entirely happy with the large number of atomistic attributes it had isolated and defined. Consequently, there was some sentiment for
TABLE 10.4 Possible “Larger Clusters” of Attributes for Quality-of-Care Criteria
Substantive Attributes | Implementation Attributes |
Scientific Grounding | Implementation |
Reliability | Feasibility |
Validity | Computerization |
Documentation | Executability |
Latitude (clinical and patient boundaries and judgment) | Ease of use |
Comprehendability | |
Flexibilitya | Manageability |
Appealabilitya | Nonintrusiveness |
Patient Responsiveness | Appealability |
Clinical Adaptability | Flexibilitya |
Inclusiveness | Appealabilitya |
Design | Dynamism |
Clarity | Pretesting |
Concordance | Dynamism |
Acceptability | Evaluation |
Appropriateness |
|
Efficiency |
|
Sensitivity |
|
Specificity |
|
aBoth flexibility and appealability might be built into the criteria or their application and, depending on the type of criteria set, at various stages of review. | |
identifying larger groupings of attributes that would convey the basic ideas of scientific basis, ease of use, and so forth. One such grouping is proposed in Table 10.4.
The table groups the several substantive attributes into four major categories related to their scientific grounding, latitude, design, and efficiency. The implementation attributes fell into four categories as well: implementation, ease of use, appealability, and dynamism. The two attributes of flexibility and appealability were seen as important to both structure and process, and thus appear in both sets.
The panelists raised several issues that warrant considerable attention in the future but that they did not have time to pursue. First was the need for parsimony in articulating attributes, that is, for grouping attributes so that they are more easily understood and applied. The clusters in Table 10.4 offered earlier are one approach for doing this.
Second, the attributes for good criteria sets should be perceived as goals, not absolutes. Although most, if not all, criteria sets today would fail if evaluated against these attributes, they should be revised, not discarded. For instance, if the principal problem with guidelines today is that they are too vague or poorly written, then they can be more precisely and clearly stated. If the criticism is poor documentation, this fault, too, can be corrected.
The third point was that not all these attributes can be maximized simultaneously in any individual criteria set. The importance of an individual attribute may depend on the type of criteria set in question and its purpose. Furthermore, whether such criteria are used chiefly for cost containment or for quality assurance may change the weights given to different attributes. We tried to highlight the key attributes for criteria sets used in quality assurance programs (Table 10.3), but a good deal more work needs to be done in this area as the practice guidelines, effectiveness, and outcomes research efforts of the federal government and other parties expand.
Fourth, it is critical to ask about the impact of guidelines. Have they been shown to improve the quality of patient care? Do they misclassify cases of good or poor care? Have they prompted malpractice lawsuits? Do patients or clinicians get around them somehow? Do they save money? Again, with respect to the federal government’s initiatives in this area, evaluation is essential.
Finally, some types of criteria, especially patient management algorithms, serve educational and decision-making needs of both practitioners and patients. Criteria sets can help practitioners educate themselves and can help patients and their families make better choices about when to seek care and how to assess diagnostic or therapeutic options. The concern for patient preferences is especially pertinent, given the emphasis on desired health outcomes in the committee’s definition of quality of care.
Development and use of guidelines for a geriatric population calls for great sensitivity to their special characteristics, such as multiple chronic problems and frailty. Moreover, in quality assurance, moving from single discrete procedures to episodes and longitudinal care is difficult. However, this broad approach is especially important for an elderly population.
The theme of education permeates the enterprise of quality assurance. The challenge is to define appropriate and effective ways to use medical care and to apply this knowledge whether one is teaching medical students, house staff, established clinicians, or oneself. This study accords considerable importance to fostering internal quality improvement programs for providers and practitioners, so that the external Medicare quality assurance program can be targeted more effectively and efficiently. Making good
quality-of-care criteria a paramount concern for the federal quality assurance effort will strengthen both these processes.
One congressional charge to the study committee was to develop prototype criteria and standards. Criteria sets vary considerably in their usefulness, depending on their objectives, their method of development, and the skills and experience of their creators. Three main kinds of criteria sets have evolved to meet different needs. Appropriateness guidelines describe indications for specific medical interventions and technologies, ranging from surgical procedures to diagnostic studies. Patient care evaluation and management criteria are intended to assess or guide the management of outpatient or inpatient medical problems. Case-finding screens identify potential quality-of-care problems that warrant further evaluation.
Because of the current interest in these types of criteria sets (particularly practice guidelines), the IOM study committee believed its best contribution was to provide preliminary guidance for their development, that is, to clarify the key “standards for standards.” To this end, it convened a panel of experts in guideline formulation to elucidate the key attributes or characteristics of sound criteria sets and acceptable methods for constructing them. The expert panel reviewed the literature, completed ratings of proposed attributes of criteria sets, and attended a two-day meeting at which they examined examples of existing criteria sets and revised and re-rated attributes.
Attributes of good criteria sets can be divided into two categories: substantive (or structural) attributes related to the inherent characteristics of a criteria set and implementation attributes focused on the process of developing and applying a criteria set. Substantive attributes might be further categorized as those related to scientific grounding, latitude, design, and efficiency. Implementation attributes can be categorized as those concerning implementation, ease of use, appealability, and dynamism.
For each of the three types of criteria sets considered, the panel discussed the range of attributes and characteristics that might be considered desirable or necessary; ultimately, they arrived at somewhat different ratings of key attributes for each set. Among the substantive attributes, clarity (e.g., unambiguous language and clear definitions of terms) was a key attribute for all three types of criteria. Validity (criteria based on outcome studies or other scientific evidence of effectiveness) was considered very important for both appropriateness guidelines and patient evaluation and management criteria. Sensitivity (a high true positive rate in detecting deficient or inappropriate care) was rated very high for appropriateness guidelines and case-finding screens. Documentation (of methods of devel-
opment and literature used) was voted very important for appropriateness guidelines. Several other characteristics were considered highly important for the patient evaluation and management criteria sets as well, including reliability (known to produce the same decisions when applied by user groups for the purposes for which the criteria set was intended), flexibility (respects the role of clinical judgment), clinical adaptability (allows for or takes into account clinically relevant differences among different classes of patients), and concordance (reflects consensus of professionals with extensive experience in the field). Among the implementation attributes, appealability (allows for appeals process by professionals and patients) was rated highly important for both appropriateness guidelines and case-finding screens. Two additional attributes were considered critical for patient evaluation and management criteria sets: dynamism (mechanism exists to review and update criteria sets to incorporate new information and to cover new situations), and executability (includes instructions for scoring, quantification, and implementation). Finally, two other additional characteristics were voted as highly important for case-finding screens: comprehendability (format understood by nonphysician reviewers) and evaluation (mechanism exists to review and evaluate the outcome or impact of the screens).
The panel emphasized that these attributes of criteria sets should be understood as goals, not absolutes. Most, if not all, criteria sets existing today would fail if evaluated against these attributes, and not all attributes can be maximized simultaneously in any individual criteria set. The panel also called for further work in this area, highlighting the need for better information about the impact of these types of criteria sets on patient care and about the best way to use criteria sets to help practitioners, patients, and family members make better decisions about health care.
| mulation is a collective process in which many clinicians participate. The process may be more valuable ultimately for those involved in developing these instruments than for those receiving them; it establishes a process of ongoing internal review under detailed, generally accepted guidelines. The more complex protocols, however, may amount to little more than “restatements of medical texts” and prove unwieldy to use; carried to the extreme they may limit the clinician to practicing “cookbook medicine.” |
6. | Predictive value was suggested by some panel members as a separate attribute, distinct from sensitivity. The expert panel’s discussion of this area indicated that correctly identifying problems of defective care (i.e, predictive value positive, as defined in Figure 10.1) is an integral part of sensitivity, and members of the panel in fact used the term sensitivity in this broader sense. Thus, defining sensitivity as high true positive rate in the list of attributes shown in Table 10.1 and omitting predictive value as a separate attribute from the final list of attributes reflect this approach. |
7. | Not indicated should be distinguished from contraindicated. Not indicated means that the acceptable (or even equivocal) indications are absent. Contraindicated means that there is some supervening medical condition or reason for not doing something that would otherwise be indicated (such as use of aspirin for persons suffering from blood-clotting disorders, use of virtually any drugs during pregnancy, or certain tranquilizers for persons on antihypertensive drugs). |
8. | One panelist shared the following “criteria for critiquing a clinical algorithm.” Graphics should be (1) uncluttered, (2) read from top to bottom and from left to right, (3) use clear and consistent symbols, (4) provide clear and consistent referencing and numbering, and (5) include minimal redundant steps or boxes. For accuracy and logic, algorithms need (1) accurate definitions, (2) accurate and up-to-date information (e.g., on dosages, laboratory values, and diagnostic signs), (3) enumeration of all important steps in the medical care process, (4) annotation of all “essential” and “critical” actions or decision points, and (5) consistent and parsimonious logic (HCHP, 1986). |
9. | This is only recently being done in the quality and utilization review fields, especially for utilization management program. However, the translation of equivalent documents, such as survey questionnaires, into computerized forms (e.g., computer assisted personal interview or CAPI techniques) is much more highly developed. Clearly, this approach offers much for quality review methods. |
ACC (American College of Cardiology). Guidelines for Exercise Testing. (A Report of the American College of Cardiology and American Heart Association Task Force on Assessment of Cardiovascular Procedures (Subcommittee on Exercise Testing). Journal of the American College of Cardiology 8:725–738, 1986a.
ACC. Guidelines for Clinical Use of Cardiac Radionuclide Imaging. (A Report of the American College of Cardiology and American Heart Association Task Force on Assessment of Cardiovascular Procedures (Subcommittee on Nu-
clear Imaging). Journal of the American College of Cardiology 8:1471–1483, 1986b.
ASGE (American Society for Gastrointestinal Endoscopy). Appropriate Use of Gastrointestinal Endoscopy, Manchester, Mass.: American Society for Gastrointestinal Endoscopy, 1986.
Borgiel, A.E. Assessing the Quality of Care in Family Physicians’ Practices by the College of Family Physicians of Canada. Pp. 63–72 in Quality of Care and Technology Assessment. Lohr, K.N. and Rettig, R.A., eds. Washington, D.C.: National Academy Press, 1988.
Brook, R.H. Practice Guidelines and Practicing Medicine: Are They Compatible? Paper prepared for the Physician Payment Review Commission. Santa Monica, Calif.: The RAND Corporation, 1988a.
Brook, R.H. Quality Assessment and Technology Assessment: Critical Linkages. Pp. 21–28 in Quality of Care and Technology Assessment. Lohr, K.N. and Rettig, R.A., eds. Washington, D.C.: National Academy Press, 1988b.
Brook, R.H., Kamberg, C.J., Mayer-Oakes, A., et al. Appropriateness of Acute Medical Care for the Elderly: Analysis of the Literature. R-3717-AARP/ RWJ/RC. Santa Monica, Calif.: The RAND Corporation, 1989.
Chassin, M.R. Standards of Care in Medicine. Inquiry 25:437–453, Winter 1988.
Chassin, M.R., Kosecoff, J., Park, R.E., et al. Indications for Selecting Medical and Surgical Procedures—A Literature Review and Ratings of Appropriateness: Coronary Angiography. R-3204/1-CWF/HF/HCFA/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986a.
Chassin, M.R., Kosecoff, J., Solomon, D.H., et al. How Coronary Angiography Is Used. Journal of the American Medical Association 258:2543–2547, 1987.
Chassin, M.R., Park, R.E., Fink, A., et al. Indications for Selecting Medical and Surgical Procedures—A Literature Review and Ratings of Appropriateness: Coronary Artery Bypass Surgery. R-3204/2-CWF/HF/HCFA/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986b.
Chassin, M.R., Park, R.E., Lohr, K.N., et al. Differences among Hospitals in Medicare Patient Mortality. Health Services Research 24(1): 1–31, 1989.
Daley, J., Jencks, S., Draper, D., et al. Predicting Hospital-Associated Mortality for Medicare Patients. Journal of the American Medical Association 260:3617–3624, 1988.
Dubois, R.W. Hospital Mortality as an Indicator of Quality. Pp. 107–131 in Providing Quality Care: The Challenge to Clinicians. Goldfield, N. and Nash, D.B., eds. Philadelphia, Pa.: American College of Physicians, 1989.
Dubois, R.W., Brook, R.H., and Rogers, W.H. Adjusted Hospital Death Rates: A Potential Screen for Quality of Medical Care. American Journal of Public Health 77:1162–1167, 1987a.
Dubois, R.W., Rogers, W.H., Moxley, J.H., et al. Hospital Inpatient Mortality. Is It a Predictor of Quality? New England Journal of Medicine 317:1674–1680, 1987b.
Eddy, D.H. Clinical Policies. Pp. 47–54 in Proceedings. Standards of Quality in Patient Care: The Importance and Risks of Standard Setting. Invitational Conference, Council of Medical Specialty Societies, Washington, D.C., September 1987.
Eddy, D.H. Methods for Designing Guidelines. Paper prepared for the Physician Payment Review Commission. Durham, N.C.: Duke University, 1988.
Eddy, D.H. A Manual for Assessing Health Practices and Designing Practice Policies. (In collaboration with the Council of Medical Specialty Societies Task Force on Practice Policies.) Forthcoming.
Ente, B.H. and Lloyd, J.S. Taking Stock of Mortality Data. An Agenda for the Future . Proceedings of a 1988 Conference. Chicago, Ill.: Joint Commission on Accreditation of Healthcare Organizations, 1989.
Fink, A., Yano, E.M., and Brook, R.M. The Condition of the Literature on Differences in Hospital Mortality. Medical Care 27:315–336, 1989.
Greenfield, S. The Challenges and Opportunities that Quality Assurance Raises for Technology Assessment. Pp. 134–141 in Quality of Care and Technology Assessment. Lohr, K.N. and Rettig, R.A., eds. Washington, D.C.: National Academy Press, 1988.
Greenfield, S. Measuring the Quality of Office Practice. Pp. 183–200 in Providing Quality Care: The Challenge to Clinicians. Goldfield, N. and Nash, D.B., eds. Philadelphia, Pa.: American College of Physicians, 1989.
Greenfield, S., Cretin, S., Worthman, L.G., et al. Comparison of a Criteria Map to a Criteria List in Quality-of-Care Assessment for Patients with Chest Pain: The Relation of Each to Outcome. Medical Care 19:255–272, 1981.
Greenfield, S., Lewis, C.E., Kaplan, S., et al. Peer Review by Criteria Mapping: Criteria for Diabetes Mellitus. Annals of Internal Medicine 83:761–770, 1975.
Greenfield, S., Nadler, M.A., Morgan, M.T., et al. The Clinical Investigation and Management of Chest Pain in an Emergency Department: Quality Assessment by Criteria Mapping. Medical Care 12:807–904, 1977.
Hannan, E.L., Bernard, H.R., O’Donnel, J.F., et al. A Methodology for Targeting Hospital Cases for Quality of Care Record Reviews. American Journal of Public Health 79:430–436, 1989.
HCHP (Harvard Community Health Plan). Criteria for Critiquing a Clinical Algorithm. Unpublished mimeo. Boston, Mass.: HCHP, 1986.
IOM (Institute of Medicine). Medical Technology Assessment Directory. Washington, D.C.: National Academy Press, 1988.
Jencks, S.F., Daley, J., Draper, D., et al. Interpreting Hospital Mortality Data. The Role of Clinical Risk Adjustment. Journal of the American Medical Association 260:3611–3616, 1988.
Kahn, K.L., Brook, R.H., and Draper, D. Interpreting Hospital Mortality Data. How Can We Proceed? Journal of the American Medical Association 260:3625–3628, 1988.
Kahn, K.L., Roth, C.P., Kosecoff, J., et al. Indications for Selecting Medical and Surgical Procedures—A Literature Review and Ratings of Appropriateness: Diagnostic Upper Gastrointestinal Endoscopy. R-3204/4-CWF/HF/HCFA/ PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986a.
Kahn, K.L., Roth, C.P., Fink, A., et al. Indications for Selecting Medical and Surgical Procedures—A Literature Review and Ratings of Appropriateness: Colonoscopy. R-3204/5-CWF/HF/HCFA/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986b.
Kanouse, D.E., Brook, R.H., Winkler, J.D., et al. Changing Medical Practice Through Technology Assessment: An Evaluation of the NIH Consensus Development Program. R-3452-NIH. Santa Monica, Calif.: The RAND Corporation, 1987.
Kosecoff, J., Kanouse, D.E., Rogers, W.H., et al. Effects of the National Institutes of Health Consensus Development Program on Physician Practice. Journal of the American Medical Association 258:2708–2713, 1987.
Lehmann, R. Joint Commission Forum: Forum on Clinical Indicator Development: A Discussion of the Use and Development of Indicators. Quality Review Bulletin 15:223–227, 1989.
Lewin, L.S. and Erickson, J.E. Leadership in the Development of Practice Guidelines: The Role of the Federal Government and Others. Paper prepared for the Physician Payment Review Commission. Washington, D.C.: LEWIN/ICF, October 1988.
Lohr, K.N. Quality of Care for Respiratory Illness in Disadvantaged Populations. P-6570. Santa Monica, Calif.: The RAND Corporation, 1980a.
Lohr, K.N. Quality of Care in the New Mexico Medicaid Program (1971–1975). Medical Care 18:1–129 (January Supplement), 1980b.
Longo, D.R., Ciccone, K.R., and Lord, J.T. Integrated Quality Assessment. A Model for Concurrent Review. Chicago, Ill.: American Hospital Association, 1989.
Marder, R.J. Joint Commission Plans for Clinical Indicator Development for Oncology. Cancer 64:310–313, 1989 (Supplement).
Merrick, N.J., Fink, A., Brook, R.H., et al. Indications for Selecting Medical and Surgical Procedures—A Literature Review and Ratings of Appropriateness: Carotid Endarterectomy. R-3204/6-CWF/HF/HCFA/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986.
OTA (Office of Technology Assessment). The Quality of Medical Care: Information for Consumers. Chapter 5: Adverse Events. Washington, D.C.: U.S. Government Printing Office, 1988.
Palmer, R.H., Louis, T.A., Thompson, M.A., et al. Final Report of the Ambulatory Care Medical Audit Demonstration Project (ACMAD). Boston, Mass.: Harvard Community Health Plan and Harvard University, March 1984.
Palmer, R.H. The Challenges and Prospects for Quality Assessment and Assurance in Ambulatory Care. Inquiry 25:119–131, 1988.
Park, R.E., Fink, A., Brook, R.H., et al. Physician Ratings of Appropriate Indications for Six Medical and Surgical Procedures. R-3280-CWF/HF/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986.
Paterson, M.L. The Challenge to Technology Assessment: An Industry Viewpoint. Pp. 106–125 in Quality of Care and Technology Assessment. Lohr, K.N. and Rettig, R.A., eds. Washington, D.C.: National Academy Press, 1988.
PPRC (Physician Payment Review Commission). Improving the Quality of Care: Clinical Research and Practice Guidelines. Draft Background Paper for Conference. Washington, D.C.: Physician Payment Review Commission, September 28, 1988a.
PPRC. Chapter 13. Increasing Appropriate Use of Services: Practice Guidelines and Feedback of Practice Patterns. Annual Report to Congress. Washington, D.C.: Physician Payment Review Commission, March 1988b.
PPRC. Chapter 12. Effectiveness Research and Practice Guidelines. Annual Report to Congress. Washington, D.C.: Physician Payment Review Commission, April 1989.
RTI (Research Triangle Institute). Nationwide Evaluation ofMedicaid Competition Demonstrations. Final Report. Research Triangle Park, N.C.: RTI, 1988.
Schaffarzick, R.W. Technology Assessment: Perspective of a Third-Party Payer. Pp. 98–105 in Quality of Care and Technology Assessment. Lohr, K.N. and Rettig, R.A., eds. Washington, D.C.: National Academy Press, 1988.
Solomon, D.H., Brook, R.H., Fink, A., et al. Indications for Selecting Medical and Surgical Procedures—A Literature Review and Ratings of Appropriateness: Cholecystectomy. R-3204/3-CWF/HF/HCFA/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986.
Sox, H.C., Jr., ed. Common Diagnostic Tests. Use and Interpretation. Philadelphia, Pa.: American College of Physicians, 1987.
Steinberg, E.P. Technology Assessment: A Physician Perspective. Pp. 79–88 in Quality of Care and Technology Assessment. Lohr, K.N. and Rettig, R.A., eds. Washington, D.C.: National Academy Press, 1988.
Stulbarg, M.S., Gerbert, B., Kemeny, M.E., et al. Outpatient Treatment of Chronic Obstructive Pulmonary Disease—A Practitioner’s Guide. Western Journal of Medicine 142:842–846, 1985.
Winchester, D.P. Assuring Quality Cancer Care in an Evolving Health Care Delivery System. CA—A Cancer Journal for Clinicians 39:201–205, 1989.
Winslow, C.M., Kosecoff, J.B., Chassin, M., et al. The Appropriateness of Performing Coronary Artery Bypass Surgery. Journal of the American Medical Association 260:505–509, 1988a.
Winslow, C.M., Solomon, D.H., Chassin, M.R., et al. The Appropriateness of Carotid Endarterectomy. New England Journal of Medicine 318:721–727, 1988b.
The Institute of Medicine (IOM) study committee and staff determined that conducting an expert panel activity could be the best way to discharge the study’s congressional request to develop prototype criteria and standards. The panel members are listed in Table A.1. The remainder of this Appendix describes this activity, which included a literature review, a homework exercise for the panelists, a two-day meeting in June 1989, and staff analysis of all products of these steps; it also provides more details about the results of the homework task and meeting discussions.
As a starting point for discussion of desirable attributes of different types of criteria sets, the IOM staff reviewed the existing literature on guideline development (see Chapter 10 reference list); on the basis of that review, the staff prepared an extensive list of possible general attributes. Three catego-
TABLE A.1 Criteria-Setting Expert Panel for Study to Design a Strategy for Quality Review and Assurance in Medicare
William A. Causey, M.D., F.A.C.P. Jackson Medical Association Jackson, Mississippi (Representing American College of Physicians) |
Mark R.Chassin, M.D. Value Health Sciences, Inc. Santa Monica, California |
Arthur J.Donovan, M.D., F.A.C.S. University of Southern California Los Angeles, California (Representing American College of Surgeons) |
Leonard S.Dreifus, M.D., F.A.C.C. Lankenau Hospital Philadelphia, Pennsylvania (Representing American College of Cardiology) |
David M.Eddy, M.D., Ph.D.a Duke University Durham, North Carolina and Jackson, Wyoming |
Lesley Fishelman, M.D. Harvard Community Health Plan Boston, Massachusetts |
Sheldon Greenfield, M.D. New England Medical Center and Tufts University School of Medicine Boston, Massachusetts |
Robert J.Marder, M.D. Joint Commission on Accreditation of Healthcare Organizations Chicago, Illinois |
Jane L.Neumann, M.D. Wisconsin Peer Review Organization and Waukesha Hospital Waukesha, Wisconsin |
Bruce Perry, M.D., M.P.H. Group Health Cooperative of Puget Sound Seattle, Washington |
Ralph W.Schaffarzick, M.D. Center for Quality Health Care of the Blue Cross and Blue Shield Association Auburn, California |
aWas unable to attend meeting |
ries of criteria were identified for special attention: (1) procedure- and technology-specific appropriateness guidelines, (2) criteria for evaluation of patient care and patient management, and (3) case-finding screens.
This list was incorporated into a homework exercise questionnaire, Possible General Attributes of Criteria Sets, which was mailed to panel members for response before the panel meeting. Panel members were requested to rate the listed attributes on a scale of 1 to 5, with 1 signifying not important attributes and 5 as very important attributes. Space was provided at the end of the questionnaires for respondents to suggest additional attributes or modification of listed attributes and to make any other comments. To help determine in what ways the attributes and their ratings might differ for the three types of criteria sets, the staff provided the questionnaire in triplicate.
The results of the first round of the homework exercise (done at home) are given in Table A.2. As reflected in the large number of 4 or 5 ratings and absence of very low ratings, the panel considered all of the listed attributes important in varying degrees. To obtain more spread in subsequent ratings, we revised the 1 to 5 scale for the second round of balloting (at the meeting) to read least important (1) and most important (5).
Several attributes were rated of less importance for all types of criteria sets; these included simplicity from the patient standpoint and generalizability-compatibility with existing quality assurance approaches (with suggestions that the latter attribute be deleted from the list). Several attributes were rated as more important for some types of criteria sets than others. For example, ease of computerization, feasibility (ease of obtaining data), and reviewer manageability were rated as more important for case-finding screens than for appropriateness guidelines and evaluation-management criteria.
The homework exercise suggested that various modifications in the list of proposed attributes and their definitions warranted further consideration at the expert panel meeting. It also identified several important underlying issues, such as the impact of differences in use on the definition and ratings of attributes of criteria sets. These issues were introduced at the meeting (and were discussed in Chapter 10).
The expert panel meeting opened with a general discussion of attributes of criteria sets (Table A.3). The panel first discussed some fundamental issues raised by the homework exercise regarding standards for judging criteria sets, in particular the impact of use or purpose when considering desirable attributes of criteria sets. It then returned to the original list and proposed many modifications and clarifications for the items on it. These
are embodied in the final list of general attributes of criteria sets in Table 10.1 of Chapter 10.
In subsequent sessions, the panel considered the proposed general attributes in the context of each of the three major types of criteria sets. Specific examples of each type of criteria set were used as a mechanism for examining the proposed attributes more closely; the advantages and limitations identified for these illustrative criteria sets helped the process of defining important attributes. In each session, the attributes for the particular type of criteria set under consideration were reformulated and re-rated (Table A.4). The highest-rated attributes (separately, for substantive and for implementation attributes) for each type of criteria set were extracted from these data and summarized in Table 10.4A of Chapter 10; the selection criterion was a mean score on the second round of voting of 4.5 or greater. Table 10.4B shows those attributes that were given an average rating of at least 4.0 but less than 4.5. In the final session, methods and strategies for guideline formulation were discussed.
TABLE A.2 Expert Panel Homework Exercise: First Round Ratings of Attributes
| Appropriateness | Evaluation/Management | Case-Finding | ||||||
Attributes | na | Meanb | SDc | n | Mean | SD | n | Mean | SD |
Sensitivity | 9 | 4.9 | .33 | 8 | 4.5 | .53 | 10 | 4.7 | .67 |
Specificity | 9 | 4.7 | .71 | 8 | 4.6 | .52 | 10 | 3.9 | 1.10 |
Reliability | 10 | 4.4 | .84 | 10 | 4.5 | .71 | 10 | 4.6 | .52 |
Validity | 10 | 4.7 | .48 | 10 | 4.8 | .42 | 10 | 4.4 | .70 |
Dynamism | 10 | 4.5 | .53 | 10 | 4.3 | .48 | 10 | 4.0 | .67 |
Flexibility | 10 | 4.4 | .70 | 10 | 4.3 | .82 | 10 | 3.8 | 1.14 |
Clinical Adaptability | 10 | 4.4 | .70 | 10 | 4.5 | .71 | 10 | 4.0 | 1.05 |
Responsiveness | 10 | 3.4 | 1.17 | 10 | 3.6 | 1.17 | 9 | 2.9 | 1.27 |
Inclusiveness | 9 | 3.3 | 1.10 | 9 | 2.9 | 1.17 | 10 | 2.7 | .95 |
Concordance | 10 | 4.0 | .94 | 10 | 4.1 | 1.10 | 10 | 3.9 | .88 |
Acceptability | 10 | 4.0 | 1.05 | 10 | 4.1 | .99 | 10 | 4.0 | 1.05 |
Clarity | 10 | 4.6 | .52 | 10 | 4.6 | .52 | 10 | 4.5 | .71 |
Simplicity (non-MDs) | 10 | 4.4 | .70 | 10 | 4.1 | .99 | 10 | 4.3 | .82 |
Simplicity (MDs) | 10 | 4.5 | .71 | 10 | 4.6 | .52 | 10 | 3.9 | .99 |
Simplicity (patients, consumers) | 10 | 3.5 | 1.27 | 10 | 2.9 | .88 | 10 | 2.7 | 1.25 |
TABLE A.3 Activities and Discussion Topics for Criteria-Setting Expert Panel
General Discussion of Proposed Attributes for Criteria Sets | |||
| Presentation: Results of homework exercise Discussion: | ||
|
| Extent to which attributes of criteria sets might differ according to their use or purpose Definitions of listed attributes and of any additional attributes | |
Application of Attributes to Three Types of Criteria Sets | |||
| Discussion: | ||
|
| Proposed attributes for each type of criteria set | |
| Examine illustrative criteria sets: | ||
|
| Technology- or procedure-specific appropriateness guidelines | |
|
|
| (a) American Society of Gastroenterology’s upper endoscopy guidelines (b) Pre-procedure criteria for carotid endarterectomy from Delmarva PRO,a New York PRO, and St. Luke’s Hospital, Houston |
|
| Criteria for evaluation and management of problems and conditions | |
|
|
| (a) UCLA/McCoy versus Medicaid hypertension management review criteria (b) Stulberg chronic obstructive pulmonary disease management algorithm |
|
| Case-finding screens | |
|
|
| (a) Ear, nose and throat screening criteria from Medical Management Analysis (b) Hospital Association of New York State hospital-wide indicators |
| Reformulate attributes for each type of criteria set | ||
General attributes: revisited | |||
Discussion of Methods for Criteria Development | |||
| Literature review, expert panel, and other approaches Stages of guideline development process and the appropriate forum for each stage Differences in methodology for formulation according to type and purpose of criteria set | ||
aUtilization and Quality Control Peer Review Organization (PRO). | |||
TABLE A.4 Expert Panel Homework Exercise: Second Round Ratings of Attributes
| Appropriateness | Evaluation/Management | Case-Finding | ||||||
Attributes | na | Meanb | SDc | n | Mean | SD | n | Mean | SD |
Sensitivity | 10 | 4.5 | .53 | 9 | 4.4 | .73 | 10 | 4.9 | .32 |
Specificity | 10 | 4.0 | .67 | 9 | 3.7 | .83 | 10 | 3.0 | .94 |
Predictive Value | 10 | 3.7 | .82 | 9 | 3.8 | 1.17 | 10 | 3.9 | .88 |
Reliability | 10 | 4.2 | .79 | 10 | 4.5 | .71 | 9 | 4.1 | .78 |
Validity | 10 | 4.6 | .84 | 10 | 4.5 | .71 | 9 | 4.0 | .87 |
Documentation-Ad | 10 | 4.6 | .70 | 10 | 4.3 | .67 | 8 | 3.9 | .83 |
Documentation-Be | 10 | 4.2 | .92 | 10 | 3.9 | .99 | 9 | 3.9 | .78 |
Flexibility | 9 | 4.3 | 1.12 | 10 | 4.8 | .42 | 7 | 4.1 | .90 |
Clinical Adaptability | 10 | 4.4 | .84 | 10 | 4.8 | .42 | 7 | 4.3 | .95 |
Responsiveness | 10 | 2.7 | .95 | 10 | 2.9 | .88 | 6 | 2.7 | 1.03 |
Inclusiveness | 10 | 2.8 | 1.55 | 10 | 3.4 | 1.17 | 9 | 4.2 | .97 |
Acceptability | 10 | 2.8 | 1.40 | 10 | 3.6 | .97 | 8 | 2.9 | .99 |
Clarity | 10 | 4.5 | .71 | 10 | 4.8 | .42 | 9 | 4.8 | .44 |
Appropriateness | 10 | 4.2 | .63 | 9 | 4.0 | 1.12 | 4 | 2.6 | 1.71 |
Pretesting | 10 | 4.2 | .92 | 10 | 4.0 | 1.15 | 9 | 4.2 | 1.09 |
Dynamism | 10 | 4.4 | .70 | 10 | 4.6 | .70 | 10 | 4.4 | .70 |
Evaluation | 10 | 4.1 | .74 | 9 | 4.3 | .5 | 10 | 4.5 | .53 |
Comprehendability (non-MD) | 10 | 4.3 | 1.25 | 10 | 4.2 | .63 | 10 | 4.6 | .52 |
Comprehendability (MD) | 10 | 4.3 | .82 | 10 | 4.3 | .48 | 10 | 4.2 | .63 |
| Appropriateness | Evaluation/Management | Case-Finding | ||||||
Attributes | na | Meanb | SDc | n | Mean | SD | n | Mean | SD |
Comprehendability-Patient | 10 | 3.2 | 1.48 | 10 | 2.9 | 1.20 | 9 | 3.0 | 1.41 |
Manageability (non-MD) | 10 | 3.7 | 1.06 | 10 | 3.6 | .84 | 10 | 4.0 | .67 |
Manageability (MD) | 10 | 3.8 | 1.03 | 10 | 3.3 | .95 | 10 | 4.2 | .79 |
Manageability (Professional) | 10 | 3.7 | .82 | 10 | 3.6 | .84 | 10 | 3.8 | 1.23 |
Nonintrusive | 10 | 3.3 | 1.16 | 10 | 3.7 | .82 | 10 | 3.8 | 1.03 |
Appealability | 10 | 4.9 | .32 | 10 | 4.4 | .84 | 9 | 4.8 | .44 |
Feasibility | 10 | 3.4 | 1.07 | 10 | 4.0 | 1.05 | 10 | 4.1 | .99 |
Computerization | 10 | 3.3 | 1.16 | 10 | 3.8 | .92 | 10 | 3.5 | .85 |
Executability | 10 | 3.8 | .79 | 10 | 4.5 | .53 | 10 | 4.0 | .94 |
Concordance | 10 | 4.3 | 1.06 | 10 | 4.5 | .71 | 8 | 4.1 | .83 |
Prioritization (high) | 10 | 3.4 | 1.17 | 10 | 4.0 | .67 | 9 | 4.1 | 1.05 |
Prioritization (consensus) | 10 | 3.4 | 1.26 | 10 | 3.6 | 1.26 | 8 | 3.8 | 1.04 |
Affordability | 10 | 2.8 | 1.14 | 10 | 2.8 | .79 | 9 | 3.0 | 1.00 |
an is the number of respondents. There were 10 respondents to the second round of rating attributes. Where n<10, the attribute was rated not applicable by one or more respondents. bMean is the mean rating or score among those rating the attributes. cSD is the standard deviation of the mean. dDocuments methods of development and cites literature. eDocuments how reliability was established. | |||||||||