Institute for Laboratory Animal Research
Division on Earth and Life Studies
500 Fifth Street, NW
Washington, DC 20001
Phone: (202) 334-2347
October 1, 2021
Joan T. Richerson, M.S., D.V.M., DACLAM
Assistant Chief Veterinary Medical Officer
Tennessee Valley Healthcare System
U.S. Department of Veterans Affairs
Nashville, TN 37212
Dear Dr. Richerson,
This letter report describes the work of the Committee on the Review of Department of Veterans Affairs Monograph on the Economic Impact and Cost Effectiveness of Service Dogs on Veterans with Post Traumatic Stress Disorder (PTSD).
The committee received the following documents on August 16, 2021, for review:
The committee thanks the study team at the Department of Veterans Affairs (VA) and the Institute for Clinical and Economic Review (ICER) for providing the additional documents to assist in the committee’s review of the revised monograph and for its careful attention to the suggestions and concerns noted in the committee’s report, Review of Department of Veterans Affairs Monograph on the Economic Impact and Cost Effectiveness of Service Dogs for Veterans with Post Traumatic Stress Disorder. The committee found the “Responses to NASEM Comments 2021 08 16” document (herein referred to as the VA Response document) to be particularly useful in guiding its review of the revised monograph.
In accordance with the Statement of Task, the committee performed its review of the revised monograph by considering how well the revised monograph responded to the committee’s first report and whether the revised monograph is consistent “with accepted scientific principles,” including the appropriateness of the techniques selected, the rigor of its analysis, the objectivity of the presentation, and other themes described in the Statement of Task (see Box 1-1).
The VA study team is commended for making several key changes suggested by the committee in its first report, and these improve the monograph. For example, the committee was pleased to see that the pre-post analysis in Chapter 1 has been moved from the main report to a supplement and included as a separate observational analysis (see Supplement 3). This analysis should not be mentioned in the abstract.
___________________
1 For the purposes of this report, the document titled, “Monograph 2 revised 2021 08 16” will be referred to as the revised monograph.
The addition of Figures 1.1 and 1.2 to the revised monograph illustrating the basic study flow and assessment schedule, respectively, made the manner in which the clinical trial was carried out more understandable. The revised monograph also includes a more thorough description than in the first report of the key differences between emotional support dogs used in practice and the emotional support dogs used in the VA study. It also discusses these differences as limitations of the study and how they may impact the generalizability of the findings.
Another improvement to the monograph is the addition of a cost-effectiveness acceptability curve to characterize the probability that service dogs are cost effective when compared to emotional support dogs at a range of cost-effective thresholds.
These changes and additions have strengthened the monograph. However, as discussed below, several issues discussed in the committee’s review of the first iteration of the monograph were not completely addressed. In addition, some of the revisions appear to give unbalanced attention to some
results and obscure the uncertainty of key outcomes. In this letter report, the committee describes three major concerns: Intent-to-Treat Analysis; Societal Perspective; and Bias in the Choice of Methods and the Reporting of Results in the Cost-Effectiveness Analysis. The letter report then discusses several minor concerns that could be helpful to the VA study team.
The committee acknowledges the challenges faced by the VA and ICER research teams in analyzing health economic information from this clinical trial. To prevent selection bias, the usual practice is to ask all trial participants before enrollment to consent for follow-up assessment regardless of their intervention assignment and intervention uptake. In this trial, such consent was not obtained. The research team later asked the VA’s Institutional Review Board (IRB) to approve the use of additional data from Veterans who dropped out of the trial after randomization and prior to pairing with a dog, but the IRB denied the request. The IRB’s decision prevented the research team from using the most appropriate comparison data set—the true intent-to-treat (ITT) population—in drawing conclusions about the relative health costs and cost effectiveness of providing a service dog versus an emotional support dog to Veterans with PTSD.
Instead of the ITT, the researchers used per protocol analysis (called the “modified intent to treat” analysis in the first monograph). The per protocol analysis is inappropriate because participant dropout after randomization might introduce bias and there is no guarantee that the treatment arms in the per protocol analysis will be balanced as randomized. Although the committee appreciates that the authors caveat the interpretation of the per protocol results in many places, it encourages the authors to state the limitations of “per protocol” in the extended Abstract and in the Preface so that the casual reader will understand this important limitation.
Without the data for all randomized participants, an appropriate statistical analysis is needed to control for confounding and bias. The committee recommended that the investigators use statistical methods such as imputation to deal with missing data, but the research team asserted that the imputation method “generates a more biased [estimate] than a complete case analysis when data are not missing completely at random” (p. 22). The committee disagrees with this interpretation of the literature. If missingness is completely at random, then both complete case analysis and imputation should give unbiased estimates. If missingness is not at random, then imputation can mitigate bias and provide better estimates of uncertainty compared with per protocol analysis.
In addition, imputation is a common method to deal with missing data, but it is not the only method. An alternative strategy could be the use of inverse probability weighting. The point is that the missingness could be informative in nature and while more involved methods would be required to model missingness not at random (e.g., some form of pattern-mixture models), the researchers can at least try to allay some concerns by presenting the observed trends in outcomes from pairing among those who completed the trial versus those who did not.
The committee’s concern is supported by Table K on page 63 of Monograph 12 (describing the health outcomes of the trial), which shows that participants who were paired (N = 181) were different than those who were not (N = 46) across several dimensions of baseline characteristics. Thus, it cannot be said that using the pairing date would provide “unbiased information for the analysis,” as claimed on page 22 of the revised monograph. The committee strongly recommends the inclusion of a table that reports all of the baseline characteristics reported in Table K of Monograph 1 by treatment for the randomized group, and separately for the paired group. This will help readers understand to what extent any new
___________________
2 A Randomized Trial of Differential Effectiveness of Service Dog Pairing Versus Emotional Support Dog Pairing to Improve Quality of Life for Veterans with PTSD (Monograph 1), see https://www.research.va.gov/REPORT-Study-of-Costs-and-Benefits-Associated-with-the-Use-of-Service-Dogs-Monograph1.pdf.
imbalance in baseline characteristics was generated in the per protocol analysis. Table 1.2 of the revised monograph does not seem complete with all of the characteristics from Table K.
The committee also strongly recommends that the researchers conduct sensitivity analysis to assess possible mechanisms and effects of missing data (e.g., bias) and to explore if the results of the per protocol analysis would change if different assumptions were made about the missing data. For example, do the data suggest the results are biased toward finding no difference when actually one group seems to improve outcomes relative to another? Putting the findings in perspective beyond saying the estimates may be biased would be helpful. Any observed differences, and lack of appropriate statistical controls, raise concerns about the validity of the results and also about the generalizability of the per protocol analysis, which was not discussed.
Finally, the committee was surprised to see the sentence “The average time to pairing for those randomized to EMOT was 158 days whereas the average time to pairing for those randomized to SERV was also 158 days” (p. 68). This seems to contradict the sentence on page 19 that “Time between unblinding and pairing was approximately 2 weeks for the emotional support dog group and 6 weeks for the service dog group.” Although the 3-month training period before pairing mitigates the use of per protocol versus ITT, the authors should clarify the time to pairing from unblinding the randomization for each arm of the study. This is important because differences in time to pairing after unblinding further heighten the concerns about differential drop out before pairing.
The committee notes several concerns with the cost-effectiveness analysis (CEA) from the societal perspective (see Section 2.4.5). The authors changed the name of this analysis section from “modified societal perspective” to “societal perspective” in response to the committee’s recommendation for an analysis reflecting a full societal perspective. The authors ultimately did not provide a full societal perspective, citing missing evidence from several domains such as patient time costs, unpaid caregiver time costs, and transportation costs. The committee disagrees with the decision to not include a full societal analysis and offers several additional suggestions.
First, on page 71 of the revised monograph, the authors say that the societal perspective inputs are identical to those of a government payer perspective, which the committee does not believe is accurate. A government payer perspective would exclude costs to study participants (e.g., for time receiving services and travel time). These costs would be included in a societal perspective.
Second, although domains related to patient time costs and out-of-pocket costs were not solicited from study participants, it is possible to do a back-of-the-envelope approach using external sources of information to incorporate these costs into the analysis. For example, based on the participants’ age, education, and region, an approximate hourly wage can be estimated from census data and then used to multiply by the number of services received and the approximate time per service. Caregiver time costs are potentially more problematic to impute, and could be substantially different between the two modalities, if there are any positive effects on the management of PTSD. Information could be derived from the literature on the percent of individuals with PTSD who rely on a caregiver, for example.
The revised monograph also assumes that domains such as patient out-of-pocket costs and patient time costs for training are likely comparable for those with a service dog versus those with an emotional support dog, but provides no external evidence to support this claim. Patients with a service dog had to fly to the vendor and spend 1-2 weeks in training. Given the relatively small difference in total costs between the two arms, the differential patient training time costs should be included in the analysis even if they have to be estimated.
Additional trainer costs to work with the patients post pairing with a service dog should also be included. Page 53 of the revised monograph indicates that trainers spent 1-2 days for an emotional support dog and 1 week for a service dog. It is not clear whether these differential trainer time costs were included in the bundled payment. It would be helpful if the components of this payment were briefly described.
Finally, the committee still thinks that a more elaborate formal impact inventory table that includes information on these components would be useful to convey what dimensions were considered, why relevant components were not included, and whether their absence may influence the results substantively. The goal would not be to invalidate the current analysis, but to inform a discussion about how these omissions may change the incremental cost-effective ratio.
The committee notes, and the revised monograph acknowledges, that the design of the cost-effectiveness study was post hoc to the clinical study. It is incumbent on the researchers to be conscious of potential bias in selecting methods of analysis and assumptions and characterizing their results. In several instances, as noted below, the committee found a lack of balance in the presentation of the results and in the representation of the level of uncertainty of the measures.
The committee recommends estimating and reporting the uncertainty around the key results of the threshold price analysis (see Section 2.4.6), including the estimated incremental benefit of service dogs relative to emotional support dogs (approximately 14 days [~0.039 quality-adjusted life years (QALYs)]) and the percent cost reduction in the service dog intervention (14%) that would meet a threshold of $100,000 per QALY gained. The current lack of details surrounding the uncertainty in these key results may inadvertently lead readers to be overconfident in the accuracy of the findings. The uncertainty (e.g., 95% confidence interval) should be reported in both the Abstract and the text. In addition, the committee notes that the reported 14% reduction in the service dog intervention cannot be attained by reducing insurance costs (i.e., the trial was run with equivalent insurance costs incurred by both service dogs and emotional support dogs), which should be acknowledged in the monograph.
The committee agrees with the VA threshold health improvement conclusion (see Section 2.4.8) that Veterans would need a 15.8 point improvement in the self-reported PTSD (PCL-5) total score to meet the $100,000 QALY threshold. The next sentence of the conclusion (“Unadjusted pre-post trial analyses yielded –15.4 points on the PCL-5 total score.”) should be removed from the Abstract. The committee also recommends either deleting it from the Conclusion or adding the caveat that the research design used to estimate 11.7 of the 15.4 point improvement in the “unadjusted pre-post trial analyses” does not permit a causal interpretation of results; said differently, the observed improvements over time could also happen at least in part, due to regression to the mean or due to continued receipt of usual care for PTSD symptoms that all of the study participants were receiving.
The committee disagrees with the authors’ decision to exclude individual-level data, as the committee recommends, on the grounds that “differences were not anticipated in the deterministic or the uncertainty analyses using the adjusted summary statistics versus the regression-based individual-level data” (VA Response document). To begin, relying exclusively on adjusted summary statistics loses information on potential correlation across variables that, at a minimum, will affect the precision of model estimates. Moreover, using individual-level data to model economic domains (e.g., health care costs, productivity) in a CEA, even when no significant difference exists between the groups on such domains, is considered best practice. Excluding these domains artificially decreases the uncertainty in the results.
The committee commends the authors for explaining the differences in emotional support dogs used in practice versus those used in the trial (pp. 17-18) and agrees that these differences likely bias patient outcomes in favor of emotional support dogs. The committee recommends including a counterpart statement in the cost-effectiveness chapter (Chapter 2) noting that these same differences would likely bias costs in favor of service dogs (because emotional support dogs in practice are likely to be much less expensive than those used in the trial), and so the net effect on cost effectiveness is unclear. Said differently, emotional support dogs in practice would be expected to result in both lower scores and lower costs than those used in the trial, and so the net effect on the incremental cost-effectiveness ratio is indeterminate.
The committee commends the authors for including the Veterans RAND 12 Item Health Survey (VR-12) as an outcome in the CEA. However, the committee recommends the cost-effectiveness section acknowledge (1) there were no significant differences between the groups on the Clinician Administered PTSD Scale for DSM-5 (CAPS-5), (2) CAPS-5 focuses exclusively on PTSD symptoms (like the PCL-5), (3) CAPS-5 was excluded from the formal CEA because no mapping exists from CAPS-5 scores to utility scores, and (4) if such a mapping existed, the incremental cost-effectiveness ratio of service dogs relative to emotional support dogs would increase (i.e., the incremental cost-effectiveness ratio would be less favorable for service dogs).
The committee commends the authors for moving the pre-post analysis from the main report to a separate supplement. Because it is discussed in the report, the committee strongly recommends some added clarifications and added analysis. First, the committee strongly recommends against discussing the pre-post analysis in the Abstract, because it is likely to be taken out of context by readers. The caveat of “These trends should be viewed with caution given they may be confounded with temporal changes” does not adequately encapsulate the problems with drawing causal inferences from the pre-post analysis. On p. 14 of the revised monograph, the authors note:
Because the control group in this study received an emotional support dog, it cannot identify the relative effect of receiving a service dog compared to no dog control group. To inform this question, we conducted a separate observational analysis comparing all participants in a pre-post analysis and found that both treatment groups reduced their outpatient mental health and outpatient total utilization and costs after receiving a dog. These trends should be viewed with caution given they may be confounded with temporal changes.
This paragraph should also clearly state that this trial was not designed to answer the question concerning service dog versus no intervention. Additionally, the last sentence of the paragraph should be modified to “confounded with temporal changes, including changes that occur from receiving ‘usual care’ for PTSD.” Without this, readers can easily miss the fact that all participants are receiving usual care, which by itself can improve outcomes over time.
The authors may have also misinterpreted the committee’s suggestion about leveraging pre-pairing data. Specifically, the committee did not recommend that the 18 months of pre-pairing data be
used to compare service dog and emotional support dog recipients as such, but rather to track whether improvements in outcomes were occurring for all participants in the pre-pairing period. The committee recommended doing this after pooling data on (future) emotional support dog and service dog recipients, hence endogeneity in timing should be less of an issue. If there is a discernible declining trend in costs in the pre-pairing period itself, then that would strongly suggest that improvements are temporal/due to usual care. Either way, this would be good information for readers to have.
Along somewhat related lines, the authors stated in their response that they could not follow the committee’s recommendation of using fixed effects because pairing with emotional support dogs and service dogs is invariant for each recipient. The committee did not suggest using fixed effects models to estimate the effect of receiving an emotional support dog versus a service dog per se. Rather, because the outcomes of interest are being recorded at different time points after pairing, the committee recommended that the authors explore using fixed effects models in conjunction with this variation in the time of pairing (period t – 1 pre-pairing, period t + 1 post-pairing, period t + 2 post-pairing, and so forth) when estimating the models.
Finally, the committee strongly recommends the authors guard against causal language when discussing the results from this supplement. Causal conclusions from exploratory analyses (e.g., “indicating that receipt of a dog may lead to improved medication adherence for antidepressants”) go beyond what the study design, data, and findings permit.
The authors have not addressed the committee’s concern that results for different costs are presented without considering if the study was powered to detect them. First, their assessment about the value of information is misplaced, as the expected value of information calculation would tell one the expected costs of decision making following the answer that researchers have provided to the current question. Thus, it does not matter what different questions may be of interest in the future; this is a relevant measure to the present study and would help in guiding decisions made based on current results. Second, the authors do not acknowledge that a consideration of power would be valuable. At the least, it would be useful if the report acknowledges that addressing power is important and that they chose not to do it pre-analysis for the economic study or in an ex-ante simulated way, nor calculated the expected value of information.
The committee appreciates the authors’ explanation of how the medication adherence measure was constructed. However, several issues remain to be clarified. It is still unclear how values of the Proportion of Days Covered (PDC) that exceed 100% are obtained, because this measure (unlike the Medication Possession Ratio) is defined as a proportion. Specifically, the Canfield et al. (2019) paper that the authors cite in their response memo includes this statement: “Each day in the denominator has a maximum of 1 days’ supply, which prevents PDC from exceeding 100%.” Thus, constructing a PDC involves checking each day during the study period to see if dispensed medication was available. Similarly, the report states that “PDC is the number of days supplied for a prescription during the observation period divided by the number of days in the observation period” (p. 23). This sounds correct as long as the authors did not sum up the days’ supply variable across overlapping prescriptions (e.g., three drugs dispensed on February 1 with 14 days’ supply should lead to a PDC of only 0.5 because only half of the month has dispensed medication). If the authors really did limit the days supplied to those within the observation period, it is unclear how they could have ended up with a ratio greater than 1. The measure that the authors cite from the Pharmacy Quality Alliance does indicate that the start day can be adjusted if there are overlaps, but there should never be a need for rounding if the PDC is computed correctly. Thus, it appears that the measure they used is not correctly described as a PDC.
The paper cited by the authors in relation to capping the PDC (Raebel et al., 2013), also has this wrong. The original citation to the PDC is Benner et al. (2002).
In addition, it is still not clear why the study requires one refill in order to calculate the PDC (p. 23). This will overstate the PDC as it removes many 0 days, and would require them to interpret the resulting measure as “adherence among those with a prescription.” The committee recognizes that this approach has occasionally been used in other studies, as in the citations the authors provide, but it would be helpful to note in a couple more places that this is measuring one specific type of adherence.
Finally, in Table 1.7 on page 37 it would be helpful to add a row to report the regression coefficients and standard error for lagged PDC because the title of the table indicates that this variable was included in the PDC model. If this is not correct, then the title of the table should be adjusted.
In alternative PCL-5 utility mapping (p. 77), the authors assigned a 0.0038 unit increase in utilities as observed in the overall VA population (not only the PTSD subpopulation), resulting in the incremental cost-effective ratio of $131,000/QALY. Inasmuch as the alternative PCL-5 utility mapping is based on a sample comprising many Veterans who did not have PTSD, the committee recommends dropping this analysis from the report on the grounds that this sample is not representative of the study sample (despite the authors’ claims of “robustness”).
The committee was tasked to review the monograph for how well it responded to the committee’s first report and for consistency with accepted scientific principles. This report transmits the committee’s findings that the monograph has been improved, while noting the above outstanding concerns. In conclusion, the committee would like to once again thank the VA study team for revising the monograph and providing the opportunity to offer additional constructive advice.
Sincerely,
Susan Busch, Co-Chair
Bisakha (Pia) Sen, Co-Chair
Committee on the Review of Department of Veterans Affairs Monograph on the Economic Impact and Cost Effectiveness of Service Dogs on Veterans with Post Traumatic Stress Disorder
Attachments:
A Committee Membership and Biographies
B References
C Reviewers