Low Inter-Rater Reliability of Calculating the 4T’s Score for Heparin Induced Thrombocytopenia

Machhi, Rushad

Introduction

Heparin induced thrombocytopenia (HIT) is an immune-mediated drug reaction that can cause thromboembolism in the setting of thrombocytopenia following heparin exposure. The “4T’s score” has been validated to determine the pre-test probability of HIT and to assist in decision making around ordering testing for HIT. The 4T‘s scoring system requires an individual clinician to determine each component of the score. The objective of our study is to investigate the inter-rater reliability of calculating the 4T’s score among clinicians.

Methods

Through retrospective query of Northwestern Enterprise Data Warehouse, we identified patients who had a HIT antibody (Ab) test ordered between 10/2019 and 10/2022 after implementation of a clinical decision support (CDS) tool that asked clinicians to calculate a 4T’s score as part of HIT PF-4 Ab orders. From this cohort, an independent clinician randomly selected 15 patients. Four raters, including an attending hematologist, a hospital medicine attending, a hematology/oncology fellow, and an internal medicine resident, performed manual chart review of the randomly selected subjects to calculate a 4T’s score. Data collected for each case included individual components of the 4T’s score and overall 4T’s score from each rater. In addition, we compared raters’ scores to the 4T’s score entered by the ordering clinician for the patient. We then categorized the numerical scores into validated pre-test probability categories: 0-3 as low, 4-5 as intermediate, and ≥6 as high. Inter-rater reliability for the categories was calculated using the Fleiss kappa statistic. Our study was approved by the Northwestern University Institutional Review Board.

Results

Of 15 cases selected, 5 each were scored as low, intermediate, and high probability 4T’s scores by the ordering clinician (associated with the HIT Ab order). The overall agreement between the score categories for the 5 clinicians (4 raters and the ordering clinician) was 50.7%, with a Fleiss Kappa statistic of 0.26 (95% CI [0.07-0.45]), indicating poor inter-rater reliability (Figure 1). Excluding the ordering clinician, the overall agreement in score category between the 4 raters was 58.9%, with a Fleiss Kappa statistic of 0.38 (95% CI [0.14-0.62]). The same 4T’s score pre-test probability category was calculated by the 4 raters in 5 cases, with only one case in which all four raters calculated the same numerical 4T’s score. There was one case in which all 5 clinicians calculated the same 4T’s score probability category, but none with the same numerical 4T’s score. The hematology/oncology fellow had highest inter-rater agreement with the original clinical (Kappa 0.30 [-0.09-0.69]), whereas the internal medicine attending and resident had the lowest (both with Kappa 0.00 [-0.37-0.37]). Of the 15 4T’s scores, 12/15 ordered by the hematology attending, 5/15 by the internal medicine attending, 12/15 by the hematology/oncology fellow, and 11/15 by the internal medicine resident were lower than those calculated by the original ordering clinicians.

Conclusion

Our study demonstrates poor inter-rater reliability of HIT 4T’s score calculation, across levels of training and specialty. Importantly, poor inter-rater reliability was seen across 4T’s categories, which has implications for clinical management of patients undergoing evaluation for HIT. This suggests that different strategies are necessary to help clinicians better use the 4Ts score.

2650 Low Inter-Rater Reliability of Calculating the 4T’s Score for Heparin Induced Thrombocytopenia