Category: Original Article

Measures of diagnostic accuracy and clinical usefulness. Epidemiological methodologies applied to the use of lung ultasound among heart failure patients

Bovaro F.*1, Manasievska M.*2,3, Merletti F.3, Ciliberto E.3, Maule M.M.3, Lupia E.4,5, Pivetta E.2,3,4

       *these authors contributed equally to this work

  1. Residency Program in Emergency Medicine, University of Turin
  2. PhD Program in Experimental Medicine and Therapy, Department of Medical Sciences, University of Turin
  3. Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin
  4. Emergency Medicine Division, AOU Città della Salute e della Scienza di Torino
  5. Deparment of Medical Sciences, University of Turin


It is still not clear which is the best methods for evaluating accuracy and clinical usefulness of new diagnostic tools.
To evaluate performances of an integrated diagnostic approach with the lung ultrasound (LUS) in diagnosing acute heart failure using seversal methodologies.
Materials And Methods
We calculated the area under the ROC curve (AUC), Brier score, Youden index, net reclassification index (NRI) and net benefit (NB) for the clinical and the LUS integrated approaches in a subcohort of patients enrolled at Molinette Hospital in a previous multicenter study.
NRI and NB seemed to be more informative for understanding the usefulness of a diagnostic tool.


Heart failure (HF) is one of the most relevant problems in developed countries and its incidence is increasing progressively with age (1). HF is defined as clinical syndrome with symptoms and signs that can result from any structural or functional cardiac disorder that impairs the ability of the ventricle to eject blood (2).
Acute heart failure is a complex and heterogeneous clinical syndrome defined as the rapid onset or change in symptoms and signs of heart failure requiring immediate medical attention and urgent therapy. It is a leading indication for hospitalization, associated with high short-term (intra-hospital) and long-term (6 to 12 month) mortality (2).
Typical HF symptom is shortness of breath (i.e. dyspnea), which is one of the most common complaints in the Emergency Department (ED), causing over 3 million evaluations/year in the United States (3)(4). It is defined as a subjective experience of breathing difficulty. Dyspnea can have two main etiologies, cardiogenic and non-cardiogenic. The diagnosis of HF based on combination of patient’s history, physical examination and traditional diagnostic approach (i.e. chest radiography, electrochadiogram, and dosage of natriuretic peptides) is often difficult, and a large number of the initial etiological diagnoses made by emergency physicians are modified after further examinations leading to dangerous diagnostic delays.
Lung ultrasound (LUS) is a basic application of point-of-care ultrasound(5). It can be quickly performed bedside and it leads to rapid therapeutic decisions (6).
Multiple vertical artifacts (i.e. B lines) at LUS evaluation have been proposed as a sonographic sign of pulmonary congestion (7). they are a good indicator of alveolar interstitial syndrome, but are not specific for acute HF AHF (8). Combination of sonographic and clinical finidings might improve diagnostic accuracy of an acute dyspnea etiology assessment (6).
The recent guidelines from the European Society of Cardiology (ESC), published in June 2016 (2), do not modify the general approach to patients with suspected AHF. The guidelines propose an integrated approach for the diagnosis of HF that should be based on detailed symptoms history, physical examination and further diagnosis confirmation using additional investigations such as electrocardiogram, chest radiograph, echocardiography and biomarkers such as natriuretic peptides (2). Therefore, the only relevant difference compared to the 2012 ESC guidelines is the recommendation to use natriuretic peptides.
The guidelines mention the LUS without indicating its level of efficacy, but suggesting the use of bedside LUS for evaluation of signs of interstitial edema and pleural effusion if expertise was available (2).
Several epidemiological methods have been suggested to evaluate accuracy and clinical usefulness of different diagnostic tools, but none of them was demostrated to perform better than the traditional receiver operating characteristic (ROC) curve, mainly in terms of frequency of use.


With this study we aimed to evaluate different performances of an integrated diagnostic approach, by implementing clinical assessment in combination with the bedside LUS in differentiating AHF from noncardiogenic causes of acute dyspnea in the ED.

Materials and methods

We used data of patients enrolled at the "Città della Salute e della Scienza di Torino" University Hospital, which is one of the seven Piedmont hospitals enrolled in an observational cohort [ref] In this cohort the diagnostic accuracy of an integrated approach with LUS was evaluated. After the initial clinical work-up (history, physical examination, electrocardiogram, arterial blood gas analysis), the emergency physician EP in charge was requested to indicate the most likely etiology of patient’s dyspnea, expressed as a dichotomous variable (cardiogenic or non-cardiogenic).Chest radiography (CXR) measurements were performed in all patients. After the LUS was performed the same EP was asked to reformulate the most likely diagnosis. As a reference test, two emergency physicians, blinded to LUS results, independently reviewed the entire medical records and indicated the final cause of dyspnea, which was used for the calculation of diagnostic accuracy (in case of disagreement, they reviewed together all data and assigned the most likely final diagnosis)
The ability of a diagnostic procedure to disitinguish sick from healthy patients determines its accuracy and diagnostic value.
The accuracy of the diagnostic approaches was expressed as the area under the ROC curve (AUC), Brier scores, Youden index.
The ROC curve, a graphical technique for describing and comparing the accuracy of a diagnostic test, is obtained by plotting the sensitivity of a test on the y axis and 1-specificity on the x axis for the complete range of decision thresholds (9). The Youden Index is used as a summary measure of the ROC curve because it measures the maximum effectiveness of a diagnostic procedure and enables the selection of an optimal threshold value (cutoff point) at the same time (10).
The Brier Score is a measure used for verifying the accuracy of a probability forecast, which refers to a specific event with binary outcomes. It is the average gap (mean squared difference) between forecast probabilities and the actual outcomes (11).
In order to evaluate the benefitial effects of the diagnostic tests we referred to the clinical usefulness. This concept has been defined ambiguously in evaluation of healthcare. The usefulness of a diagnostic procedure is defined as the degree to which actual use of the corresponding procedure in the healthcare is associated with changing health outcomes, such as preventing death and restoring or maintain health (12).Two very popular measures that are used to assess the clinical usefulness of a diagnostic test are the net reclassification index (NRI) and the net benefit (NB).
NRI evaluates the improvement in prediction performance gained by adding a new predictor to a set of baseline predictors. It is an index that attempts to quatify how well a new test reclassifies subjects in comparison to the old model (13).
The NB is a decision analytic measure that explicitly incorporates weights for detecting disease (i.e. true positive, TP) and overdiagnosing non disease (i.e. false positive, FP) (13)(14). It can be interpreted as the fraction of TP classifications penalized for FP classifications. Net fraction of TP gained by making decisions based on prediction with the diagnostic test/marker/procedure compared to decisions without the diagnostic procedure at a single threshold (e.g. prevalence of disease) is net benefit (NB).


The sub-cohort analyzed in this study consists of 310 patients presented to the ED of the Città della Salute e della Scienza di Torino for acute dyspnea, of whom 152 (49%) patients received a final diagnosis of heart failure. The area under the ROC (AUC) of the clinical evaluation, the integrated approach and CXR was 0.874, 0.974 and 0.774, respectively.
The NRI of the approach integrated with LUS for cardiogenic and non-cardiogenic dyspneas were 12.5 (95% CI: 6.9-18.1) and 7.6 (95% CI: 3.9-12.1), respectively. The NB of the clinical and the integrated valuations varied from 13.1 to 10, respectively with a prevalence of heart failure ranging from 40 to 50%.
The Brier score for the clinical and integrated evaluations were 0.11 and 0.03, respectively.
The results for the Youden index for the clinical diagnosis and the integrated approach was 0.747 and 0.948, respectively.


The diagnostic accuracy and clinical usefulness of a diagnostic tool could be expressed in several different ways. Although several methods have been proposed, AUC is the most reported measure of accuracy. Despite a widespread use of AUC, NRI and NBs might be more informative, in particular for understanding the usefulness of a diagnostic tool.
  1. Redfield MM. Heart Failure — An Epidemic of Uncertain Proportions. N Engl J Med. 2002 Oct 31;347(18):1442–4.
  2. Ponikowski P, Voors AA, Anker SD, Bueno H, Cleland JGF, Coats AJS, et al. 2016 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur Heart J. 2016;37(27):2129–2200m.
  3. Parshall MB, Schwartzstein RM, Adams L, Banzett RB, Manning HL, Bourbeau J, et al. An official American thoracic society statement: Update on the mechanisms, assessment, and management of dyspnea. Am J Respir Crit Care Med. 2012;185(4):435–52.
  4. Ambrosino N, Serradori M. Determining the cause of dyspnoea: linguistic and biological descriptors. Chron Respir Dis. 2006;3(3):117–22.
  5. Moore CL CJ. Point-of-care ultrasonography. N Engl J Med. 2011;364(8):749–57.
  6. Pivetta E, Goffi A, Lupia E, Tizzani M, Porrino G, Ferreri E, et al. Lung ultrasound-implemented diagnosis of acute decompensated heart failure in the ED: A SIMEU multicenter study. Chest. 2015;148(1):202–10.
  7. Martindale JL. Resolution of sonographic B-lines as a measure of pulmonary decongestion in acute heart failure. Am J Emerg Med. 2016;34(6):1129–32.
  8. Volpicelli G, Elbarbary M, Blaivas M, Lichtenstein DA, Mathis G, Kirkpatrick AW, et al. International evidence-based recommendations for point-of-care lung ultrasound. Intensive Care Med. 2012;38(4):577–91.
  9. MS P. The Statistical Evaluation of Medical Tests for Classification and Prediction. New York: Oxford University Press; 2004. 302 p.
  10. Ruopp MD, Perkins NJ, Whitcomb BW, Schisterman EF. Youden Index and optimal cut-point estimated from observations affected by a lower limit of detection. Biometrical J. 2008;50(3):419–30.
  11. Brier GW. Verification of forecasts expersses in terms of probaility. Mon Weather Rev. 1950;78(1):1–3.
  12. Vickers AJ. Decision analysis for the evaluation of diagnostic tests, prediction models and molecular markers. Am Stat. 2008;62(4):314–20.
  13. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Vol. 19, Book. 2009. 500 p.
  14. Steyerberg EW, Pencina MJ, Lingsma HF, Kattan MW, Vickers AJ, van Calster B. Assessing the incremental value of diagnostic and prognostic markers: A review and illustration. Eur J Clin Invest. 2012;42(2):216–28.


Itjem is the official italian scientific review for emergency medicine.

Publisher: Simeu, Società italiana della medicina di emergenza-urgenza, via Valprato, 68 Torino -

Editorial coordination: Silvia Alparone.

Scientific Manager : Giuliano Bertazzoni; Operating Editorial Board: Paolo Balzaretti, Guido Borasi, Rodolfo Ferrari, Mauro Giordano, Paolo Groff, Emanuele Pivetta.

Advisory Board: Michele Gulizia, Riccardo Lubrano, Marco Ranieri, Maria Pia Ruggieri, Roberta Petrino, Francesco Violi, Ugo Loaisa, Lexie Asrow.

Editorial Board and Reviewers: Giancarlo Agnelli, Giancarlo Avanzi, Marco Baroni, Stefania Basili, Alessio Bertini, Francesco Buccelletti, Gian A. Cibinel, Roberto Cosentini, Fabio De Iaco, Andrea Fabbri, Paola Noto, Giovanni Ricevuti, Fernando Schiraldi, Danilo Toni.

Norme editoriali clicca qui.

  • Scientifici

  • Divulgativi