Τετάρτη 22 Ιανουαρίου 2020

Data clustering to select clinically-relevant test cases for algorithm benchmarking and characterization.

Data clustering to select clinically-relevant test cases for algorithm benchmarking and characterization.:

Related Articles
Data clustering to select clinically-relevant test cases for algorithm benchmarking and characterization.

Phys Med Biol. 2020 Jan 21;:

Authors: Weppler S, Schinkel C, Kirkby C, Smith WL

Abstract

PURPOSE: Algorithm benchmarking and characterization are an important part of algorithm development and validation prior to clinical implementation. However, benchmarking may be limited to a small collection of test cases due to the resource-intensive nature of establishing "ground-truth" references. This study proposes a framework for selecting test cases to assess algorithm and workflow equivalence. Effective test case selection may minimize the number of ground-truth comparisons required to establish robust and clinically relevant benchmarking and characterization results.

METHODS: To demonstrate the proposed framework, we clustered differences between two independent workflows estimating during-treatment dose objective violations for 15 head and neck cancer patients (15 planning CTs, 105 on-unit CBCTs). Each workflow used a different deformable image registration algorithm to estimate inter-fractional anatomy and contour changes. The Hopkins statistic tested whether workflow output was inherently clustered and k-medoid clustering formalized cluster assignment. Further statistical analyses verified the relevance of clusters to algorithm output. Data at cluster centers ("medoids") were considered as candidate test cases representative of workflow-relevant algorithm differences.

RESULTS: The framework indicated that differences in estimated dose objective violations were naturally grouped (Hopkins = 0.75, providing 90% confidence). K-medoid clustering identified five clusters which stratified workflow differences (MANOVA: p < 0.001) in estimated parotid gland D50%, spinal cord/brainstem Dmax, and high dose CTV coverage dose violations (Kendall's tau: p < 0.05). Systematic algorithm differences resulting in workflow discrepancies were: parotid gland volumes (ANOVA: p < 0.001), external contour deformations (t-test: p = 0.022), and CTV-to-PTV margins (t-test: 0.009), respectively. Five candidate test cases were verified as representative of the five clusters.

CONCLUSIONS: The framework successfully clustered workflow outputs and identified five test cases representative of clinically relevant algorithm discrepancies. This approach may improve the allocation of resources during the benchmarking and characterization process and the applicability of results to clinical data.

PMID: 31962297 [PubMed - as supplied by publisher]

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου