Challenge CURVAS:

Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation

Captura de pantalla 2024-04-17 122433.png

Introduction

In medical imaging, DL models are often tasked with delineating structures or abnormalities within complex anatomical structures, such as tumors, blood vessels, or organs. Uncertainty arises from the inherent complexity and variability of these structures, leading to challenges in precisely defining their boundaries. This uncertainty is further compounded by interrater variability, as different medical experts may have varying opinions on where the true boundaries lie. DL models must grapple with these discrepancies, leading to inconsistencies in segmentation results across different annotators and potentially impacting diagnosis and treatment decisions. Addressing interrater variability in DL for medical segmentation involves the development of robust algorithms capable of capturing and quantifying uncertainty, as well as standardizing annotation practices and promoting collaboration among medical experts to reduce variability and improve the reliability of DL-based medical image analysis. Interrater variability poses significant challenges in the field of DL for medical image segmentation.

Furthermore, achieving model calibration, a fundamental aspect of reliable predictions, becomes notably challenging when dealing with multiple classes and raters. Calibration is pivotal for ensuring that predicted probabilities align with the true likelihood of events, enhancing the model's reliability. It must be considered that, even if not clearly, having multiple classes account for uncertainties arising from their interactions. Moreover, incorporating annotations from multiple raters adds another layer of complexity, as differing expert opinions may contribute to a broader spectrum of variability and computational complexity.

Consequently, the development of robust algorithms capable of effectively capturing and quantifying variability and uncertainty, while also accommodating the nuances of multi-class and multi-rater scenarios, becomes imperative. Striking a balance between model calibration, accurate segmentation and handling variability in medical annotations is crucial for the success and reliability of DL-based medical image analysis.

Because of all the previously stated reasons, we have created a challenge that considers all of the above. In this challenge, we will work with abdominal CT scans. Each of them will have three different annotations obtained from different experts and each of the annotations will have three classes: pancreas, kidney and liver.

The main idea is to be able to evaluate the results considering multi rater information. There will be two parts. The first part will be a classical dice score evaluation and volume assessment, to give information of clinical relevance as well. The second part will consist of studying whether the model is calibrated or not. All of these evaluations will be performed considering all three different annotations.

For the training phase, 20 CT scans belonging to group A with the respective annotations will be given. It is encouraged to leverage publicly available external data annotated by multiple raters. QUIBQ21 organizers have already been contacted and have given consent (with proper attribution) on using their multi-annotator data. The idea of giving a small amount of data for the training set and giving the opportunity of using a public dataset for training is to make the challenge more inclusive, giving the opportunity to develop a method by using data that is in anyone's hands. Furthermore, by using this data to train and using other data to evaluate, it makes it more robust to shifts and other sources of variability between datasets.

Then, for the validation phase, 10 CT scans will be given, 5 belonging to group A and 5 belonging to group B (to which group each CT scan belongs will not be revealed until after the challenge).

Finally, for the test phase, 60 CT scans will be used for evaluation. 20 CTs belonging to group A, 17 Ts belonging to group B and 23 CTs belonging to group C. These CT scans will not be published either until the end of the challenge.

Timeline

Dataset

The challenge cohort consists of 90 CT images prospectively gathered at the University Hospital Erlangen between August 2023 and October 2023.

One of the technical prerequisites was the utilization of contrast-enhanced CT scans in a portal venous phase with the acquisition of thin slices ranging from 0.6 to 1mm. We employed Thoracic-Abdominal CT images taken during the patients' hospital stay, motivated by various medical needs. Given the focus on abdominal organs, we employed the Br40 soft kernel. CT examinations were conducted using SIEMENS CT scanners at the university hospital Erlangen, with rotation speeds of 0.25 or 0.5 sec. Detector collimation varied from 128x0.6mm single source to 98x0.6x2 and 144x0.4x2 dual source configurations. Spiral pitch factors ranged from 0.3 to 1.3. The mean reference tube current was set at 200 mAs, adjustable to 120 mAs. Automated tube voltage adaptation and tube current modulation were implemented in all instances. Contrast agent administration was standard practice, with an injection rate of 3-4 mL/s and a body weight-adjusted dosage of 400 mg(iodine)/kg (equivalent to 1.14 ml/kg Iomeprol 350mg/ml). All images underwent reconstruction using soft convolution kernels and iterative techniques.

Inclusion criteria were a maximum of 10 cysts with a diameter of less than 2,0 cm. Furthermore, CT scans with major artifacts (e.g. breathing artifacts) or incomplete registrations were excluded.

Participants were required to be over 18 years old and provide both verbal and written consent for the use of their CT images in the Challenge. Both study-specific and broad consent were obtained. Among the 90 patients, there were 51 males and 39 females, aged between 37 and 94 years, with an average age of 65.7 years. All patients received treatment at the University Hospital Erlangen in Bavaria, Germany. No additional selection criteria were set to ensure a representative sample of a typical patient cohort.

Our overall data consists on 90 CTs splitted in three different groups:
- Group A: cases with 2 cysts or less with no contour altering pathologies (45 CTs)
- Group B: cases with 3-5 cysts with no contour altering pathologies (22 CTs)
- Group C: cases with 6-10 cysts with some pathologies included (liver metastases, hydronephrosis, adrenal gland metastases, missing kidney) (23 CTs)

However, in any case, the participants will know which case belongs to which group. This information will be released after the challenge, together with the whole dataset.
The data collected for the generation of the datasets involved in this challenge has been approved by an ethical committee (number 23-243-B) held at the Universitätsklinikum Erlangen Hospital.

The data to be used during and after the challenge is pseudonymized and coded by the Hospital to assure that a re-identification of the data sample is not possible. Moreover, the patient information is only known by the IP of the Hospital so that the challenge collaborators do not have as well any means to identify patient's data at any point.

The data usage agreement for this challenge is CC BY-NC (Attribution-NonCommercial).

Evaluation and Metrics

The evaluation will be based on several aspects.

Ranking and Prices

Top five performing methods will be announced publicly. Winners will be invited to present their methods and results in the challenge event hosted in MICCAI 2024.

Two members of the participating team can be qualified as author (one must be the person that submits the results). The participating teams may publish their own results separately only after the organizer has published a challenge paper and always mentioning the organizer's challenge paper.

Captura de pantalla 2024-04-17 114323.png

Training Set Release

Open Development
Phase

Closed Testing Phase

Preliminary Results

MICCAI 2024

Validation submission open
Validation Set Release

Final Algorithms
Submission

Release of the results
Replication of results
Challenge winners announced
contact winners to invite them to MICCAI 2024
Writing
Validation Set Release

Organizers

Captura de pantalla 2024-04-17 111402.png

Sycai Technologies SL (Sycai Medical)

Mertixell Riera i Marín, PhD Candidate

Javier García López, PhD

Júlia Rodríguez Comas, PhD

Captura de pantalla 2024-04-17 111144.png

Universitätsklinikum Erlangen

Joy Kleiss, PhD Candidate

Matthias May, MD

Maximilan Schmidt, MD

Christopher Hessman, MD

Captura de pantalla 2024-04-17 111701.png

Universitat Pompeu Fabra

Shika O K, PhD

Adrian Galdrán, PhD

Miguel Angel González-Ballester, PhD

Training Set Release

Open Development
Phase

Closed Testing Phase

Preliminary Results

MICCAI 2024

Validation submission open

Validation Set Release

Final Algorithms
Submission

Release of the results

Replication of results

Challenge winners announced

contact winners to invite them to MICCAI 2024

Writing

Validation Set Release

Organizers

CON EL APOYO DE

LOGROS

CONTACTO

Training Set Release

Open Development Phase

Closed Testing Phase

Preliminary Results

MICCAI 2024

Validation submission open​

Validation Set Release

Final Algorithms Submission

Release of the results​

Replication of results

Challenge winners announced

contact winners to invite them to MICCAI 2024

Writing

Validation Set Release

Organizers

Open Development
Phase

Validation submission open

Final Algorithms
Submission

Release of the results