Details

Tasks

[Next: Rules]

The tasks of the challenge are two-fold:

  • To segment vertebrae from the given spine images that include fractured and non-fractured cases, and provide vertebra segmentation results in the form of corresponding masks.
  • To classify vertebrae from the given spine images into fractured and non-fractured cases along with specific morphological grades and cases of vertebral fractures, and provide fracture classification results in the form of corresponding fracture scores.

Participants are invited to develop automated or semi-automated computer-assisted algorithms to solve both tasks or an individual task, and to submit their results through this website. The results for the vertebra segmentation task and fracture classification task will be evaluated and ranked separately, so that tasks can be approached individually or jointly, meaning that a qualitative vertebra segmentation does not necessarily imply a qualitative fracture classification, and vice versa.

Rules

[Previous: Tasks] [Next: Evaluation Metrics]

The is an open online challenge, meaning that anyone can participate by entering the challenge (open), and that results are regularly updated and posted through this website (online). If a sufficient number of participants enter the challenge, organizers may decide to proceed to a live challenge with corresponding paper submissions within a major medical imaging conference (e.g. MICCAI, ISBI, SPIE MI) and/or prepare a joint journal paper summarizing challenge outcomes for a high-impact journal in the corresponding field (e.g. IEEE TMI, MedIA). The following rules apply:

  • Anyone can download the database without registering. However, we kindly ask users to register with their contact information before downloading the database, so that we can keep track of its usage. Moreover, each registered user will be informed via e-mail about future developments and news related to the challenge.
  • Contributions to the challenge are not limited to new and unpublished methods, which means that application of existing methods is allowed. Participants agree that they will specifically describe any manual operations in their contributions, which may influence their final ranking.
  • Registering with contact information is mandatory at result submission, however, participation in the challenge is anonymous to the extent that identities of participants are known to organizers only. When submitting results, each participant will be given an unique id, the so called participant id (PID), by which the participant will be identified in challenge results posted online.
  • Online challenge results will be reported by PID in the form of ranks only. Ranking will be performed separately for the vertebra segmentation task and fracture classification task. Along with PID, each participant will also receive a password for accessing a more detailed report of the corresponding results.
  • Participants that originate from the same group can submit at most two (2) substantially different contributions to the challenge. Participants are allowed to resubmit twice (2×) an improved version of their existing contributions, meaning that a maximum of three (3) versions per contribution are allowed, and contribution versions resulting in the highest rank will be assigned to corresponding participants.
  • If the challenge organizers decide to proceed to a live challenge and/or prepare a joint journal paper, they reserve the right to decline selected participants (e.g. participants with trivial contributions producing very poor results, etc.). Accepted participants must agree to disclose their identity and provide a short description of the contributed method. For the joint journal paper, a maximum of two (2) co-authors per contribution will be allowed.

Evaluation Metrics

[Previous: Rules] [Next: Ranking]

Evaluation of the results will be performed on the basis of the submitted results separately for vertebra segmentation and fracture classification tasks.

Vertebra Segmentation Evaluation Metrics

For vertebra segmentation, the following metrics will be considered to evaluate the quality of volume masks \(M_{seg}\) of segmented vertebrae against corresponding reference volume masks \(M_{ref}\):

DSC
The Dice similarity coefficient (DSC) is defined as: $$DSC = \frac{2N(M_{seg} \cap M_{ref})}{N(M_{seg})+N(M_{ref})}$$
where \(N(M_{seg} \cap M_{ref})\) is the number of voxels in the overlap between volume masks \(M_{seg}\) and \(M_{ref}\), \(N(M_{seg})\) is the number of voxels in the segmented volume mask \(M_{seg}\), and \(N(M_{ref})\) is the number of voxels in the reference volume mask \(N(M_{ref})\).

MSSD
The mean symmetric surface distance (MSSD) is defined as: $$MSSD = \frac{1}{N(M_{seg}) + N(M_{ref})}\left(\sum_{i \in M_{seg}}\min_{j \in M_{ref}}d(i,j) + \sum_{j \in M_{ref}}\min_{i \in M_{seg}}d(j,i)\right)$$ $$MSSD = \frac{1}{N(M_{seg}) + N(M_{ref})} \cdot \\[0.5em] \left(\sum_{i \in M_{seg}}\min_{j \in M_{ref}}d(i,j) + \sum_{j \in M_{ref}}\min_{i \in M_{seg}}d(j,i)\right)$$ where \(\min_{j \in M_{ref}}d(i,j)\) is the Euclidean distance from surface voxel \(i\) in the segmented volume mask \(M_{seg}\) to the closest surface voxel \(j\) in the reference volume mask \(M_{ref}\), and \(\min_{i \in M_{seg}}d(j,i)\) is the Euclidean distance from surface voxel \(j\) in the reference volume mask \(M_{ref}\) to the closest surface voxel \(i\) in the segmented volume mask \(M_{seg}\).



Fracture Classification Evaluation Metrics

For fracture classification, the following metrics will be considered to evaluate the correctness of detected fracture scores \(S_{det} = (g_{det},c_{det})\) against corresponding reference fracture scores \(S_{ref} = (g_{ref},c_{ref})\), where \(g_{det}\) and \(g_{ref}\) are morphological grades, and \(c_{det}\) and \(c_{ref}\) are morphological cases of vertebral fractures (if either \(g=0\) or \(c=0\), the only possible score is \(S=(0,0)\), representing a non-fractured vertebra):

MSPP
The shortest path penalty (SPP) is defined as the sum of individual penalties accumulated along the shortest path from the detected score \(S_{det}\) to the corresponding reference score \(S_{ref}\). In this case, each individual penalty equals 1, representing each change in morphological grade and/or case starting from the detected score and reaching the corresponding reference score. For all vertebrae, the mean shortest path penalty (MSPP) is therefore computed as: $$MSPP = \frac{\sum\left(|g_{det}-g_{ref}| + [c_{det} \neq c_{ref}]\right)}{\sum[c_{ref} \geq 0]}$$ where \([x]=1\) if condition \(x\) is satisfied, and \([x]=0\) otherwise. For example, for a vertebra with the detected score \(S_{det}=(1,3)\) (mild crush fracture) and the corresponding reference score \(S_{ref}=(2,1)\) (moderate wedge fracture), SPP results in \(SPP=2\).



F-score
The F-score quantitatively describes the quality of a binary classification and is defined as: $$F = \frac{2 \cdot PPV \cdot TPR}{PPV + TPR}$$ where the positive predictive value (PPV), also known as precision, is defined as the ratio of all correctly detected fractured vertebrae, and the true positive rate (TPR), also known as sensitivity or recall, is defined as the ratio of fractured vertebrae that are correctly detected as such: $$\begin{split} PPV &= \frac{TP}{TP + FP} \\[0.5em] TPR &= \frac{TP}{TP + FN} \end{split}$$ where true positives \(TP = \sum\left([c_{det} > 0]\cdot[c_{ref} > 0]\right)\) represent the number of fractured vertebrae that are correctly detected as fractured, false negatives \(FN = \sum\left([c_{det} = 0]\cdot[c_{ref} > 0]\right)\) represent the number of fractured vertebrae that are incorrectly detected as non-fractured, true negatives \(TN = \sum\left([c_{det} = 0]\cdot[c_{ref} = 0]\right)\) represent the number of non-fractured vertebrae that are correctly detected as non-fractured, and false positives \(FP = \sum\left([c_{det} > 0]\cdot[c_{ref} = 0]\right)\) represent the number of non-fractured vertebrae that are incorrectly detected as fractured. Positives \(P=TP+FN=\sum[c_{ref}>0]\) represent the number of all fractured vertebrae, and negatives \(N=FP+TN=\sum[c_{ref}=0]\) represent the number of all non-fractured vertebrae, where \([x]=1\) if condition \(x\) is satisfied, and \([x]=0\) otherwise.



Ranking

[Previous: Evaluation Metrics]

Ranking of the results will be performed on the basis of the evaluation metrics separately for vertebra segmentation and fracture classification.

Vertebra Segmentation Ranking

Ranking of vertebra segmentation results will be performed in the following order:

  • for each segmented vertebra in each spine image, the DSC and MSSD values will be computed for each participant,
  • then, for each segmented vertebra in each spine image, ranks for the DSC and MSSD values will be computed across all participants (if DSC is zero, the participant will be attributed the lowest rank for both DSC and MSSD),
  • then, for each segmented vertebra in each spine image, the mean of the DSC and the mean of the MSSD ranks will be computed for each participant, resulting in the participant's rank for that vertebra,
  • then, the mean participant's rank across all vertebrae will be computed, resulting in the participant's final vertebra segmentation rank.

Ranks for the DSC and MSSD values are natural numbers, while all remaining ranks, including the participant's final vertebra segmentation rank, are rational numbers.





Fracture Classification Ranking

Ranking of fracture classification results will be performed in the following order:

  • for all classified vertebrae, the MSPP and F-score values will be computed for each participant,
  • then, ranks for the MSPP and F-score values will be computed across all participants,
  • then, the mean of the MSPP and F-score ranks will be computed for each participant, resulting in the participant's final fracture classification rank.

Ranks for the MSPP and F-score values are natural numbers, while the participant's final fracture classification rank is a rational number.