Semantic segmentation is performed on the whole image over five different classes. Due to the dominant background presence, we do not use the total accuracy as a measure, but instead use mean intersection over union (mean IoU).
Infants and infant seats, as well as children and child seats are treated as two different instances, i.e. the model should learn to separate the child from the child seat. This also means that adults, children and infants should all be classified as a person, i.e. one label for all of them.
Below is the public leaderboard for semantic segmentation for different training data and vehicles. We use the following abbreviations for the classes:
- BG = background
- IS = infant seat
- CS = child seat
- Person = Adult passenger, child or baby
- Object = everyday object
Train car all means that one model was trained on each vehicle. The general performance of the method is evaluated on the test set of each vehicle. Consequently, we calculate the mean of the means of the performances across all vehicles for the overall performance of the method.
If a single car is mentioned as the car the model was trained on, then a single model was trained only on the mentioned car and the performance of this model on the test images of all unseen/unknown vehicles is evaluated. Consequently, we calculate the mean of the means of the performances across all vehicles without the test performance of the vehicle it was trained on.
Name | Train Car | mean IoU | IoU (per class) | Paper | Code | RGB | Gray | Depth | Additional | Team | Title | Conference | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SVIRO-Team | X5 | 43.71 | BG: 85.70 IS: 17.91 CS: 38.61 Person: 67.69 Object: 8.63 | No | Yes | No | Yes |