Benchmarking robustness of automated CT pancreas segmentation: achieving human-level reliability through human-in-the-loop optimization. — SciRadar