I hold a master's degree in computer science and I am a PhD student at the Pattern Recognition and Image Analysis lab and a fourth year medical student at the University of Münster, Germany. At the beginning of my PhD, I redeveloped the MDM Portal, a research infrastructure to share clinical metadata, and was involved in projects on interoperability and reuse of metadata. My main interest lays in the useful integration of machine learning into clinical routine. For my PhD, I work on an interpretable machine learning system with doctors-in-the-loop during model development. My long-term research vision is for a self-learning healthcare system in which trial and error methods and ad-hoc reports are replaced by a data-driven approach that informs medical decision making of healthcare professionals.

Before moving to Münster for my PhD and medical studies, I obtained my master's degree in computer science from the RWTH Aachen with semesters abroad at the University of Gothenburg and UC Berkeley. In my bachelor, I was interested in theoretical computer science and wrote my thesis in mathematical logic. During my master, I shifted my focus to artificial intelligence and machine learning. My master's thesis dealt with automatic speech recognition.

Selected publications

For a full list of my papers see my Google Scholar profile.

An Evaluation of the Doctor-Interpretability of Generalized Additive Models with Interactions
Stefan Hegselmann, Thomas Volkert, Hendrik Ohlenburg, Antje Gottschalk, Martin Dugas, Christian Ertmer.
Conference: Machine Learning for Healthcare (MLHC) 2020
Video: Spotlight presentation
Code: Zenodo repository
Explanation: Generalized Additive Models with Interactions are transparent models consisting of one- and two-dimensional risk functions that can be visualized and assessed by human practitioners. This allows a model validation, which is useful in sensitive domains such as healthcare. In this study, we evaluated their interpretability with doctors to determine their clincal usefulness. The resuluts suggested that doctors can correctly interpret risk functions and also feel confident to do so. However, the evaluation also identified several interpretability issues and it showed that interpretability depends on the complexity of risk functions.

Pragmatic MDR: A Metadata Repository with Bottom-Up Standardization of Medical Metadata through Reuse
Stefan Hegselmann, Michael Storck, Sophia Gessner, Philipp Neuhaus, Julian Varghese, Philipp Bruland, Alexandra Meidt, Cornelia Mertens, Sarah Riepenhausen, Sonja Baier, Benedikt Stöcker, Jörg Henke, Carsten Oliver Schmid, Martin Dugas
Journal: BMC Medical Informatics and Decision Making
Explanation: Many institutions use different documentation (metadata) to collect medical data, e.g. one hospital collects blood pressure as an integer in mmHg and another as free text allowing much more variability. This leads to incompatible data elements that cannot be easily combined. One approach to standardize and re-use data elements are so-called metadata repositories (MDRs). Usually, an expert committee determines the content of an MDR in a top-down fashion, increasing the workload and introducing a potential bias. In this work, we explore an alternative and fully automatic approach called pragmatic MDR that is based on a large amount of real-world documentation and ranks definitions according to their frequency (bottom-up standardization). We implement a prototype and perform experiments with data from the MDM Portal. The results showed that bottom-up standards only emerged for less complex medical concepts (lab values and vital signs) and two medical data managers assessed the overall quality of the proposed data elements as good.

A Web Service to Suggest Semantic Codes Based on the MDM-Portal
Stefan Hegselmann, Michael Storck, Sophia Geßner, Philipp Neuhaus, Julian Varghese, Martin Dugas.
Conference: 63th Meeting of the German Association for Medical Informatics, Biometry and Epidemiology 2018
Explanation: The MDM Portal developed and hosted by the Institute of Medical Informatics at the University of Münster contains thousand of medical forms annotated with semantic codes to foster interoperability and to enable semantic analysis. However, annotating metadata with semantic codes is an ambigous and a laborious process. To overcome these issues, we developed a standardized REST web interface that offers a suggestion mechanism of more than 330,000 semantic codes from the portal and integrated it into two metadata editors.

Reproducible Survival Prediction with SEER Cancer Data
Stefan Hegselmann, Leonard Gruelich, Julian Varghese, Martin Dugas.
Conference: Machine Learning for Healthcare (MLHC) 2018
Code: Github repository
Explanation: The Surveillance, Epidemiology, and End Results (SEER) is a major epidemiological cancer registry. In this paper, we performed a literature review of 34 studies that applied machine learning to this data and concluded that no study contains straightforward reproducible results. Additionally, we demonstrated that reproducible cohort selection and survival prediction with SEER cancer data are feasible. We published the code of our experiments to allow future work to build on our results.

Automatic conversion of metadata from the study of health in Pomerania to ODM
Stefan Hegselmann, Sophia Gessner, Philipp Neuhaus, Jörg Henke, Carsten Oliver Schmidt, Martin Dugas.
Conference: Health Informatics Meets eHealth 2017
Explanation: The MDM Portal developed and hosted by the Institute of Medical Informatics at the University of Münster contains medical forms in the system-independent CDISC Operational Data Model. In this work, we analyzed and developed a metadata mapping from a major epidemiological study in Germany (Study of Health in Pomerania) to this Operational Data Model, which allowed us to make hundreds of medical forms with more than 15,000 items publicly accessible and reusable.

Inverted HMM - a Proof of Concept
Patrick Doetsch, Stefan Hegselmann, Ralf Schlüter, Hermann Ney.
Conference: 30th Conference on Neural Information Processing Systems (NeurIPS) 2016
Explanation: While a standard hidden Markov model for automatic speech and handwriting recognition assign frames to state labels, we proposed an inverted architecture that assigns each state to a single frame. For my master thesis, I developed the code and conducted experiments that served as basis for this paper written together with my adivisors.

Counting in Team Semantics
Erich Grädel, Stefan Hegselmann.
Conference: 25th EACSL Annual Conference on Computer Science Logic (CSL) 2016
Explanation: For my bachelor thesis, I explored several counting extensions for so-called logics with team semantics that, compared to common Tarski semantics, are evaluated for sets of assignments instead of a single assignment of the variables. This paper written with my advisor introduces two of these extension and a proof that shows same expressive power of a logic extended by one of our counting constructs and fixed-point logic with counting, which is a fundamental logic for descriptive complexity theory.