Identifying Interpretable Subgroups with Exceptional Structural Relations in Big Datasets

  • Funding agency: RWTH PrepFund Programm
  • Project title: Identifying Interpretable Subgroups with Exceptional Structural Relations in Big Datasets
  • Applicants: Prof. Dr. Axel Mayer & Dr. Florian Lemmerich
  • Staff members: Benedikt Langenberg, M. Sc., Christoph Kiefer, M. Sc. & Jeffry Cacho, B. Sc.
  • Duration: 14 Monate, 2019-2020


In this project, we aim at bringing together data mining techniques from computer science and social science methodology. In the data mining literature, efficient algorithms for subgroup discovery (sometimes also called pattern mining or pattern recognition) have been developed and are widely used for identifying unobserved subgroups in large datasets. In the social, behavioral, and life sciences on the other hand, there are decades of research on how to model structural relations between random variables. One of the most popular and flexible methods used in this field is structural equation modeling (SEM). We propose to develop a new method that combines the latest algorithms from pattern mining to more efficiently find the subgroups of interest in large datasets and complex structural equation models that allow for discovering unobserved relationships within and between persons. The new approach is termed SubgroupSEM and allows researchers to find subgroups of persons with specific functional relationships. For example, the new approach would allow researchers to find a subgroup of patients in which a treatment has effects on motivation, attachment, and physical activity, which then reduce depressive symptoms. To the best of our knowledge there is little to no overlap between SEM and pattern mining fields. We believe that SEM could benefit from the algorithmic knowledge developed in other fields, while on the other hand the pattern mining community could profit from incorporating more refined models used in the social sciences. We want to do various feasibility studies and identify limitations and potential applications of the intended SubgroupSEM approach in order to be in a good position to submit a full proposal to an external funding agency by the end of the PrepFund project.