Most projects of this CRC need to handle considerable amounts of data, perform experiments with that data, build models and software artifacts, and eventually produce experimental results. Handling these very diverse, possibly large datasets and models is a challenge in itself.
Since the start of the CRC 1223 in 2016, major advances have been made on scalable platforms for data analysis and experimental research. These include, for example, the Spark system and the TensorFlow library. Both of these and various other open-source software packages have matured, in the last few years, to a level where they can handle very large datasets and run on all kinds of parallel architectures including GPU clusters. Because of these advances, which were not foreseeable at this strong level, the need for designing a dedicated system platform for the CRC became more or less obsolete. This development changed the focus of the research infrastructure and data management within the CRC. Instead of designing new system architectures, the priority was set on collecting and curating datasets as re-usable assets for reproducible experiments.