Validation & Validity in Data Science

In the current era of big data, we can acquire and analyze more data than ever, but this data is unstructured and messy, and measurement procedures may not have been optimal. Even more strongly, in many human-focused use cases, we may not be able to fully articulate what and where to measure, even though we have a good sense on what is an intended or unintended outcome.

In music, we frequently encounter such challenges of measurement. Music information can digitally be described in many ways using many modalities, but the success of a song is typically determined by implicit human responses. As a computer scientist, I am interested in developing validation techniques that give us more confidence in our measurement procedures, also when they occur ‘in the wild’, outside of fully controlled lab settings.

In this, I am both inspired by notions of psychometric validity in the social sciences domain, as well as by techniques for (automated) testing in software engineering.