Big Data–Big Theory: Predictive modelling and communities of cultural practice



1, 22 & 23 March 2016

This series of workshops aims to investigate the potential for Statistical Learning, Information Theory and related approaches to account for meaning in cultural products, going beyond current data-intensive computational research on music, art and literature which concentrates on classification or prediction of consumer behaviour. They will also explore how thinking in the arts can inform developments in Data Science. In particular we plan to map and bridge a number of apparent and related gulfs including:

  • between analytical practices and approaches which rely on the interaction of superhumanly many pieces of data, and those which rely on the application of general principles not easily expressible in computational terms,
  • between the language on the one hand of feature sets, probability and entropy, and on the other of characteristics, communities of practice, and originality,
  • between one culture which measures success by a single aggregate outcome (e.g., the accuracy of a classifier), and another culture which measures success by particular cases.

In part, these are Big-Data twists on Snow’s ‘Two Cultures’, but not only so. Radically different analytical practices exist within single domains (e.g., ‘semantic analysis’ of musical audio, where both data-driven and theory-driven approaches are common) which are hard to accommodate in the same scholarly space, let alone to unite in a single analytical method. Furthermore, Big Data and its associated methodologies hold promise of bringing cultures together, e.g., through common concerns with prior experience, learning, and emergence.

Towards accounts of meaning in the arts

Information, according to Shannon, is reduction in uncertainty. Artists play with the uncertainty, surprise and expectation of their viewers and audience in many ways. Access to large quantities of data and modern methods allow these things to be measured and modelled.

In its first few decades, computational research aiming to analyse artworks concentrated on collation of materials and measurement and analysis of specific features. In the case of music, this has mushroomed in the past couple of decades into the flourishing field of ‘Music Information Retrieval’ with its own international society and conference. Nevertheless, the impact on musicology is small, a situation mirrored in other disciplines. In part this is because computational research has not addressed the issues of central interest, and in particular has eschewed dealing with ‘meaning’. The moment for these workshops arises not because hectoring is required to encourage arts researchers to embrace the possibilities of computational research nor to encourage data scientists to address the important issues. Instead it arises because computational systems which ‘know’ a lot about the context of an artwork are no longer a pipe dream, and because of current interest in processes of production and reception in the arts.

Computational models which learn patterns and probabilities from large quantities of actual data can lead to systems which, for example, not only label chords in a piece of music but demonstrate where a particular chord progression is rare or common and why. Such observations often form the substance of analyses of pieces of music, and the basis of claims of meaning. Computational research of this kind therefore has the potential to at least highlight the points in a work which are meaningful, even if not to construct an entire account of meaning.

Similar potential exists for the other arts also. Eye-tracking studies of art have the potential to lead to predictive models of visual composition. Studies employing the methods of corpus linguistics could allow us to understand something of what makes a poem effective. Above all, since ‘expectation’ plays such an important role in both predictive computational models and in accounts of artworks, can new data-intensive methods give novel ways of approaching the question of meaning, especially since in an artwork meaning as much concerns ‘how’ as it does ‘what’?

The workshops

A series of three mini-sandpits will be held at Lancaster University on Tuesday 1 March, Tuesday 22 March and Wednesday 23 March 2016. The overall objective is to identify themes of common concern as guides for future research, to develop ways of communicating and working across divides of intellectual culture and language, and to sketch potential interdisciplinary research projects.

  1. Information theory and the arts (1 March)
  • Fundamental concepts (information, uncertainty, prediction, anticipation, probability, familiarity, resemblance, symmetry, order, structure, signification, meaning, etc.).
  • History of information-theoretic study in the arts (Birkoff, Moles, Berlyne, Meyer, etc.).
  • Recent developments in theory, technology and data (compression and complexity, probabilistic programming, deep learning, diverse sources of data, etc.).
  1. Communication and cross-fertilisation (22 March)
  • Scientific and artistic language: shared concepts or not? (‘Information’, ‘features’, ‘originality’, ‘entropy’, etc.)
  • Research practices, different concepts of ‘data’ and different roles for subjectivity.
  • Criteria of significance and validity.
  1. Research directions (23 March)
  • What are the achievable medium-term objectives?
  • How can we ensure good science-arts collaboration?
  • What materials and technologies are promising?
  • Projects to pursue.

The principal intended outcomes are

  • plans for collaborative research projects ready to start the process of bid-writing,
  • a manifesto for big-data research in the arts.

Participants will be expected to actively engage in discussions, to make proposals, to bring ideas and, when invited, to give short presentations about their research. Participation will be by invitation only. Attendance at all three workshops is desirable but not essential. Lunch and refreshments will be provided. We intend to meet the travel and accommodation expenses of participants, using funds made available by the EPSRC, but in some cases, depending on numbers, it might be possible only to make a contribution.

Any researchers based in the UK wishing further information or who would like to be invited should contact Alan Marsden at LICA, Lancaster University,