David Schlangen
Week 1, August 3-7, 2020
Abstract:
In this introductory course we will look at what we can learn about the semantics of natural language from corpora that pair natural language expressions with images. Thanks to active interest in tasks like image captioning and image retrieval via written descriptions, language+vision resources are now available that approach purely textual corpora in size. (E.g.: BNC, 100 Million tokens; corpora discussed here, ~50 Million tokens.)
We will look at the available data and explore some questions that can be asked with it, regarding how expressions are to be interpreted in concrete, visually specified situations; and how speakers make linguistic choices in specific, visually represented situations, and we discuss machine learning models for these tasks. The course will consist of lectures and hands-on parts, for which Jupyter notebooks and a remote computing environment will be provided, so that the participants will only need to have access to a web browser.