The immense and constantly growing amount of video resources requires efficient automated processing mechanisms, which is, however, a real challenge due to the Semantic Gap between low-level features automatically extracted and feature statistics and aggregates computed from audio and video signals, and what humans can comprehend based on acquired knowledge and years of experience. Automatically extractable descriptors, such as dominant colour and motion trajectory, have application potential in machine learning-based classification, but are limited in scene understanding, because they do not directly correspond to sophisticated human-interpretable concepts, such as depicted concepts and video events.
One of the main approaches to narrowing the Semantic Gap is to complement feature extraction and analysis with machine-interpretable background knowledge formalized using general, spatial, temporal, and fuzzy description logics and rule-based mechanisms, and implemented in ontology languages such as the Web Ontology Language (OWL) and rule languages such as SWRL. The corresponding structured annotations enable a variety of inference tasks suitable for the automated interpretation of images, 3D models, audio contents, and video scenes. They can be efficiently queried both manually and programmatically using the powerful query language SPARQL.
The formal representation of, and reasoning over, multimedia content descriptions, together with the semantic enrichment of multimedia resources with Linked Data, can be implemented in intelligent applications, such as video understanding, content-based video indexing and retrieval, automated subtitle generation, video surveillance, clinical decision support, and automated music and movie recommendation engines.
This research program focuses on the standardisation of structured spatiotemporal annotations, the development of Linked Data-powered hypervideo applications, and ontology-based video retrieval via spatiotemporal reasoning and information fusion.