Abstract

Humans usually show exceptional generalisation and discovery ability in the open world, when being shown uncommonly new concepts. Whereas most existing studies in the literature focus on common typical data from closed sets, open-world novel dis- covery is under-explored in videos. In this paper, we are interested in asking: what if atypical unusual videos are exposed in the learning process? To this end, we collect a new video dataset consisting of various types of unusual atypical data (e.g. sci-fi, an- imation, etc.). To study how such atypical data may benefit open-world learning, we feed them into the model training process for representation learning. Focusing on three key tasks in open-world learning: out-of-distribution (OOD) detection, novel category discovery (NCD), and zero-shot action recognition (ZSAR), we found that even straight- forward learning approaches with atypical data consistently improve performance across various settings. Furthermore, we found that increasing the categorical diversity of the atypical samples further boosts OOD detection performance. Additionally, in the NCD task, using a smaller yet more semantically diverse set of atypical samples leads to better performance compared to using a larger but more typical dataset. In the ZSAR setting, the semantic diversity of atypical videos helps the model generalise better to unseen ac- tion classes. These observations in our extensive experimental evaluations reveal the benefits of atypical videos for visual representation learning in the open world, together with the newly proposed dataset, encouraging further studies in this direction.

llustration of open-world data and tasks

(a) Comparison of open-world and closed-world data distributions. Commonly used datasets such as Kinetics-400, UCF101, HMDB51, and MiT-v2 represent more concentrated distributions and are considered closed- world. In contrast, our proposed atypical dataset captures a broader, more diverse feature space, better reflecting the open-world setting. (b) Overview of open-world tasks: OOD detection identifies out-of-distribution (unknown) data from known categories; novel cate- gory discovery (NCD) clusters the unknown data to reveal new classes; and zero-shot ac- tion recognition (ZSAR) further classifies these new categories using semantic information. These tasks form a natural progression, with increasing difficulty and reliance on model generalisation.

Additionally, we introduce a data augmentation method for spectral data, called Adaptive Noise. This technique selectively applies vertical noise to regions containing absorption peaks, ensuring that the introduced noise does not alter the intrinsic spectral information and encourages the model to focus on peak positions more.

Dataset Statistics

Existing publicly available datasets primarily focus on common human actions and activities. In contrast, our dataset introduces a broader spectrum of complex and diverse scenarios

Our atypical dataset encompasses a broad range of scenes, subjects, actions, and other visual elements that are either rare or exhibit substantial semantic or visual deviation from those found in conventional, systematically curated datasets.

Additionally, we introduce a data augmentation method for spectral data, called Adaptive Noise...

Experiments

OOD detection performance

For the OOD detection task, we explore the effect of introducing different auxiliary datasets during the training stage. Evaluation is conducted on both standard OOD benchmarks (HMDB51, MiT-v2) and more challenging atypical distributions (atypical-surreal, atypical- theatre).

Auxiliary datasets with limited semantic content, such as Gaussian noise and Diving48, yield minimal im- provements. In contrast, both Kinetics-400 and our proposed atypical dataset lead to notable gains, with atypical achieving the best overall results across all metrics (FPR95, AUROC, AUPR).

Novel category discovery

For the NCD task, we investigate the impact of different auxiliary datasets during the self- supervised pre-training stage. The baseline model is trained without auxiliary data. We com- pare Kinetics-400, the original dataset itself (UCF101), our proposed atypical dataset, and their combinations.

When combined our atypical data with UCF101, it achieves the best overall results, surpassing configurations that include Kinetics-400. These findings highlight the effectiveness of atypical in enhancing novel category discovery

Zero-shot action recognition

Similar to the NCD task, we evaluate the impact of different auxiliary datasets on the per- formance of ZSAR.

What Can We Learn from Harry Potter?
An Exploratory Study of Visual Representation Learning from
Atypical Videos

Abstract

llustration of open-world data and tasks

Dataset Statistics

Experiments

OOD detection performance

Novel category discovery

Zero-shot action recognition

BibTeX

What Can We Learn from Harry Potter? An Exploratory Study of Visual Representation Learning from Atypical Videos

Abstract

llustration of open-world data and tasks

Dataset Statistics

Experiments

OOD detection performance

Novel category discovery

Zero-shot action recognition

BibTeX

What Can We Learn from Harry Potter?
An Exploratory Study of Visual Representation Learning from
Atypical Videos