视频时序切分

人工智能炼丹师
2021-06-09 / 1 评论 / 854 阅读 / 正在检测是否收录...

将视频在时序维度(镜头 + 场景)进行理解, 相关公开数据集和benchmark:SoccerNet-v2Kinetics-GEBDMovieNet
ViTT-AACL2020

1. 镜头分割(Shot Boundray Segmentation)

镜头切分benchmark: ClipShots、TRECVID、SoccerNet-v2

1.1 TransNet

1.2 TransNet V2

1.3 DSBD


2. 场景分割(Scene Boundray Segmentation)

2.1 SceneSeg

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation [CVPR 2020]

  • 论文简介:提出一个场景切分数据集MovieNet(380个电影),此外提出了一个局部到全局的场景切分算法 Github Code
    SceneSeg
  • 算法整体流程:
  1. 镜头切分,公开的源代码采用了传统方法做镜头切分,可以考虑用深度学习方法做优化,如TransNet等
  2. 对每个镜头提取多个模态特征(动作、地点、语音等维度)
  3. 进行局部到全局的特征聚合,利用BNet(boundary Network)实现局部的特征融合
    a. Clip-level: BNet由两个部分构成: 通过内积建模镜头之间(4个镜头)的差异,通过temporal conv + max pooling建模镜头之间的联系,二者concat
    b. Segment-level: 通过bi-LSTM实现序列到序列的功能,其中序列长度选取10(远小于镜头数目,为了减少内存消耗)
    c. global optimal grouping: 通过过动态规划,实现后处理优化(优点:考虑了所有镜头特征,考虑了长时的上下文依赖,缺点: 没有能够实现端到端的优化,与前面的模型时独立的), 具体细节参考StoryGraph

2.2 Shot Type Classification

A Unified Framework for Shot Type Classification Based on Subject Centric Lens[ECCV2020]

镜头拍摄风格识别

Deep Relationship Analysis in Video with Multimodal Feature Fusion [ACM MM 2020]

多模态场景理解

2.3 自监督预训练

Shot Contrastive Self-Supervised Learning for Scene Boundary Detection [CVPR2021]
Amazon

BaSSL: Boundary-aware Self-supervised Learning for Video Scene Segmentation

UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection

Scene Consistency Representation Learning for Video Scene Segmentation


3. 事件分割(Event Segmentation)

Generic Event Boundary Detection: A Benchmark for Event Segmentation

  • 提出了一种新的边界切分定义,包括: 环境、物体、镜头发生变化。

A Benchmark for Multi-shot Temporal Event Localization

Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection

Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection

0

评论 (1)

取消
  1. 头像
    aaxuinqxdo
    Windows 10 · Google Chrome

    博主太厉害了!

    回复
粤ICP备2021042327号