Awesome Segmentation

人工智能炼丹师
2020-08-16 / 0 评论 / 263 阅读 / 正在检测是否收录...

本文对截止到2020年各大顶会的分割论文,包括语义分割,实例分割, 全景分割,视频分割等领域发展进行小结,不定期更新。

Awesome Semantic Segmentation

CVPR 2020

StripPooling

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing [Paper] [Code]

ECCV 2020

Error-Correcting Supervision

Semi-Supervised Segmentation based on Error-Correcting Supervision [Paper]

Segmentation Failures Detection

Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation [Paper]

OCRNet

Object-Contextual Representations for Semantic Segmentation[Paper] [Code]

OCRNet

  • coarse2fineattention

IFVD

Intra-class Feature Variation Distillation for Semantic Segmentation [Paper] [Code]

  • 模型蒸馏

CaC-Netx

Learning to Predict Context-adaptive Convolution for Semantic Segmentation[Paper] [[Code]()]
cacNet

  • 通过预测卷积kernel进行空间attention

TGM

Tensor Low-Rank Reconstruction for Semantic Segmentation [Paper] [Code]

  • non-local方法的改进

Segfix

SegFix: Model-Agnostic Boundary Refinement for Segmentation [Paper]

  • Motivation: 边缘处的点的类别与“内部”的点的类比相似,通过网络学习shift

DecoupleSegNets

Improving Semantic Segmentation via Decoupled Body and Edge Supervision [Paper] [Code]

  • 将主体和边缘特征分离,多任务学习

EfficientFCN

EfficientFCN: Holistically-guided Decoding for Semantic Segmentation [Paper]
EffcientFCN

  • Motivation: 如何高效率地扩充特征的感受野
  • 算法原理:通过采用减小stride+dilated conv的方式的方式,由于特征分辨率增加导致计算量暴增。文章主要提出一种利用stride=32生成“Codebook”,可以理解为不同patten的特征集合,利用stride=8的特征生成集合的组合系数,实现“上采样”

GCSeg

Class-wise Dynamic Graph Convolution for Semantic Segmentation [Paper]

  • 图卷积做全局特征提取

CVPR 2019

  • Fast Interactive Object Annotation with Curve-GCN. [Paper] [Code(pytorch)]
    • 利用Graph Convolutional Network (GCN) 预测多边形的各个端点实现分割标注
  • Large-scale interactive object segmentation with human annotators. [Paper]
    • 交互式分割
  • Knowledge Adaptation for Efficient Semantic Segmentation. [Paper]
    • 通过知识蒸馏实现大降采样(分辨率降16倍)的高效率分割
      • 通过autoencoder对Teacher网络的特征进行压缩去噪,用L2损失比较T的编码特征与S的编码特征
      • 两两像素之间的相似性的差异(pair-wise distillation)
  • Structured Knowledge Distillation for Semantic Segmentation. [Paper]
    • 通过知识蒸馏实现高效分割,引入多个约束项
      • 单个像素的损失(Teacher与Student之间逐像素损失,Student与GT之间逐像素损失)
      • Teacher与Student网络中两两像素之间的相似性的差异(pair-wise distillation)
      • 利用判别网络实现约束Embedding的相似性(holistic distillation)
  • FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference. [Paper]
    • 图片类别标注(Weakly); 图片类别标注+部分逐像素标注(Semi-supervised)
  • Dual Attention Network for Scene Segmentation. [Paper] [Code(pytorch)]
    • 加入空间上(二阶关系,借鉴Non-Local)和通道上的注意力
  • [DUpsampling]: Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation. [Paper]
    • 基于Encoder-Decoder的算法通常为了避免Encoder的最后一层卷积层空间分辨率过小,Encoder网络的total_stride会尽可能小(多数为8),导致占内存,消耗大量计算资源
    • 该论文提出的DUpsampling,利用分割标注在空间上的冗余性(对标注概率label_prob的压缩,对低分辨率网络输出pred_prob,重建高分辨率标注概率label_prob)提出了一种Data-Dependent的上采样方法,比转置卷积上采样方法参数量少,比双线性插值方法更好。
    • 得益于DUpsampling,可以将特征分辨率将到足够低,并对底层特征进行Downsample,然后与低分辨率高层特征融合,减少计算量
  • In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images. [Paper] [Code(Pytorch)]
    • 出发点: 实现实时语义分割
      • 轻量化backbone: compact encoders(ResNet18 or MobileNet V2)
      • 轻量化decoder with lateral skip-connections(UNet类似结构)
      • 增大网络的感受野:SPP(PSPNet) 或 结合lateral skip-connections的图像金字塔结构,有利于识别大目标

ECCV 2018

  • [ICNet]: ICNet for Real-Time Semantic Segmentation on High-Resolution Images. [Paper] [Code(Tensorflow)]
    ICNet

  • PSPNet(~1FPS)的加速版本,能够达到实时,30FPS; Image Cascade Network(ICNet)

  • 为什么不直接在最后一个分辨率下,实现1/16和1/32的降采样,然后多尺度特征图融合(UNet结构),再加上多个尺度上的监督,也就是DeepLabV3+的简化模型版本?

  • [ExFuse]: Enhancing Feature Fusion for Semantic Segmentation(Face++).[Paper]
    ExFuse

  • semantic supervision(SS): 在backbone的预训练的过程,在网络的中间层加入多个分类损失,使得中间层带有更多的语义信息

  • layer rearrangement(LR): 调整backbone中不同block的通道数的分布,使得深层和浅层具有相近的通道数,即丰富底层特征,有利于后续步骤中深层和浅层的融合

  • explicit channel resolution embedding(ECRE):借鉴超分辨率中的上采样方式(sub-pixel Upsample)

  • semantic embedding branch(SEB): 将不同深层特征进行上采样,然后与浅层特征相乘融合

  • densely adjacent prediction(DEP): 可以理解为卷积核为$k \times k$固定参数$\frac{1}{k \times k}$的group conv

  • [DeepLabv3+]: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.

  • Adaptive Affinity Fields for Semantic Segmentation

  • [PSANet]: Point-wise Spatial Attention Network for Scene Parsing

  • [ESPNet]: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

  • [BiSeNet]: Bilateral Segmentation Network for Real-time Semantic Segmentation

CVPR 2018

  • [DFN]: Learning a Discriminative Feature Network for Semantic Segmentation(Face++). [Paper] [Code(tensorflow)]
  • The Lovász-Softmax loss:A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. [Paper] [Code]
  • [EncNet]: Context Encoding for Semantic Segmentation
  • Context Contrasted Feature and Gated Multi-Scale Aggregation for Scene Segmentation
  • DenseASPP for Semantic Segmentation in Street Scenes
  • Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation

Awesome Instance Segmentation

Latest

  • YOLACT:Real-time Instance Segmentation. [Paper]

CVPR 2020

CVPR 2019

  • Hybrid Task Cascade for Instance Segmentation. [Paper] [Code(pytorch)]

  • Mask Scoring R-CNN. [Paper]

    MS-RCNN

    • 算法简介:Mask Scoring R-CNN是对Mask-RCNN的改进,文章的出发点在于mask-rcnn采用分类的得分作为检测结果和分割结果与GT重合程度的得分,但是在实际应用中常常出现,分类得分高,但是检测结果和分割结果并不好的问题。为了更准确的评估分割结果的好坏,文章在Mask-RCNN的基础上提出一个MaskIOU分支,该分支以ROI区域的分割Mask和ROIAlign的特征作为输入,预测输出该ROI predicted mask与GT mask 之间的IOU score。结合IOU score 和classification score,判断该ROI输出mask的精确程度
    • 值得借鉴的点:

CVPR 2018

  • Path Aggregation Network for Instance Segmentation. [Paper] [Code(pytorch)] COCO2017 Winner​ :fire:
  • Masklab: Instance segmentation by refining object detection with semantic and direction features

ICCV 2017

CVPR 2017

  • End-to-End Instance Segmentation with Recurrent Attention.[Paper]

ECCV 2016

  • Instance-sensitive fully convolutional networks

Awesome Panoptic Segmentation

CVPR 2019

  • Panoptic Segmentation. [Paper]
  • Learning to Fuse Things and Stuff. [Paper]
  • Attention-guided Unified Network for Panoptic Segmentation.
  • Panoptic Feature Pyramid Networks.
  • UPSNet: A Unified Panoptic Segmentation Network
  • DeeperLab: Single-Shot Image Parser
  • An End-to-End Network for Panoptic Segmentation
  • PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things

Awesome Video Object Segmentation

视频分割 VS 语义图片分割: 相邻帧得到相似的结果(时间冗余度和视觉抖动)

VOS Performance(mean region similarity)

Algorithm DAVIS(16val/17) YouTube-VOS Youtube-Obj(mIOU) Speed(FPS)
RVOS(CVPR19) -/48.0 - - 22.7
STCNN(CVPR19) 83.8/58.7 - 79.6 0.256
FEELOVS(CVPR19) 81.1/- 1.96
SiamMask(CVPR19) 35
FAVOS(CVPR18) -/54.6 - - -
OSVOS(CVPR17) 79.8/56.6 - - 0.1~5
MaskTrack(CVPR17) 80.3/- - 71.7 <1.0
OnAVOS(BMVC17) 86.1/-

CVPR 2019

  • RVOS: End-to-End Recurrent Network for Video Object Segmentation. [Paper] [Code(pytorch)]

    • 特点: 多目标视频分割;one-shot and zero-shot VOS
    • spatial(Instance) and temporal(video) Recurrent Netorrk
  • STCNN: Spatiotemporal CNN for Video Object Segmentation. [Paper] [Code(pytorch)]

    STCNN

    • 主要由两个支路构成,Temporal Coherence Branch ,利用GAN进行无监督的预训练(输入前4帧, 预测输出当前帧, 生成器的目标为最小化生成图片与当前帧的MSE和最大化判别器的损失),网络的目的是学习时序的一致性;另外一条支路为Spatial Segmentation Branch,融合当前帧和历史帧的多尺度特征,得到当前帧的预测结果
  • FEELOVS: Fast End-to-End Embedding Learning for Video Object Segmentation. [Google] [Paper] [Code(tensorflow)]

  • SiamMask: Fast Online Object Tracking and Segmentation: A Unifying Approach. [Paper] [Code(Pytorch)]

  • MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation. [Paper]

    • 解决目标被遮挡或消失
  • Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video. [Paper]

  • A Generative Appearance Model for End-To-End Video Object Segmentation. [Paper] [Code(Pytorch)]

ECCV 2018

  • YouTube-VOS: Sequence-to-Sequence Video Object Segmentation. [Paper] [DatasetURL]
  • Video object segmentation with joint re-identification and attention-aware mask propagation

CVPR 2018

  • Motion-guided cascaded refinement network for video object segmentation.
  • FAVOS: Fast and accurate online video object segmentation via tracking parts.
  • Efficient video object segmentation via network modulation.

CVPR 2017

OSVOS 可以认为是将语义分割方法适用到视频目标分割最直接的方法,由离线训练二分类网络(物体分割)+在线finetune构成。FusionSeg和MaskTrack用了光流信息和RGB输入图像进行互补,通过在网络的输入中加入传统方法计算的光流。FusionSeg的光流支路进行重新训练,和MaskTrack 直接沿用RGB支路的模型,前者的光流支路结果通过可学习的1*1卷积进行融合,而后者直接将光流支路得到的结果叠加求平均。

  • OSVOS: One-Shot Video Object Segmentation.[Paper] [Code(pytorch)] [Code(TensorFlow)]

    • 算法流程图:ImageNet预训练+视频分割数据集DAVIS二分类训练+在线测试FinetuneOSVOS

    • 特点:单帧处理,没有累计误差;通过Finetune+物体边缘损失约束,用时间换准确率

  • FusionSeg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos.[Paper] [Code(caffe)]

    • 利用外观信息和运动信息构成two stream结构,实现视频目标分割

      • 利用光流信息和当前帧的图像作为输入,够成two-stream结构,实现信息互补:ResNet101结构;最后采用不同大小的空洞卷积构成多尺度,最后通过逐点求极大值进行多分支结果融合。网络训练通过将不同分支分开单独训练后,再训练最后的融合层(1*1)FusionSeg
      • 为了解决视频目标分割数据集不足的问题,提出利用预训练分割模型(VOC2012)+视频目标检测数据集(ImageNet VID)标注框进行筛选,再后处理得到训练数据,过程如下图:Data
    • 缺点:光流采用传统方法估计得到,得到的带有噪声的光流输入图像可能使得训练不稳定,且会影响最后的输出结果

  • MaskTrack: Learning video object segmentation from static images. [Paper] [Code]
    img

    • 利用前一帧预测的mask和当前RGB图像作为输入,mask(t-1)指示了目标的位置,形状大小。训练通过对单张图像进行平移,形变生成训练数据图像对(RGBImg,mask);离线训练(静态图片平移形变生成的数据优于视频数据集,文中采用显著性检测数据集)+在线Finetue;此外可以加入光流信息互补提升性能,将RGB图像用光流图像替代,经过同样的卷积网络,得到输出概率与MaskTrack的输出概率得分进行平均(论文3.3节中)
    • 特点: 速度慢(Finetune+光流计算耗时);前一帧的输入图像可以是粗糙的因此可以用目标检测算法相结合

Others

  • OnAVOS: Online adaptation of convolutional neural networks for video object segmentation. [BMVC17]

Awesome Video Instance Segmentation

Reference

2

评论 (0)

取消
粤ICP备2021042327号