标签搜索

Jefxiong

累计撰写 52 篇文章
累计收到 7 条评论

首页
/
算法基础
/
正文

算法基础 Segmentation

Awesome Segmentation

人工智能炼丹师

2020-08-16 / 0 评论 / 292 阅读 / 正在检测是否收录...

08/16

本文对截止到2020年各大顶会的分割论文，包括语义分割，实例分割，全景分割，视频分割等领域发展进行小结，不定期更新。

Awesome Semantic Segmentation

CVPR 2020

StripPooling

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing [Paper] [Code]

ECCV 2020

Error-Correcting Supervision

Semi-Supervised Segmentation based on Error-Correcting Supervision [Paper]

Segmentation Failures Detection

Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation [Paper]

OCRNet

Object-Contextual Representations for Semantic Segmentation[Paper] [Code]

OCRNet

coarse2fine、attention

IFVD

Intra-class Feature Variation Distillation for Semantic Segmentation [Paper] [Code]

模型蒸馏

CaC-Netx

Learning to Predict Context-adaptive Convolution for Semantic Segmentation[Paper] [[Code]()]
cacNet

通过预测卷积kernel进行空间attention

TGM

Tensor Low-Rank Reconstruction for Semantic Segmentation [Paper] [Code]

non-local方法的改进

Segfix

SegFix: Model-Agnostic Boundary Refinement for Segmentation [Paper]

Motivation: 边缘处的点的类别与“内部”的点的类比相似，通过网络学习shift

DecoupleSegNets

Improving Semantic Segmentation via Decoupled Body and Edge Supervision [Paper] [Code]

将主体和边缘特征分离，多任务学习

EfficientFCN

EfficientFCN: Holistically-guided Decoding for Semantic Segmentation [Paper]
EffcientFCN

Motivation: 如何高效率地扩充特征的感受野
算法原理：通过采用减小stride+dilated conv的方式的方式，由于特征分辨率增加导致计算量暴增。文章主要提出一种利用stride=32生成“Codebook”，可以理解为不同patten的特征集合，利用stride=8的特征生成集合的组合系数，实现“上采样”

GCSeg

Class-wise Dynamic Graph Convolution for Semantic Segmentation [Paper]

图卷积做全局特征提取

CVPR 2019

Fast Interactive Object Annotation with Curve-GCN. [Paper] [Code(pytorch)]
- 利用Graph Convolutional Network (GCN) 预测多边形的各个端点实现分割标注
Large-scale interactive object segmentation with human annotators. [Paper]
- 交互式分割
Knowledge Adaptation for Efficient Semantic Segmentation. [Paper]
- 通过知识蒸馏实现大降采样(分辨率降16倍)的高效率分割
  - 通过autoencoder对Teacher网络的特征进行压缩去噪，用L2损失比较T的编码特征与S的编码特征
  - 两两像素之间的相似性的差异(pair-wise distillation)
Structured Knowledge Distillation for Semantic Segmentation. [Paper]
- 通过知识蒸馏实现高效分割，引入多个约束项
  - 单个像素的损失(Teacher与Student之间逐像素损失，Student与GT之间逐像素损失)
  - Teacher与Student网络中两两像素之间的相似性的差异(pair-wise distillation)
  - 利用判别网络实现约束Embedding的相似性(holistic distillation)
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference. [Paper]
- 图片类别标注(Weakly); 图片类别标注+部分逐像素标注(Semi-supervised)
Dual Attention Network for Scene Segmentation. [Paper] [Code(pytorch)]
- 加入空间上(二阶关系，借鉴Non-Local)和通道上的注意力
[DUpsampling]: Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation. [Paper]
- 基于Encoder-Decoder的算法通常为了避免Encoder的最后一层卷积层空间分辨率过小，Encoder网络的total_stride会尽可能小(多数为8)，导致占内存，消耗大量计算资源
- 该论文提出的DUpsampling，利用分割标注在空间上的冗余性(对标注概率label_prob的压缩，对低分辨率网络输出pred_prob，重建高分辨率标注概率label_prob)提出了一种Data-Dependent的上采样方法，比转置卷积上采样方法参数量少，比双线性插值方法更好。
- 得益于DUpsampling，可以将特征分辨率将到足够低，并对底层特征进行Downsample，然后与低分辨率高层特征融合，减少计算量
In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images. [Paper] [Code(Pytorch)]
- 出发点：实现实时语义分割
  - 轻量化backbone： compact encoders（ResNet18 or MobileNet V2）
  - 轻量化decoder with lateral skip-connections（UNet类似结构)
  - 增大网络的感受野：SPP(PSPNet) 或结合lateral skip-connections的图像金字塔结构，有利于识别大目标

ECCV 2018

[ICNet]: ICNet for Real-Time Semantic Segmentation on High-Resolution Images. [Paper] [Code(Tensorflow)]
PSPNet(~1FPS)的加速版本，能够达到实时，30FPS; Image Cascade Network(ICNet)
为什么不直接在最后一个分辨率下，实现1/16和1/32的降采样，然后多尺度特征图融合(UNet结构)，再加上多个尺度上的监督，也就是DeepLabV3+的简化模型版本？
[ExFuse]: Enhancing Feature Fusion for Semantic Segmentation(Face++).[Paper]
semantic supervision(SS): 在backbone的预训练的过程，在网络的中间层加入多个分类损失，使得中间层带有更多的语义信息
layer rearrangement(LR)：调整backbone中不同block的通道数的分布，使得深层和浅层具有相近的通道数，即丰富底层特征，有利于后续步骤中深层和浅层的融合
explicit channel resolution embedding(ECRE)：借鉴超分辨率中的上采样方式(sub-pixel Upsample)
semantic embedding branch(SEB): 将不同深层特征进行上采样，然后与浅层特征相乘融合
densely adjacent prediction(DEP): 可以理解为卷积核为$k \times k$固定参数$\frac{1}{k \times k}$的group conv
[DeepLabv3+]: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.
Adaptive Affinity Fields for Semantic Segmentation
[PSANet]: Point-wise Spatial Attention Network for Scene Parsing
[ESPNet]: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
[BiSeNet]: Bilateral Segmentation Network for Real-time Semantic Segmentation

CVPR 2018

[DFN]: Learning a Discriminative Feature Network for Semantic Segmentation(Face++). [Paper] [Code(tensorflow)]
The Lovász-Softmax loss：A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. [Paper] [Code]
[EncNet]: Context Encoding for Semantic Segmentation
Context Contrasted Feature and Gated Multi-Scale Aggregation for Scene Segmentation
DenseASPP for Semantic Segmentation in Street Scenes
Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation

Awesome Instance Segmentation

Latest

YOLACT：Real-time Instance Segmentation. [Paper]

CVPR 2020

CVPR 2019

Hybrid Task Cascade for Instance Segmentation. [Paper] [Code(pytorch)]
Mask Scoring R-CNN. [Paper]
- 算法简介:Mask Scoring R-CNN是对Mask-RCNN的改进，文章的出发点在于mask-rcnn采用分类的得分作为检测结果和分割结果与GT重合程度的得分，但是在实际应用中常常出现，分类得分高，但是检测结果和分割结果并不好的问题。为了更准确的评估分割结果的好坏，文章在Mask-RCNN的基础上提出一个MaskIOU分支，该分支以ROI区域的分割Mask和ROIAlign的特征作为输入，预测输出该ROI predicted mask与GT mask 之间的IOU score。结合IOU score 和classification score，判断该ROI输出mask的精确程度
- 值得借鉴的点:

CVPR 2018

Path Aggregation Network for Instance Segmentation. [Paper] [Code(pytorch)] COCO2017 Winner :fire:
Masklab: Instance segmentation by refining object detection with semantic and direction features

ICCV 2017

Mask R-CNN. [Paper]

CVPR 2017

End-to-End Instance Segmentation with Recurrent Attention.[Paper]

ECCV 2016

Instance-sensitive fully convolutional networks

Awesome Panoptic Segmentation

CVPR 2019

Panoptic Segmentation. [Paper]
Learning to Fuse Things and Stuff. [Paper]
Attention-guided Unified Network for Panoptic Segmentation.
Panoptic Feature Pyramid Networks.
UPSNet: A Unified Panoptic Segmentation Network
DeeperLab: Single-Shot Image Parser
An End-to-End Network for Panoptic Segmentation
PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things

Awesome Video Object Segmentation

视频分割 VS 语义图片分割：相邻帧得到相似的结果(时间冗余度和视觉抖动)

VOS Performance(mean region similarity)

Algorithm	DAVIS(16val/17)	YouTube-VOS	Youtube-Obj(mIOU)	Speed(FPS)
RVOS(CVPR19)	-/48.0	-	-	22.7
STCNN(CVPR19)	83.8/58.7	-	79.6	0.256
FEELOVS(CVPR19)	81.1/-			1.96
SiamMask(CVPR19)				35
FAVOS(CVPR18)	-/54.6	-	-	-
OSVOS(CVPR17)	79.8/56.6	-	-	0.1~5
MaskTrack(CVPR17)	80.3/-	-	71.7	<1.0
OnAVOS(BMVC17)	86.1/-

CVPR 2019

RVOS: End-to-End Recurrent Network for Video Object Segmentation. [Paper] [Code(pytorch)]
- 特点：多目标视频分割；one-shot and zero-shot VOS
- spatial(Instance) and temporal(video) Recurrent Netorrk
STCNN: Spatiotemporal CNN for Video Object Segmentation. [Paper] [Code(pytorch)]
- 主要由两个支路构成，Temporal Coherence Branch ，利用GAN进行无监督的预训练(输入前4帧, 预测输出当前帧, 生成器的目标为最小化生成图片与当前帧的MSE和最大化判别器的损失)，网络的目的是学习时序的一致性；另外一条支路为Spatial Segmentation Branch，融合当前帧和历史帧的多尺度特征，得到当前帧的预测结果
FEELOVS: Fast End-to-End Embedding Learning for Video Object Segmentation. [Google] [Paper] [Code(tensorflow)]
SiamMask: Fast Online Object Tracking and Segmentation: A Unifying Approach. [Paper] [Code(Pytorch)]
MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation. [Paper]
- 解决目标被遮挡或消失
Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video. [Paper]
A Generative Appearance Model for End-To-End Video Object Segmentation. [Paper] [Code(Pytorch)]

ECCV 2018

YouTube-VOS: Sequence-to-Sequence Video Object Segmentation. [Paper] [DatasetURL]
Video object segmentation with joint re-identification and attention-aware mask propagation

CVPR 2018

Motion-guided cascaded refinement network for video object segmentation.
FAVOS: Fast and accurate online video object segmentation via tracking parts.
Efficient video object segmentation via network modulation.

CVPR 2017

OSVOS 可以认为是将语义分割方法适用到视频目标分割最直接的方法，由离线训练二分类网络(物体分割)+在线finetune构成。FusionSeg和MaskTrack用了光流信息和RGB输入图像进行互补，通过在网络的输入中加入传统方法计算的光流。FusionSeg的光流支路进行重新训练，和MaskTrack 直接沿用RGB支路的模型，前者的光流支路结果通过可学习的1*1卷积进行融合，而后者直接将光流支路得到的结果叠加求平均。

OSVOS: One-Shot Video Object Segmentation.[Paper] [Code(pytorch)] [Code(TensorFlow)]
- 算法流程图：ImageNet预训练+视频分割数据集DAVIS二分类训练+在线测试Finetune
- 特点：单帧处理，没有累计误差；通过Finetune+物体边缘损失约束，用时间换准确率
FusionSeg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos.[Paper] [Code(caffe)]
- 利用外观信息和运动信息构成two stream结构，实现视频目标分割
  - 利用光流信息和当前帧的图像作为输入，够成two-stream结构，实现信息互补：ResNet101结构；最后采用不同大小的空洞卷积构成多尺度，最后通过逐点求极大值进行多分支结果融合。网络训练通过将不同分支分开单独训练后，再训练最后的融合层(1*1)
  - 为了解决视频目标分割数据集不足的问题，提出利用预训练分割模型(VOC2012)+视频目标检测数据集(ImageNet VID)标注框进行筛选，再后处理得到训练数据，过程如下图:
- 缺点：光流采用传统方法估计得到，得到的带有噪声的光流输入图像可能使得训练不稳定，且会影响最后的输出结果
MaskTrack: Learning video object segmentation from static images. [Paper] [Code]
- 利用前一帧预测的mask和当前RGB图像作为输入，mask(t-1)指示了目标的位置，形状大小。训练通过对单张图像进行平移，形变生成训练数据图像对（RGBImg，mask)；离线训练(静态图片平移形变生成的数据优于视频数据集，文中采用显著性检测数据集)+在线Finetue；此外可以加入光流信息互补提升性能，将RGB图像用光流图像替代，经过同样的卷积网络，得到输出概率与MaskTrack的输出概率得分进行平均(论文3.3节中)
- 特点：速度慢(Finetune+光流计算耗时)；前一帧的输入图像可以是粗糙的因此可以用目标检测算法相结合

Others

OnAVOS: Online adaptation of convolutional neural networks for video object segmentation. [BMVC17]

Awesome Video Instance Segmentation

Reference

版权属于：人工智能炼丹师

本文链接： https://jefxiong.cn/index.php/archives/23.html

作品采用：《署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0) 》许可协议授权

取消

Jefxiong

52 文章数

7 评论量

人生倒计时

标签云

Awesome Segmentation

Awesome Semantic Segmentation

CVPR 2020

StripPooling

ECCV 2020

Error-Correcting Supervision

Segmentation Failures Detection

OCRNet

IFVD

CaC-Netx

TGM

Segfix

DecoupleSegNets

EfficientFCN

GCSeg

CVPR 2019

ECCV 2018

CVPR 2018

Awesome Instance Segmentation

Latest

CVPR 2020

CVPR 2019

CVPR 2018

ICCV 2017

CVPR 2017

ECCV 2016

Awesome Panoptic Segmentation

CVPR 2019

Awesome Video Object Segmentation

VOS Performance(mean region similarity)

CVPR 2019

ECCV 2018

CVPR 2018

CVPR 2017

Others

Awesome Video Instance Segmentation

Reference

评论 (0)