欧美free性护士vide0shd,老熟女,一区二区三区,久久久久夜夜夜精品国产,久久久久久综合网天天,欧美成人护士h版

<cite id="iq4iq"><thead id="iq4iq"></thead></cite>

柚子快報(bào)激活碼778899分享：論文閱讀醫(yī)學(xué)圖像分割：U

Farfetch遠(yuǎn)方購(gòu)物棧綜合2025-05-05480

柚子快報(bào)激活碼778899分享：論文閱讀醫(yī)學(xué)圖像分割：U

http://yzkb.51969.com/

“U-Net: Convolutional Networks for Biomedical Image Segmentation” 是一篇由Olaf Ronneberger, Philipp Fischer, 和 Thomas Brox發(fā)表的論文，于2015年在MICCAI的醫(yī)學(xué)圖像計(jì)算和計(jì)算機(jī)輔助干預(yù)會(huì)議上提出。這篇論文介紹了一種新型的卷積神經(jīng)網(wǎng)絡(luò)架構(gòu)——U-Net，特別是為了處理醫(yī)學(xué)圖像分割問(wèn)題而設(shè)計(jì)。

背景和挑戰(zhàn) 在醫(yī)學(xué)圖像分析領(lǐng)域，圖像分割是一個(gè)基本且重要的任務(wù)，它涉及將圖像分割成不同的區(qū)域或?qū)ο?，例如，區(qū)分正常組織與腫瘤組織。傳統(tǒng)的分割方法依賴于手工特征提取和復(fù)雜的模型，而深度學(xué)習(xí)方法，特別是卷積神經(jīng)網(wǎng)絡(luò)（CNN），提供了一種端到端的自動(dòng)特征學(xué)習(xí)方法。 U-Net 架構(gòu) U-Net的設(shè)計(jì)靈感來(lái)源于全卷積網(wǎng)絡(luò)（FCN），但做了顯著的改進(jìn)以更好地適應(yīng)醫(yī)學(xué)圖像分割。U-Net的架構(gòu)形狀像字母"U"，由兩部分組成：

收縮路徑（Contracting Path）：

也稱為編碼器部分，包括多個(gè)卷積層和池化層，用于提取圖像特征。隨著網(wǎng)絡(luò)深度的增加，空間分辨率逐漸降低，但特征通道數(shù)增加，以學(xué)習(xí)更復(fù)雜的圖像表示。

擴(kuò)展路徑（Expansive Path）：

也稱為解碼器部分，由多個(gè)上采樣操作和卷積層組成。擴(kuò)展路徑的目的是將低分辨率的特征映射恢復(fù)到高分辨率，以便于精確的定位。

網(wǎng)絡(luò)特點(diǎn)：

跳躍連接（Skip Connections）：

跳躍連接將編碼器部分的特征圖與解碼器部分的對(duì)應(yīng)特征圖連接起來(lái)，這有助于網(wǎng)絡(luò)在上采樣過(guò)程中恢復(fù)精確的定位信息。通過(guò)跳躍連接，網(wǎng)絡(luò)能夠利用上下文信息進(jìn)行更準(zhǔn)確的分割。數(shù)據(jù)增強(qiáng)（Data Augmentation）：

論文中特別強(qiáng)調(diào)了數(shù)據(jù)增強(qiáng)在訓(xùn)練過(guò)程中的重要性，因?yàn)獒t(yī)學(xué)圖像數(shù)據(jù)通常是有限的。使用了隨機(jī)旋轉(zhuǎn)、縮放和彈性變形等方法來(lái)擴(kuò)展訓(xùn)練數(shù)據(jù)集，從而提高模型的泛化能力。

成果和影響 U-Net在2015年的ISBI挑戰(zhàn)賽中取得了突破性的結(jié)果，并且由于它出色的性能和靈活性，迅速成為醫(yī)學(xué)圖像分割領(lǐng)域的一個(gè)里程碑。U-Net的架構(gòu)和思想被廣泛應(yīng)用于各種醫(yī)學(xué)圖像分割任務(wù)，并且激發(fā)了許多后續(xù)的研究和改進(jìn)。

結(jié)論 U-Net提供了一種有效的醫(yī)學(xué)圖像分割方案，通過(guò)其獨(dú)特的結(jié)構(gòu)設(shè)計(jì)，它在處理小量數(shù)據(jù)集時(shí)仍然能夠?qū)崿F(xiàn)很高的精度。它解決了傳統(tǒng)分割方法難以捕捉復(fù)雜特征和形狀的問(wèn)題，并為醫(yī)學(xué)圖像分割領(lǐng)域的發(fā)展開(kāi)辟了新的方向。

------------------------------------------------------------以下是原文閱讀----------------------------------------------------------------------

Abstract.

There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.unifreiburg.de/people/ronneber/u-net .

廣泛認(rèn)為，成功訓(xùn)練深度網(wǎng)絡(luò)需要數(shù)千個(gè)帶有注釋的訓(xùn)練樣本。在本文中，我們提出了一種網(wǎng)絡(luò)和訓(xùn)練策略，通過(guò)強(qiáng)烈使用數(shù)據(jù)增強(qiáng)技術(shù)，更有效地利用可用的標(biāo)注樣本。**該架構(gòu)包括一個(gè)收縮路徑來(lái)捕捉上下文信息和一個(gè)對(duì)稱擴(kuò)展路徑來(lái)實(shí)現(xiàn)精確定位。**這樣的網(wǎng)絡(luò)可以從非常少量的圖像進(jìn)行端到端訓(xùn)練，并且在ISBI挑戰(zhàn)中對(duì)EM stacks(EM堆棧)（electron microscopic stacks）中神經(jīng)結(jié)構(gòu)分割的先前最佳方法（滑動(dòng)窗口卷積網(wǎng)絡(luò)）取得了更好的效果。使用同一網(wǎng)絡(luò)在透射光顯微鏡圖像（相差顯微鏡和差顯微鏡）上進(jìn)行訓(xùn)練，我們?cè)贗SBI細(xì)胞追蹤挑戰(zhàn)2015中以較大的優(yōu)勢(shì)贏得了這些類別。此外，該網(wǎng)絡(luò)速度快。對(duì)于一個(gè)512x512的圖像，分割只需不到一秒鐘的時(shí)間在最新的GPU上完成。完整的實(shí)現(xiàn)（基于Caffe）和訓(xùn)練過(guò)的網(wǎng)絡(luò)可在http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net找到。

Introduction

In the last two years, deep convolutional networks have outperformed the state of the art in many visual recognition tasks, e.g. [ 7 , 3]. While convolutional networks have already existed for a long time [ 8], their success was limited due to the size of the available training sets and the size of the considered networks. The breakthrough by Krizhevsky et al. [ 7] was due to supervised training of a large network with 8 layers and millions of parameters on the ImageNet dataset with 1 million training images. Since then, even larger and deeper networks have been trained [12]. 在過(guò)去的兩年中，深度卷積網(wǎng)絡(luò)在許多視覺(jué)識(shí)別任務(wù)中超越了最先進(jìn)的方法，例如[7, 3]。雖然卷積網(wǎng)絡(luò)已經(jīng)存在很長(zhǎng)時(shí)間[8]，但由于可用訓(xùn)練集的規(guī)模和考慮網(wǎng)絡(luò)的規(guī)模有限，它們的成功受到了限制。Krizhevsky等人的突破是通過(guò)在ImageNet數(shù)據(jù)集的100萬(wàn)個(gè)訓(xùn)練圖像上對(duì)一個(gè)包含8個(gè)層和數(shù)百萬(wàn)個(gè)參數(shù)的大型網(wǎng)絡(luò)進(jìn)行監(jiān)督訓(xùn)練來(lái)實(shí)現(xiàn)的[7]。從那時(shí)起，甚至更大更深的網(wǎng)絡(luò)已經(jīng)被訓(xùn)練出來(lái)[12]。

The typical use of convolutional networks is on classification tasks, where the output to an image is a single class label. However, in many visual tasks, especially in biomedical image processing, the desired output should include localization, i.e., a class label is supposed to be assigned to each pixel. Moreover, thousands of training images are usually beyond reach in biomedical tasks. Hence, Ciresan et al. [ 1] trained a network in a sliding-window setup to predict the class label of each pixel by providing a local region (patch) around that pixel as input. First, this network can localize. Secondly, the training data in terms of patches is much larger than the number of training images. The resulting network won the EM segmentation challenge at ISBI 2012 by a large margin. 卷積網(wǎng)絡(luò)的典型用途是在分類任務(wù)中，其中圖像的輸出是一個(gè)單一的類別標(biāo)簽。然而，在許多視覺(jué)任務(wù)中，特別是在生物醫(yī)學(xué)圖像處理中，期望的輸出包括定位，即應(yīng)為每個(gè)像素分配一個(gè)類別標(biāo)簽。此外，在生物醫(yī)學(xué)任務(wù)中，通常無(wú)法獲取成千上萬(wàn)的訓(xùn)練圖像。因此，Ciresan等人[1]在用滑動(dòng)窗口訓(xùn)練網(wǎng)絡(luò)，通過(guò)提供每個(gè)像素周?chē)木植繀^(qū)域（patch——每個(gè)patch包含很多pixel）作為輸入來(lái)預(yù)測(cè)每個(gè)像素的類別標(biāo)簽。首先，該網(wǎng)絡(luò)可以進(jìn)行定位。其次，以patch形式的訓(xùn)練數(shù)據(jù)遠(yuǎn)大于訓(xùn)練圖像的數(shù)量。最終得到的網(wǎng)絡(luò)在2012年的ISBI EM分割挑戰(zhàn)中以較大優(yōu)勢(shì)獲勝。 Fig. 1. U-net architecture (example for 32x32 pixels in the lowest resolution). Each blue box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the lower left edge of the box. White boxes represent copied feature maps. The arrows denote the different operations. 圖1. U-net架構(gòu)（最低分辨率為32x32像素的示例）。每個(gè)藍(lán)色框代表一個(gè)多通道的特征圖。通道數(shù)在框的頂部標(biāo)示。x-y大小在框的左下角提供。白色框代表復(fù)制的特征圖。箭頭表示不同的操作，如右下角所示。

Obviously, the strategy in Ciresan et al. [1] has two drawbacks.

First, it is quite slow because the network must be run separately for each patch, and there is a lot of redundancy due to overlapping patches.Secondly, there is a trade-off between localization accuracy and the use of context. Larger patches require more max-pooling layers that reduce the localization accuracy, while small patches allow the network to see only little context.

More recent approaches [11,4] proposed a classifier output that takes into account the features from multiple layers. Good localization and the use of context are possible at the same time. 顯然，Ciresan等人的策略[1]有兩個(gè)缺點(diǎn)。

首先，它相當(dāng)慢，因?yàn)榫W(wǎng)絡(luò)必須為每個(gè)patch單獨(dú)運(yùn)行，且由于patch重疊導(dǎo)致大量冗余。其次，定位精度和上下文使用之間存在權(quán)衡。較大的patch需要更多的最大池化層，這會(huì)降低定位精度，而小patch讓網(wǎng)絡(luò)只能看到很少的上下文。

更近期的方法[11,4]提出了一個(gè)考慮了多層特征的分類器輸出。好的定位和上下文的使用可以同時(shí)實(shí)現(xiàn)。

In this paper, we build upon a more elegant architecture, the so-called “fully convolutional network” [9]. We modify and extend this architecture such that it works with very few training images and yields more precise segmentations; see Figure 1. The main idea in [9] is to supplement a usual contracting network by successive layers, where pooling operators are replaced by upsampling operators. Hence, these layers increase the resolution of the output. In order to localize, high resolution features from the contracting path are combined with the upsampled output. A successive convolution lay 在本文中，我們構(gòu)建了一個(gè)更為優(yōu)雅的架構(gòu)，即所謂的“全卷積網(wǎng)絡(luò)”[9]。我們修改并擴(kuò)展了這一架構(gòu)，使其可以使用非常少量的訓(xùn)練圖像，并產(chǎn)生更精確的分割；見(jiàn)圖1。[9]中的主要思想是通過(guò)連續(xù)層來(lái)補(bǔ)充一個(gè)常規(guī)的收縮網(wǎng)絡(luò)（successive layers），在這些層中，池化操作（pooling operators）被上采樣操作（upsampling operators）替代。因此，這些層增加了輸出的分辨率。為了實(shí)現(xiàn)定位，來(lái)自收縮路徑的高分辨率特征與上采樣的輸出相結(jié)合。一個(gè)連續(xù)的卷積層 Fig. 2. Overlap-tile strategy for seamless segmentation of arbitrary large images (here segmentation of neuronal structures in EM stacks). Prediction of the segmentation in the yellow area, requires image data within the blue area as input. Missing input data is extrapolated by mirroring 圖 2. 無(wú)縫分割任意大圖像的重疊平鋪策略（這里是對(duì)電子顯微鏡堆疊中神經(jīng)結(jié)構(gòu)的分割）。預(yù)測(cè)黃色區(qū)域內(nèi)的分割需要藍(lán)色區(qū)域內(nèi)的圖像數(shù)據(jù)作為輸入。缺失的輸入數(shù)據(jù)通過(guò)鏡像法進(jìn)行外推。

One important modification in our architecture is that in the upsampling part we have also a large number of feature channels, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting path, and yields a u-shaped architecture. The network does not have any fully connected layers and only uses the valid part of each convolution, i.e., the segmentation map only contains the pixels, for which the full context is available in the input image. This strategy allows the seamless segmentation of arbitrarily large images by an overlap-tile strategy (see Figure 2). To predict the pixels in the border region of the image, the missing context is extrapolated by mirroring the input image. This tiling strategy is important to apply the network to large images, since otherwise the resolution would be limited by the GPU memory.

我們架構(gòu)中的一個(gè)重要改進(jìn)是，在上采樣部分我們也有大量的特征通道，這使得網(wǎng)絡(luò)能夠?qū)⑸舷挛男畔鞑サ礁叻直媛实膶?。因此，擴(kuò)展路徑（the expansive path）或多或少地對(duì)稱于收縮路徑（the contracting path），并且產(chǎn)生了一個(gè)U形的架構(gòu)。該網(wǎng)絡(luò)沒(méi)有任何全連接層，并且只使用每個(gè)卷積的有效部分，即分割圖僅包含輸入圖像中具有完整上下文的像素。這種策略通過(guò)重疊-平鋪的方法，實(shí)現(xiàn)對(duì)任意大小圖像的無(wú)縫分割。（見(jiàn)圖2）。為了預(yù)測(cè)圖像邊緣區(qū)域的像素，通過(guò)鏡像輸入圖像來(lái)推斷缺失的上下文。這種 tiling strategy對(duì)于將網(wǎng)絡(luò)應(yīng)用于大圖像非常重要，否則分辨率將受限于GPU內(nèi)存。。

As for our tasks there is very little training data available, we use excessive data augmentation by applying elastic deformations to the available training images. This allows the network to learn invariance to such deformations, without the need to see these transformations in the annotated image corpus. This is particularly important in biomedical segmentation, since deformation used to be the most common variation in tissue and realistic deformations can be simulated efficiently. The value of data augmentation for learning invariance has been shown in Dosovitskiy et al. [2] in the scope of unsupervised feature learning.

鑒于我們的任務(wù)可用的訓(xùn)練數(shù)據(jù)非常有限，我們通過(guò)對(duì)現(xiàn)有訓(xùn)練圖像應(yīng)用彈性變形（elastic deformations）來(lái)進(jìn)行過(guò)度的數(shù)據(jù)增強(qiáng)。這使得網(wǎng)絡(luò)能夠?qū)W習(xí)對(duì)這些變形的不變性，而不需要在標(biāo)注的圖像語(yǔ)料庫(kù)中看到這些變換。這在生物醫(yī)學(xué)分割中尤其重要，因?yàn)樽冃纬３Ｊ墙M織中最常見(jiàn)的變化，而且可以有效地模擬真實(shí)的變形。Dosovitskiy等人[2]在無(wú)監(jiān)督特征學(xué)習(xí)的范疇內(nèi)，已經(jīng)展示了數(shù)據(jù)增強(qiáng)對(duì)學(xué)習(xí)不變性的價(jià)值。

Another challenge in many cell segmentation tasks is the separation of touching objects of the same class; see Figure 3. To this end, we propose the use of a weighted loss, where the separating background labels between touching cells obtain a large weight in the loss function.

在許多細(xì)胞分割任務(wù)中的另一個(gè)挑戰(zhàn)是分離同一類別中相互接觸的對(duì)象；參見(jiàn)圖3。為此，我們提出使用加權(quán)損失，其中touching cells之間分隔的背景標(biāo)簽在損失函數(shù)中獲得較大的權(quán)重。

The resulting network is applicable to various biomedical segmentation problems. In this paper, we show results on the segmentation of neuronal structures in EM stacks (an ongoing competition started at ISBI 2012), where we outperformed the network of Ciresan et al. [1]. Furthermore, we show results for cell segmentation in light microscopy images from the ISBI cell tracking challenge 2015. Here we won with a large margin on the two most challenging 2D transmitted light datasets.

生成的網(wǎng)絡(luò)適用于各種生物醫(yī)學(xué)分割問(wèn)題。在本文中，我們展示了在EM stacks(EM堆棧)中神經(jīng)結(jié)構(gòu)分割的結(jié)果（這是一個(gè)始于2012年國(guó)際生物成像學(xué)會(huì)(ISBI)的持續(xù)競(jìng)賽），我們的性能超越了Ciresan等人[1]的網(wǎng)絡(luò)。此外，我們還展示了來(lái)自ISBI細(xì)胞跟蹤挑戰(zhàn)賽2015的光鏡圖像中的細(xì)胞分割結(jié)果。在這兩個(gè)最具挑戰(zhàn)性的2D透射光數(shù)據(jù)集上，我們以很大的優(yōu)勢(shì)獲勝。

Network Architecture 網(wǎng)絡(luò)架構(gòu)

The network architecture is illustrated in Figure 1. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling. At each downsampling step we double the number of feature channels. Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. At the final layer a 1x1 convolution is used to map each 64- component feature vector to the desired number of classes. In total the network has 23 convolutional layers.

如圖1所示。它由一個(gè)收縮路徑（左側(cè)）和一個(gè) 擴(kuò)展路徑（右側(cè)）組成。

收縮路徑遵循典型的卷積網(wǎng)絡(luò)架構(gòu)。它由兩個(gè)3x3卷積（不填充卷積）的重復(fù)應(yīng)用組成，每個(gè)卷積后跟一個(gè)修正線性單元（ReLU）和一個(gè)2x2最大池化操作，步幅為2，用于下采樣。在每個(gè)下采樣步驟中，我們將特征通道數(shù)量加倍。擴(kuò)展路徑中的每個(gè)步驟由特征圖的上采樣后跟一個(gè)2x2卷積（“上卷積”）組成，該卷積將特征通道數(shù)量減半，然后將其與從收縮路徑中對(duì)應(yīng)裁剪的特征圖進(jìn)行串聯(lián)，并進(jìn)行兩個(gè)3x3卷積，每個(gè)卷積后跟一個(gè)ReLU。由于每次卷積都會(huì)導(dǎo)致邊界像素的丟失，因此裁剪是必要的。

在最后一層，使用1x1卷積將每個(gè)64個(gè)分量的特征向量映射到所需的類別數(shù)量。總體上，該網(wǎng)絡(luò)具有23個(gè)卷積層。

To allow a seamless tiling of the output segmentation map (see Figure 2), it is important to select the input tile size such that all 2x2 max-pooling operations are applied to a layer with an even x- and y-size.

為了實(shí)現(xiàn)輸出分割圖的seamless tiling（見(jiàn)圖2），重要的是選擇輸入卷積核大小，使得所有2x2最大池化操作應(yīng)用于具有偶數(shù)x和y大小的層。

Training 訓(xùn)練

The input images and their corresponding segmentation maps are used to train the network with the stochastic gradient descent implementation of Caffe [6]. Due to the unpadded convolutions, the output image is smaller than the input by a constant border width. To minimize the overhead and make maximum use of the GPU memory, we favor large input tiles over a large batch size and hence reduce the batch to a single image. Accordingly we use a high momentum (0.99) such that a large number of the previously seen training samples determine the update in the current optimization step

使用輸入圖像及其相應(yīng)的分割地圖來(lái)訓(xùn)練網(wǎng)絡(luò)，采用Caffe的隨機(jī)梯度下降實(shí)現(xiàn)[6]。由于無(wú)填充卷積，輸出圖像比輸入圖像小一個(gè)常數(shù)邊框?qū)挾?。為了最小化開(kāi)銷(xiāo)并充分利用GPU內(nèi)存，我們更喜歡使用較大的輸入瓷磚而不是較大的批量大小，因此將批量大小減小為單個(gè)圖像。相應(yīng)地，我們使用高動(dòng)量（0.99），以便大量先前看到的訓(xùn)練樣本決定當(dāng)前優(yōu)化步驟中的更新。

能量函數(shù)通過(guò)對(duì)最終特征圖進(jìn)行pixel-wise soft-max計(jì)算，并結(jié)合交叉熵?fù)p失函數(shù)來(lái)計(jì)算。soft-max方程定義如下：

(

)

(

)

(

∑

′

(

′

(

)

P_{k}(x) = exp(a_{k}(x))/(\sum\limits_{k'=1}^{K}exp(a_{k'}(x))

Pk?(x)=exp(ak?(x))/(k′=1∑K?exp(ak′?(x))

其中

(

)

(

∈

（

)

a_{k}(x) ( x∈Ω（Ω?Z_{2}))

ak?(x)(x∈Ω（Ω?Z2?))表示像素位置處特征通道k的激活值。 K是類別的數(shù)量， pk(x) 是近似的最大函數(shù)。即對(duì)于具有最大激活值ak(x)的k，pk(x)≈1，對(duì)于其他所有的k，pk(x)≈0。

交叉熵?fù)p失函數(shù)會(huì)懲罰

(

)

(

)

p_{l(x)}(x)

pl(x)?(x)與1之間的偏差。

能量函數(shù)：

∑

∈

(

)

(

)

(

)

E = \sum\limits_{x∈Ω}w(x)log(p_{l(x)}(x))

E=x∈Ω∑?w(x)log(pl(x)?(x))

where l : Ω → {1, . . . , K} is the true label of each pixel and w : Ω → R is a weight map that we introduced to give some pixels more importance in the training.

其中，l : Ω → {1, . . . , K} 是每個(gè)像素的真實(shí)標(biāo)簽，w : Ω → R 是我們引入的權(quán)重映射，用于在訓(xùn)練中賦予一些像素更重要的作用。 Fig. 3. HeLa cells on glass recorded with DIC (differential interference contrast) microscopy. (a) raw image. (b) overlay with ground truth segmentation. Different colors indicate different instances of the HeLa cells. ? generated segmentation mask (white: foreground, black: background). (d) map with a pixel-wise loss weight to force the network to learn the border pixels. 圖3. 用差分干涉對(duì)顯微鏡下的玻璃上的HeLa細(xì)胞進(jìn)行記錄。(a) 原始圖像。(b) 與基本真實(shí)分割疊加。不同的顏色表示HeLa細(xì)胞的不同實(shí)例。? 生成的分割掩模（白色：前景，黑色：背景）。(d) 用于強(qiáng)制網(wǎng)絡(luò)學(xué)習(xí)邊界像素的像素級(jí)損失權(quán)重映射。

We pre-compute the weight map for each ground truth segmentation to compensate the different frequency of pixels from a certain class in the training data set, and to force the network to learn the small separation borders that we introduce between touching cells (See Figure 3c and d). 我們預(yù)先計(jì)算每個(gè)真實(shí)標(biāo)簽分割的權(quán)重映射，以彌補(bǔ)訓(xùn)練數(shù)據(jù)集中某個(gè)類別像素的不同頻率，并迫使網(wǎng)絡(luò)學(xué)習(xí)我們?cè)诮佑|細(xì)胞之間引入的小分隔邊界（參見(jiàn)圖3c和d）。

The separation border is computed using morphological operations. The weight map is then computed as 分割邊界是通過(guò)形態(tài)學(xué)操作計(jì)算得出的。然后，權(quán)重映射被計(jì)算為：

(

)

(

)

(

)

(

)

w(x)=w_{c}(x)+w_{}*exp(-((d_{1}(x)+d_{2}(x))^2)/2\sigma^{2})

w(x)=wc?(x)+w??exp(?((d1?(x)+d2?(x))2)/2σ2)

where wc : Ω → R is the weight map to balance the class frequencies, d1 : Ω → R denotes the distance to the border of the nearest cell and d2 : Ω → R the distance to the border of the second nearest cell. In our experiments we set w0 = 10 and σ ≈ 5 pixels 其中，

wc：Ω→R是用于平衡類別頻率的權(quán)重映射，d1：Ω→R表示到最近細(xì)胞邊界的距離，d2：Ω→R表示到第二近細(xì)胞邊界的距離。

在我們的實(shí)驗(yàn)中，我們?cè)O(shè)置w0 = 10和σ≈5像素。

In deep networks with many convolutional layers and different paths through the network, a good initialization of the weights is extremely important. Otherwise, parts of the network might give excessive activations, while other parts never contribute. Ideally the initial weights should be adapted such that each feature map in the network has approximately unit variance. For a network with our architecture (alternating convolution and ReLU layers) this can be achieved by drawing the initial weights from a Gaussian distribution with a standard deviation of p 2/N, where N denotes the number of incoming nodes of one neuron [5]. E.g. for a 3x3 convolution and 64 feature channels in the previous layer N = 9 · 64 = 576.

在具有許多卷積層和網(wǎng)絡(luò)中的不同路徑的深度網(wǎng)絡(luò)中，良好的權(quán)重初始化非常重要。否則，網(wǎng)絡(luò)的某些部分可能會(huì)給出過(guò)高的激活，而其他部分從不起作用。理想情況下，初始權(quán)重應(yīng)該適應(yīng)于網(wǎng)絡(luò)中的每個(gè)特征圖具有大約單位方差。對(duì)于我們的架構(gòu)網(wǎng)絡(luò)（交替的卷積和ReLU層），這可以通過(guò)從具有標(biāo)準(zhǔn)差為p 2/N的高斯分布中抽取初始權(quán)重來(lái)實(shí)現(xiàn)，其中N表示一個(gè)神經(jīng)元的輸入節(jié)點(diǎn)數(shù)[5]。例如，對(duì)于3x3卷積和上一層的64個(gè)特征通道，N = 9 · 64 = 576。

Data Augmentation 數(shù)據(jù)增強(qiáng)

Data augmentation is essential to teach the network the desired invariance and robustness properties, when only few training samples are available. In case of microscopical images we primarily need shift and rotation invariance as well as robustness to deformations and gray value variations. Especially random elastic deformations of the training samples seem to be the key concept to train a segmentation network with very few annotated images. We generate smooth deformations using random displacement vectors on a coarse 3 by 3 grid. The displacements are sampled from a Gaussian distribution with 10 pixels standard deviation. Per-pixel displacements are then computed using bicubic interpolation. Drop-out layers at the end of the contracting path perform further implicit data augmentation.

當(dāng)只有少量訓(xùn)練樣本可用時(shí)，數(shù)據(jù)增強(qiáng)對(duì)于教導(dǎo)網(wǎng)絡(luò)所需的不變性和魯棒性是必不可少的。在顯微鏡圖像的情況下，我們主要需要平移和旋轉(zhuǎn)不變性以及對(duì)變形和灰度變化的魯棒性。特別是對(duì)于只有很少標(biāo)注圖像的分割網(wǎng)絡(luò)，隨機(jī)彈性變形訓(xùn)練樣本似乎是訓(xùn)練的關(guān)鍵概念。我們使用在粗糙的3x3網(wǎng)格上的隨機(jī)位移向量來(lái)生成平滑變形。位移是從標(biāo)準(zhǔn)差為10個(gè)像素的高斯分布中采樣得到的。然后使用雙三次插值計(jì)算每個(gè)像素的位移。在收縮路徑結(jié)束時(shí)的Drop-out層執(zhí)行進(jìn)一步的隱式數(shù)據(jù)增強(qiáng)。

Experiments

We demonstrate the application of the u-net to three different segmentation tasks. The first task is the segmentation of neuronal structures in electron microscopic recordings. An example of the data set and our obtained segmentation is displayed in Figure 2. We provide the full result as Supplementary Material. The data set is provided by the EM segmentation challenge [14] that was started at ISBI 2012 and is still open for new contributions. The training data is a set of 30 images (512x512 pixels) from serial section transmission electron microscopy of the Drosophila first instar larva ventral nerve cord (VNC). Each image comes with a corresponding fully annotated ground truth segmentation map for cells (white) and membranes (black). The test set is publicly available, but its segmentation maps are kept secret. An evaluation can be obtained by sending the predicted membrane probability map to the organizers. The evaluation is done by thresholding the map at 10 different levels and computation of the “warping error”, the “Rand error” and the “pixel error” [14].

我們展示了u-net在三個(gè)不同的分割任務(wù)中的應(yīng)用。第一個(gè)任務(wù)是電子顯微鏡記錄中神經(jīng)結(jié)構(gòu)的分割。數(shù)據(jù)集的示例和我們得到的分割結(jié)果如圖2所示。我們提供完整的結(jié)果作為補(bǔ)充材料。該數(shù)據(jù)集由EM分割挑戰(zhàn)[14]提供，該挑戰(zhàn)始于2012年的ISBI，并仍然對(duì)新的貢獻(xiàn)開(kāi)放。訓(xùn)練數(shù)據(jù)是來(lái)自果蠅一齡幼蟲(chóng)腹神經(jīng)索(VNC)的連續(xù)切片透射電子顯微鏡的30張圖像(512x512像素)。每個(gè)圖像都附帶有相應(yīng)的完全注釋的細(xì)胞(白色)和膜(黑色)的地面真值分割圖。測(cè)試集是公開(kāi)可用的，但其分割圖是保密的?？梢酝ㄟ^(guò)將預(yù)測(cè)的膜概率圖發(fā)送給組織者來(lái)獲得評(píng)估。評(píng)估是通過(guò)在10個(gè)不同的閾值下對(duì)圖像進(jìn)行二值化，并計(jì)算“彎曲誤差”、“Rand誤差”和“像素誤差”[14]來(lái)完成的。

The u-net (averaged over 7 rotated versions of the input data) achieves without any further pre- or postprocessing a warping error of 0.0003529 (the new best score, see Table 1) and a rand-error of 0.0382. U-net（對(duì)輸入數(shù)據(jù)的7個(gè)旋轉(zhuǎn)版本進(jìn)行平均）在沒(méi)有進(jìn)一步的預(yù)處理或后處理的情況下，達(dá)到了0.0003529的彎曲誤差（新的最佳得分，見(jiàn)表1）和0.0382的Rand誤差。

This is significantly better than the sliding-window convolutional network result by Ciresan et al. [1], whose best submission had a warping error of 0.000420 and a rand error of 0.0504. In terms of rand error the only better performing algorithms on this data set use highly data set specific post-processing methods1 applied to the probability map of Ciresan et al. [1].

這比Ciresan等人的滑動(dòng)窗口卷積網(wǎng)絡(luò)結(jié)果要好得多[1]，其最佳提交的彎曲誤差為0.000420，Rand誤差為0.0504。就Rand誤差而言，在這個(gè)數(shù)據(jù)集上表現(xiàn)更好的算法只使用了高度數(shù)據(jù)集特定的后處理方法，應(yīng)用于Ciresan等人的概率圖[1]。 Fig. 4. Result on the ISBI cell tracking challenge. (a) part of an input image of the “PhC-U373” data set. (b) Segmentation result (cyan mask) with manual ground truth (yellow border) ? input image of the “DIC-HeLa” data set. (d) Segmentation result (random colored masks) with manual ground truth (yellow border). 圖4. ISBI細(xì)胞跟蹤挑戰(zhàn)賽結(jié)果。（a）“PhC-U373”數(shù)據(jù)集的部分輸入圖像。（b）手動(dòng)標(biāo)注的分割結(jié)果（青色掩膜）和人工標(biāo)注的地面真實(shí)值（黃色邊界）。（c）“DIC-HeLa”數(shù)據(jù)集的輸入圖像。（d）隨機(jī)著色的分割結(jié)果（隨機(jī)顏色掩膜）和人工標(biāo)注的地面真實(shí)值（黃色邊界）。

Table 2. Segmentation results (IOU) on the ISBI cell tracking challenge 2015. 表2. 2015年ISBI細(xì)胞跟蹤挑戰(zhàn)賽的分割結(jié)果（IOU）。 We also applied the u-net to a cell segmentation task in light microscopic images. This segmenation task is part of the ISBI cell tracking challenge 2014 and 2015 [10,13]. The first data set “PhC-U373”2 contains Glioblastoma-astrocytoma U373 cells on a polyacrylimide substrate recorded by phase contrast microscopy (see Figure 4a,b and Supp. Material). It contains 35 partially annotated training images. Here we achieve an average IOU (“intersection over union”) of 92%, which is significantly better than the second best algorithm with 83% (see Table 2). The second data set “DIC-HeLa”3 are HeLa cells on a flat glass recorded by differential interference contrast (DIC) microscopy (see Figure 3, Figure 4c,d and Supp. Material). It contains 20 partially annotated training images. Here we achieve an average IOU of 77.5% which is significantly better than the second best algorithm with 46%.

我們還將U-Net應(yīng)用于光學(xué)顯微圖像中的細(xì)胞分割任務(wù)。這個(gè)分割任務(wù)是ISBI細(xì)胞跟蹤挑戰(zhàn)賽2014年和2015年的一部分。第一個(gè)數(shù)據(jù)集“PhC-U373”包含通過(guò)相差顯微鏡記錄的Glioblastoma-astrocytoma U373細(xì)胞在聚丙烯酰胺基質(zhì)上的圖像（見(jiàn)圖4a、b和補(bǔ)充材料）。它包含了35個(gè)部分注釋的訓(xùn)練圖像。在這里，我們實(shí)現(xiàn)了平均IOU（“交并比”）為92%，明顯優(yōu)于第二好的算法的83%（見(jiàn)表2）。第二個(gè)數(shù)據(jù)集“DIC-HeLa”是通過(guò)差干涉對(duì)比顯微鏡記錄的HeLa細(xì)胞在平坦玻璃上的圖像（見(jiàn)圖3、圖4c、d和補(bǔ)充材料）。它包含了20個(gè)部分注釋的訓(xùn)練圖像。在這里，我們實(shí)現(xiàn)了平均IOU為77.5%，明顯優(yōu)于第二好的算法的46%。

Conclusion

The u-net architecture achieves very good performance on very different biomedical segmentation applications. Thanks to data augmentation with elastic deformations, it only needs very few annotated images and has a very reasonable training time of only 10 hours on a NVidia Titan GPU (6 GB). We provide the full Caffe[6]-based implementation and the trained networks4 . We are sure that the u-net architecture can be applied easily to many more tasks

U-Net架構(gòu)在不同的生物醫(yī)學(xué)分割應(yīng)用中取得了非常好的性能。通過(guò)使用彈性變形進(jìn)行數(shù)據(jù)增強(qiáng)，它只需要很少的標(biāo)注圖像，并且在NVidia Titan GPU（6 GB）上的訓(xùn)練時(shí)間非常合理，只需10小時(shí)。我們提供了基于Caffe的完整實(shí)現(xiàn)和訓(xùn)練好的網(wǎng)絡(luò)。我們相信U-Net架構(gòu)可以很容易地應(yīng)用于更多的任務(wù)中。

柚子快報(bào)激活碼778899分享：論文閱讀醫(yī)學(xué)圖像分割：U

http://yzkb.51969.com/