tf.nn.conv2d는 tensorflow에서 무엇을합니까?
나는 tf.nn.conv2d
여기 에 대한 tensorflow의 문서를보고 있었습니다 . 그러나 나는 그것이 무엇을하는지 또는 달성하려는 것을 이해할 수 없습니다. 문서에 나와 있습니다.
# 1 : 필터를 모양이있는 2 차원 행렬로 평탄화
[filter_height * filter_width * in_channels, output_channels]
.
이제 무엇을합니까? 그 요소 별 곱셈입니까 아니면 일반 행렬 곱셈입니까? 또한 문서에 언급 된 다른 두 가지 사항을 이해할 수 없었습니다. 나는 아래에 그것들을 썼다 :
# 2 : 입력 텐서에서 이미지 패치를 추출하여 모양의 가상 텐서를 형성
[batch, out_height, out_width, filter_height * filter_width * in_channels]
.# 3 : 각 패치에 대해 필터 매트릭스와 이미지 패치 벡터를 오른쪽으로 곱합니다.
누군가가 예제를 제공 할 수 있다면 매우 도움이 될 것입니다.
작은 부분을 코딩하고 작업 모양을 인쇄하려고했습니다. 그래도 이해할 수 없습니다.
나는 이와 같은 것을 시도했다 :
op = tf.shape(tf.nn.conv2d(tf.random_normal([1,10,10,10]),
tf.random_normal([2,10,10,10]),
strides=[1, 2, 2, 1], padding='SAME'))
with tf.Session() as sess:
result = sess.run(op)
print(result)
나는 컨볼 루션 신경망의 비트와 조각을 이해합니다. 나는 그들을 여기 에서 공부했다 . 그러나 tensorflow의 구현은 내가 기대 한 것이 아닙니다. 그래서 질문을 제기했습니다.
편집 : 그래서 훨씬 간단한 코드를 구현했습니다. 그러나 나는 무슨 일이 일어나고 있는지 알 수 없습니다. 나는 결과가 어떻게 이런지를 의미합니다. 어떤 프로세스 가이 출력을 산출하는지 말해 줄 수 있다면 매우 도움이 될 것입니다.
input = tf.Variable(tf.random_normal([1,2,2,1]))
filter = tf.Variable(tf.random_normal([1,1,1,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
print("input")
print(input.eval())
print("filter")
print(filter.eval())
print("result")
result = sess.run(op)
print(result)
산출
input
[[[[ 1.60314465]
[-0.55022103]]
[[ 0.00595062]
[-0.69889867]]]]
filter
[[[[-0.59594476]]]]
result
[[[[-0.95538563]
[ 0.32790133]]
[[-0.00354624]
[ 0.41650501]]]]
2D convolution is computed in a similar way one would calculate 1D convolution: you slide your kernel over the input, calculate the element-wise multiplications and sum them up. But instead of your kernel/input being an array, here they are matrices.
In the most basic example there is no padding and stride=1. Let's assume your input
and kernel
are:
When you use your kernel you will receive the following output: , which is calculated in the following way:
- 14 = 4 * 1 + 3 * 0 + 1 * 1 + 2 * 2 + 1 * 1 + 0 * 0 + 1 * 0 + 2 * 0 + 4 * 1
- 6 = 3 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 0 * 1 + 1 * 0 + 2 * 0 + 4 * 0 + 1 * 1
- 6 = 2 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 2 * 1 + 4 * 0 + 3 * 0 + 1 * 0 + 0 * 1
- 12 = 1 * 1 + 0 * 0 + 1 * 1 + 2 * 2 + 4 * 1 + 1 * 0 + 1 * 0 + 0 * 0 + 2 * 1
TF's conv2d function calculates convolutions in batches and uses a slightly different format. For an input it is [batch, in_height, in_width, in_channels]
for the kernel it is [filter_height, filter_width, in_channels, out_channels]
. So we need to provide the data in the correct format:
import tensorflow as tf
k = tf.constant([
[1, 0, 1],
[2, 1, 0],
[0, 0, 1]
], dtype=tf.float32, name='k')
i = tf.constant([
[4, 3, 1, 0],
[2, 1, 0, 1],
[1, 2, 4, 1],
[3, 1, 0, 2]
], dtype=tf.float32, name='i')
kernel = tf.reshape(k, [3, 3, 1, 1], name='kernel')
image = tf.reshape(i, [1, 4, 4, 1], name='image')
Afterwards the convolution is computed with:
res = tf.squeeze(tf.nn.conv2d(image, kernel, [1, 1, 1, 1], "VALID"))
# VALID means no padding
with tf.Session() as sess:
print sess.run(res)
And will be equivalent to the one we calculated by hand.
For examples with padding/strides, take a look here.
Ok I think this is about the simplest way to explain it all.
Your example is 1 image, size 2x2, with 1 channel. You have 1 filter, with size 1x1, and 1 channel (size is height x width x channels x number of filters).
For this simple case the resulting 2x2, 1 channel image (size 1x2x2x1, number of images x height x width x x channels) is the result of multiplying the filter value by each pixel of the image.
Now let's try more channels:
input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([1,1,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
Here the 3x3 image and the 1x1 filter each have 5 channels. The resulting image will be 3x3 with 1 channel (size 1x3x3x1), where the value of each pixel is the dot product across channels of the filter with the corresponding pixel in the input image.
Now with a 3x3 filter
input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
Here we get a 1x1 image, with 1 channel (size 1x1x1x1). The value is the sum of the 9, 5-element dot products. But you could just call this a 45-element dot product.
Now with a bigger image
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
The output is a 3x3 1-channel image (size 1x3x3x1). Each of these values is a sum of 9, 5-element dot products.
Each output is made by centering the filter on one of the 9 center pixels of the input image, so that none of the filter sticks out. The x
s below represent the filter centers for each output pixel.
.....
.xxx.
.xxx.
.xxx.
.....
Now with "SAME" padding:
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
This gives a 5x5 output image (size 1x5x5x1). This is done by centering the filter at each position on the image.
Any of the 5-element dot products where the filter sticks out past the edge of the image get a value of zero.
So the corners are only sums of 4, 5-element dot products.
Now with multiple filters.
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
This still gives a 5x5 output image, but with 7 channels (size 1x5x5x7). Where each channel is produced by one of the filters in the set.
Now with strides 2,2:
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')
Now the result still has 7 channels, but is only 3x3 (size 1x3x3x7).
This is because instead of centering the filters at every point on the image, the filters are centered at every other point on the image, taking steps (strides) of width 2. The x
's below represent the filter center for each output pixel, on the input image.
x.x.x
.....
x.x.x
.....
x.x.x
And of course the first dimension of the input is the number of images so you can apply it over a batch of 10 images, for example:
input = tf.Variable(tf.random_normal([10,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')
This performs the same operation, for each image independently, giving a stack of 10 images as the result (size 10x3x3x7)
Just to add to the other answers, you should think of the parameters in
filter = tf.Variable(tf.random_normal([3,3,5,7]))
as '5' corresponding to the number of channels in each filter. Each filter is a 3d cube, with a depth of 5. Your filter depth must correspond to your input image's depth. The last parameter, 7, should be thought of as the number of filters in the batch. Just forget about this being 4D, and instead imagine that you have a set or a batch of 7 filters. What you do is create 7 filter cubes with dimensions (3,3,5).
It is a lot easier to visualize in the Fourier domain since convolution becomes point-wise multiplication. For an input image of dimensions (100,100,3) you can rewrite the filter dimensions as
filter = tf.Variable(tf.random_normal([100,100,3,7]))
In order to obtain one of the 7 output feature maps, we simply perform the point-wise multiplication of the filter cube with the image cube, then we sum the results across the channels/depth dimension (here it's 3), collapsing to a 2d (100,100) feature map. Do this with each filter cube, and you get 7 2D feature maps.
I tried to implement conv2d (for my studying). Well, I wrote that:
def conv(ix, w):
# filter shape: [filter_height, filter_width, in_channels, out_channels]
# flatten filters
filter_height = int(w.shape[0])
filter_width = int(w.shape[1])
in_channels = int(w.shape[2])
out_channels = int(w.shape[3])
ix_height = int(ix.shape[1])
ix_width = int(ix.shape[2])
ix_channels = int(ix.shape[3])
filter_shape = [filter_height, filter_width, in_channels, out_channels]
flat_w = tf.reshape(w, [filter_height * filter_width * in_channels, out_channels])
patches = tf.extract_image_patches(
ix,
ksizes=[1, filter_height, filter_width, 1],
strides=[1, 1, 1, 1],
rates=[1, 1, 1, 1],
padding='SAME'
)
patches_reshaped = tf.reshape(patches, [-1, ix_height, ix_width, filter_height * filter_width * ix_channels])
feature_maps = []
for i in range(out_channels):
feature_map = tf.reduce_sum(tf.multiply(flat_w[:, i], patches_reshaped), axis=3, keep_dims=True)
feature_maps.append(feature_map)
features = tf.concat(feature_maps, axis=3)
return features
Hope I did it properly. Checked on MNIST, had very close results (but this implementation is slower). I hope this helps you.
In addition to other answers, conv2d operation is operating in c++ (cpu) or cuda for gpu machines that requires to flatten and reshape data in certain way and use gemmBLAS or cuBLAS(cuda) matrix multiplication.
참고URL : https://stackoverflow.com/questions/34619177/what-does-tf-nn-conv2d-do-in-tensorflow
'programing tip' 카테고리의 다른 글
제출하기 전에 어떻게해야합니까? (0) | 2020.07.10 |
---|---|
node.js와 Python 결합 (0) | 2020.07.10 |
가상 디스크의 UUID를 어떻게 변경합니까? (0) | 2020.07.10 |
활동 장면 애니메이션 전환 중에 상태 표시 줄과 탐색 표시 줄에 애니메이션이 표시되지 않도록하려면 어떻게합니까? (0) | 2020.07.10 |
SqlDataAdapter 및 SqlDataReader (0) | 2020.07.10 |