问题描述
我在图像和卷积核周围使用零填充,将它们转换为傅立叶域,然后将它们反转回来以获得卷积图像,请参阅下面的代码.然而,结果是错误的.我期待一个模糊的图像,但输出是四个移位的四分之一.为什么输出错误,我该如何修复代码?
I'm using zero padding around my image and convolution kernel, converting them to the Fourier domain, and inverting them back to get the convolved image, see code below. The result, however, is wrong. I was expecting a blurred image, but the output is four shifted quarters. Why is the output wrong, and how can I fix the code?
输入图像:
卷积结果:
from PIL import Image,ImageDraw,ImageOps,ImageFilter
import numpy as np
from scipy import fftpack
from copy import deepcopy
import imageio
## STEP 1 ##
im1=Image.open("pika.jpeg")
im1=ImageOps.grayscale(im1)
im1.show()
print("s",im1.size)
## working on this image array
im_W=np.array(im1).T
print("before",im_W.shape)
if(im_W.shape[0]%2==0):
im_W=np.pad(im_W, ((1,0),(0,0)), 'constant')
if(im_W.shape[1]%2==0):
im_W=np.pad(im_W, ((0,0),(1,0)), 'constant')
print("after",im_W.shape)
Boxblur=np.array([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
dim=Boxblur.shape[0]
##padding before frequency domain multipication
pad_size=(Boxblur.shape[0]-1)/2
pad_size=int(pad_size)
##padded the image(starts here)
p_im=np.pad(im_W, ((pad_size,pad_size),(pad_size,pad_size)), 'constant')
t_b=(p_im.shape[0]-dim)/2
l_r=(p_im.shape[1]-dim)/2
t_b=int(t_b)
l_r=int(l_r)
##padded the image(ends here)
## padded the kernel(starts here)
k_im=np.pad(Boxblur, ((t_b,t_b),(l_r,l_r)), 'constant')
print("hjhj",k_im)
print("kernel",k_im.shape)
##fourier transforms image and kernel
fft_im = fftpack.fftshift(fftpack.fft2(p_im))
fft_k = fftpack.fftshift(fftpack.fft2(k_im))
con_in_f=fft_im*fft_k
ifft2 = abs(fftpack.ifft2(fftpack.ifftshift(con_in_f)))
convolved=(np.log(abs(ifft2))* 255 / np.amax(np.log(abs(ifft2)))).astype(np.uint8)
final=Image.fromarray(convolved.T)
final.show()
u=im1.filter(ImageFilter.Kernel((3,3), [1/9,1/9,1/9,1/9,1/9,1/9,1/9,1/9,1/9], scale=None, offset=0))
u.show()
推荐答案
离散傅立叶变换 (DFT) 以及通过扩展的 FFT(计算 DFT)的原点位于第一个元素(对于图像,左上角像素)用于 输入和输出.这就是我们经常在输出上使用fftshift
函数的原因,以便将原点移到我们更熟悉的位置(图像的中间).
The Discrete Fourier transform (DFT) and, by extension, the FFT (which computes the DFT) have the origin in the first element (for an image, the top-left pixel) for both the input and the output. This is the reason we often use the fftshift
function on the output, so as to shift the origin to a location more familiar to us (the middle of the image).
这意味着在将其传递给 FFT 函数之前,我们需要将 3x3 均匀加权模糊内核转换为如下所示:
This means that we need to transform a 3x3 uniform weighted blurring kernel to look like this before passing it to the FFT function:
1/9 1/9 0 0 ... 0 1/9
1/9 1/9 0 0 ... 0 1/9
0 0 0 0 ... 0 0
... ... ...
0 0 0 0 ... 0 0
1/9 1/9 0 0 ... 0 1/9
也就是说,内核的中间位于图像的左上角,中间上方和左侧的像素环绕并出现在图像的右端和下端.
That is, the middle of the kernel is at the top-left corner of the image, with the pixels above and to the left of the middle wrapping around and appearing at the right and bottom ends of the image.
我们可以使用 ifftshift
函数来做到这一点,该函数在填充后应用于内核.填充内核时,我们需要注意原点(内核的中间)位于内核映像k_im代码>.最初的原点是
[3,3]//2 == [1,1]
.通常,我们匹配的图像大小是偶数,例如 [256,256]
.原点将在 [256,256]//2 == [128,128]
.这意味着我们需要向左和向右(以及底部和顶部)填充不同的数量.我们需要小心计算这个填充:
We can do this using the ifftshift
function, applied to the kernel after padding. When padding the kernel, we need to take care that the origin (middle of the kernel) is at location k_im.shape // 2
(integer division), within the kernel image k_im
. Initially the origin is at [3,3]//2 == [1,1]
. Usually, the image whose size we're matching is even in size, for example [256,256]
. The origin there will be at [256,256]//2 == [128,128]
. This means that we need to pad a different amount to the left and to the right (and bottom and top). We need to be careful computing this padding:
sz = img.shape # the sizes we're matching
kernel = np.ones((3,3)) / 9
sz = (sz[0] - kernel.shape[0], sz[1] - kernel.shape[1]) # total amount of padding
kernel = np.pad(kernel, (((sz[0]+1)//2, sz[0]//2), ((sz[1]+1)//2, sz[1]//2)), 'constant')
kernel = fftpack.ifftshift(kernel)
请注意,输入图像 img
不需要填充(尽管如果您想强制使用 FFT 更便宜的尺寸,您可以这样做).也不需要在乘法之前对FFT的结果应用fftshift
,然后立即反转这个移位,这些移位是多余的.仅当您想显示傅立叶域图像时才应使用 fftshift
.最后,对过滤后的图像应用对数缩放是错误的.
Note that the input image, img
, does not need to be padded (though you can do this if you want to enforce a size for which the FFT is cheaper). There is also no need to apply fftshift
to the result of the FFT before multiplication, and then reverse this shift right after, these shifts are redundant. You should use fftshift
only if you want to display the Fourier domain image. Finally, applying logarithmic scaling to the filtered image is wrong.
生成的代码是(我使用 pyplot 进行显示,根本不使用 PIL):
The resulting code is (I'm using pyplot for display, not using PIL at all):
import numpy as np
from scipy import misc
from scipy import fftpack
import matplotlib.pyplot as plt
img = misc.face()[:,:,0]
kernel = np.ones((3,3)) / 9
sz = (img.shape[0] - kernel.shape[0], img.shape[1] - kernel.shape[1]) # total amount of padding
kernel = np.pad(kernel, (((sz[0]+1)//2, sz[0]//2), ((sz[1]+1)//2, sz[1]//2)), 'constant')
kernel = fftpack.ifftshift(kernel)
filtered = np.real(fftpack.ifft2(fftpack.fft2(img) * fftpack.fft2(kernel)))
plt.imshow(filtered, vmin=0, vmax=255)
plt.show()
请注意,我采用的是逆 FFT 的实部.虚部应该只包含非常接近于零的值,这是计算中舍入误差的结果.取绝对值虽然很常见,但并不正确.例如,您可能希望对包含负值的图像应用过滤器,或应用产生负值的过滤器.在这里取绝对值会产生伪影.如果逆 FFT 的输出包含显着不同于零的虚值,则填充滤波内核的方式存在错误.
Note that I am taking the real part of the inverse FFT. The imaginary part should contain only values very close to zero, which are the result of rounding errors in the computations. Taking the absolute value, though common, is incorrect. For example, you might want to apply a filter to an image that contains negative values, or apply a filter that produces negative values. Taking the absolute value here would create artefacts. If the output of the inverse FFT contains imaginary values significantly different from zero, then there is an error in the way that the filtering kernel was padded.
还要注意,这里的内核很小,因此模糊效果也很小.为了更好地看到模糊的效果,制作一个更大的内核,例如 np.ones((7,7))/49
.
Also note that the kernel here is tiny, and consequently the blurring effect is tiny too. To better see the effect of the blurring, make a larger kernel, for example np.ones((7,7)) / 49
.
这篇关于在傅里叶域中使用内核卷积图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!