Computer vision is a vast area for analyzing images and videos. While many people tend to think mostly about machine learning models when they hear computer vision, in reality, there are many more existing algorithms that, in some cases, perform better than AI!
In computer vision, the area of feature detection involves identifying distinct regions of interest in an image. These results can then be used to create feature descriptors — numerical vectors representing local image regions. After that, the feature descriptors of multiple photos from the same scene can be combined to perform image matching or even reconstruct a scene.
In this article, we will make an analogy from calculus to introduce image derivatives and gradients. It will be necessary for us to understand the logic behind the convolutional kernel and the Sobel operator in particular — a computer vision filter used to detect edges in the image.
Image intensity
is one of the main characteristics of an image. Every pixel of the image has three components: R (red), G (green), and B (blue), taking values between 0 and 255. The higher the value is, the brighter the pixel is. The intensity of a pixel is just a weighted average of its R, G, and B components.
In fact, there exist several standards defining different weights. Since we are going to focus on OpenCV, we will use their formula, which is given below:
image = cv2.imread('image.png')
B, G, R = cv2.split(image)
grayscale_image = 0.299 * R + 0.587 * G + 0.114 * B
grayscale_image = np.clip(grayscale_image, 0, 255).astype('uint8')
intensity = grayscale_image.mean()
print(f"Image intensity: {intensity:2f}")
Grayscale images
Images can be represented using different color channels. If RGB channels represent an original image, applying the intensity formula above will transform it into grayscale format, consisting of only one channel.
Since the sum of weights in the formula is equal to 1, the grayscale image will contain intensity values between 0 and 255, just like the RGB channels.
In OpenCV, RGB channels can be converted to grayscale format using the cv2.cvtColor() function, which is an easier way than the method we just saw above.
image = cv2.imread('image.png')
grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
intensity = grayscale_image.mean()
print(f"Image intensity: {intensity:2f}")
Instead of the standard RGB palette, OpenCV uses the BGR palette. They are both the same except that R and B elements are just swapped. For simplicity, in this and the following articles of this series, we are going to use the terms RGB and BGR interchangeably.
If we calculate the image intensity using both methods in OpenCV, we can get slightly different results. That is entirely normal since, when using the cv2.cvtColor function, OpenCV rounds transformed pixels to the nearest integers. Calculating the mean value will result in a small difference.
Image derivative
Image derivatives are used to measure how fast the pixel intensity changes across the image. Images can be thought of as a function of two arguments, I(x, y), where x and y specify the pixel position and I represents the intensity of that pixel.
We could write formally:
But given the fact that images exist in the discrete space, their derivatives are usually approximated through convolutional kernels:
- For the horizontal X-axis: [-1, 0, 1]
- For the vertical Y-axis: [-1, 0, 1]ᵀ
In other words, we can rewrite the equations above in the following form:
To better understand the logic behind the kernels, let us refer to the example below.
Example
Suppose we have a matrix consisting of 5×5 pixels representing a grayscale image patch. The elements of this matrix show the intensity of pixels.
To calculate the image derivative, we can use convolutional kernels. The idea is simple: by taking a pixel in the image and several pixels in its neighborhood, we find the sum of an element-wise multiplication with a given kernel that represents a fixed matrix (or vector).
In our case, we will use a three-element vector [-1, 0, 1]. From the example above, let us take a pixel at position (1, 1) whose value is -3, for instance.
Since the kernel size (in yellow) is 3×1, we will need the left and right elements of -3 to match the size, so as a result, we take the vector [4, -3, 2]. Then, by finding the sum of the element-wise product, we get the value of -2:
The value of -2 represents a derivative for the initial pixel. If we take an attentive look, we can notice that the derivative of pixel -3 is just the difference between the rightmost pixel (2) of -3 and its leftmost pixel (4).
Why use complex formulas when we can take the difference between two elements? Indeed, in this example, we could have just calculated the intensity difference between elements I(x, y + 1) and I(x, y – 1). But in reality, we can handle more complex scenarios when we need to detect more sophisticated and less obvious features. For that reason, it is convenient to use the generalization of kernels whose matrices are already known for detecting predefined types of features.
Based on the derivative value, we can make some observations:
- If the derivative value is significant in a given image region, it means that the intensity changes drastically there. Otherwise, there are no noticeable changes in terms of brightness.
- If the value of the derivative is positive, it means that from left to right, the image region becomes brighter; if it is negative, the image region becomes darker in the direction from left to right.
By making the analogy to linear algebra, kernels can be thought of as linear operators on images that transform local image regions.
Analogously, we can calculate the convolution with the vertical kernel. The procedure will remain the same, except that we now move our window (kernel) vertically across the image matrix.
You can notice that after applying a convolution filter to the original 5×5 image, it became 3×3. It is normal because we cannot apply convolution in the same way to edge pixles (otherwise we will get out of bounds).
To preserve the image dimensionality, the padding technique is usually used which consists of temporarily extending / interpolating image borders or filling them with zeros, so the convolution can be calculated for edge pixels as well.
By default, libraries like OpenCV automatically pad the borders to guarantee the same dimensionality for input and output images.
Image gradient
An image gradient shows how fast the intensity (brightness) changes at a given pixel in both directions (X and Y).
Formally, image gradient can be written as a vector of image derivatives with respect to X- and Y-axis.
Gradient magnitude
Gradient magnitude represents a norm of the gradient vector and can be found using the formula below:
Gradient orientation
Using the found Gx and Gy, it is also possible to calculate the angle of the gradient vector:
Example
Let us look at how we can manually calculate gradients based on the example above. For that, we will need the computed 3×3 matrices after the convolution kernel was applied.
If we take the top-left pixel, it has the values Gₓ = -2 and Gᵧ = 11. We can easily calculate the gradient magnitude and orientation:
For the whole 3×3 matrix, we get the following visualization of gradients:
In practice, it is recommended to normalize kernels before applying them to matrices. We didn’t do it for the sake of simplicity of the example.
Sobel operator
Having learned the fundamentals of image derivatives and gradients, it is now time to take on the Sobel operator, which is used to approximate them. In comparison to previous kernels of sizes 3×1 and 1×3, the Sobel operator is defined by a pair of 3×3 kernels (for both axes):
This gives an advantage to the Sobel operator as the kernels before measured only 1D changes, ignoring other rows and columns in the neighbourhood. The Sobel operator considers more information about local regions.
Another advantage is that Sobel is more robust to handling noise. Let us look at the image patch below. If we calculate the derivative around the red element in the center, which is on the border between dark (2) and bright (7) pixels, we should get 5. The problem is that there is a noisy pixel with the value of 10.
If we apply the horizontal 1D kernel near the red element, it will give significant importance to the pixel value 10, which is a clear outlier. At the same time, the Sobel operator is more robust: it will take 10 into account, as well as the pixels with a value of 7 around it. In some sense, the Sobel operator applies smoothing.
While comparing several kernels at the same time, it is recommended to normalize the matrix kernels to ensure they are all on the same scale. One of the most common applications of operators in general in image analysis is feature detection.
In the case of the Sobel and Scharr operators, they are commonly used to detect edges — zones where pixel intensity (and its gradient) drastically changes.
OpenCV
To apply Sobel operators, it is sufficient to use the OpenCV function cv2.Sobel. Let us look at its parameters:
derivative_x = cv2.Sobel(image, cv2.CV_64F, 1, 0)
derivative_y = cv2.Sobel(image, cv2.CV_64F, 0, 1)
- The first parameter is an input NumPy image.
- The second parameter (cv2.CV_64F) is the data depth of the output image. The problem is that, in general, operators can produce output images containing values outside the interval 0–255. That is why we need to specify the type of pixels we want the output image to have.
- The third and fourth parameters represent the order of the derivative in the x direction and the y direction, respectively. In our case, we only want the first derivative in the x direction and y direction, so we pass values (1, 0) and (0, 1)
Let us look at the following example, where we are given a Sudoku input image:
Let us apply the Sobel filter:
import cv2
import matplotlib.pyplot as plt
image = cv2.imread("data/input/sudoku.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
derivative_x = cv2.Scharr(image, cv2.CV_64F, 1, 0)
derivative_y = cv2.Scharr(image, cv2.CV_64F, 0, 1)
derivative_combined = cv2.addWeighted(derivative_x, 0.5, derivative_y, 0.5, 0)
min_value = min(derivative_x.min(), derivative_y.min(), derivative_combined.min())
max_value = max(derivative_x.max(), derivative_y.max(), derivative_combined.max())
print(f"Value range: ({min_value:.2f}, {max_value:.2f})")
fig, axes = plt.subplots(1, 3, figsize=(16, 6), constrained_layout=True)
axes[0].imshow(derivative_x, cmap='gray', vmin=min_value, vmax=max_value)
axes[0].set_title("Horizontal derivative")
axes[0].axis('off')
image_1 = axes[1].imshow(derivative_y, cmap='gray', vmin=min_value, vmax=max_value)
axes[1].set_title("Vertical derivative")
axes[1].axis('off')
image_2 = axes[2].imshow(derivative_combined, cmap='gray', vmin=min_value, vmax=max_value)
axes[2].set_title("Combined derivative")
axes[2].axis('off')
color_bar = fig.colorbar(image_2, ax=axes.ravel().tolist(), orientation='vertical', fraction=0.025, pad=0.04)
plt.savefig("data/output/sudoku.png")
plt.show()
As a result, we can see that horizontal and vertical derivatives detect the lines very well! Additionally, the combination of those lines allows us to detect both types of features:
Scharr operator
Another popular alternative to the Sober kernel is the Scharr operator:
Despite its substantial similarity with the structure of the Sobel operator, the Scharr kernel achieves higher accuracy in edge detection tasks. It has several critical mathematical properties that we are not going to consider in this article.
OpenCV
The use of the Scharr filter in OpenCV is very similar to what we saw above with the Sobel filter. The only difference is another method name (other parameters are the same):
derivative_x = cv2.Scharr(image, cv2.CV_64F, 1, 0)
derivative_y = cv2.Scharr(image, cv2.CV_64F, 0, 1)
Here is the result we get with the Scharr filter:
In this case, it is challenging to notice the differences in results for both operators. However, by looking at the color map, we can see that the range of possible values produced by the Scharr operator is much larger (-800, +800) than it was for Sobel (-200, +200). That is normal since the Scharr kernel has larger constants.
It is also a good example of why we need to use a special type cv2.CV_64F. Otherwise, the values would have been clipped to the standard range between 0 and 255, and we would have lost valuable information about the gradients.
Note. Applying save methods directly to cv2.CV_64F images would cause an error. To save such images on a disk, they need to be converted into another format and contain only values between 0 and 255.
Conclusion
By applying calculus fundamentals to computer vision, we have studied essential image properties that allow us to detect intensity peaks in images. This knowledge is helpful since feature detection is a common task in image analysis, especially when there are constraints on image processing or when machine learning algorithms are not used.
We have also looked at an example using OpenCV to see how edge detection works with Sobel and Scharr operators. In the following articles, we will study more advanced algorithms for feature detection and examine OpenCV examples.
Resources
All images unless otherwise noted are by the author.
Source link
#Feature #Detection #Part #Image #Derivatives #Gradients #SobelOperator