Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Explain digital image processing fundamentals. Discuss image sampling, quantization, and the basic relationships between pixels.

Digital Image Processing Fundamentals

A digital image is a two-dimensional function f(x,y)f(x,y), where x,yx,y are spatial coordinates and the value of ff at any point is the intensity (gray level). When xx, yy, and the amplitude are all finite and discrete, the image is digital. Each discrete element is a pixel (picture element).

Image Acquisition Pipeline

A continuous scene is converted to a digital image in two steps:

  1. Sampling – digitizing the spatial coordinates (x,y)(x,y).
  2. Quantization – digitizing the amplitude (intensity) values.

1. Image Sampling

Sampling divides the continuous image into a grid of M×NM \times N points, producing the pixel array:

f(x,y)=[f(0,0)f(0,N1)f(M1,0)f(M1,N1)]f(x,y)=\begin{bmatrix} f(0,0) & \cdots & f(0,N-1)\\ \vdots & \ddots & \vdots \\ f(M-1,0) & \cdots & f(M-1,N-1) \end{bmatrix}
  • More samples → higher spatial resolution → finer detail.
  • Insufficient sampling causes aliasing (the Nyquist criterion: sampling rate must be at least twice the highest spatial frequency).

2. Quantization

Each sampled value is mapped to one of L=2kL = 2^{k} discrete intensity levels, where kk is the number of bits per pixel. For an 8-bit image, L=256L = 256 levels (00255255). Too few levels produce false contouring. Storage of an image is M×N×kM \times N \times k bits.

3. Basic Relationships Between Pixels

For a pixel pp at (x,y)(x,y):

  • Neighbors:
    • 4-neighbors N4(p)N_4(p): (x±1,y),(x,y±1)(x\pm1,y),(x,y\pm1).
    • Diagonal neighbors ND(p)N_D(p): (x±1,y±1)(x\pm1,y\pm1).
    • 8-neighbors N8(p)=N4(p)ND(p)N_8(p) = N_4(p) \cup N_D(p).
  • Adjacency: 4-, 8-, and m-adjacency (mixed adjacency, used to remove path ambiguity).
  • Connectivity / Region / Boundary: pixels are connected if adjacent and their values satisfy a similarity criterion VV; a maximal connected set forms a region.
  • Distance measures between p(x,y)p(x,y) and q(s,t)q(s,t):
    • Euclidean: De=(xs)2+(yt)2D_e = \sqrt{(x-s)^2+(y-t)^2}
    • City-block (D4D_4): xs+yt|x-s|+|y-t|
    • Chessboard (D8D_8): max(xs,yt)\max(|x-s|,|y-t|)

Conclusion

Sampling and quantization determine an image's spatial resolution and intensity resolution respectively, while pixel relationships (neighborhood, adjacency, connectivity, distance) form the basis of region-level operations such as segmentation and morphology.

fundamentals
2long10 marks

What is histogram processing? Explain histogram equalization and histogram specification with mathematical formulation and an example.

Histogram Processing

The histogram of a digital image with intensity levels in [0,L1][0, L-1] is the discrete function

h(rk)=nkh(r_k) = n_k

where rkr_k is the kk-th intensity and nkn_k is the number of pixels with that intensity. The normalized histogram is p(rk)=nk/MNp(r_k)=n_k/MN, which approximates the probability of occurrence of level rkr_k. Histogram processing modifies this distribution to improve contrast.

Histogram Equalization

Goal: spread intensities to produce an approximately uniform histogram, maximizing global contrast. The transformation is the cumulative distribution function (CDF):

sk=T(rk)=(L1)j=0kpr(rj)=L1MNj=0knjs_k = T(r_k) = (L-1)\sum_{j=0}^{k} p_r(r_j) = \frac{L-1}{MN}\sum_{j=0}^{k} n_j

Example. A 3-bit (L=8L=8) 64×6464\times64 image (MN=4096MN=4096):

rkr_knkn_kpr(rk)p_r(r_k)CDFsk=(L1)CDFs_k=(L-1)\,\text{CDF}round
07900.190.191.331
110230.250.443.083
28500.210.654.555
36560.160.815.676
43290.080.896.236
52450.060.956.657
61220.030.986.867
7810.021.007.007

Pixels are remapped using the round column, stretching the originally dark image across the full range.

Histogram Specification (Matching)

Instead of a uniform output, we force the histogram to match a specified target shape pz(z)p_z(z). Steps:

  1. Equalize the input: s=T(r)=(L1)j=0rpr(rj)s = T(r) = (L-1)\sum_{j=0}^{r} p_r(r_j).
  2. Equalize the target: G(z)=(L1)i=0zpz(zi)G(z) = (L-1)\sum_{i=0}^{z} p_z(z_i).
  3. For each ss, find z=G1(s)z = G^{-1}(s), i.e. the level whose equalized value is closest to ss.

This gives the mapping z=G1(T(r))z = G^{-1}(T(r)).

Equalization vs Specification

AspectEqualizationSpecification
Output histogram(Approx.) uniformUser-defined shape
ControlAutomaticTargeted
UseGeneral contrast enhancementWhen a particular tonal distribution is desired
histogramenhancement
3long10 marks

Explain the Fourier transform in image processing. Discuss frequency-domain filtering using ideal and Butterworth filters with examples.

Fourier Transform in Image Processing

The Fourier transform (FT) decomposes an image into its sinusoidal frequency components, moving from the spatial domain f(x,y)f(x,y) to the frequency domain F(u,v)F(u,v). Slowly varying regions map to low frequencies; edges and fine detail map to high frequencies.

The 2-D Discrete Fourier Transform (DFT) of an M×NM\times N image:

F(u,v)=x=0M1y=0N1f(x,y)ej2π(uxM+vyN)F(u,v)=\sum_{x=0}^{M-1}\sum_{y=0}^{N-1} f(x,y)\, e^{-j2\pi\left(\frac{ux}{M}+\frac{vy}{N}\right)}

and its inverse:

f(x,y)=1MNu=0M1v=0N1F(u,v)ej2π(uxM+vyN)f(x,y)=\frac{1}{MN}\sum_{u=0}^{M-1}\sum_{v=0}^{N-1} F(u,v)\, e^{\,j2\pi\left(\frac{ux}{M}+\frac{vy}{N}\right)}

Frequency-Domain Filtering

Filtering is performed by point-wise multiplication with a transfer function H(u,v)H(u,v):

G(u,v)=H(u,v)F(u,v),g(x,y)=F1{G(u,v)}G(u,v)=H(u,v)\,F(u,v), \qquad g(x,y)=\mathcal{F}^{-1}\{G(u,v)\}

General steps: (1) shift origin by multiplying f(x,y)f(x,y) by (1)x+y(-1)^{x+y}, (2) compute the DFT, (3) multiply by H(u,v)H(u,v), (4) inverse DFT, (5) take real part and undo the shift. Let D(u,v)D(u,v) be the distance of (u,v)(u,v) from the centred origin and D0D_0 the cutoff.

1. Ideal Low-Pass Filter (ILPF)

H(u,v)={1D(u,v)D00D(u,v)>D0H(u,v)=\begin{cases} 1 & D(u,v)\le D_0 \\ 0 & D(u,v)>D_0 \end{cases}

It passes all frequencies inside a circle of radius D0D_0 and blocks the rest. Although it gives the sharpest possible cut, its abrupt transition causes ringing artifacts (concentric ripples) because of the sinc nature of its spatial response.

2. Butterworth Low-Pass Filter (BLPF)

H(u,v)=11+[D(u,v)D0]2nH(u,v)=\frac{1}{1+\left[\dfrac{D(u,v)}{D_0}\right]^{2n}}

where nn is the filter order. The transition is smooth and controllable:

  • Low order nn → gentle roll-off, little ringing.
  • High order nn → approaches the ideal filter (more ringing).

Examples / Effect

  • Low-pass (ILPF/BLPF) → blurring and noise smoothing (removes high frequencies).
  • High-pass (obtained as HHP=1HLPH_{HP}=1-H_{LP}) → edge sharpening (removes low frequencies). The Butterworth high-pass is H(u,v)=11+[D0/D(u,v)]2nH(u,v)=\dfrac{1}{1+[D_0/D(u,v)]^{2n}}.

Comparison

FilterTransitionRingingControl
IdealAbrupt (brick-wall)SevereNone
ButterworthSmoothNegligible–moderate (order-dependent)Via order nn

Thus the Butterworth filter is generally preferred in practice because it trades a small amount of sharpness for greatly reduced ringing.

fourierfiltering
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Define resolution and aspect ratio of an image.

Resolution is the amount of detail an image holds, given by its number of pixels, written as width×height\text{width} \times \text{height} (e.g. 1920×10801920\times1080). Spatial resolution refers to the smallest discernible detail (pixel density, often dpi), while intensity (gray-level) resolution refers to the number of distinct intensity levels L=2kL=2^k. Higher resolution means finer detail.

Aspect ratio is the proportional relationship between an image's width and height, expressed as W:HW{:}H (e.g. 4:34{:}3, 16:916{:}9). It must be preserved when resizing to avoid distortion (stretching/squashing).

fundamentals
5short5 marks

Explain log transformation and its use.

Log Transformation

The log transformation is a point (intensity) transformation given by

s=clog(1+r)s = c\,\log(1 + r)

where rr is the input intensity (r0r \ge 0), ss is the output, and cc is a scaling constant (often c=L1log(1+Rmax)c = \frac{L-1}{\log(1+R_{max})} to fit the output to [0,L1][0,L-1]).

Characteristics & Use

  • It maps a narrow range of low (dark) input values to a wider range of output values, and compresses high (bright) values.
  • Therefore it expands the dark regions and brightens the image while compressing the dynamic range.
  • Main use: displaying images with a very large dynamic range, such as the Fourier spectrum, where a few very large values would otherwise dominate and hide low-magnitude detail. Applying the log lets all components become visible on a normal display.
  • The inverse-log (exponential) transformation does the opposite, enhancing bright regions.
enhancement
6short5 marks

What is convolution in spatial filtering?

Convolution in Spatial Filtering

Spatial filtering processes a pixel using a small mask (kernel) ww of size m×nm \times n slid over the image. Convolution is the core operation that computes each output pixel as a weighted sum of the input pixel and its neighbours.

For a kernel of size (2a+1)×(2b+1)(2a+1)\times(2b+1), the 2-D convolution is

g(x,y)=s=aat=bbw(s,t)f(xs,yt)g(x,y) = \sum_{s=-a}^{a}\sum_{t=-b}^{b} w(s,t)\, f(x-s,\,y-t)

Key Points

  • The kernel is rotated by 180° before sliding (this distinguishes convolution from correlation, which uses the kernel as-is).
  • For each position, multiply overlapping kernel and image values, sum them, and place the result at the centre pixel.
  • Borders are handled by padding (zero, replicate, or mirror).

Example: convolving with 19[111111111]\frac{1}{9}\begin{bmatrix}1&1&1\\1&1&1\\1&1&1\end{bmatrix} averages each 3×33\times3 neighbourhood, producing a smoothing (blur) effect. Different kernels produce smoothing, sharpening, or edge detection. Convolution is linear and shift-invariant, which is why the same kernel applies uniformly across the image.

filtering
7short5 marks

Differentiate between the DFT and the DCT.

DFT vs DCT

FeatureDFT (Discrete Fourier Transform)DCT (Discrete Cosine Transform)
Basis functionsComplex exponentials ej2πux/Ne^{-j2\pi ux/N} (sine + cosine)Real cosine functions only
OutputComplex (magnitude + phase)Real-valued
Symmetry assumedPeriodic extension of the signalEven (mirror-symmetric) extension
Energy compactionLowerHigh – energy concentrated in few low-frequency coefficients
Boundary artifactsCan show discontinuities/ringing at edgesReduced, due to symmetric extension
ComputationMore (complex arithmetic)Less (real arithmetic)
Typical useFrequency-domain filtering, spectral analysis, convolutionCompression (JPEG, MPEG)

1-D DCT:   C(u)=α(u)x=0N1f(x)cos ⁣[(2x+1)uπ2N]\;C(u)=\alpha(u)\displaystyle\sum_{x=0}^{N-1} f(x)\cos\!\left[\frac{(2x+1)u\pi}{2N}\right].

Summary: The DFT is general-purpose (complex, used for filtering/analysis), whereas the DCT is real-valued with superior energy compaction, making it the transform of choice for image and video compression.

transforms
8short5 marks

Explain watershed segmentation.

Watershed Segmentation

The watershed transform is a region-based segmentation method that treats a grayscale image as a topographic surface, where intensity is interpreted as elevation. Bright pixels are hills/ridges and dark pixels are valleys (catchment basins).

Flooding Analogy

Imagine piercing a hole at each local minimum and slowly flooding the surface from below:

  1. Water rises uniformly and fills each catchment basin.
  2. Where waters from two different basins are about to merge, a dam (watershed line) is built.
  3. When flooding completes, the dams form closed contours that segment the image into regions.

It is usually applied to a gradient image, so basins correspond to homogeneous regions and watershed lines lie along strong edges.

Over-segmentation and the Fix

Noise and small intensity fluctuations create many spurious minima, causing over-segmentation (too many tiny regions). This is controlled by marker-controlled watershed: internal and external markers are defined first, and flooding starts only from those markers, yielding meaningful regions.

Advantages: produces continuous, closed boundaries; intuitive. Disadvantage: sensitive to noise → over-segmentation without markers.

segmentation
9short5 marks

What is the Canny edge detector?

Canny Edge Detector

The Canny edge detector (J. Canny, 1986) is an optimal multi-stage algorithm designed to satisfy three criteria: good detection (low error rate), good localization (edges close to true edges), and single response per edge. Its steps are:

  1. Smoothing – convolve the image with a Gaussian filter to reduce noise.
  2. Gradient computation – find intensity gradient magnitude and direction (e.g. using Sobel):
M(x,y)=Gx2+Gy2,θ=tan1 ⁣(GyGx)M(x,y)=\sqrt{G_x^2+G_y^2}, \qquad \theta=\tan^{-1}\!\left(\frac{G_y}{G_x}\right)
  1. Non-maximum suppression – thin the edges by keeping only pixels that are local maxima of MM along the gradient direction, suppressing all others.
  2. Double thresholding – classify pixels using two thresholds ThighT_{high} and TlowT_{low} into strong (>Thigh>T_{high}), weak (TlowT_{low}ThighT_{high}), and non-edge (<Tlow<T_{low}).
  3. Edge tracking by hysteresis – keep weak edges only if they are connected to a strong edge; discard isolated weak edges.

Significance

Canny produces thin, continuous, well-localized edges and is robust to noise, making it one of the most widely used edge detectors compared with simpler operators (Sobel, Prewitt) that lack thinning and hysteresis.

edge-detection
10short5 marks

Explain dilation and erosion with examples.

Dilation and Erosion (Morphological Operations)

Morphological operations process a binary (or grayscale) image AA with a small structuring element (SE) BB, based on set theory.

Dilation ABA \oplus B

AB={z(B^)zA}A \oplus B = \{\, z \mid (\hat{B})_z \cap A \neq \varnothing \,\}

The SE is slid over the image; if it touches (overlaps) any foreground pixel, the centre is set to foreground. Effect: grows / thickens objects, fills small holes and gaps, and connects nearby components.

Example: with a 3×33\times3 SE, an isolated foreground pixel expands into a 3×33\times3 block; a one-pixel gap in a line is bridged.

Erosion ABA \ominus B

AB={z(B)zA}A \ominus B = \{\, z \mid (B)_z \subseteq A \,\}

The centre stays foreground only if the entire SE fits inside the object. Effect: shrinks / thins objects, removes small isolated noise pixels and thin protrusions.

Example: with a 3×33\times3 SE, an object's boundary layer is stripped off by one pixel; an isolated single foreground pixel is removed entirely.

Duality and Combinations

They are duals: (AB)c=AcB^(A \ominus B)^c = A^c \oplus \hat{B}.

  • Opening = erosion then dilation (ABA\circ B) → removes small objects/noise, smooths contours.
  • Closing = dilation then erosion (ABA\bullet B) → fills small holes and gaps.
morphology
11short5 marks

What is the JPEG compression standard?

JPEG Compression Standard

JPEG (Joint Photographic Experts Group) is the most widely used lossy compression standard for continuous-tone still images. It exploits the human visual system's reduced sensitivity to high-frequency detail and to colour (chrominance).

Baseline Encoding Steps

  1. Colour transform & subsampling – convert RGB to YCbCrYC_bC_r; subsample the chroma channels (e.g. 4:2:0) since the eye is less sensitive to colour detail.
  2. Block splitting – divide each channel into 8×88\times8 pixel blocks; level-shift by subtracting 128.
  3. Forward DCT – apply the 2-D Discrete Cosine Transform to each block, concentrating energy in the top-left (low-frequency) coefficients.
  4. Quantization – divide each DCT coefficient by a value from a quantization table and round. This is the lossy step; higher-frequency coefficients (less visible) are quantized more coarsely. A quality factor scales the table.
  5. Entropy coding – reorder coefficients in a zig-zag scan (groups zeros together), apply run-length encoding plus DPCM on the DC term, then Huffman (or arithmetic) coding.

Decoding

The inverse steps (entropy decode → dequantize → inverse DCT → colour reconstruct) recover the image. Because quantization discards data, the result is an approximation; lower quality → higher compression and more blocking artifacts.

Note: JPEG also defines a lossless mode, while JPEG 2000 uses the wavelet transform instead of the DCT.

compression
12short5 marks

Write short notes on colour image processing models.

Colour Image Processing Models

A colour model is a specification of a coordinate system in which each colour is represented as a point. The main models used in image processing are:

1. RGB (Red, Green, Blue)

An additive model where colours are formed by combining the three primaries. Represented as a unit cube; suited to hardware such as monitors, cameras, and scanners. A pixel is a triplet (R,G,B)(R,G,B); 24-bit colour gives 16.7\approx16.7M colours.

2. CMY / CMYK (Cyan, Magenta, Yellow, blacK)

A subtractive model used for printing. Related to RGB by [CMY]=[111][RGB]\begin{bmatrix}C\\M\\Y\end{bmatrix}=\begin{bmatrix}1\\1\\1\end{bmatrix}-\begin{bmatrix}R\\G\\B\end{bmatrix}. Black (K) is added for true black and ink economy.

3. HSI (Hue, Saturation, Intensity)

Separates colour information (hue, saturation) from intensity. Because it decouples intensity from chromaticity, it matches human colour perception and is ideal for image-processing algorithms (e.g. you can enhance brightness or segment by hue independently). Related variants: HSV/HSB and HSL.

4. YCbCrYC_bC_r / YUV

YY = luminance, Cb,CrC_b,C_r = chrominance. Used in video and JPEG/MPEG compression because chroma can be subsampled with little perceived loss.

Summary

ModelTypeMain use
RGBAdditiveDisplay hardware
CMYKSubtractivePrinting
HSI/HSVPerceptualImage-processing algorithms
YCbCrYC_bC_rLuma/chromaCompression, video
color

Frequently asked questions

Where can I find the BSc CSIT (TU) Image Processing (BSc CSIT, CSC413) question paper 2081?
The full BSc CSIT (TU) Image Processing (BSc CSIT, CSC413) 2081 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Image Processing (BSc CSIT, CSC413) 2081 paper come with solutions?
Yes. Every question on this Image Processing (BSc CSIT, CSC413) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Image Processing (BSc CSIT, CSC413) 2081 paper?
The BSc CSIT (TU) Image Processing (BSc CSIT, CSC413) 2081 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Image Processing (BSc CSIT, CSC413) past paper free?
Yes — reading and attempting this Image Processing (BSc CSIT, CSC413) past paper on Kekkei is completely free.