BSc CSIT (TU) Science Image Processing (BSc CSIT, CSC413) Question Paper 2081 Nepal

Q: Where can I find the BSc CSIT (TU) Image Processing (BSc CSIT, CSC413) question paper 2081?

The full BSc CSIT (TU) Image Processing (BSc CSIT, CSC413) 2081 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Image Processing (BSc CSIT, CSC413) 2081 paper come with solutions?

Yes. Every question on this Image Processing (BSc CSIT, CSC413) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Image Processing (BSc CSIT, CSC413) 2081 paper?

The BSc CSIT (TU) Image Processing (BSc CSIT, CSC413) 2081 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Image Processing (BSc CSIT, CSC413) past paper free?

Yes — reading and attempting this Image Processing (BSc CSIT, CSC413) past paper on Kekkei is completely free.

Question

1Long answer10 marks

Explain digital image processing fundamentals. Discuss image sampling, quantization, and the basic relationships between pixels.

fundamentals

Answer 1

Digital Image Processing Fundamentals

A digital image is a two-dimensional function $f(x,y)$ , where $x,y$ are spatial coordinates and the value of $f$ at any point is the intensity (gray level). When $x$ , $y$ , and the amplitude are all finite and discrete, the image is digital. Each discrete element is a pixel (picture element).

Image Acquisition Pipeline

A continuous scene is converted to a digital image in two steps:

Sampling – digitizing the spatial coordinates $(x,y)$ .
Quantization – digitizing the amplitude (intensity) values.

1. Image Sampling

Sampling divides the continuous image into a grid of $M \times N$ points, producing the pixel array:

f(x,y)=\begin{bmatrix} f(0,0) & \cdots & f(0,N-1)\\ \vdots & \ddots & \vdots \\ f(M-1,0) & \cdots & f(M-1,N-1) \end{bmatrix}

More samples → higher spatial resolution → finer detail.
Insufficient sampling causes aliasing (the Nyquist criterion: sampling rate must be at least twice the highest spatial frequency).

2. Quantization

Each sampled value is mapped to one of $L = 2^{k}$ discrete intensity levels, where $k$ is the number of bits per pixel. For an 8-bit image, $L = 256$ levels ( $0$ – $255$ ). Too few levels produce false contouring. Storage of an image is $M \times N \times k$ bits.

3. Basic Relationships Between Pixels

For a pixel $p$ at $(x,y)$ :

Neighbors:
- 4-neighbors $N_4(p)$ : $(x\pm1,y),(x,y\pm1)$ .
- Diagonal neighbors $N_D(p)$ : $(x\pm1,y\pm1)$ .
- 8-neighbors $N_8(p) = N_4(p) \cup N_D(p)$ .
Adjacency: 4-, 8-, and m-adjacency (mixed adjacency, used to remove path ambiguity).
Connectivity / Region / Boundary: pixels are connected if adjacent and their values satisfy a similarity criterion $V$ ; a maximal connected set forms a region.
Distance measures between $p(x,y)$ $p (x, y)$ and $q(s,t)$ $q (s, t)$ :
- Euclidean: $D_e = \sqrt{(x-s)^2+(y-t)^2}$
- City-block ( $D_4$ ): $|x-s|+|y-t|$
- Chessboard ( $D_8$ ): $\max(|x-s|,|y-t|)$

Conclusion

Sampling and quantization determine an image's spatial resolution and intensity resolution respectively, while pixel relationships (neighborhood, adjacency, connectivity, distance) form the basis of region-level operations such as segmentation and morphology.

Answer 2

Histogram Processing

The histogram of a digital image with intensity levels in $[0, L-1]$ is the discrete function

h(r_k) = n_k

where $r_k$ is the $k$ -th intensity and $n_k$ is the number of pixels with that intensity. The normalized histogram is $p(r_k)=n_k/MN$ , which approximates the probability of occurrence of level $r_k$ . Histogram processing modifies this distribution to improve contrast.

Histogram Equalization

Goal: spread intensities to produce an approximately uniform histogram, maximizing global contrast. The transformation is the cumulative distribution function (CDF):

s_k = T(r_k) = (L-1)\sum_{j=0}^{k} p_r(r_j) = \frac{L-1}{MN}\sum_{j=0}^{k} n_j

Example. A 3-bit ( $L=8$ ) $64\times64$ image ( $MN=4096$ ):

$r_k$	$n_k$	$p_r(r_k)$	CDF	$s_k=(L-1)\,\text{CDF}$	round
0	790	0.19	0.19	1.33	1
1	1023	0.25	0.44	3.08	3
2	850	0.21	0.65	4.55	5
3	656	0.16	0.81	5.67	6
4	329	0.08	0.89	6.23	6
5	245	0.06	0.95	6.65	7
6	122	0.03	0.98	6.86	7
7	81	0.02	1.00	7.00	7

Pixels are remapped using the round column, stretching the originally dark image across the full range.

Histogram Specification (Matching)

Instead of a uniform output, we force the histogram to match a specified target shape $p_z(z)$ . Steps:

Equalize the input: $s = T(r) = (L-1)\sum_{j=0}^{r} p_r(r_j)$ .
Equalize the target: $G(z) = (L-1)\sum_{i=0}^{z} p_z(z_i)$ .
For each $s$ , find $z = G^{-1}(s)$ , i.e. the level whose equalized value is closest to $s$ .

This gives the mapping $z = G^{-1}(T(r))$ .

Equalization vs Specification

Aspect	Equalization	Specification
Output histogram	(Approx.) uniform	User-defined shape
Control	Automatic	Targeted
Use	General contrast enhancement	When a particular tonal distribution is desired

Answer 3

Fourier Transform in Image Processing

The Fourier transform (FT) decomposes an image into its sinusoidal frequency components, moving from the spatial domain $f(x,y)$ to the frequency domain $F(u,v)$ . Slowly varying regions map to low frequencies; edges and fine detail map to high frequencies.

The 2-D Discrete Fourier Transform (DFT) of an $M\times N$ image:

F(u,v)=\sum_{x=0}^{M-1}\sum_{y=0}^{N-1} f(x,y)\, e^{-j2\pi\left(\frac{ux}{M}+\frac{vy}{N}\right)}

and its inverse:

f(x,y)=\frac{1}{MN}\sum_{u=0}^{M-1}\sum_{v=0}^{N-1} F(u,v)\, e^{\,j2\pi\left(\frac{ux}{M}+\frac{vy}{N}\right)}

Frequency-Domain Filtering

Filtering is performed by point-wise multiplication with a transfer function $H(u,v)$ :

G(u,v)=H(u,v)\,F(u,v), \qquad g(x,y)=\mathcal{F}^{-1}\{G(u,v)\}

General steps: (1) shift origin by multiplying $f(x,y)$ by $(-1)^{x+y}$ , (2) compute the DFT, (3) multiply by $H(u,v)$ , (4) inverse DFT, (5) take real part and undo the shift. Let $D(u,v)$ be the distance of $(u,v)$ from the centred origin and $D_0$ the cutoff.

1. Ideal Low-Pass Filter (ILPF)

H(u,v)=\begin{cases} 1 & D(u,v)\le D_0 \\ 0 & D(u,v)>D_0 \end{cases}

It passes all frequencies inside a circle of radius $D_0$ and blocks the rest. Although it gives the sharpest possible cut, its abrupt transition causes ringing artifacts (concentric ripples) because of the sinc nature of its spatial response.

2. Butterworth Low-Pass Filter (BLPF)

H(u,v)=\frac{1}{1+\left[\dfrac{D(u,v)}{D_0}\right]^{2n}}

where $n$ is the filter order. The transition is smooth and controllable:

Low order $n$ → gentle roll-off, little ringing.
High order $n$ → approaches the ideal filter (more ringing).

Examples / Effect

Low-pass (ILPF/BLPF) → blurring and noise smoothing (removes high frequencies).
High-pass (obtained as $H_{HP}=1-H_{LP}$ ) → edge sharpening (removes low frequencies). The Butterworth high-pass is $H(u,v)=\dfrac{1}{1+[D_0/D(u,v)]^{2n}}$ .

Comparison

Filter	Transition	Ringing	Control
Ideal	Abrupt (brick-wall)	Severe	None
Butterworth	Smooth	Negligible–moderate (order-dependent)	Via order $n$

Thus the Butterworth filter is generally preferred in practice because it trades a small amount of sharpness for greatly reduced ringing.

Answer 4

Resolution is the amount of detail an image holds, given by its number of pixels, written as $\text{width} \times \text{height}$ (e.g. $1920\times1080$ ). Spatial resolution refers to the smallest discernible detail (pixel density, often dpi), while intensity (gray-level) resolution refers to the number of distinct intensity levels $L=2^k$ . Higher resolution means finer detail.

Aspect ratio is the proportional relationship between an image's width and height, expressed as $W{:}H$ (e.g. $4{:}3$ , $16{:}9$ ). It must be preserved when resizing to avoid distortion (stretching/squashing).

Answer 5

Log Transformation

The log transformation is a point (intensity) transformation given by

s = c\,\log(1 + r)

where $r$ is the input intensity ( $r \ge 0$ ), $s$ is the output, and $c$ is a scaling constant (often $c = \frac{L-1}{\log(1+R_{max})}$ to fit the output to $[0,L-1]$ ).

Characteristics & Use

It maps a narrow range of low (dark) input values to a wider range of output values, and compresses high (bright) values.
Therefore it expands the dark regions and brightens the image while compressing the dynamic range.
Main use: displaying images with a very large dynamic range, such as the Fourier spectrum, where a few very large values would otherwise dominate and hide low-magnitude detail. Applying the log lets all components become visible on a normal display.
The inverse-log (exponential) transformation does the opposite, enhancing bright regions.

Answer 6

Convolution in Spatial Filtering

Spatial filtering processes a pixel using a small mask (kernel) $w$ of size $m \times n$ slid over the image. Convolution is the core operation that computes each output pixel as a weighted sum of the input pixel and its neighbours.

For a kernel of size $(2a+1)\times(2b+1)$ , the 2-D convolution is

g(x,y) = \sum_{s=-a}^{a}\sum_{t=-b}^{b} w(s,t)\, f(x-s,\,y-t)

Key Points

The kernel is rotated by 180° before sliding (this distinguishes convolution from correlation, which uses the kernel as-is).
For each position, multiply overlapping kernel and image values, sum them, and place the result at the centre pixel.
Borders are handled by padding (zero, replicate, or mirror).

Example: convolving with $\frac{1}{9}\begin{bmatrix}1&1&1\\1&1&1\\1&1&1\end{bmatrix}$ averages each $3\times3$ neighbourhood, producing a smoothing (blur) effect. Different kernels produce smoothing, sharpening, or edge detection. Convolution is linear and shift-invariant, which is why the same kernel applies uniformly across the image.

Answer 7

DFT vs DCT

Feature	DFT (Discrete Fourier Transform)	DCT (Discrete Cosine Transform)
Basis functions	Complex exponentials $e^{-j2\pi ux/N}$ (sine + cosine)	Real cosine functions only
Output	Complex (magnitude + phase)	Real-valued
Symmetry assumed	Periodic extension of the signal	Even (mirror-symmetric) extension
Energy compaction	Lower	High – energy concentrated in few low-frequency coefficients
Boundary artifacts	Can show discontinuities/ringing at edges	Reduced, due to symmetric extension
Computation	More (complex arithmetic)	Less (real arithmetic)
Typical use	Frequency-domain filtering, spectral analysis, convolution	Compression (JPEG, MPEG)

1-D DCT: $\;C(u)=\alpha(u)\displaystyle\sum_{x=0}^{N-1} f(x)\cos\!\left[\frac{(2x+1)u\pi}{2N}\right]$ .

Summary: The DFT is general-purpose (complex, used for filtering/analysis), whereas the DCT is real-valued with superior energy compaction, making it the transform of choice for image and video compression.

Answer 8

Watershed Segmentation

The watershed transform is a region-based segmentation method that treats a grayscale image as a topographic surface, where intensity is interpreted as elevation. Bright pixels are hills/ridges and dark pixels are valleys (catchment basins).

Flooding Analogy

Imagine piercing a hole at each local minimum and slowly flooding the surface from below:

Water rises uniformly and fills each catchment basin.
Where waters from two different basins are about to merge, a dam (watershed line) is built.
When flooding completes, the dams form closed contours that segment the image into regions.

It is usually applied to a gradient image, so basins correspond to homogeneous regions and watershed lines lie along strong edges.

Over-segmentation and the Fix

Noise and small intensity fluctuations create many spurious minima, causing over-segmentation (too many tiny regions). This is controlled by marker-controlled watershed: internal and external markers are defined first, and flooding starts only from those markers, yielding meaningful regions.

Advantages: produces continuous, closed boundaries; intuitive. Disadvantage: sensitive to noise → over-segmentation without markers.

Answer 9

Canny Edge Detector

The Canny edge detector (J. Canny, 1986) is an optimal multi-stage algorithm designed to satisfy three criteria: good detection (low error rate), good localization (edges close to true edges), and single response per edge. Its steps are:

Smoothing – convolve the image with a Gaussian filter to reduce noise.
Gradient computation – find intensity gradient magnitude and direction (e.g. using Sobel):

M(x,y)=\sqrt{G_x^2+G_y^2}, \qquad \theta=\tan^{-1}\!\left(\frac{G_y}{G_x}\right)

Non-maximum suppression – thin the edges by keeping only pixels that are local maxima of $M$ along the gradient direction, suppressing all others.
Double thresholding – classify pixels using two thresholds $T_{high}$ and $T_{low}$ into strong ( $>T_{high}$ ), weak ( $T_{low}$ – $T_{high}$ ), and non-edge ( $<T_{low}$ ).
Edge tracking by hysteresis – keep weak edges only if they are connected to a strong edge; discard isolated weak edges.

Significance

Canny produces thin, continuous, well-localized edges and is robust to noise, making it one of the most widely used edge detectors compared with simpler operators (Sobel, Prewitt) that lack thinning and hysteresis.

Answer 10

Dilation and Erosion (Morphological Operations)

Morphological operations process a binary (or grayscale) image $A$ with a small structuring element (SE) $B$ , based on set theory.

Dilation $A \oplus B$

A \oplus B = \{\, z \mid (\hat{B})_z \cap A \neq \varnothing \,\}

The SE is slid over the image; if it touches (overlaps) any foreground pixel, the centre is set to foreground. Effect: grows / thickens objects, fills small holes and gaps, and connects nearby components.

Example: with a $3\times3$ SE, an isolated foreground pixel expands into a $3\times3$ block; a one-pixel gap in a line is bridged.

Erosion $A \ominus B$

A \ominus B = \{\, z \mid (B)_z \subseteq A \,\}

The centre stays foreground only if the entire SE fits inside the object. Effect: shrinks / thins objects, removes small isolated noise pixels and thin protrusions.

Example: with a $3\times3$ SE, an object's boundary layer is stripped off by one pixel; an isolated single foreground pixel is removed entirely.

Duality and Combinations

They are duals: $(A \ominus B)^c = A^c \oplus \hat{B}$ .

Opening = erosion then dilation ( $A\circ B$ ) → removes small objects/noise, smooths contours.
Closing = dilation then erosion ( $A\bullet B$ ) → fills small holes and gaps.

Answer 11

JPEG Compression Standard

JPEG (Joint Photographic Experts Group) is the most widely used lossy compression standard for continuous-tone still images. It exploits the human visual system's reduced sensitivity to high-frequency detail and to colour (chrominance).

Baseline Encoding Steps

Colour transform & subsampling – convert RGB to $YC_bC_r$ ; subsample the chroma channels (e.g. 4:2:0) since the eye is less sensitive to colour detail.
Block splitting – divide each channel into $8\times8$ pixel blocks; level-shift by subtracting 128.
Forward DCT – apply the 2-D Discrete Cosine Transform to each block, concentrating energy in the top-left (low-frequency) coefficients.
Quantization – divide each DCT coefficient by a value from a quantization table and round. This is the lossy step; higher-frequency coefficients (less visible) are quantized more coarsely. A quality factor scales the table.
Entropy coding – reorder coefficients in a zig-zag scan (groups zeros together), apply run-length encoding plus DPCM on the DC term, then Huffman (or arithmetic) coding.

Decoding

The inverse steps (entropy decode → dequantize → inverse DCT → colour reconstruct) recover the image. Because quantization discards data, the result is an approximation; lower quality → higher compression and more blocking artifacts.

Note: JPEG also defines a lossless mode, while JPEG 2000 uses the wavelet transform instead of the DCT.

Answer 12

Colour Image Processing Models

A colour model is a specification of a coordinate system in which each colour is represented as a point. The main models used in image processing are:

1. RGB (Red, Green, Blue)

An additive model where colours are formed by combining the three primaries. Represented as a unit cube; suited to hardware such as monitors, cameras, and scanners. A pixel is a triplet $(R,G,B)$ ; 24-bit colour gives $\approx16.7$ M colours.

2. CMY / CMYK (Cyan, Magenta, Yellow, blacK)

A subtractive model used for printing. Related to RGB by $\begin{bmatrix}C\\M\\Y\end{bmatrix}=\begin{bmatrix}1\\1\\1\end{bmatrix}-\begin{bmatrix}R\\G\\B\end{bmatrix}$ . Black (K) is added for true black and ink economy.

3. HSI (Hue, Saturation, Intensity)

Separates colour information (hue, saturation) from intensity. Because it decouples intensity from chromaticity, it matches human colour perception and is ideal for image-processing algorithms (e.g. you can enhance brightness or segment by hue independently). Related variants: HSV/HSB and HSL.

4. $YC_bC_r$ / YUV

$Y$ = luminance, $C_b,C_r$ = chrominance. Used in video and JPEG/MPEG compression because chroma can be subsampled with little perceived loss.

Summary

Model	Type	Main use
RGB	Additive	Display hardware
CMYK	Subtractive	Printing
HSI/HSV	Perceptual	Image-processing algorithms
$YC_bC_r$	Luma/chroma	Compression, video

Level	BSc CSIT (TU)
Stream	Science
Subject	Image Processing (BSc CSIT, CSC413)
Year	2081 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

Section A: Long Answer Questions