BSc CSIT (TU) Science Multimedia Computing (BSc CSIT, CSC467) Question Paper 2075 Nepal

Q: Where can I find the BSc CSIT (TU) Multimedia Computing (BSc CSIT, CSC467) question paper 2075?

The full BSc CSIT (TU) Multimedia Computing (BSc CSIT, CSC467) 2075 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Multimedia Computing (BSc CSIT, CSC467) 2075 paper come with solutions?

Yes. Every question on this Multimedia Computing (BSc CSIT, CSC467) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Multimedia Computing (BSc CSIT, CSC467) 2075 paper?

The BSc CSIT (TU) Multimedia Computing (BSc CSIT, CSC467) 2075 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Multimedia Computing (BSc CSIT, CSC467) past paper free?

Yes — reading and attempting this Multimedia Computing (BSc CSIT, CSC467) past paper on Kekkei is completely free.

Question

1Long answer10 marks

Explain digital audio representation. Discuss sampling, quantization, PCM, and audio compression techniques including MPEG audio (MP3) with the role of psychoacoustic models.

audio

Answer 1

Digital Audio Representation

Sound is a continuous (analog) pressure wave. To process it on a computer it must be converted into a discrete digital form through Analog-to-Digital Conversion (ADC), which involves two key steps: sampling and quantization.

1. Sampling

Sampling captures the amplitude of the continuous signal at regular time intervals. The number of samples taken per second is the sampling rate $f_s$ .

By the Nyquist–Shannon sampling theorem, to reconstruct a signal containing frequencies up to $f_{max}$ without aliasing, we must sample at:

f_s \ge 2 f_{max}

Human hearing spans roughly 20 Hz–20 kHz, so CD audio uses $f_s = 44.1\,\text{kHz}$ .
Sampling below the Nyquist rate causes aliasing, where high frequencies masquerade as lower ones.

2. Quantization

Each sampled amplitude is rounded to the nearest level from a finite set determined by the bit depth $n$ (number of bits per sample), giving $2^n$ levels (e.g., 16-bit CD audio = 65,536 levels). The rounding error introduces quantization noise; the signal-to-noise ratio improves by about $6\,\text{dB}$ per added bit (SQNR $\approx 6.02n + 1.76\,\text{dB}$ ).

3. PCM (Pulse Code Modulation)

PCM is the standard uncompressed digital audio format: it stores the stream of quantized sample values directly as binary numbers. The raw bit rate is:

\text{Bit rate} = f_s \times n \times c

where $c$ = number of channels. For stereo CD: $44100 \times 16 \times 2 = 1.411\,\text{Mbps}$ . PCM is lossless but storage-heavy, motivating compression.

4. Audio Compression

Lossless (e.g., FLAC, ALS): removes statistical redundancy; the original is recoverable bit-for-bit (≈2:1).
Lossy (e.g., MP3, AAC): discards perceptually irrelevant information for much higher ratios (10:1 or more).

5. MPEG Audio (MP3) and the Psychoacoustic Model

MP3 = MPEG-1/2 Audio Layer III. Its encoding pipeline:

The PCM stream is split by a filter bank / MDCT into 32 sub-bands (further subdivided in Layer III).
A psychoacoustic model analyses each frame in parallel to decide how many bits each band truly needs.
Quantization is applied per band, allocating fewer bits where the ear cannot detect the resulting noise.
Huffman (entropy) coding losslessly packs the quantized coefficients into the bitstream.

Role of the psychoacoustic model — it exploits limitations of human hearing:

Absolute threshold of hearing: very quiet sounds below the audible threshold are dropped.
Frequency (simultaneous) masking: a loud tone hides nearby quieter tones, so the masked sounds are coded coarsely or removed.
Temporal masking: sounds just before/after a loud sound are masked.

By keeping the quantization noise below the computed masking threshold, MP3 removes data the listener cannot perceive, achieving large size reductions with little audible loss.

Answer 2

Multimedia System

A multimedia system is a computer-based system capable of capturing, storing, processing, integrating, transmitting and presenting more than one type of media — combining at least one continuous (time-dependent) medium (audio, video, animation) with discrete (time-independent) media (text, graphics, images) in an interactive, integrated manner.

Characteristics of Multimedia Data

Large volume: images, audio and especially video require very large storage/bandwidth.
Time-dependence (continuous media): audio/video must be delivered and played at a steady, real-time rate; late data is useless.
Need for synchronization: related streams (e.g., lip-sync of audio and video) must be temporally coordinated.
High bandwidth and real-time constraints: demands sustained throughput and bounded delay/jitter.
Voluminous and heterogeneous: different media have very different data structures and processing needs.
Compression-dependent: raw media is impractical, so it is almost always compressed.

Storage and Coding Requirements

Raw multimedia is huge, e.g. one second of uncompressed CD-stereo audio ≈ 1.4 Mb, and uncompressed SD video can exceed 100+ Mbps. Therefore systems require:

Large, fast secondary storage (high disk capacity and transfer rate) and large buffers/main memory.
Compression coding (JPEG for images, MP3/AAC for audio, MPEG/H.26x for video) to cut storage and bandwidth.
Real-time I/O and guaranteed transfer rates so continuous media play without starvation.
Appropriate file formats and metadata to store and index mixed media.

Components of a Multimedia Computing System

Capture / input devices: microphone, camera, scanner, MIDI, digitizer (ADC).
Processing hardware: CPU plus specialized processors — GPU, DSP, sound and video cards/codecs.
Storage: high-capacity disks/SSDs, optical media, multimedia databases.
System & application software: OS with real-time/continuous-media support, codecs, authoring tools, media players, multimedia DBMS.
Communication / network: high-bandwidth network with QoS for transmission and streaming.
Output / presentation devices: display/monitor, speakers, projectors (DAC for playback).

Answer 3

Multimedia Synchronization

Multimedia synchronization is the task of maintaining the correct temporal (and sometimes spatial) relationships among media objects during presentation, so that the user perceives a coherent, consistent presentation. It covers timing within a single stream and between multiple streams.

Intra-media (Intra-stream) Synchronization

Concerns the timing relationship within a single continuous medium.
Ensures the individual units of one stream (e.g., video frames, audio samples) are presented at the correct, even rate.
Example: playing video at a constant 25 frames/second; controlling jitter (variation in inter-frame delay) so playback is smooth.

Inter-media (Inter-stream) Synchronization

Concerns timing relationships between two or more media streams.
Ensures related streams stay aligned with one another.
Classic example: lip-sync — keeping audio aligned with the corresponding video (skew should stay within ≈ ±80 ms). Other examples: slides synchronized with a narration, subtitles with audio.

Reference Model for Multimedia Synchronization (Four-Layer Model)

Synchronization is commonly described by a layered reference model, each layer offering services and an interface to the layer above:

Media Layer — operates on a single continuous media stream as a sequence of Logical Data Units (LDUs) (e.g., frames/samples); handles intra-stream timing of one stream.
Stream Layer — operates on whole streams and groups of streams; provides inter-stream synchronization and guarantees about delay/jitter (QoS) for continuous media.
Object Layer — hides the difference between continuous and discrete media; works on a complete presentation, computing and enforcing a synchronization schedule for all media objects.
Specification Layer — the topmost, open layer where authors specify the synchronization (e.g., via timeline, hierarchical, reference-point or event-based specifications); maps the author's intent down to the object layer.

This model separates what synchronization is required (specification) from how it is achieved (lower layers), making synchronized multimedia presentations easier to author and implement.

Answer 4

Entropy in Coding

Entropy is a measure of the average information content (uncertainty) per symbol of an information source. For a source emitting symbols $s_i$ with probabilities $p_i$ , Shannon entropy is:

H = -\sum_{i} p_i \log_2 p_i \quad \text{(bits/symbol)}

Relation to compression: Entropy gives the theoretical lower bound on the average number of bits needed per symbol for lossless coding — no lossless code can use fewer than $H$ bits/symbol on average (Shannon's source coding theorem). Entropy (variable-length) coders such as Huffman and arithmetic coding assign shorter codewords to frequent symbols and longer ones to rare symbols, driving the average code length toward $H$ . The closer the source's redundancy (difference between actual bits used and $H$ ) is to zero, the more efficient the compression. A highly predictable source has low entropy and compresses well; a random, uniform source has maximum entropy and is essentially incompressible.

Answer 5

Arithmetic Coding

Arithmetic coding is an entropy-coding technique that encodes an entire message as a single number — a fraction in the interval $[0, 1)$ — rather than assigning a separate codeword to each symbol.

How it works:

Start with the interval $[0, 1)$ .
For each input symbol, subdivide the current interval into sub-intervals whose widths are proportional to the symbols' probabilities.
Select the sub-interval corresponding to the actual symbol and make it the new current interval.
After the last symbol, output any number lying inside the final interval; its binary fraction is the compressed code. Decoding reverses the process.

Differences from Huffman Coding

Aspect	Huffman Coding	Arithmetic Coding
Codeword unit	One integer-length codeword per symbol	One fractional number for the whole message
Bits per symbol	Must be a whole number of bits ( $\ge 1$ )	Can be a fractional number of bits
Optimality	Optimal only when probabilities are powers of $\tfrac{1}{2}$	Approaches the entropy bound very closely for any probabilities
Adaptivity	Adaptive Huffman possible but rebuilds tree	Naturally supports adaptive/context models
Complexity	Simpler, faster	More computation (arithmetic precision/renormalization)

Key advantage: because Huffman wastes up to nearly 1 bit per symbol (rounding to whole bits), arithmetic coding achieves compression closer to the true entropy, especially when symbol probabilities are highly skewed (e.g., a symbol with probability > 0.5).

Answer 6

Motion Estimation and Motion Compensation

These are the core techniques used to remove temporal redundancy between successive frames in inter-frame video compression (MPEG, H.26x). Consecutive frames are usually very similar, so it is efficient to predict a frame from a previously coded reference frame.

Motion Estimation (ME)

The current frame is divided into macroblocks (e.g., 16×16 pixels).
For each block, the encoder searches a region of the reference frame to find the best-matching block.
The displacement between the current block and its best match is recorded as a motion vector (MV) $(dx, dy)$ .
The match is found by minimizing a cost such as SAD (Sum of Absolute Differences) or MSE over a search window, using strategies like full search, three-step search or logarithmic search.

Motion Compensation (MC)

Using the motion vectors, the encoder predicts the current frame by fetching the matched blocks from the reference frame.
It then computes the residual (prediction error) = current block − predicted block.
Only the motion vectors plus the (small) residual are transmitted, instead of the full block. The residual is further compressed with DCT, quantization and entropy coding.

Result: Because the residual and motion vectors are much smaller than the raw frame, ME/MC give large compression gains. This is how MPEG produces P-frames (predicted from a past frame) and B-frames (predicted bidirectionally from past and future frames).

Answer 7

MP3 Audio Compression and Psychoacoustic Models

MP3 (MPEG-1/2 Audio Layer III) is a popular lossy audio compression standard that reduces PCM audio to roughly one-tenth its size (e.g., 1.4 Mbps CD audio → 128 kbps) with little perceptible quality loss.

MP3 Encoding (brief)

The PCM signal is split into frames and passed through a filter bank / MDCT into frequency sub-bands.
A psychoacoustic model runs in parallel to decide the bit allocation per band.
Each band is quantized, using more bits for perceptually important bands and fewer (coarser) for masked bands.
The quantized values are losslessly packed with Huffman (entropy) coding into the bitstream.

Psychoacoustic Model

The psychoacoustic model captures the limits of human hearing so that inaudible information can be discarded:

Absolute threshold of hearing: sounds quieter than the ear can detect are removed.
Frequency (simultaneous) masking: a loud tone raises the hearing threshold around its frequency, hiding nearby softer tones, which can then be coded coarsely or dropped.
Temporal masking: sounds just before and after a strong sound are masked.

The encoder computes a masking threshold per band and keeps the quantization noise below it, so the discarded data and added noise remain inaudible — yielding high compression with good perceived quality.

Answer 8

Computer Animation

Computer animation is the technique of creating the illusion of motion by rapidly displaying a sequence of computer-generated still images (frames). When frames showing slightly different states of objects are shown fast enough (typically ≥ 24 frames/second), the human eye perceives continuous movement (persistence of vision).

Frame-based vs Key-frame Animation

Aspect	Frame-based animation	Key-frame animation
Definition	The animator/artist creates every individual frame of the sequence.	The animator defines only the important key frames (start/end of a motion); the system generates the frames in between by interpolation (tweening).
Effort	High — labour-intensive, every frame drawn.	Lower — only key frames specified; computer fills the rest.
Control	Total control over each frame.	Less manual control of intermediate frames (depends on interpolation).
Examples	Traditional flip-book / cel animation, GIF frame sequences.	Modern 3D/2D tools (Flash, Maya, Blender) using keyframes and tweening.

Summary: Frame-based animation specifies the content of every frame explicitly, whereas key-frame animation specifies only selected key states and lets the computer interpolate the in-between (tween) frames, saving effort and producing smooth motion.

Answer 9

Hypermedia

Hypermedia is an extension of hypertext in which the linked nodes of information may contain any type of media — text, graphics, images, audio, video and animation — connected by hyperlinks that the user can navigate non-linearly. The World Wide Web is the most familiar example of a hypermedia system.

Hypertext vs Hypermedia

Aspect	Hypertext	Hypermedia
Content of nodes	Primarily text linked to other text.	Any media — text, image, audio, video, animation.
Scope	A subset / special case.	A superset that includes hypertext.
Links	Text-to-text links.	Links can originate from or point to multimedia objects.
Example	A purely text document with cross-reference links.	A web page mixing text, pictures, video and audio with clickable links.

Summary: Hypertext links text in a non-linear way; hypermedia generalizes this idea to link multiple media types. Thus all hypertext is hypermedia, but not all hypermedia is purely hypertext.

Answer 10

Multimedia Streaming

Multimedia streaming is the technique of transmitting audio/video over a network and playing it back continuously as it arrives, so the user does not have to download the whole file first. The client buffers a small amount of data and begins playback while the rest keeps downloading.

Types: On-demand streaming (stored media, e.g., YouTube/VOD) and live streaming (real-time events). Protocols include RTP/RTSP and modern HTTP adaptive streaming (HLS, MPEG-DASH).

Issues Involved

Bandwidth limitations: the network may not sustain the media's required bit rate, causing stalls; addressed by compression and adaptive bitrate streaming.
Delay (latency): end-to-end delay must be low, especially for live/interactive media.
Jitter: variation in packet arrival times disrupts smooth playback; handled with playback buffers / jitter buffers.
Packet loss / errors: over best-effort networks (UDP), lost packets degrade quality; need error concealment, FEC or retransmission.
Quality of Service (QoS): the Internet is best-effort, so guaranteeing bandwidth, bounded delay and loss is difficult.
Synchronization: keeping audio and video in step (lip-sync).
Scalability: serving many simultaneous clients (multicast, CDNs).

Answer 11

Discrete Cosine Transform (DCT)

The DCT is a mathematical transform that converts a block of spatial-domain pixel values into a set of frequency-domain coefficients using cosine basis functions. It is the heart of JPEG and MPEG image/video compression, where it is applied to 8×8 blocks.

The 2-D forward DCT of an $N\times N$ block $f(x,y)$ is:

F(u,v) = \frac{2}{N} C(u)C(v) \sum_{x=0}^{N-1}\sum_{y=0}^{N-1} f(x,y)\cos\!\Big[\frac{(2x+1)u\pi}{2N}\Big]\cos\!\Big[\frac{(2y+1)v\pi}{2N}\Big]

where $C(k)=\tfrac{1}{\sqrt{2}}$ for $k=0$ and $C(k)=1$ otherwise. The coefficient $F(0,0)$ is the DC (average) term; the rest are AC terms representing increasing spatial frequencies.

Importance in Image Compression

Energy compaction: the DCT concentrates most of a block's signal energy into a few low-frequency coefficients (top-left), leaving most high-frequency coefficients near zero.
Exploits human vision: the eye is less sensitive to high spatial frequencies, so high-frequency coefficients can be quantized coarsely or discarded with little perceived loss.
Enables high compression: after quantization, many coefficients become zero; zig-zag scanning + run-length + entropy (Huffman) coding then pack them very compactly.
It is reversible (an inverse DCT reconstructs the block), and the loss is introduced only at the quantization step, allowing a controllable quality/size trade-off.

Answer 12

Common Multimedia File Formats

JPEG (.jpg / .jpeg)

Joint Photographic Experts Group — a lossy image format.
Uses DCT-based compression with quantization and entropy coding; quality/size is adjustable.
Best for photographs / continuous-tone images with many colours; supports ~16.7 M colours but no transparency and is poor for sharp text/line art (blocking artifacts).

GIF (.gif)

Graphics Interchange Format — lossless but limited to a 256-colour (8-bit) palette; uses LZW compression.
Supports 1-bit transparency and simple animation (multiple frames).
Best for logos, icons, simple graphics and short animations; unsuitable for full-colour photos.

PNG (.png)

Portable Network Graphics — lossless raster format using DEFLATE compression.
Supports true colour (24-bit) and full 8-bit alpha transparency; no native animation (single image).
Best for graphics, screenshots, line art and images needing transparency; files larger than equivalent JPEG for photos.

MPEG (.mpg / .mp4 …)

Moving Picture Experts Group — a family of lossy audio/video compression standards (MPEG-1, -2, -4, etc.).
Uses DCT (spatial) plus motion estimation/compensation (temporal) with I-, P- and B-frames, and includes audio coding (e.g., MP3 = MPEG Layer III, AAC).
Used for video (DVD, digital TV, streaming, MP4 files); achieves high compression for moving pictures.

Summary: JPEG = lossy photos; GIF = lossless 256-colour + simple animation; PNG = lossless true-colour with transparency; MPEG = lossy video/audio using spatial + temporal compression.

Level	BSc CSIT (TU)
Stream	Science
Subject	Multimedia Computing (BSc CSIT, CSC467)
Year	2075 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions