BSc CSIT (TU) Science Artificial Intelligence (BSc CSIT, CSC261) Question Paper 2079 Nepal

Q: Where can I find the BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) question paper 2079?

The full BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2079 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Artificial Intelligence (BSc CSIT, CSC261) 2079 paper come with solutions?

Yes. Every question on this Artificial Intelligence (BSc CSIT, CSC261) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2079 paper?

The BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2079 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Artificial Intelligence (BSc CSIT, CSC261) past paper free?

Yes — reading and attempting this Artificial Intelligence (BSc CSIT, CSC261) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What is machine learning? Differentiate between supervised, unsupervised, and reinforcement learning with examples and discuss the perceptron learning rule.

machine-learningperceptron

Answer 1

Machine Learning

Machine Learning (ML) is a subfield of Artificial Intelligence that gives computers the ability to learn patterns from data and improve their performance on a task with experience, without being explicitly programmed for every case. Formally (Tom Mitchell): a program learns from experience E with respect to task T and performance measure P, if its performance on T, as measured by P, improves with E.

Types of Machine Learning

Type	Training Data	Goal	Example
Supervised	Labeled (input + correct output)	Learn a mapping $x \to y$ to predict labels	Spam detection, house-price prediction (regression/classification)
Unsupervised	Unlabeled (input only)	Discover hidden structure/patterns	Customer segmentation by k-means clustering, PCA
Reinforcement	No fixed dataset; rewards from environment	Learn a policy that maximizes cumulative reward	Game playing (AlphaGo), robot navigation

Supervised learning learns from labeled examples. Each training instance is a pair $(x, y)$ and the model is trained to minimize the error between predicted $\hat{y}$ and true $y$ . Subtypes: classification (discrete labels) and regression (continuous output).
Unsupervised learning has no labels; the algorithm finds structure such as clusters or lower-dimensional representations.
Reinforcement learning has an agent that takes actions in an environment, receives rewards/penalties, and learns an optimal policy by trial and error (exploration vs. exploitation).

Perceptron Learning Rule

A perceptron is the simplest single-layer neural unit. It computes a weighted sum of inputs plus a bias and applies a step activation:

\hat{y} = f\!\left(\sum_{i=1}^{n} w_i x_i + b\right), \quad f(z)=\begin{cases}1 & z \ge 0\\ 0 & z < 0\end{cases}

The perceptron learning rule updates weights only when the prediction is wrong, moving the decision boundary toward correct classification:

w_i \leftarrow w_i + \eta\,(y - \hat{y})\,x_i, \qquad b \leftarrow b + \eta\,(y - \hat{y})

where $\eta$ is the learning rate, $y$ the target, and $\hat{y}$ the output.

Algorithm:

1. Initialize weights w and bias b (small/zero values).
2. For each training example (x, y):
     a. Compute output yhat = f(w·x + b).
     b. Update: w = w + η(y - yhat)x ;  b = b + η(y - yhat).
3. Repeat over epochs until no errors (or max epochs reached).

Perceptron Convergence Theorem: if the data is linearly separable, the rule is guaranteed to converge to a separating hyperplane in finite steps. Its key limitation is that a single perceptron cannot solve non-linearly-separable problems (e.g., XOR); multilayer networks are needed for those.

Answer 2

Reasoning Under Uncertainty

In the real world an agent rarely has complete, certain knowledge. Reasoning under uncertainty is the process of drawing rational conclusions when information is incomplete, noisy, or probabilistic. Classical (Boolean) logic fails here because it cannot express "probably" or "likely." Instead, AI uses probability theory to represent degrees of belief in propositions, allowing an agent to make the best decision given the available evidence. Sources of uncertainty include incomplete sensors, noisy data, and a non-deterministic environment.

Bayes' Theorem

Bayes' theorem relates a conditional probability to its inverse, letting us update beliefs when new evidence arrives:

P(H\mid E) = \frac{P(E\mid H)\,P(H)}{P(E)}

where:

$P(H)$ = prior probability of hypothesis $H$ ,
$P(E\mid H)$ = likelihood of evidence $E$ given $H$ ,
$P(E)$ = probability of the evidence (normalizing constant),
$P(H\mid E)$ = posterior probability of $H$ after observing $E$ .

Example: A disease affects 1% of people ( $P(D)=0.01$ ). A test is 99% accurate: $P(+\mid D)=0.99$ and $P(+\mid \neg D)=0.01$ . If a person tests positive:

P(D\mid +) = \frac{0.99\times0.01}{0.99\times0.01 + 0.01\times0.99} = \frac{0.0099}{0.0198} = 0.5

So even with a positive test, the probability of disease is only 50% — illustrating the importance of the prior.

Bayesian Belief Network (BBN)

A Bayesian Belief Network is a directed acyclic graph (DAG) that compactly represents the joint probability distribution over a set of random variables.

Structure:

Nodes = random variables.
Directed edges = direct probabilistic (causal) dependencies; an edge $A \to B$ means $A$ influences $B$ .
Each node stores a Conditional Probability Table (CPT) giving $P(\text{node}\mid \text{parents})$ .
A node is conditionally independent of its non-descendants given its parents.

The full joint distribution factorizes as:

P(X_1,\dots,X_n) = \prod_{i=1}^{n} P\big(X_i \mid \text{Parents}(X_i)\big)

Example (classic Burglary network): Variables Burglary (B), Earthquake (E), Alarm (A), JohnCalls (J), MaryCalls (M). Structure: $B \to A$ , $E \to A$ , $A \to J$ , $A \to M$ . The alarm depends on burglary and earthquake; whether John and Mary call depends on the alarm. Then

P(B,E,A,J,M) = P(B)\,P(E)\,P(A\mid B,E)\,P(J\mid A)\,P(M\mid A).

Use: Given evidence (e.g., John and Mary both call), the network performs probabilistic inference to compute the posterior, such as $P(\text{Burglary}\mid J=\text{true}, M=\text{true})$ . BBNs are used for diagnosis (medical), risk analysis, and decision support because they make inference tractable by exploiting conditional independence.

Answer 3

Natural Language Processing (NLP)

Natural Language Processing is a branch of AI concerned with enabling computers to understand, interpret, generate, and respond to human (natural) language — text or speech. It bridges human communication and machine understanding, combining linguistics with computational methods. Applications include machine translation, chatbots, sentiment analysis, speech recognition, and text summarization. NLP is usually divided into Natural Language Understanding (NLU) — extracting meaning from input — and Natural Language Generation (NLG) — producing language as output.

Phases of NLP

NLP processing typically proceeds through the following phases:

Lexical / Morphological Analysis — Breaks text into tokens (words) and analyzes word structure (roots, prefixes, suffixes). Example: unhappiness → un + happy + ness.
Syntactic Analysis (Parsing) — Checks grammatical structure and arranges words into a parse tree according to grammar rules. Example: rejects "Boy the go to" as ungrammatical.
Semantic Analysis — Derives the literal meaning of words and sentences, mapping syntactic structures to meaning and rejecting meaningless ones (e.g., "hot ice-cream").
Discourse Integration — Interprets a sentence in the context of preceding and following sentences (e.g., resolving the referent of "it" or "he").
Pragmatic Analysis — Interprets intended meaning based on real-world knowledge and context (e.g., "Can you pass the salt?" is a request, not a yes/no question).

Challenges in Natural Language Understanding

Ambiguity — the central problem:
- Lexical ambiguity: a word with multiple meanings (bank = riverbank / financial).
- Syntactic ambiguity: multiple parse trees ("I saw the man with the telescope").
- Semantic/referential ambiguity: unclear meaning or pronoun reference.
Context dependence — meaning changes with discourse and situation.
Idioms, metaphors, sarcasm and irony — non-literal language.
Synonymy and polysemy — many words for one concept and one word for many concepts.
Spelling errors, slang, code-mixing and informal grammar.
World knowledge requirement — understanding often needs common-sense knowledge not present in the text.

These challenges make robust language understanding one of the hardest problems in AI, motivating modern statistical and deep-learning approaches.

Answer 4

Rational agent: A rational agent is one that, for each possible percept sequence, selects the action that is expected to maximize its performance measure, given its built-in knowledge and the evidence provided by its percepts. In short, it "does the right thing" to achieve the best expected outcome.

Properties (dimensions) of a task environment — captured by the PEAS description and these classifications:

Fully observable vs. Partially observable — whether sensors give complete state information.
Deterministic vs. Stochastic — whether the next state is fully determined by the current state and action.
Episodic vs. Sequential — whether each decision is independent or affects future ones.
Static vs. Dynamic — whether the environment can change while the agent deliberates.
Discrete vs. Continuous — nature of states, time, percepts, and actions.
Single-agent vs. Multi-agent — whether other agents are involved (cooperative/competitive).
Known vs. Unknown — whether the agent knows the rules/outcomes of the environment.

Answer 5

Informed vs. Uninformed Search

Aspect	Uninformed (Blind) Search	Informed (Heuristic) Search
Domain knowledge	Uses no information beyond the problem definition	Uses problem-specific heuristic knowledge to guide search
Guidance	Explores blindly in a fixed order	Estimates how close a state is to the goal using $h(n)$
Efficiency	Generally less efficient; explores more nodes	More efficient; finds solutions faster with fewer expansions
Cost function	Considers only path cost so far ( $g(n)$ ), if any	Uses $h(n)$ (and often $f(n)=g(n)+h(n)$ )
Examples	BFS, DFS, Uniform-Cost Search, Depth-Limited, Iterative Deepening	Greedy Best-First Search, A* Search, Hill Climbing
Completeness/Optimality	Can be complete/optimal but expensive	Optimal/efficient when heuristic is admissible (e.g., A*)

Summary: Uninformed search has no clue about the goal's location and searches systematically but blindly, whereas informed search uses a heuristic estimate to move toward the goal more intelligently, reducing the search space.

Answer 6

Hill-Climbing Search

Hill-climbing is a local search and optimization algorithm that continually moves in the direction of increasing value (better heuristic) to reach a peak (goal/optimum). It is a greedy approach: at each step it examines the neighbors of the current state and moves to the neighbor with the best heuristic value, keeping no search tree.

Algorithm:

1. Start with an initial state (current).
2. Loop:
     a. Let neighbor = highest-valued successor of current.
     b. If value(neighbor) <= value(current): return current  (a peak).
     c. current = neighbor.

It is also called "greedy local search" because it picks the best immediate neighbor without looking ahead.

Problems (Drawbacks)

Local maximum — a state better than all its neighbors but lower than the global maximum; the algorithm stops there and never reaches the true goal.
Plateau (flat region) — a flat area where neighbors have equal values, giving no direction to proceed; the search wanders or halts.
Ridge — a sequence of local maxima where movement requires going through lower states; single-step moves cannot climb it, so progress stalls.

Remedies: random-restart hill climbing, simulated annealing, and stochastic/first-choice hill climbing help escape these traps.

Answer 7

Heuristic Function

A heuristic function, denoted $h(n)$ , is a function that estimates the cost of the cheapest path from a given node $n$ to the goal state. It provides problem-specific guidance to an informed search algorithm so it can prefer the most promising nodes and reach the goal faster, without exploring the entire search space.

$h(n) \ge 0$ , and $h(\text{goal}) = 0$ .
A heuristic is admissible if it never overestimates the true cost to the goal (this makes A* optimal).

Example — 8-puzzle: Two common heuristics estimate how far a state is from the solved configuration:

Misplaced tiles ( $h_1$ ): the number of tiles not in their goal position.
Manhattan distance ( $h_2$ ): the sum, over all tiles, of the horizontal + vertical distance of each tile from its goal position.

Example — route finding: The straight-line (Euclidean) distance from a city to the destination is an admissible heuristic for road navigation, since the actual road path can only be longer:

h(n) = \sqrt{(x_n - x_{goal})^2 + (y_n - y_{goal})^2}.

Answer 8

Forward and Backward Chaining

Both are inference methods used in rule-based / expert systems to derive conclusions from a knowledge base of facts and IF-THEN rules.

Forward Chaining (Data-Driven)

Starts from the known facts and applies inference rules to derive new facts, repeatedly, until the goal is reached or no new facts can be added. It works bottom-up, from data to conclusion.

Used when many possible conclusions follow from a set of facts.
Example: monitoring/control systems, expert systems that react to incoming data.

Backward Chaining (Goal-Driven)

Starts from the goal (hypothesis) and works backward, looking for rules whose conclusion is the goal, then trying to prove their premises as sub-goals, recursively, until they match known facts.

Used when there is a specific goal/query to verify.
Example: diagnostic systems, Prolog query resolution.

Comparison

Forward Chaining	Backward Chaining
Data-driven (facts → goal)	Goal-driven (goal → facts)
Bottom-up reasoning	Top-down reasoning
Derives all possible conclusions	Proves only the required goal
Can be slower if many irrelevant facts	More focused/efficient for a single query

Illustration: Rules: R1: A∧B → C, R2: C → D. Facts: A, B.

Forward: A,B ⇒ (R1) C ⇒ (R2) D.
Backward (prove D): D needs C (R2) → C needs A∧B (R1) → A,B are facts → D proved.

Answer 9

Propositional Logic vs. Predicate Logic

Aspect	Propositional Logic	Predicate (First-Order) Logic
Basic unit	Whole propositions (statements that are true/false)	Objects, predicates (properties/relations), and functions
Quantifiers	None	Uses $\forall$ (for all) and $\exists$ (there exists)
Expressive power	Limited; cannot express relations between objects	More powerful; expresses relations and generalizations
Variables	No variables	Uses variables and constants for objects
Structure	Atomic symbols (P, Q, R) joined by $\land,\lor,\neg,\to$	Predicates with arguments, e.g., $Likes(x,y)$
Example	$P$ : "Socrates is a man" (a single symbol)	$\forall x\,[Man(x) \to Mortal(x)]$

Explanation: Propositional logic treats each statement as an indivisible true/false symbol and cannot break a statement into its objects and relationships. For instance, "All men are mortal" must be a single symbol $P$ — we cannot reason about individual men.

Predicate logic extends propositional logic by introducing predicates, terms (objects/variables/functions), and quantifiers. This lets us express statements like $\forall x\,(Man(x) \to Mortal(x))$ and infer $Mortal(\text{Socrates})$ from $Man(\text{Socrates})$ . Thus predicate logic is strictly more expressive and is the basis for richer knowledge representation in AI.

Answer 10

Semantic Network

A semantic network is a graphical knowledge-representation scheme that represents knowledge as a network of nodes and labeled edges (arcs/links):

Nodes represent objects, concepts, or events.
Edges (links) represent the relationships between them.

It expresses knowledge in the form of interconnected concepts, making relationships explicit and supporting inheritance of properties.

Common link types

IS-A (subclass/instance): shows class hierarchy and supports property inheritance. e.g., Dog IS-A Mammal.
HAS-A / part-of: shows possession or composition. e.g., Dog HAS-A Tail.
Property links: attach attributes. e.g., Bird can Fly.

Example

Consider representing: "Tweety is a bird; a bird is an animal; a bird can fly."

   Animal
     ^ IS-A
   Bird ---- can ----> Fly
     ^ IS-A
  Tweety

Here Tweety → IS-A → Bird → IS-A → Animal, and Bird → can → Fly. By inheritance, since Tweety is a Bird, we infer Tweety is an Animal and can fly.

Advantage: intuitive, visual, supports inheritance. Limitation: lacks standard semantics for quantifiers and can become ambiguous for complex logic.

Answer 11

Frame-Based Knowledge Representation

A frame is a data structure (proposed by Marvin Minsky) for representing a stereotyped object, situation, or concept as a collection of attributes and their values. Frame-based representation organizes knowledge into frames that resemble records/objects, making it a structured and object-oriented scheme.

Structure of a frame

A frame consists of:

Frame name — the concept/object it represents.
Slots — the attributes/properties of the object.
Fillers (slot values) — the values of those slots; a value can be data, a default, a range, a pointer to another frame, or an attached procedure (procedural attachment/demon).

Frame: Car
  IS-A:        Vehicle        (inheritance from parent frame)
  Wheels:      4              (default value)
  Fuel:        Petrol
  Engine:      <pointer to Engine frame>
  Color:       Red
  Start:       <procedure to start the car>

Key features

Inheritance: a frame can inherit slots/values from a parent (super) frame via IS-A links, reducing redundancy (e.g., Car inherits properties of Vehicle).
Default values: slots can have defaults used when specific information is missing.
Procedural attachment: slots may hold procedures (if-needed / if-added demons) triggered when values are accessed or changed.
Slot–filler organization makes knowledge modular and easy to update.

Advantages: organized, supports inheritance and defaults, close to human concept structuring. Limitation: lacks formal logical inference and can be inconsistent without careful design.

Answer 12

Activation Function

An activation function is a mathematical function applied to the weighted sum of inputs of a neuron to produce its output. Its main role is to introduce non-linearity into a neural network, enabling it to learn and represent complex, non-linear relationships (without it, even a deep network would behave like a single linear model). It also bounds/transforms the neuron's output into a usable range.

For a neuron, output $= f\!\left(\sum_i w_i x_i + b\right)$ , where $f$ is the activation function.

Two common activation functions

Sigmoid:

\sigma(z) = \frac{1}{1 + e^{-z}}

Outputs values in the range $(0, 1)$ ; useful for probabilities/binary outputs.

ReLU (Rectified Linear Unit):

f(z) = \max(0, z)

Outputs $z$ if positive, else 0; widely used in deep networks for fast, sparse activation.

(Other examples: Step/Threshold, Tanh, Softmax.)

Level	BSc CSIT (TU)
Stream	Science
Subject	Artificial Intelligence (BSc CSIT, CSC261)
Year	2079 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions