Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Explain knowledge representation using predicate logic. Convert given English sentences into First-Order Predicate Logic and explain the resolution method with an example.

Knowledge Representation using Predicate Logic

Knowledge representation (KR) is the way facts about the world are encoded in a form a computer can use to reason and derive new facts. Predicate (First-Order) Logic is a powerful KR scheme that extends propositional logic with predicates, variables, functions and quantifiers, letting us express relationships between objects and general statements.

Its building blocks are:

  • Constants (objects): John, Nepal
  • Variables: x, y
  • Predicates (relations): Likes(x, y), Man(x)
  • Functions: father(x)
  • Connectives: ,,¬,\land, \lor, \lnot, \rightarrow
  • Quantifiers: universal \forall and existential \exists

Converting English to First-Order Predicate Logic

English sentenceFOPL
All men are mortal.x[Man(x)Mortal(x)]\forall x\,[\,Man(x) \rightarrow Mortal(x)\,]
Socrates is a man.Man(Socrates)Man(Socrates)
Some students are intelligent.x[Student(x)Intelligent(x)]\exists x\,[\,Student(x) \land Intelligent(x)\,]
Everyone who loves all animals is loved by someone.x[(y(Animal(y)Loves(x,y)))zLoves(z,x)]\forall x\,[\,(\forall y\,(Animal(y)\rightarrow Loves(x,y))) \rightarrow \exists z\,Loves(z,x)\,]

Resolution Method

Resolution is a sound and complete inference rule used to prove a goal by refutation: assume the negation of the goal, add it to the knowledge base, and derive a contradiction (the empty clause \square).

Steps:

  1. Convert all sentences to Conjunctive Normal Form (CNF) — eliminate \rightarrow, move ¬\lnot inward, standardize variables, Skolemize existentials, drop \forall, distribute \lor over \land into clauses.
  2. Negate the goal and add it to the clause set.
  3. Repeatedly apply the resolution rule with unification: from (AP)(A \lor P) and (¬PB)(\lnot P' \lor B) where PP unifies with PP' under MGU θ\theta, derive the resolvent (AB)θ(A \lor B)\theta.
  4. If the empty clause is derived, the original goal is proven.

Example — Prove Mortal(Socrates):

Knowledge base in CNF:

  1. ¬Man(x)Mortal(x)\lnot Man(x) \lor Mortal(x) (from "All men are mortal")
  2. Man(Socrates)Man(Socrates)

Negated goal: 3. ¬Mortal(Socrates)\lnot Mortal(Socrates)

Resolution:

  • Resolve (1) and (3) with θ={x/Socrates}\theta = \{x/Socrates\} \Rightarrow ¬Man(Socrates)\lnot Man(Socrates) (clause 4)
  • Resolve (2) and (4) \Rightarrow \square (empty clause)

A contradiction is reached, so Mortal(Socrates)Mortal(Socrates) is proved.

knowledge-representationpredicate-logicresolution
2long10 marks

How do you relate the biological neuron (synapse, dendrite, axon) to an artificial neural network? Construct a multi-layer ANN and illustrate the back-propagation learning algorithm.

Biological Neuron vs Artificial Neuron

An artificial neural network (ANN) is loosely modelled on the biological neuron. The correspondence is:

Biological neuronArtificial neuron (ANN)
Dendrites receive signals from other neuronsInputs x1,x2,,xnx_1, x_2, \dots, x_n
Synapse strength controls signal influenceWeights w1,w2,,wnw_1, w_2, \dots, w_n
Cell body (soma) sums incoming signalsSummation net=iwixi+bnet = \sum_i w_i x_i + b
Axon transmits the output if the neuron firesActivation function output y=f(net)y = f(net)
Firing thresholdBias / threshold bb

Thus each artificial neuron computes y=f ⁣(iwixi+b)y = f\!\left(\sum_i w_i x_i + b\right), where ff is an activation function such as the sigmoid f(net)=11+enetf(net)=\dfrac{1}{1+e^{-net}}.

Multi-Layer ANN

A multi-layer perceptron (MLP) has:

  • an input layer (one node per feature),
  • one or more hidden layers of neurons,
  • an output layer.

Layers are fully connected feed-forward; signals flow input \rightarrow hidden \rightarrow output. (Diagram in words: inputs x1,x2x_1,x_2 feed every hidden neuron h1,h2h_1,h_2; the hidden neurons feed the output neuron o1o_1, each connection carrying a weight.)

Back-Propagation Learning Algorithm

Back-propagation trains the network by gradient descent, minimising the error E=12(tkok)2E = \frac{1}{2}\sum (t_k - o_k)^2 where tkt_k is the target and oko_k the actual output.

Steps:

  1. Initialize all weights to small random values.
  2. Forward pass: present an input, compute outputs layer by layer using y=f(net)y=f(net).
  3. Compute output error: for each output neuron kk,
δk=(tkok)f(netk)=(tkok)ok(1ok)\delta_k = (t_k - o_k)\,f'(net_k) = (t_k - o_k)\,o_k(1-o_k)
  1. Back-propagate error to hidden neurons: for each hidden neuron jj,
δj=f(netj)kδkwjk=oj(1oj)kδkwjk\delta_j = f'(net_j)\sum_k \delta_k w_{jk} = o_j(1-o_j)\sum_k \delta_k w_{jk}
  1. Update weights using learning rate η\eta:
wijwij+ηδjxiw_{ij} \leftarrow w_{ij} + \eta\,\delta_j\,x_i
  1. Repeat for all training examples over many epochs until the error is acceptably small.

This lets the error signal propagate backward from output to input, adjusting every weight in the direction that reduces total error.

neural-networkbackpropagation
3long10 marks

Explain adversarial search. Describe the Minimax algorithm and Alpha-Beta pruning with an example game tree.

Adversarial Search

Adversarial search is search in a competitive environment where two or more agents have opposing goals — typically two-player, zero-sum, perfect-information games (e.g. chess, tic-tac-toe). One player (MAX) tries to maximise the score while the opponent (MIN) tries to minimise it. The problem is modelled as a game tree: nodes are states, edges are moves, and terminal nodes have a utility value.

Minimax Algorithm

Minimax computes the optimal move assuming the opponent also plays optimally.

  • At MAX nodes, take the maximum of the children's values.
  • At MIN nodes, take the minimum of the children's values.
  • At terminal nodes, use the utility value.
function MINIMAX(node, isMax):
    if node is terminal: return utility(node)
    if isMax:
        best = -inf
        for child in node: best = max(best, MINIMAX(child, false))
        return best
    else:
        best = +inf
        for child in node: best = min(best, MINIMAX(child, true))
        return best

It performs a complete depth-first exploration; time complexity O(bm)O(b^m), space O(bm)O(bm), for branching factor bb and depth mm.

Alpha-Beta Pruning

Alpha-beta pruning returns the same result as minimax but prunes branches that cannot affect the final decision, using two bounds:

  • α\alpha = best (highest) value found so far for MAX,
  • β\beta = best (lowest) value found so far for MIN.

Prune (stop exploring) whenever αβ\alpha \ge \beta. With good move ordering it reduces complexity to about O(bm/2)O(b^{m/2}), roughly doubling the searchable depth.

Example Game Tree

MAX root with two MIN children. Leaf values left to right:

  • Left MIN node children: 3,12,83, 12, 8 \Rightarrow MIN value =3= 3.
  • Right MIN node children: 2,4,62, 4, 6.

At the root, α=3\alpha = 3 after the left subtree. Exploring the right MIN node, its first child is 22, so its value 2<α=3\le 2 < \alpha = 3. Since the root (MAX) already has 33, the remaining children 4,64, 6 of the right node need not be examined — they are pruned. Root value =3= 3, and MAX chooses the left branch.

adversarial-searchminimaxalpha-beta
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Explain the hill-climbing search algorithm and its problems.

Hill-Climbing Search

Hill climbing is a local search / optimization algorithm that starts from an arbitrary solution and iteratively moves to a neighbouring state with a higher (better) heuristic value, stopping when no neighbour is better. It is a greedy, memory-efficient method that keeps only the current state.

current = initial_state
loop:
    neighbor = highest-valued successor of current
    if value(neighbor) <= value(current): return current
    current = neighbor

Problems of Hill Climbing

  1. Local maximum: a peak that is higher than its neighbours but lower than the global maximum; the algorithm stops there.
  2. Plateau: a flat region where all neighbours have the same value, giving no direction to move.
  3. Ridge: a sequence of high points where each single move leads downhill, even though progress is possible along the ridge.

These can be reduced with random-restart hill climbing, simulated annealing, or stochastic variants.

searchhill-climbing
5short5 marks

What is a heuristic function? Give an example.

Heuristic Function

A heuristic function h(n)h(n) is a function that estimates the cost of the cheapest path from a node nn to the goal. It uses problem-specific knowledge to guide search toward the goal faster, without guaranteeing the estimate is exact. A heuristic is admissible if it never overestimates the true cost, which makes algorithms like A* optimal.

In A*, the evaluation function is:

f(n)=g(n)+h(n)f(n) = g(n) + h(n)

where g(n)g(n) is the actual cost from start to nn and h(n)h(n) is the heuristic estimate to the goal.

Example

In a route-finding problem (e.g. finding a road path between two cities), the straight-line (Euclidean) distance from the current city to the destination is an admissible heuristic, since the real road distance can never be shorter than the straight line. For the 8-puzzle, the number of misplaced tiles or the Manhattan distance of tiles from their goal positions are common heuristics.

heuristicsearch
6short5 marks

Explain forward and backward chaining in inference.

Forward and Backward Chaining

Both are inference techniques used in rule-based (production) systems with rules of the form IF condition THEN conclusion.

Forward Chaining (data-driven)

Starts from the known facts and repeatedly applies rules whose conditions are satisfied, adding new facts, until the goal is derived or no more rules fire. It works from data toward conclusions.

  • Example: Facts: Rainy. Rules: IF Rainy THEN Cloudy, IF Cloudy THEN CarryUmbrella. Forward chaining derives Cloudy, then CarryUmbrella.
  • Use: monitoring, expert systems that react to incoming data.

Backward Chaining (goal-driven)

Starts from the goal (hypothesis) and works backward, looking for rules whose conclusion is the goal, then trying to prove each of their conditions as sub-goals, recursively, until they reduce to known facts.

  • Example: Goal: CarryUmbrella? \rightarrow needs Cloudy \rightarrow needs Rainy \rightarrow Rainy is a known fact, so the goal succeeds.
  • Use: diagnostic and query systems (e.g. Prolog).
AspectForward chainingBackward chaining
DirectionFacts \rightarrow goalGoal \rightarrow facts
ApproachData-drivenGoal-driven
Best whenFew facts, many goalsSpecific goal to prove
inferencereasoning
7short5 marks

Differentiate between propositional logic and predicate logic.

Propositional Logic vs Predicate Logic

AspectPropositional LogicPredicate (First-Order) Logic
Basic unitA whole statement (proposition) that is true or false, e.g. PP = "It is raining".Objects, predicates and relations among them, e.g. Raining(today)Raining(today).
QuantifiersNone.Has \forall (universal) and \exists (existential).
Variables / objectsCannot represent individual objects or variables.Can represent objects, variables, functions.
Expressive powerLimited; cannot express general statements about all/some objects.Much richer; can express "All men are mortal".
ExamplePQP \rightarrow Qx[Man(x)Mortal(x)]\forall x\,[Man(x) \rightarrow Mortal(x)]
Structure inside statementTreats statements as indivisible atoms.Breaks statements into predicates + arguments.

Summary: Propositional logic deals only with whole true/false facts and connectives (,,¬,\land,\lor,\lnot,\rightarrow), whereas predicate logic adds quantifiers, predicates, variables and functions, making it far more expressive for representing general knowledge.

logicknowledge-representation
8short5 marks

What is a semantic network? Explain with an example.

Semantic Network

A semantic network is a graphical (network) knowledge-representation scheme in which knowledge is represented as a graph of nodes and labelled edges (links).

  • Nodes represent objects, concepts or events.
  • Links (arcs) represent relationships between them.

The two most important relationships are:

  • IS-A (subclass / class membership) — supports inheritance of properties.
  • HAS-A / part-of and other property links.

Properties defined for a general class are inherited by its members, which makes storage compact.

Example

Consider the facts: A Sparrow is a Bird; a Bird is an Animal; a Bird can Fly; a Sparrow has colour Brown.

In words, the network is:

Animal
  ^ IS-A
Bird ---- can ----> Fly
  ^ IS-A
Sparrow ---- colour ----> Brown

Here Sparrow --IS-A--> Bird --IS-A--> Animal. Because Bird can Fly, the node Sparrow inherits the property that it can fly without storing it explicitly. This inheritance and the clear visual structure are the main advantages of semantic networks.

knowledge-representationsemantic-network
9short5 marks

Explain the frame-based knowledge representation scheme.

Frame-Based Knowledge Representation

A frame is a data structure that represents a stereotyped object, situation or concept by grouping together all knowledge about it. It is an object-oriented style of KR proposed by Marvin Minsky.

A frame consists of:

  • A frame name (the concept it describes).
  • A set of slots (attributes / properties).
  • Slot values (fillers), which may be specific values, default values, or even procedures (demons) such as if-needed and if-added that compute values when required.

Frames are organised into a hierarchy linked by IS-A/instance relations, so a frame inherits slots and default values from its parent frames — reducing redundancy.

Example

Frame: Car
  IS-A:        Vehicle
  Wheels:      4            (default)
  Fuel:        Petrol       (default)
  Engine:      <required>

Frame: MyCar
  INSTANCE-OF: Car
  Owner:       Ram
  Colour:      Red
  Wheels:      (inherited = 4)

MyCar inherits Wheels = 4 and Fuel = Petrol from the Car frame. Frames thus combine declarative knowledge (slots) with procedural attachments and inheritance, making them suitable for representing structured, real-world objects.

knowledge-representationframes
10short5 marks

What is an activation function? List any two activation functions.

Activation Function

An activation function is a function applied to the weighted sum of a neuron's inputs (net=iwixi+bnet = \sum_i w_i x_i + b) to produce the neuron's output. It introduces non-linearity, allowing the neural network to learn complex, non-linear mappings; it also typically squashes the output into a bounded range and decides whether/how strongly the neuron "fires".

Two Common Activation Functions

  1. Sigmoid (logistic):   f(x)=11+ex  \;f(x) = \dfrac{1}{1 + e^{-x}}\; — outputs in (0,1)(0,1), smooth and differentiable.
  2. ReLU (Rectified Linear Unit):   f(x)=max(0,x)  \;f(x) = \max(0, x)\; — outputs 00 for negatives, xx otherwise; fast and avoids vanishing gradients.

(Other examples: tanh, step/threshold, softmax.)

neural-network
11short5 marks

Differentiate between supervised and unsupervised learning.

Supervised vs Unsupervised Learning

AspectSupervised LearningUnsupervised Learning
Training dataLabelled — each input has a known output/target.Unlabelled — only inputs, no target outputs.
GoalLearn a mapping input \rightarrow output to predict labels for new data.Discover hidden structure, patterns or groupings in data.
TasksClassification, regression.Clustering, association, dimensionality reduction.
ExamplesLinear/logistic regression, decision trees, SVM, k-NN, neural networks.k-means, hierarchical clustering, PCA, Apriori.
FeedbackGuided by the correct answers (error between prediction and label).No external feedback; relies on data similarity.
Use caseSpam detection, price prediction.Customer segmentation, anomaly detection.

Summary: Supervised learning trains on labelled examples to predict outputs, while unsupervised learning finds patterns in unlabelled data without predefined targets.

machine-learning
12short5 marks

Explain the concept of overfitting in machine learning.

Overfitting in Machine Learning

Overfitting occurs when a model learns the training data too well — capturing not only the underlying pattern but also the noise and random fluctuations — so it performs very well on training data but poorly on unseen (test) data. The model has low bias but high variance and fails to generalize.

Signs: training error is very low while validation/test error is high (a large gap between them).

Common causes: an overly complex model (too many parameters/features), too little training data, or training for too long.

Ways to reduce overfitting:

  • Use more training data.
  • Cross-validation to tune and validate.
  • Regularization (L1/L2) to penalize large weights.
  • Pruning (decision trees) or dropout / early stopping (neural networks).
  • Feature selection to reduce model complexity.

(The opposite problem, where a too-simple model cannot capture the pattern, is underfitting.)

machine-learning

Frequently asked questions

Where can I find the BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) question paper 2075?
The full BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2075 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Artificial Intelligence (BSc CSIT, CSC261) 2075 paper come with solutions?
Yes. Every question on this Artificial Intelligence (BSc CSIT, CSC261) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2075 paper?
The BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2075 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Artificial Intelligence (BSc CSIT, CSC261) past paper free?
Yes — reading and attempting this Artificial Intelligence (BSc CSIT, CSC261) past paper on Kekkei is completely free.