BSc CSIT (TU) Science Artificial Intelligence (BSc CSIT, CSC261) Question Paper 2075 Nepal

Q: Where can I find the BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) question paper 2075?

The full BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2075 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Artificial Intelligence (BSc CSIT, CSC261) 2075 paper come with solutions?

Yes. Every question on this Artificial Intelligence (BSc CSIT, CSC261) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2075 paper?

The BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2075 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Artificial Intelligence (BSc CSIT, CSC261) past paper free?

Yes — reading and attempting this Artificial Intelligence (BSc CSIT, CSC261) past paper on Kekkei is completely free.

Question

1Long answer10 marks

Explain knowledge representation using predicate logic. Convert given English sentences into First-Order Predicate Logic and explain the resolution method with an example.

knowledge-representationpredicate-logicresolution

Answer 1

Knowledge Representation using Predicate Logic

Knowledge representation (KR) is the way facts about the world are encoded in a form a computer can use to reason and derive new facts. Predicate (First-Order) Logic is a powerful KR scheme that extends propositional logic with predicates, variables, functions and quantifiers, letting us express relationships between objects and general statements.

Its building blocks are:

Constants (objects): John, Nepal
Variables: x, y
Predicates (relations): Likes(x, y), Man(x)
Functions: father(x)
Connectives: $\land, \lor, \lnot, \rightarrow$
Quantifiers: universal $\forall$ and existential $\exists$

Converting English to First-Order Predicate Logic

English sentence	FOPL
All men are mortal.	$\forall x\,[\,Man(x) \rightarrow Mortal(x)\,]$
Socrates is a man.	$Man(Socrates)$
Some students are intelligent.	$\exists x\,[\,Student(x) \land Intelligent(x)\,]$
Everyone who loves all animals is loved by someone.	$\forall x\,[\,(\forall y\,(Animal(y)\rightarrow Loves(x,y))) \rightarrow \exists z\,Loves(z,x)\,]$

Resolution Method

Resolution is a sound and complete inference rule used to prove a goal by refutation: assume the negation of the goal, add it to the knowledge base, and derive a contradiction (the empty clause $\square$ ).

Steps:

Convert all sentences to Conjunctive Normal Form (CNF) — eliminate $\rightarrow$ , move $\lnot$ inward, standardize variables, Skolemize existentials, drop $\forall$ , distribute $\lor$ over $\land$ into clauses.
Negate the goal and add it to the clause set.
Repeatedly apply the resolution rule with unification: from $(A \lor P)$ and $(\lnot P' \lor B)$ where $P$ unifies with $P'$ under MGU $\theta$ , derive the resolvent $(A \lor B)\theta$ .
If the empty clause is derived, the original goal is proven.

Example — Prove Mortal(Socrates):

Knowledge base in CNF:

$\lnot Man(x) \lor Mortal(x)$ (from "All men are mortal")
$Man(Socrates)$

Negated goal: 3. $\lnot Mortal(Socrates)$

Resolution:

Resolve (1) and (3) with $\theta = \{x/Socrates\}$ $\Rightarrow$ $\lnot Man(Socrates)$ (clause 4)
Resolve (2) and (4) $\Rightarrow$ $\square$ (empty clause)

A contradiction is reached, so $Mortal(Socrates)$ is proved.

Answer 2

Biological Neuron vs Artificial Neuron

An artificial neural network (ANN) is loosely modelled on the biological neuron. The correspondence is:

Biological neuron	Artificial neuron (ANN)
Dendrites receive signals from other neurons	Inputs $x_1, x_2, \dots, x_n$
Synapse strength controls signal influence	Weights $w_1, w_2, \dots, w_n$
Cell body (soma) sums incoming signals	Summation $net = \sum_i w_i x_i + b$
Axon transmits the output if the neuron fires	Activation function output $y = f(net)$
Firing threshold	Bias / threshold $b$

Thus each artificial neuron computes $y = f\!\left(\sum_i w_i x_i + b\right)$ , where $f$ is an activation function such as the sigmoid $f(net)=\dfrac{1}{1+e^{-net}}$ .

Multi-Layer ANN

A multi-layer perceptron (MLP) has:

an input layer (one node per feature),
one or more hidden layers of neurons,
an output layer.

Layers are fully connected feed-forward; signals flow input $\rightarrow$ hidden $\rightarrow$ output. (Diagram in words: inputs $x_1,x_2$ feed every hidden neuron $h_1,h_2$ ; the hidden neurons feed the output neuron $o_1$ , each connection carrying a weight.)

Back-Propagation Learning Algorithm

Back-propagation trains the network by gradient descent, minimising the error $E = \frac{1}{2}\sum (t_k - o_k)^2$ where $t_k$ is the target and $o_k$ the actual output.

Steps:

Initialize all weights to small random values.
Forward pass: present an input, compute outputs layer by layer using $y=f(net)$ .
Compute output error: for each output neuron $k$ ,

\delta_k = (t_k - o_k)\,f'(net_k) = (t_k - o_k)\,o_k(1-o_k)

Back-propagate error to hidden neurons: for each hidden neuron $j$ ,

\delta_j = f'(net_j)\sum_k \delta_k w_{jk} = o_j(1-o_j)\sum_k \delta_k w_{jk}

Update weights using learning rate $\eta$ :

w_{ij} \leftarrow w_{ij} + \eta\,\delta_j\,x_i

Repeat for all training examples over many epochs until the error is acceptably small.

This lets the error signal propagate backward from output to input, adjusting every weight in the direction that reduces total error.

Answer 3

Adversarial Search

Adversarial search is search in a competitive environment where two or more agents have opposing goals — typically two-player, zero-sum, perfect-information games (e.g. chess, tic-tac-toe). One player (MAX) tries to maximise the score while the opponent (MIN) tries to minimise it. The problem is modelled as a game tree: nodes are states, edges are moves, and terminal nodes have a utility value.

Minimax Algorithm

Minimax computes the optimal move assuming the opponent also plays optimally.

At MAX nodes, take the maximum of the children's values.
At MIN nodes, take the minimum of the children's values.
At terminal nodes, use the utility value.

function MINIMAX(node, isMax):
    if node is terminal: return utility(node)
    if isMax:
        best = -inf
        for child in node: best = max(best, MINIMAX(child, false))
        return best
    else:
        best = +inf
        for child in node: best = min(best, MINIMAX(child, true))
        return best

It performs a complete depth-first exploration; time complexity $O(b^m)$ , space $O(bm)$ , for branching factor $b$ and depth $m$ .

Alpha-Beta Pruning

Alpha-beta pruning returns the same result as minimax but prunes branches that cannot affect the final decision, using two bounds:

$\alpha$ = best (highest) value found so far for MAX,
$\beta$ = best (lowest) value found so far for MIN.

Prune (stop exploring) whenever $\alpha \ge \beta$ . With good move ordering it reduces complexity to about $O(b^{m/2})$ , roughly doubling the searchable depth.

Example Game Tree

MAX root with two MIN children. Leaf values left to right:

Left MIN node children: $3, 12, 8$ $\Rightarrow$ MIN value $= 3$ .
Right MIN node children: $2, 4, 6$ .

At the root, $\alpha = 3$ after the left subtree. Exploring the right MIN node, its first child is $2$ , so its value $\le 2 < \alpha = 3$ . Since the root (MAX) already has $3$ , the remaining children $4, 6$ of the right node need not be examined — they are pruned. Root value $= 3$ , and MAX chooses the left branch.

Answer 4

Hill-Climbing Search

Hill climbing is a local search / optimization algorithm that starts from an arbitrary solution and iteratively moves to a neighbouring state with a higher (better) heuristic value, stopping when no neighbour is better. It is a greedy, memory-efficient method that keeps only the current state.

current = initial_state
loop:
    neighbor = highest-valued successor of current
    if value(neighbor) <= value(current): return current
    current = neighbor

Problems of Hill Climbing

Local maximum: a peak that is higher than its neighbours but lower than the global maximum; the algorithm stops there.
Plateau: a flat region where all neighbours have the same value, giving no direction to move.
Ridge: a sequence of high points where each single move leads downhill, even though progress is possible along the ridge.

These can be reduced with random-restart hill climbing, simulated annealing, or stochastic variants.

Answer 5

Heuristic Function

A heuristic function $h(n)$ is a function that estimates the cost of the cheapest path from a node $n$ to the goal. It uses problem-specific knowledge to guide search toward the goal faster, without guaranteeing the estimate is exact. A heuristic is admissible if it never overestimates the true cost, which makes algorithms like A* optimal.

In A*, the evaluation function is:

f(n) = g(n) + h(n)

where $g(n)$ is the actual cost from start to $n$ and $h(n)$ is the heuristic estimate to the goal.

Example

In a route-finding problem (e.g. finding a road path between two cities), the straight-line (Euclidean) distance from the current city to the destination is an admissible heuristic, since the real road distance can never be shorter than the straight line. For the 8-puzzle, the number of misplaced tiles or the Manhattan distance of tiles from their goal positions are common heuristics.

Answer 6

Forward and Backward Chaining

Both are inference techniques used in rule-based (production) systems with rules of the form IF condition THEN conclusion.

Forward Chaining (data-driven)

Starts from the known facts and repeatedly applies rules whose conditions are satisfied, adding new facts, until the goal is derived or no more rules fire. It works from data toward conclusions.

Example: Facts: Rainy. Rules: IF Rainy THEN Cloudy, IF Cloudy THEN CarryUmbrella. Forward chaining derives Cloudy, then CarryUmbrella.
Use: monitoring, expert systems that react to incoming data.

Backward Chaining (goal-driven)

Starts from the goal (hypothesis) and works backward, looking for rules whose conclusion is the goal, then trying to prove each of their conditions as sub-goals, recursively, until they reduce to known facts.

Example: Goal: CarryUmbrella? $\rightarrow$ needs Cloudy $\rightarrow$ needs Rainy $\rightarrow$ Rainy is a known fact, so the goal succeeds.
Use: diagnostic and query systems (e.g. Prolog).

Aspect	Forward chaining	Backward chaining
Direction	Facts $\rightarrow$ goal	Goal $\rightarrow$ facts
Approach	Data-driven	Goal-driven
Best when	Few facts, many goals	Specific goal to prove

Answer 7

Propositional Logic vs Predicate Logic

Aspect	Propositional Logic	Predicate (First-Order) Logic
Basic unit	A whole statement (proposition) that is true or false, e.g. $P$ = "It is raining".	Objects, predicates and relations among them, e.g. $Raining(today)$ .
Quantifiers	None.	Has $\forall$ (universal) and $\exists$ (existential).
Variables / objects	Cannot represent individual objects or variables.	Can represent objects, variables, functions.
Expressive power	Limited; cannot express general statements about all/some objects.	Much richer; can express "All men are mortal".
Example	$P \rightarrow Q$	$\forall x\,[Man(x) \rightarrow Mortal(x)]$
Structure inside statement	Treats statements as indivisible atoms.	Breaks statements into predicates + arguments.

Summary: Propositional logic deals only with whole true/false facts and connectives ( $\land,\lor,\lnot,\rightarrow$ ), whereas predicate logic adds quantifiers, predicates, variables and functions, making it far more expressive for representing general knowledge.

Answer 8

Semantic Network

A semantic network is a graphical (network) knowledge-representation scheme in which knowledge is represented as a graph of nodes and labelled edges (links).

Nodes represent objects, concepts or events.
Links (arcs) represent relationships between them.

The two most important relationships are:

IS-A (subclass / class membership) — supports inheritance of properties.
HAS-A / part-of and other property links.

Properties defined for a general class are inherited by its members, which makes storage compact.

Example

Consider the facts: A Sparrow is a Bird; a Bird is an Animal; a Bird can Fly; a Sparrow has colour Brown.

In words, the network is:

Animal
  ^ IS-A
Bird ---- can ----> Fly
  ^ IS-A
Sparrow ---- colour ----> Brown

Here Sparrow --IS-A--> Bird --IS-A--> Animal. Because Bird can Fly, the node Sparrow inherits the property that it can fly without storing it explicitly. This inheritance and the clear visual structure are the main advantages of semantic networks.

Answer 9

Frame-Based Knowledge Representation

A frame is a data structure that represents a stereotyped object, situation or concept by grouping together all knowledge about it. It is an object-oriented style of KR proposed by Marvin Minsky.

A frame consists of:

A frame name (the concept it describes).
A set of slots (attributes / properties).
Slot values (fillers), which may be specific values, default values, or even procedures (demons) such as if-needed and if-added that compute values when required.

Frames are organised into a hierarchy linked by IS-A/instance relations, so a frame inherits slots and default values from its parent frames — reducing redundancy.

Example

Frame: Car
  IS-A:        Vehicle
  Wheels:      4            (default)
  Fuel:        Petrol       (default)
  Engine:      <required>

Frame: MyCar
  INSTANCE-OF: Car
  Owner:       Ram
  Colour:      Red
  Wheels:      (inherited = 4)

MyCar inherits Wheels = 4 and Fuel = Petrol from the Car frame. Frames thus combine declarative knowledge (slots) with procedural attachments and inheritance, making them suitable for representing structured, real-world objects.

Answer 10

Activation Function

An activation function is a function applied to the weighted sum of a neuron's inputs ( $net = \sum_i w_i x_i + b$ ) to produce the neuron's output. It introduces non-linearity, allowing the neural network to learn complex, non-linear mappings; it also typically squashes the output into a bounded range and decides whether/how strongly the neuron "fires".

Two Common Activation Functions

Sigmoid (logistic): $\;f(x) = \dfrac{1}{1 + e^{-x}}\;$ — outputs in $(0,1)$ , smooth and differentiable.
ReLU (Rectified Linear Unit): $\;f(x) = \max(0, x)\;$ — outputs $0$ for negatives, $x$ otherwise; fast and avoids vanishing gradients.

(Other examples: tanh, step/threshold, softmax.)

Answer 11

Supervised vs Unsupervised Learning

Aspect	Supervised Learning	Unsupervised Learning
Training data	Labelled — each input has a known output/target.	Unlabelled — only inputs, no target outputs.
Goal	Learn a mapping input $\rightarrow$ output to predict labels for new data.	Discover hidden structure, patterns or groupings in data.
Tasks	Classification, regression.	Clustering, association, dimensionality reduction.
Examples	Linear/logistic regression, decision trees, SVM, k-NN, neural networks.	k-means, hierarchical clustering, PCA, Apriori.
Feedback	Guided by the correct answers (error between prediction and label).	No external feedback; relies on data similarity.
Use case	Spam detection, price prediction.	Customer segmentation, anomaly detection.

Summary: Supervised learning trains on labelled examples to predict outputs, while unsupervised learning finds patterns in unlabelled data without predefined targets.

Answer 12

Overfitting in Machine Learning

Overfitting occurs when a model learns the training data too well — capturing not only the underlying pattern but also the noise and random fluctuations — so it performs very well on training data but poorly on unseen (test) data. The model has low bias but high variance and fails to generalize.

Signs: training error is very low while validation/test error is high (a large gap between them).

Common causes: an overly complex model (too many parameters/features), too little training data, or training for too long.

Ways to reduce overfitting:

Use more training data.
Cross-validation to tune and validate.
Regularization (L1/L2) to penalize large weights.
Pruning (decision trees) or dropout / early stopping (neural networks).
Feature selection to reduce model complexity.

(The opposite problem, where a too-simple model cannot capture the pattern, is underfitting.)

Level	BSc CSIT (TU)
Stream	Science
Subject	Artificial Intelligence (BSc CSIT, CSC261)
Year	2075 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

BSc CSIT (TU) Science Artificial Intelligence (BSc CSIT, CSC261) Question Paper 2075 Nepal

Section A: Long Answer Questions

Knowledge Representation using Predicate Logic

Converting English to First-Order Predicate Logic

Resolution Method

Biological Neuron vs Artificial Neuron

Multi-Layer ANN

Back-Propagation Learning Algorithm

Adversarial Search

Minimax Algorithm

Alpha-Beta Pruning

Example Game Tree

Section B: Short Answer Questions

Hill-Climbing Search

Problems of Hill Climbing

Heuristic Function

Example

Forward and Backward Chaining

Forward Chaining (data-driven)

Backward Chaining (goal-driven)

Propositional Logic vs Predicate Logic

Semantic Network

Example

Frame-Based Knowledge Representation

Example

Activation Function

Two Common Activation Functions

Supervised vs Unsupervised Learning

Overfitting in Machine Learning

Frequently asked questions