Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

What is machine learning? Differentiate between supervised, unsupervised, and reinforcement learning with examples and discuss the perceptron learning rule.

Machine Learning

Machine Learning (ML) is a subfield of Artificial Intelligence that gives computers the ability to learn patterns from data and improve their performance on a task with experience, without being explicitly programmed for every case. Formally (Tom Mitchell): a program learns from experience E with respect to task T and performance measure P, if its performance on T, as measured by P, improves with E.

Types of Machine Learning

TypeTraining DataGoalExample
SupervisedLabeled (input + correct output)Learn a mapping xyx \to y to predict labelsSpam detection, house-price prediction (regression/classification)
UnsupervisedUnlabeled (input only)Discover hidden structure/patternsCustomer segmentation by k-means clustering, PCA
ReinforcementNo fixed dataset; rewards from environmentLearn a policy that maximizes cumulative rewardGame playing (AlphaGo), robot navigation
  • Supervised learning learns from labeled examples. Each training instance is a pair (x,y)(x, y) and the model is trained to minimize the error between predicted y^\hat{y} and true yy. Subtypes: classification (discrete labels) and regression (continuous output).
  • Unsupervised learning has no labels; the algorithm finds structure such as clusters or lower-dimensional representations.
  • Reinforcement learning has an agent that takes actions in an environment, receives rewards/penalties, and learns an optimal policy by trial and error (exploration vs. exploitation).

Perceptron Learning Rule

A perceptron is the simplest single-layer neural unit. It computes a weighted sum of inputs plus a bias and applies a step activation:

y^=f ⁣(i=1nwixi+b),f(z)={1z00z<0\hat{y} = f\!\left(\sum_{i=1}^{n} w_i x_i + b\right), \quad f(z)=\begin{cases}1 & z \ge 0\\ 0 & z < 0\end{cases}

The perceptron learning rule updates weights only when the prediction is wrong, moving the decision boundary toward correct classification:

wiwi+η(yy^)xi,bb+η(yy^)w_i \leftarrow w_i + \eta\,(y - \hat{y})\,x_i, \qquad b \leftarrow b + \eta\,(y - \hat{y})

where η\eta is the learning rate, yy the target, and y^\hat{y} the output.

Algorithm:

1. Initialize weights w and bias b (small/zero values).
2. For each training example (x, y):
     a. Compute output yhat = f(w·x + b).
     b. Update: w = w + η(y - yhat)x ;  b = b + η(y - yhat).
3. Repeat over epochs until no errors (or max epochs reached).

Perceptron Convergence Theorem: if the data is linearly separable, the rule is guaranteed to converge to a separating hyperplane in finite steps. Its key limitation is that a single perceptron cannot solve non-linearly-separable problems (e.g., XOR); multilayer networks are needed for those.

machine-learningperceptron
2long10 marks

Explain reasoning under uncertainty. Describe Bayes' theorem and the structure and use of a Bayesian Belief Network with an example.

Reasoning Under Uncertainty

In the real world an agent rarely has complete, certain knowledge. Reasoning under uncertainty is the process of drawing rational conclusions when information is incomplete, noisy, or probabilistic. Classical (Boolean) logic fails here because it cannot express "probably" or "likely." Instead, AI uses probability theory to represent degrees of belief in propositions, allowing an agent to make the best decision given the available evidence. Sources of uncertainty include incomplete sensors, noisy data, and a non-deterministic environment.

Bayes' Theorem

Bayes' theorem relates a conditional probability to its inverse, letting us update beliefs when new evidence arrives:

P(HE)=P(EH)P(H)P(E)P(H\mid E) = \frac{P(E\mid H)\,P(H)}{P(E)}

where:

  • P(H)P(H) = prior probability of hypothesis HH,
  • P(EH)P(E\mid H) = likelihood of evidence EE given HH,
  • P(E)P(E) = probability of the evidence (normalizing constant),
  • P(HE)P(H\mid E) = posterior probability of HH after observing EE.

Example: A disease affects 1% of people (P(D)=0.01P(D)=0.01). A test is 99% accurate: P(+D)=0.99P(+\mid D)=0.99 and P(+¬D)=0.01P(+\mid \neg D)=0.01. If a person tests positive:

P(D+)=0.99×0.010.99×0.01+0.01×0.99=0.00990.0198=0.5P(D\mid +) = \frac{0.99\times0.01}{0.99\times0.01 + 0.01\times0.99} = \frac{0.0099}{0.0198} = 0.5

So even with a positive test, the probability of disease is only 50% — illustrating the importance of the prior.

Bayesian Belief Network (BBN)

A Bayesian Belief Network is a directed acyclic graph (DAG) that compactly represents the joint probability distribution over a set of random variables.

Structure:

  • Nodes = random variables.
  • Directed edges = direct probabilistic (causal) dependencies; an edge ABA \to B means AA influences BB.
  • Each node stores a Conditional Probability Table (CPT) giving P(nodeparents)P(\text{node}\mid \text{parents}).
  • A node is conditionally independent of its non-descendants given its parents.

The full joint distribution factorizes as:

P(X1,,Xn)=i=1nP(XiParents(Xi))P(X_1,\dots,X_n) = \prod_{i=1}^{n} P\big(X_i \mid \text{Parents}(X_i)\big)

Example (classic Burglary network): Variables Burglary (B), Earthquake (E), Alarm (A), JohnCalls (J), MaryCalls (M). Structure: BAB \to A, EAE \to A, AJA \to J, AMA \to M. The alarm depends on burglary and earthquake; whether John and Mary call depends on the alarm. Then

P(B,E,A,J,M)=P(B)P(E)P(AB,E)P(JA)P(MA).P(B,E,A,J,M) = P(B)\,P(E)\,P(A\mid B,E)\,P(J\mid A)\,P(M\mid A).

Use: Given evidence (e.g., John and Mary both call), the network performs probabilistic inference to compute the posterior, such as P(BurglaryJ=true,M=true)P(\text{Burglary}\mid J=\text{true}, M=\text{true}). BBNs are used for diagnosis (medical), risk analysis, and decision support because they make inference tractable by exploiting conditional independence.

uncertaintybayesian-network
3long10 marks

Explain Natural Language Processing (NLP). Discuss the various phases of NLP and the challenges involved in natural language understanding.

Natural Language Processing (NLP)

Natural Language Processing is a branch of AI concerned with enabling computers to understand, interpret, generate, and respond to human (natural) language — text or speech. It bridges human communication and machine understanding, combining linguistics with computational methods. Applications include machine translation, chatbots, sentiment analysis, speech recognition, and text summarization. NLP is usually divided into Natural Language Understanding (NLU) — extracting meaning from input — and Natural Language Generation (NLG) — producing language as output.

Phases of NLP

NLP processing typically proceeds through the following phases:

  1. Lexical / Morphological Analysis — Breaks text into tokens (words) and analyzes word structure (roots, prefixes, suffixes). Example: unhappinessun + happy + ness.
  2. Syntactic Analysis (Parsing) — Checks grammatical structure and arranges words into a parse tree according to grammar rules. Example: rejects "Boy the go to" as ungrammatical.
  3. Semantic Analysis — Derives the literal meaning of words and sentences, mapping syntactic structures to meaning and rejecting meaningless ones (e.g., "hot ice-cream").
  4. Discourse Integration — Interprets a sentence in the context of preceding and following sentences (e.g., resolving the referent of "it" or "he").
  5. Pragmatic Analysis — Interprets intended meaning based on real-world knowledge and context (e.g., "Can you pass the salt?" is a request, not a yes/no question).

Challenges in Natural Language Understanding

  • Ambiguity — the central problem:
    • Lexical ambiguity: a word with multiple meanings (bank = riverbank / financial).
    • Syntactic ambiguity: multiple parse trees ("I saw the man with the telescope").
    • Semantic/referential ambiguity: unclear meaning or pronoun reference.
  • Context dependence — meaning changes with discourse and situation.
  • Idioms, metaphors, sarcasm and irony — non-literal language.
  • Synonymy and polysemy — many words for one concept and one word for many concepts.
  • Spelling errors, slang, code-mixing and informal grammar.
  • World knowledge requirement — understanding often needs common-sense knowledge not present in the text.

These challenges make robust language understanding one of the hardest problems in AI, motivating modern statistical and deep-learning approaches.

nlplanguage-processing
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Define a rational agent. List the properties of a task environment.

Rational agent: A rational agent is one that, for each possible percept sequence, selects the action that is expected to maximize its performance measure, given its built-in knowledge and the evidence provided by its percepts. In short, it "does the right thing" to achieve the best expected outcome.

Properties (dimensions) of a task environment — captured by the PEAS description and these classifications:

  • Fully observable vs. Partially observable — whether sensors give complete state information.
  • Deterministic vs. Stochastic — whether the next state is fully determined by the current state and action.
  • Episodic vs. Sequential — whether each decision is independent or affects future ones.
  • Static vs. Dynamic — whether the environment can change while the agent deliberates.
  • Discrete vs. Continuous — nature of states, time, percepts, and actions.
  • Single-agent vs. Multi-agent — whether other agents are involved (cooperative/competitive).
  • Known vs. Unknown — whether the agent knows the rules/outcomes of the environment.
intelligent-agents
5short5 marks

Differentiate between informed and uninformed search.

Informed vs. Uninformed Search

AspectUninformed (Blind) SearchInformed (Heuristic) Search
Domain knowledgeUses no information beyond the problem definitionUses problem-specific heuristic knowledge to guide search
GuidanceExplores blindly in a fixed orderEstimates how close a state is to the goal using h(n)h(n)
EfficiencyGenerally less efficient; explores more nodesMore efficient; finds solutions faster with fewer expansions
Cost functionConsiders only path cost so far (g(n)g(n)), if anyUses h(n)h(n) (and often f(n)=g(n)+h(n)f(n)=g(n)+h(n))
ExamplesBFS, DFS, Uniform-Cost Search, Depth-Limited, Iterative DeepeningGreedy Best-First Search, A* Search, Hill Climbing
Completeness/OptimalityCan be complete/optimal but expensiveOptimal/efficient when heuristic is admissible (e.g., A*)

Summary: Uninformed search has no clue about the goal's location and searches systematically but blindly, whereas informed search uses a heuristic estimate to move toward the goal more intelligently, reducing the search space.

search
6short5 marks

Explain the hill-climbing search algorithm and its problems.

Hill-Climbing Search

Hill-climbing is a local search and optimization algorithm that continually moves in the direction of increasing value (better heuristic) to reach a peak (goal/optimum). It is a greedy approach: at each step it examines the neighbors of the current state and moves to the neighbor with the best heuristic value, keeping no search tree.

Algorithm:

1. Start with an initial state (current).
2. Loop:
     a. Let neighbor = highest-valued successor of current.
     b. If value(neighbor) <= value(current): return current  (a peak).
     c. current = neighbor.

It is also called "greedy local search" because it picks the best immediate neighbor without looking ahead.

Problems (Drawbacks)

  1. Local maximum — a state better than all its neighbors but lower than the global maximum; the algorithm stops there and never reaches the true goal.
  2. Plateau (flat region) — a flat area where neighbors have equal values, giving no direction to proceed; the search wanders or halts.
  3. Ridge — a sequence of local maxima where movement requires going through lower states; single-step moves cannot climb it, so progress stalls.

Remedies: random-restart hill climbing, simulated annealing, and stochastic/first-choice hill climbing help escape these traps.

searchhill-climbing
7short5 marks

What is a heuristic function? Give an example.

Heuristic Function

A heuristic function, denoted h(n)h(n), is a function that estimates the cost of the cheapest path from a given node nn to the goal state. It provides problem-specific guidance to an informed search algorithm so it can prefer the most promising nodes and reach the goal faster, without exploring the entire search space.

  • h(n)0h(n) \ge 0, and h(goal)=0h(\text{goal}) = 0.
  • A heuristic is admissible if it never overestimates the true cost to the goal (this makes A* optimal).

Example — 8-puzzle: Two common heuristics estimate how far a state is from the solved configuration:

  • Misplaced tiles (h1h_1): the number of tiles not in their goal position.
  • Manhattan distance (h2h_2): the sum, over all tiles, of the horizontal + vertical distance of each tile from its goal position.

Example — route finding: The straight-line (Euclidean) distance from a city to the destination is an admissible heuristic for road navigation, since the actual road path can only be longer:

h(n)=(xnxgoal)2+(ynygoal)2.h(n) = \sqrt{(x_n - x_{goal})^2 + (y_n - y_{goal})^2}.
heuristicsearch
8short5 marks

Explain forward and backward chaining in inference.

Forward and Backward Chaining

Both are inference methods used in rule-based / expert systems to derive conclusions from a knowledge base of facts and IF-THEN rules.

Forward Chaining (Data-Driven)

Starts from the known facts and applies inference rules to derive new facts, repeatedly, until the goal is reached or no new facts can be added. It works bottom-up, from data to conclusion.

  • Used when many possible conclusions follow from a set of facts.
  • Example: monitoring/control systems, expert systems that react to incoming data.

Backward Chaining (Goal-Driven)

Starts from the goal (hypothesis) and works backward, looking for rules whose conclusion is the goal, then trying to prove their premises as sub-goals, recursively, until they match known facts.

  • Used when there is a specific goal/query to verify.
  • Example: diagnostic systems, Prolog query resolution.

Comparison

Forward ChainingBackward Chaining
Data-driven (facts → goal)Goal-driven (goal → facts)
Bottom-up reasoningTop-down reasoning
Derives all possible conclusionsProves only the required goal
Can be slower if many irrelevant factsMore focused/efficient for a single query

Illustration: Rules: R1: A∧B → C, R2: C → D. Facts: A, B.

  • Forward: A,B ⇒ (R1) C ⇒ (R2) D.
  • Backward (prove D): D needs C (R2) → C needs A∧B (R1) → A,B are facts → D proved.
inferencereasoning
9short5 marks

Differentiate between propositional logic and predicate logic.

Propositional Logic vs. Predicate Logic

AspectPropositional LogicPredicate (First-Order) Logic
Basic unitWhole propositions (statements that are true/false)Objects, predicates (properties/relations), and functions
QuantifiersNoneUses \forall (for all) and \exists (there exists)
Expressive powerLimited; cannot express relations between objectsMore powerful; expresses relations and generalizations
VariablesNo variablesUses variables and constants for objects
StructureAtomic symbols (P, Q, R) joined by ,,¬,\land,\lor,\neg,\toPredicates with arguments, e.g., Likes(x,y)Likes(x,y)
ExamplePP: "Socrates is a man" (a single symbol)x[Man(x)Mortal(x)]\forall x\,[Man(x) \to Mortal(x)]

Explanation: Propositional logic treats each statement as an indivisible true/false symbol and cannot break a statement into its objects and relationships. For instance, "All men are mortal" must be a single symbol PP — we cannot reason about individual men.

Predicate logic extends propositional logic by introducing predicates, terms (objects/variables/functions), and quantifiers. This lets us express statements like x(Man(x)Mortal(x))\forall x\,(Man(x) \to Mortal(x)) and infer Mortal(Socrates)Mortal(\text{Socrates}) from Man(Socrates)Man(\text{Socrates}). Thus predicate logic is strictly more expressive and is the basis for richer knowledge representation in AI.

logicknowledge-representation
10short5 marks

What is a semantic network? Explain with an example.

Semantic Network

A semantic network is a graphical knowledge-representation scheme that represents knowledge as a network of nodes and labeled edges (arcs/links):

  • Nodes represent objects, concepts, or events.
  • Edges (links) represent the relationships between them.

It expresses knowledge in the form of interconnected concepts, making relationships explicit and supporting inheritance of properties.

Common link types

  • IS-A (subclass/instance): shows class hierarchy and supports property inheritance. e.g., Dog IS-A Mammal.
  • HAS-A / part-of: shows possession or composition. e.g., Dog HAS-A Tail.
  • Property links: attach attributes. e.g., Bird can Fly.

Example

Consider representing: "Tweety is a bird; a bird is an animal; a bird can fly."

   Animal
     ^ IS-A
   Bird ---- can ----> Fly
     ^ IS-A
  Tweety

Here Tweety → IS-A → Bird → IS-A → Animal, and Bird → can → Fly. By inheritance, since Tweety is a Bird, we infer Tweety is an Animal and can fly.

Advantage: intuitive, visual, supports inheritance. Limitation: lacks standard semantics for quantifiers and can become ambiguous for complex logic.

knowledge-representationsemantic-network
11short5 marks

Explain the frame-based knowledge representation scheme.

Frame-Based Knowledge Representation

A frame is a data structure (proposed by Marvin Minsky) for representing a stereotyped object, situation, or concept as a collection of attributes and their values. Frame-based representation organizes knowledge into frames that resemble records/objects, making it a structured and object-oriented scheme.

Structure of a frame

A frame consists of:

  • Frame name — the concept/object it represents.
  • Slots — the attributes/properties of the object.
  • Fillers (slot values) — the values of those slots; a value can be data, a default, a range, a pointer to another frame, or an attached procedure (procedural attachment/demon).
Frame: Car
  IS-A:        Vehicle        (inheritance from parent frame)
  Wheels:      4              (default value)
  Fuel:        Petrol
  Engine:      <pointer to Engine frame>
  Color:       Red
  Start:       <procedure to start the car>

Key features

  • Inheritance: a frame can inherit slots/values from a parent (super) frame via IS-A links, reducing redundancy (e.g., Car inherits properties of Vehicle).
  • Default values: slots can have defaults used when specific information is missing.
  • Procedural attachment: slots may hold procedures (if-needed / if-added demons) triggered when values are accessed or changed.
  • Slot–filler organization makes knowledge modular and easy to update.

Advantages: organized, supports inheritance and defaults, close to human concept structuring. Limitation: lacks formal logical inference and can be inconsistent without careful design.

knowledge-representationframes
12short5 marks

What is an activation function? List any two activation functions.

Activation Function

An activation function is a mathematical function applied to the weighted sum of inputs of a neuron to produce its output. Its main role is to introduce non-linearity into a neural network, enabling it to learn and represent complex, non-linear relationships (without it, even a deep network would behave like a single linear model). It also bounds/transforms the neuron's output into a usable range.

For a neuron, output =f ⁣(iwixi+b)= f\!\left(\sum_i w_i x_i + b\right), where ff is the activation function.

Two common activation functions

  1. Sigmoid:
σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}

Outputs values in the range (0,1)(0, 1); useful for probabilities/binary outputs.

  1. ReLU (Rectified Linear Unit):
f(z)=max(0,z)f(z) = \max(0, z)

Outputs zz if positive, else 0; widely used in deep networks for fast, sparse activation.

(Other examples: Step/Threshold, Tanh, Softmax.)

neural-network

Frequently asked questions

Where can I find the BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) question paper 2079?
The full BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2079 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Artificial Intelligence (BSc CSIT, CSC261) 2079 paper come with solutions?
Yes. Every question on this Artificial Intelligence (BSc CSIT, CSC261) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2079 paper?
The BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2079 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Artificial Intelligence (BSc CSIT, CSC261) past paper free?
Yes — reading and attempting this Artificial Intelligence (BSc CSIT, CSC261) past paper on Kekkei is completely free.