BSc CSIT (TU) Science Artificial Intelligence (BSc CSIT, CSC261) Question Paper 2079 Nepal
This is the official BSc CSIT (TU) (Science stream) Artificial Intelligence (BSc CSIT, CSC261) question paper for 2079, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Artificial Intelligence (BSc CSIT, CSC261) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) exam or solving previous years' question papers, this 2079 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
What is machine learning? Differentiate between supervised, unsupervised, and reinforcement learning with examples and discuss the perceptron learning rule.
Machine Learning
Machine Learning (ML) is a subfield of Artificial Intelligence that gives computers the ability to learn patterns from data and improve their performance on a task with experience, without being explicitly programmed for every case. Formally (Tom Mitchell): a program learns from experience E with respect to task T and performance measure P, if its performance on T, as measured by P, improves with E.
Types of Machine Learning
| Type | Training Data | Goal | Example |
|---|---|---|---|
| Supervised | Labeled (input + correct output) | Learn a mapping to predict labels | Spam detection, house-price prediction (regression/classification) |
| Unsupervised | Unlabeled (input only) | Discover hidden structure/patterns | Customer segmentation by k-means clustering, PCA |
| Reinforcement | No fixed dataset; rewards from environment | Learn a policy that maximizes cumulative reward | Game playing (AlphaGo), robot navigation |
- Supervised learning learns from labeled examples. Each training instance is a pair and the model is trained to minimize the error between predicted and true . Subtypes: classification (discrete labels) and regression (continuous output).
- Unsupervised learning has no labels; the algorithm finds structure such as clusters or lower-dimensional representations.
- Reinforcement learning has an agent that takes actions in an environment, receives rewards/penalties, and learns an optimal policy by trial and error (exploration vs. exploitation).
Perceptron Learning Rule
A perceptron is the simplest single-layer neural unit. It computes a weighted sum of inputs plus a bias and applies a step activation:
The perceptron learning rule updates weights only when the prediction is wrong, moving the decision boundary toward correct classification:
where is the learning rate, the target, and the output.
Algorithm:
1. Initialize weights w and bias b (small/zero values).
2. For each training example (x, y):
a. Compute output yhat = f(w·x + b).
b. Update: w = w + η(y - yhat)x ; b = b + η(y - yhat).
3. Repeat over epochs until no errors (or max epochs reached).
Perceptron Convergence Theorem: if the data is linearly separable, the rule is guaranteed to converge to a separating hyperplane in finite steps. Its key limitation is that a single perceptron cannot solve non-linearly-separable problems (e.g., XOR); multilayer networks are needed for those.
Explain reasoning under uncertainty. Describe Bayes' theorem and the structure and use of a Bayesian Belief Network with an example.
Reasoning Under Uncertainty
In the real world an agent rarely has complete, certain knowledge. Reasoning under uncertainty is the process of drawing rational conclusions when information is incomplete, noisy, or probabilistic. Classical (Boolean) logic fails here because it cannot express "probably" or "likely." Instead, AI uses probability theory to represent degrees of belief in propositions, allowing an agent to make the best decision given the available evidence. Sources of uncertainty include incomplete sensors, noisy data, and a non-deterministic environment.
Bayes' Theorem
Bayes' theorem relates a conditional probability to its inverse, letting us update beliefs when new evidence arrives:
where:
- = prior probability of hypothesis ,
- = likelihood of evidence given ,
- = probability of the evidence (normalizing constant),
- = posterior probability of after observing .
Example: A disease affects 1% of people (). A test is 99% accurate: and . If a person tests positive:
So even with a positive test, the probability of disease is only 50% — illustrating the importance of the prior.
Bayesian Belief Network (BBN)
A Bayesian Belief Network is a directed acyclic graph (DAG) that compactly represents the joint probability distribution over a set of random variables.
Structure:
- Nodes = random variables.
- Directed edges = direct probabilistic (causal) dependencies; an edge means influences .
- Each node stores a Conditional Probability Table (CPT) giving .
- A node is conditionally independent of its non-descendants given its parents.
The full joint distribution factorizes as:
Example (classic Burglary network): Variables Burglary (B), Earthquake (E), Alarm (A), JohnCalls (J), MaryCalls (M). Structure: , , , . The alarm depends on burglary and earthquake; whether John and Mary call depends on the alarm. Then
Use: Given evidence (e.g., John and Mary both call), the network performs probabilistic inference to compute the posterior, such as . BBNs are used for diagnosis (medical), risk analysis, and decision support because they make inference tractable by exploiting conditional independence.
Explain Natural Language Processing (NLP). Discuss the various phases of NLP and the challenges involved in natural language understanding.
Natural Language Processing (NLP)
Natural Language Processing is a branch of AI concerned with enabling computers to understand, interpret, generate, and respond to human (natural) language — text or speech. It bridges human communication and machine understanding, combining linguistics with computational methods. Applications include machine translation, chatbots, sentiment analysis, speech recognition, and text summarization. NLP is usually divided into Natural Language Understanding (NLU) — extracting meaning from input — and Natural Language Generation (NLG) — producing language as output.
Phases of NLP
NLP processing typically proceeds through the following phases:
- Lexical / Morphological Analysis — Breaks text into tokens (words) and analyzes word structure (roots, prefixes, suffixes). Example: unhappiness → un + happy + ness.
- Syntactic Analysis (Parsing) — Checks grammatical structure and arranges words into a parse tree according to grammar rules. Example: rejects "Boy the go to" as ungrammatical.
- Semantic Analysis — Derives the literal meaning of words and sentences, mapping syntactic structures to meaning and rejecting meaningless ones (e.g., "hot ice-cream").
- Discourse Integration — Interprets a sentence in the context of preceding and following sentences (e.g., resolving the referent of "it" or "he").
- Pragmatic Analysis — Interprets intended meaning based on real-world knowledge and context (e.g., "Can you pass the salt?" is a request, not a yes/no question).
Challenges in Natural Language Understanding
- Ambiguity — the central problem:
- Lexical ambiguity: a word with multiple meanings (bank = riverbank / financial).
- Syntactic ambiguity: multiple parse trees ("I saw the man with the telescope").
- Semantic/referential ambiguity: unclear meaning or pronoun reference.
- Context dependence — meaning changes with discourse and situation.
- Idioms, metaphors, sarcasm and irony — non-literal language.
- Synonymy and polysemy — many words for one concept and one word for many concepts.
- Spelling errors, slang, code-mixing and informal grammar.
- World knowledge requirement — understanding often needs common-sense knowledge not present in the text.
These challenges make robust language understanding one of the hardest problems in AI, motivating modern statistical and deep-learning approaches.
Section B: Short Answer Questions
Attempt any EIGHT questions.
Define a rational agent. List the properties of a task environment.
Rational agent: A rational agent is one that, for each possible percept sequence, selects the action that is expected to maximize its performance measure, given its built-in knowledge and the evidence provided by its percepts. In short, it "does the right thing" to achieve the best expected outcome.
Properties (dimensions) of a task environment — captured by the PEAS description and these classifications:
- Fully observable vs. Partially observable — whether sensors give complete state information.
- Deterministic vs. Stochastic — whether the next state is fully determined by the current state and action.
- Episodic vs. Sequential — whether each decision is independent or affects future ones.
- Static vs. Dynamic — whether the environment can change while the agent deliberates.
- Discrete vs. Continuous — nature of states, time, percepts, and actions.
- Single-agent vs. Multi-agent — whether other agents are involved (cooperative/competitive).
- Known vs. Unknown — whether the agent knows the rules/outcomes of the environment.
Differentiate between informed and uninformed search.
Informed vs. Uninformed Search
| Aspect | Uninformed (Blind) Search | Informed (Heuristic) Search |
|---|---|---|
| Domain knowledge | Uses no information beyond the problem definition | Uses problem-specific heuristic knowledge to guide search |
| Guidance | Explores blindly in a fixed order | Estimates how close a state is to the goal using |
| Efficiency | Generally less efficient; explores more nodes | More efficient; finds solutions faster with fewer expansions |
| Cost function | Considers only path cost so far (), if any | Uses (and often ) |
| Examples | BFS, DFS, Uniform-Cost Search, Depth-Limited, Iterative Deepening | Greedy Best-First Search, A* Search, Hill Climbing |
| Completeness/Optimality | Can be complete/optimal but expensive | Optimal/efficient when heuristic is admissible (e.g., A*) |
Summary: Uninformed search has no clue about the goal's location and searches systematically but blindly, whereas informed search uses a heuristic estimate to move toward the goal more intelligently, reducing the search space.
Explain the hill-climbing search algorithm and its problems.
Hill-Climbing Search
Hill-climbing is a local search and optimization algorithm that continually moves in the direction of increasing value (better heuristic) to reach a peak (goal/optimum). It is a greedy approach: at each step it examines the neighbors of the current state and moves to the neighbor with the best heuristic value, keeping no search tree.
Algorithm:
1. Start with an initial state (current).
2. Loop:
a. Let neighbor = highest-valued successor of current.
b. If value(neighbor) <= value(current): return current (a peak).
c. current = neighbor.
It is also called "greedy local search" because it picks the best immediate neighbor without looking ahead.
Problems (Drawbacks)
- Local maximum — a state better than all its neighbors but lower than the global maximum; the algorithm stops there and never reaches the true goal.
- Plateau (flat region) — a flat area where neighbors have equal values, giving no direction to proceed; the search wanders or halts.
- Ridge — a sequence of local maxima where movement requires going through lower states; single-step moves cannot climb it, so progress stalls.
Remedies: random-restart hill climbing, simulated annealing, and stochastic/first-choice hill climbing help escape these traps.
What is a heuristic function? Give an example.
Heuristic Function
A heuristic function, denoted , is a function that estimates the cost of the cheapest path from a given node to the goal state. It provides problem-specific guidance to an informed search algorithm so it can prefer the most promising nodes and reach the goal faster, without exploring the entire search space.
- , and .
- A heuristic is admissible if it never overestimates the true cost to the goal (this makes A* optimal).
Example — 8-puzzle: Two common heuristics estimate how far a state is from the solved configuration:
- Misplaced tiles (): the number of tiles not in their goal position.
- Manhattan distance (): the sum, over all tiles, of the horizontal + vertical distance of each tile from its goal position.
Example — route finding: The straight-line (Euclidean) distance from a city to the destination is an admissible heuristic for road navigation, since the actual road path can only be longer:
Explain forward and backward chaining in inference.
Forward and Backward Chaining
Both are inference methods used in rule-based / expert systems to derive conclusions from a knowledge base of facts and IF-THEN rules.
Forward Chaining (Data-Driven)
Starts from the known facts and applies inference rules to derive new facts, repeatedly, until the goal is reached or no new facts can be added. It works bottom-up, from data to conclusion.
- Used when many possible conclusions follow from a set of facts.
- Example: monitoring/control systems, expert systems that react to incoming data.
Backward Chaining (Goal-Driven)
Starts from the goal (hypothesis) and works backward, looking for rules whose conclusion is the goal, then trying to prove their premises as sub-goals, recursively, until they match known facts.
- Used when there is a specific goal/query to verify.
- Example: diagnostic systems, Prolog query resolution.
Comparison
| Forward Chaining | Backward Chaining |
|---|---|
| Data-driven (facts → goal) | Goal-driven (goal → facts) |
| Bottom-up reasoning | Top-down reasoning |
| Derives all possible conclusions | Proves only the required goal |
| Can be slower if many irrelevant facts | More focused/efficient for a single query |
Illustration: Rules: R1: A∧B → C, R2: C → D. Facts: A, B.
- Forward: A,B ⇒ (R1) C ⇒ (R2) D.
- Backward (prove D): D needs C (R2) → C needs A∧B (R1) → A,B are facts → D proved.
Differentiate between propositional logic and predicate logic.
Propositional Logic vs. Predicate Logic
| Aspect | Propositional Logic | Predicate (First-Order) Logic |
|---|---|---|
| Basic unit | Whole propositions (statements that are true/false) | Objects, predicates (properties/relations), and functions |
| Quantifiers | None | Uses (for all) and (there exists) |
| Expressive power | Limited; cannot express relations between objects | More powerful; expresses relations and generalizations |
| Variables | No variables | Uses variables and constants for objects |
| Structure | Atomic symbols (P, Q, R) joined by | Predicates with arguments, e.g., |
| Example | : "Socrates is a man" (a single symbol) |
Explanation: Propositional logic treats each statement as an indivisible true/false symbol and cannot break a statement into its objects and relationships. For instance, "All men are mortal" must be a single symbol — we cannot reason about individual men.
Predicate logic extends propositional logic by introducing predicates, terms (objects/variables/functions), and quantifiers. This lets us express statements like and infer from . Thus predicate logic is strictly more expressive and is the basis for richer knowledge representation in AI.
What is a semantic network? Explain with an example.
Semantic Network
A semantic network is a graphical knowledge-representation scheme that represents knowledge as a network of nodes and labeled edges (arcs/links):
- Nodes represent objects, concepts, or events.
- Edges (links) represent the relationships between them.
It expresses knowledge in the form of interconnected concepts, making relationships explicit and supporting inheritance of properties.
Common link types
- IS-A (subclass/instance): shows class hierarchy and supports property inheritance. e.g., Dog IS-A Mammal.
- HAS-A / part-of: shows possession or composition. e.g., Dog HAS-A Tail.
- Property links: attach attributes. e.g., Bird can Fly.
Example
Consider representing: "Tweety is a bird; a bird is an animal; a bird can fly."
Animal
^ IS-A
Bird ---- can ----> Fly
^ IS-A
Tweety
Here Tweety → IS-A → Bird → IS-A → Animal, and Bird → can → Fly. By inheritance, since Tweety is a Bird, we infer Tweety is an Animal and can fly.
Advantage: intuitive, visual, supports inheritance. Limitation: lacks standard semantics for quantifiers and can become ambiguous for complex logic.
Explain the frame-based knowledge representation scheme.
Frame-Based Knowledge Representation
A frame is a data structure (proposed by Marvin Minsky) for representing a stereotyped object, situation, or concept as a collection of attributes and their values. Frame-based representation organizes knowledge into frames that resemble records/objects, making it a structured and object-oriented scheme.
Structure of a frame
A frame consists of:
- Frame name — the concept/object it represents.
- Slots — the attributes/properties of the object.
- Fillers (slot values) — the values of those slots; a value can be data, a default, a range, a pointer to another frame, or an attached procedure (procedural attachment/demon).
Frame: Car
IS-A: Vehicle (inheritance from parent frame)
Wheels: 4 (default value)
Fuel: Petrol
Engine: <pointer to Engine frame>
Color: Red
Start: <procedure to start the car>
Key features
- Inheritance: a frame can inherit slots/values from a parent (super) frame via IS-A links, reducing redundancy (e.g., Car inherits properties of Vehicle).
- Default values: slots can have defaults used when specific information is missing.
- Procedural attachment: slots may hold procedures (if-needed / if-added demons) triggered when values are accessed or changed.
- Slot–filler organization makes knowledge modular and easy to update.
Advantages: organized, supports inheritance and defaults, close to human concept structuring. Limitation: lacks formal logical inference and can be inconsistent without careful design.
What is an activation function? List any two activation functions.
Activation Function
An activation function is a mathematical function applied to the weighted sum of inputs of a neuron to produce its output. Its main role is to introduce non-linearity into a neural network, enabling it to learn and represent complex, non-linear relationships (without it, even a deep network would behave like a single linear model). It also bounds/transforms the neuron's output into a usable range.
For a neuron, output , where is the activation function.
Two common activation functions
- Sigmoid:
Outputs values in the range ; useful for probabilities/binary outputs.
- ReLU (Rectified Linear Unit):
Outputs if positive, else 0; widely used in deep networks for fast, sparse activation.
(Other examples: Step/Threshold, Tanh, Softmax.)
Frequently asked questions
- Where can I find the BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) question paper 2079?
- The full BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2079 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Artificial Intelligence (BSc CSIT, CSC261) 2079 paper come with solutions?
- Yes. Every question on this Artificial Intelligence (BSc CSIT, CSC261) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2079 paper?
- The BSc CSIT (TU) Artificial Intelligence (BSc CSIT, CSC261) 2079 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Artificial Intelligence (BSc CSIT, CSC261) past paper free?
- Yes — reading and attempting this Artificial Intelligence (BSc CSIT, CSC261) past paper on Kekkei is completely free.