Data Structure Interview Questions and Answers

Find 100+ Data Structure interview questions and answers to assess candidates' skills in arrays, linked lists, stacks, queues, trees, graphs, hashing, and algorithmic problem-solving.
By
WeCP Team

As companies build scalable, high-performance software systems, recruiters must identify Data Structure experts who understand how to organize, store, and manipulate data efficiently. Strong data structure knowledge is critical for backend development, system design, algorithms, and performance optimization.

This resource, "100+ Data Structure Interview Questions and Answers," is tailored for recruiters to simplify the evaluation process. It covers a wide range of topics from fundamental data structures to advanced implementations, including trees, graphs, heaps, and hashing techniques.

Whether you're hiring Software Engineers, Backend Developers, Algorithm Engineers, or Computer Science Graduates, this guide enables you to assess a candidate’s:

  • Core Data Structure Knowledge: Arrays, linked lists, stacks, queues, hash tables, and dictionaries.
  • Advanced Skills: Binary trees, BSTs, heaps, tries, graphs, adjacency lists, dynamic arrays, and balanced trees (AVL, Red-Black).
  • Real-World Proficiency: Choosing optimal data structures for specific problems, analyzing time/space complexity, and implementing efficient algorithms under constraints.

For a streamlined assessment process, consider platforms like WeCP, which allow you to:

  • Create customized data structure assessments aligned to specific roles or difficulty levels.
  • Include hands-on coding tasks such as implementing trees, traversing graphs, or optimizing search operations.
  • Proctor exams remotely while ensuring integrity.
  • Evaluate results with AI-driven analysis for faster, more accurate decision-making.

Save time, enhance your hiring process, and confidently hire Data Structure–strong professionals who can design efficient, scalable, and high-performance applications from day one.

Data structure Interview Questions

Data structure – Beginner (1–40)

  1. What is a data structure?
  2. What are the types of data structures?
  3. What is the difference between array and linked list?
  4. What is an array?
  5. What are the advantages of arrays?
  6. What are the disadvantages of arrays?
  7. What is a linked list?
  8. What are the types of linked lists?
  9. What is a node in a linked list?
  10. How do you traverse a linked list?
  11. What is a stack?
  12. What are real-world examples of a stack?
  13. What is LIFO?
  14. What operations can be performed on a stack?
  15. What is a queue?
  16. What is FIFO?
  17. What operations can be performed on a queue?
  18. What is a circular queue?
  19. What is a priority queue?
  20. What is a deque?
  21. What is a tree?
  22. What is a binary tree?
  23. What is a binary search tree (BST)?
  24. What is tree traversal?
  25. What is inorder traversal?
  26. What is preorder traversal?
  27. What is postorder traversal?
  28. What is a graph?
  29. What is the difference between a tree and a graph?
  30. What is a hash table?
  31. What is hashing?
  32. What is a collision in hashing?
  33. What are different collision resolution techniques?
  34. What is a dynamic array?
  35. What is a time complexity?
  36. What is space complexity?
  37. What is Big O notation?
  38. What is the time complexity of searching in an array?
  39. What is the best case for linear search?
  40. What is the worst case for binary search?

Data structure – Intermediate (1–40)

  1. Explain the difference between singly linked list and doubly linked list.
  2. What is the advantage of a doubly linked list?
  3. What is a skip list?
  4. What is the difference between stack and queue?
  5. How is a stack implemented using an array?
  6. How is a stack implemented using a linked list?
  7. What are applications of queues in real life?
  8. What is a heap?
  9. What are the types of heaps?
  10. What is a min-heap?
  11. What is a max-heap?
  12. How does heapify work?
  13. What is a trie?
  14. What is a balanced binary tree?
  15. What is AVL tree?
  16. What are rotations in AVL trees?
  17. What is a Red-Black Tree?
  18. Compare AVL and Red-Black trees.
  19. What is a B-Tree?
  20. What is a B+ Tree?
  21. What is a graph traversal?
  22. What is BFS?
  23. What is DFS?
  24. What is the difference between BFS and DFS?
  25. What is a weighted graph?
  26. What is a DAG (Directed Acyclic Graph)?
  27. What is topological sorting?
  28. What is Dijkstra’s algorithm?
  29. What is a hash function?
  30. What is open addressing?
  31. What is separate chaining?
  32. What is dynamic programming?
  33. What is memoization?
  34. What is a segment tree?
  35. What is a Fenwick tree (Binary Indexed Tree)?
  36. What is amortized analysis?
  37. What is a bloom filter?
  38. What is the difference between greedy and dynamic programming?
  39. How does a reference count work in garbage collection?
  40. What is a sparse matrix?

Data structure – Experienced (1–40)

  1. Explain internal implementation of dynamic arrays (like ArrayList in Java).
  2. Explain how HashMap works internally.
  3. What is consistent hashing?
  4. What are lock-free data structures?
  5. How are concurrent queues implemented?
  6. Explain the ABA problem in lock-free data structures.
  7. What is a memory pool allocator?
  8. How does garbage collection affect data structure performance?
  9. Explain cache-friendly data structures.
  10. How does a skip list maintain balance?
  11. Explain the internals of a Red-Black Tree insertion.
  12. Explain time complexity proof for AVL balancing.
  13. How are tries optimized for memory?
  14. Explain compressed tries / radix trees.
  15. Explain rope data structure.
  16. Explain splay trees and their amortized performance.
  17. What is a treap?
  18. Explain union-find with path compression.
  19. What is Kruskal’s algorithm?
  20. What is Prim’s algorithm?
  21. What is Bellman-Ford algorithm?
  22. Compare Dijkstra vs Bellman-Ford.
  23. Explain Floyd–Warshall algorithm.
  24. How do LSM trees work in databases?
  25. What is a log-structured merge tree used for?
  26. Explain data structures used in Redis.
  27. How do bloom filters avoid false negatives?
  28. Explain HyperLogLog.
  29. What is CRDT (Conflict-Free Replicated Data Types)?
  30. What is a K-D tree?
  31. What is an R-tree?
  32. Explain quad trees and oct trees.
  33. How does a Z-order curve work?
  34. Explain B+ tree internal node and leaf node structure.
  35. Explain write amplification in LSM trees.
  36. Explain the design of a distributed hash table (DHT).
  37. What is the difference between consistent hashing and rendezvous hashing?
  38. How do you design a real-time event streaming buffer?
  39. What is a cache eviction algorithm (LRU, LFU, ARC, etc.)?
  40. Explain the difference between OLAP and OLTP data structures.

Data structure Interview Questions and Answers

Beginner (Q&A)

1. What is a data structure?

A data structure is a specialized way of organizing, storing, and managing data so that it can be accessed and modified efficiently. It defines how data is arranged in memory, how elements relate to each other, and what operations (like insertion, deletion, searching, updating) can be performed efficiently.

Data structures are fundamental to computer science because they directly influence the performance, scalability, and efficiency of software systems. For example, choosing a hash table instead of a list can reduce search operations from O(n) to O(1).

In simple terms, a data structure helps answer the question:
How should data be stored so the program can use it quickly and efficiently?

2. What are the types of data structures?

Data structures are broadly classified into two main categories:

1. Primitive Data Structures

These are basic data types provided by programming languages:

  • Integer
  • Float
  • Boolean
  • Character
  • Pointer

They form the building blocks for more complex structures.

2. Non-Primitive Data Structures

These are more advanced structures used for complex data representation.

They are further divided into:

a. Linear Data Structures

Data elements are arranged sequentially.

  • Arrays
  • Linked Lists
  • Stacks
  • Queues

They follow a specific order (one element follows another).

b. Non-Linear Data Structures

Data elements are arranged hierarchically or in networks.

  • Trees
  • Graphs
  • Tries

These structures represent relationships like parent-child or interconnected nodes.

c. Hash-Based Data Structures

Special structures using hash functions for fast access:

  • Hash Tables
  • Hash Maps

Each category is suited to different types of problems, which is why selecting the right structure is highly important in software development.

3. What is the difference between array and linked list?

Arrays and linked lists are both linear data structures but differ fundamentally in how they store and manage data.

FeatureArrayLinked ListStorageContiguous memoryNon-contiguous memorySizeFixed sizeDynamic sizeInsertion/DeletionCostly (O(n)) because shifting is requiredEfficient (O(1) at beginning/middle using pointers)Access TimeFast random access O(1)Slow sequential access O(n)Memory UtilizationMay waste memory due to fixed sizeEfficient: grows/shrinks dynamicallyComplexity of ImplementationSimpleMore complex due to pointers

In short:

  • Use arrays when fast indexing and fixed-size storage is needed.
  • Use linked lists when frequent insertions/deletions occur and memory should be used dynamically.

4. What is an array?

An array is a linear data structure that stores a collection of elements in contiguous memory locations. All elements in an array are of the same data type, and each element is identified using an index.

For example, in a 1D integer array, values are stored side-by-side in memory, and you can access any value instantly using its index number.

Key characteristics:

  • Fixed size
  • Fast random access (O(1))
  • Simple and easy to use
  • memory-friendly for storing large blocks of uniform data

Arrays are used heavily in operating systems, compilers, databases, and almost all programming situations.

5. What are the advantages of arrays?

Arrays offer several important advantages:

1. Fast Random Access

You can access any element instantly using an index, with O(1) time complexity.
This makes arrays ideal for read-heavy operations.

2. Memory Efficiency

Data is stored contiguously, reducing memory overhead and improving locality of reference.
This means better CPU cache performance.

3. Easy Iteration

Since elements are stored sequentially, loops can iterate efficiently.

4. Simple Implementation

Arrays are simple to declare, use, and understand, making them ideal for basic storage tasks.

5. Useful for Mathematical and Scientific Computations

Matrix operations, vector calculations, sorting algorithms (like quicksort or mergesort), and many data-intensive problems use arrays internally.

6. What are the disadvantages of arrays?

Despite their usefulness, arrays have several limitations:

1. Fixed Size

Once declared, the size cannot be changed.
This can lead to:

  • overflow (not enough space)
  • under-utilization (wasted memory)

2. Costly Insertions and Deletions

To insert or remove an element in the middle, other elements need to be shifted.
This takes O(n) time.

3. Wasted Memory

If you reserve large space but use only some of it, memory is wasted.

4. Cannot Grow Dynamically

Unlike linked lists, arrays cannot expand at runtime.

5. Requires Contiguous Memory

Large arrays may not be possible if memory is fragmented.

These limitations make arrays less suitable for dynamic data operations.

7. What is a linked list?

A linked list is a dynamic linear data structure where elements (called nodes) are stored in non-contiguous memory locations. Each node contains:

  • Data
  • Pointer (reference) to the next node (and sometimes to the previous node)

Linked lists allow efficient insertion and deletion of elements because only the pointers need adjustment.

Key features:

  • Dynamic size
  • Efficient insert/delete
  • No memory waste (grows as needed)
  • Slower access time since nodes must be traversed sequentially

Linked lists are widely used in memory allocation, undo mechanisms, and implementing stacks or queues.

8. What are the types of linked lists?

Linked lists come in multiple variations, each suited for different use cases:

1. Singly Linked List

Each node has:

  • Data
  • Pointer to next node

Traversal is one-directional.

2. Doubly Linked List

Each node has:

  • Data
  • Pointer to next node
  • Pointer to previous node

Supports bidirectional traversal.

3. Circular Linked List

Last node points back to the first node, forming a loop.
Can be:

  • Circular singly
  • Circular doubly

Used in round-robin scheduling, playlist looping, etc.

4. Multi-Linked Lists

Nodes contain multiple pointers for graphs, adjacency lists, or complex structures.

5. Skip Lists

Special lists with multiple levels for faster searching (O(log n)).

9. What is a node in a linked list?

A node is the fundamental building block of a linked list.
Each node contains two main parts:

  1. Data Field
    Stores the actual value (integer, string, object, etc.)
  2. Pointer/Reference Field
    Stores the address of the next node (and sometimes the previous node).

A node acts like a container that links to another node, creating a flexible and dynamic structure.
The first node is called the head, and the last node typically stores a null pointer (or head itself in circular lists).

10. How do you traverse a linked list?

Traversing a linked list means visiting each node one by one to read or process its data.

Steps to traverse a singly linked list:

  1. Start at the head node.
  2. While the current node is not NULL:
    • Process the data
    • Move to the next node using:
      current = current.next
  3. Stop when the next pointer is NULL (end of list).

Example (pseudo-code):

Node current = head;
while (current != null) {
    print(current.data);
    current = current.next;
}

Traversal in doubly linked list:

Similar process, but you can go forward using next or backward using previous pointers.

Traversal in circular linked list:

Stop when you reach the head again instead of NULL.

Traversal is slower compared to arrays because it requires pointer dereferencing and sequential access, not direct indexing.

11. What is a stack?

A stack is a linear data structure that follows the LIFO (Last In, First Out) principle. This means the most recently added element is the first one to be removed. A stack can be visualized as a vertical pile where insertion and deletion happen at only one end, called the top.

Stacks can be implemented using:

  • Arrays
  • Linked Lists
  • Dynamic data structures

Stacks are widely used in programming because they provide efficient push/pop operations with O(1) time complexity. They are essential for problems requiring reverse ordering or backtracking, such as undo mechanisms, expression evaluation, and recursion handling.

12. What are real-world examples of a stack?

Stacks are used in many real-world applications and software systems. Examples include:

1. Browser History

When you visit pages, they are pushed onto a stack. Pressing back pops the last visited page.

2. Undo/Redo Operations

Text editors like MS Word or Photoshop store actions in a stack.
Undo removes the last action (pop), redo adds it back (push).

3. Call Stack in Programming

When functions are called, they are pushed onto the call stack.
When a function returns, it is popped.
This manages recursion and nested function calls.

4. Expression Evaluation

Stacks help evaluate:

  • Infix
  • Prefix
  • Postfix expressions

5. Backtracking Algorithms

Used in:

  • Solving mazes
  • DFS (Depth First Search)
  • Solving puzzles like Sudoku

6. Plate Stacks in Real Life

Plates in a cafeteria are stacked—last plate placed is the first to be used.

Stacks are both conceptually simple and extremely powerful in system-level operations.

13. What is LIFO?

LIFO stands for Last In, First Out, which describes the order in which operations occur in a stack. The item that is added last is the first item to be removed.

How LIFO works:

  • Insert operation → Push to the top
  • Remove operation → Pop from the top

Example:

If items are pushed in the order:
A → B → C,
then the pop order will be:
C → B → A.

This behavior is useful in situations where the most recent activity needs to be handled first, such as nested function calls or undo commands.

14. What operations can be performed on a stack?

A stack supports several fundamental operations:

1. Push(x)

Adds an element x to the top of the stack.
Time complexity: O(1)

2. Pop()

Removes and returns the top element of the stack.
Time complexity: O(1)

3. Peek() / Top()

Returns the top element without removing it.
Time complexity: O(1)

4. isEmpty()

Checks whether the stack contains no elements.
Useful to prevent underflow.

5. isFull()

Checks if the stack has reached its capacity (mainly in array implementation).

6. Size()

Returns the number of elements in the stack.

These operations make stacks highly efficient and predictable for last-in-first-out tasks.

15. What is a queue?

A queue is a linear data structure that follows the FIFO (First In, First Out) principle. This means the first inserted element is the first one to be removed.

A queue has two ends:

  • Front (deletion end)
  • Rear (insertion end)

Queues are used in systems where order must be preserved, such as task scheduling, buffering, and synchronization.

Queues can be implemented using:

  • Arrays
  • Linked Lists
  • Circular arrays

They provide predictable and fair processing of elements.

16. What is FIFO?

FIFO stands for First In, First Out, meaning the element added earliest is the first one to be removed.

Example:

If elements enter a queue in this order:
A → B → C,
they will be removed in the same order:
A → B → C.

Real-world analogy:

A queue at a ticket counter — the person who arrives first is served first.

FIFO is essential in:

  • Data buffers
  • CPU scheduling
  • Print queue management
  • Network packet queues

It ensures fairness by processing tasks in the order they arrive.

17. What operations can be performed on a queue?

A queue supports several key operations:

1. Enqueue(x)

Adds an element x to the rear of the queue.
Time complexity: O(1)

2. Dequeue()

Removes and returns the element from the front of the queue.
Time complexity: O(1)

3. Peek() / Front()

Returns the front element without removing it.

4. isEmpty()

Checks whether the queue has zero elements.

5. isFull()

Checks if the queue has reached its maximum capacity (for array-based queues).

6. Size()

Returns the total number of elements currently in the queue.

Queues are essential for orderly processing of data where fairness is important.

18. What is a circular queue?

A circular queue is an improved version of the traditional queue where the last position is connected back to the first position, forming a circle. This eliminates the problem of unused space in a normal queue after several enqueue-dequeue operations.

Key features:

  • Uses array in circular fashion
  • Front and rear pointers wrap using modulo operation
  • Efficiently uses memory
  • No shifting of elements required

Advantages:

  • Prevents overflow when space is available at the front
  • Efficient for fixed-size buffer applications

Applications:

  • Circular buffers
  • CPU round-robin scheduling
  • Streaming data buffers
  • Network packet management

Circular queues ensure efficient use of space and smooth continuous data flow.

19. What is a priority queue?

A priority queue is an abstract data type where each element has a priority, and elements are removed based on highest (or lowest) priority, not based on insertion order.

Key points:

  • Highest priority element is dequeued first
  • Implemented using heaps (binary heap, Fibonacci heap, etc.)
  • Supports custom priority rules

Real-world applications:

  • CPU scheduling (highest priority job first)
  • Dijkstra’s shortest path algorithm
  • Event-driven simulations
  • Task scheduling

Unlike normal queues, ordering depends on priority rather than FIFO policy.

20. What is a deque?

A deque (Double-Ended Queue) is a linear data structure that allows insertion and deletion from both ends — front and rear.

Key characteristics:

  • Supports both stack and queue operations
  • More flexible than standard queues
  • Can be implemented using arrays or doubly linked lists

Operations:

  • insertFront()
  • insertRear()
  • deleteFront()
  • deleteRear()
  • getFront()
  • getRear()

Types of Deques:

  1. Input restricted deque – insertion at one end only
  2. Output restricted deque – deletion at one end only

Applications:

  • Implementing sliding window problems
  • Palindrome checking
  • Browser history (forward/backward navigation)
  • Job scheduling

Deque is considered one of the most powerful linear data structures due to its two-way insertion/deletion capability.

21. What is a tree?

A tree is a non-linear hierarchical data structure used to represent relationships between elements in a parent-child form. It consists of nodes connected by edges, forming a structure similar to a real-life family tree or organizational chart.

A tree has the following characteristics:

  • Root Node: The topmost node of the tree.
  • Child Nodes: Nodes directly connected below a parent node.
  • Leaf Nodes: Nodes with no children.
  • Height/Depth: Height is the longest path from root to leaf; depth is from the root to a specific node.
  • Subtrees: Smaller trees that branch off from a parent node.
  • No Cycles: Trees do not contain loops or cycles.

Trees are widely used in:

  • File systems
  • Databases (B-trees, B+ trees)
  • Compilers (syntax trees)
  • AI algorithms (decision trees)
  • Hierarchical data representation

Trees provide efficient search, insertion, and traversal operations, making them essential in almost every computing system.

22. What is a binary tree?

A binary tree is a special type of tree in which each node can have at most two children, known as:

  • Left child
  • Right child

Key properties include:

  • Every node has 0, 1, or 2 children.
  • Can be used to represent hierarchical relationships.
  • May or may not follow ordering rules (unlike BST).
  • Height-balanced variations exist (like AVL or Red-Black trees).

Types of binary trees:

  • Full Binary Tree: Every node has 0 or 2 children.
  • Complete Binary Tree: All levels are filled except possibly the last, and nodes are left-aligned.
  • Perfect Binary Tree: All internal nodes have two children, and all leaves are at the same level.
  • Skewed Binary Tree: All nodes are either left or right children, forming a chain-like structure.

Binary trees are used in:

  • Huffman coding
  • Expression trees
  • Parsing
  • Searching
  • Memory management

23. What is a binary search tree (BST)?

A Binary Search Tree (BST) is a type of binary tree that maintains a sorted structure based on the following rule:

For any node:

  • Left subtree contains values less than the node's value.
  • Right subtree contains values greater than the node's value.

This ordering allows efficient operations:

  • Search: O(log n) average
  • Insert: O(log n) average
  • Delete: O(log n) average

However, in the worst case (skewed tree), operations become O(n).

BSTs are used in:

  • Searching and lookup tables
  • Databases
  • Sets and maps implementations
  • Sorting (tree sort)

Important advantage: BST maintains sorted order while allowing dynamic updates.

24. What is tree traversal?

Tree traversal refers to the process of visiting every node in a tree in a specific sequence. Since a tree is a non-linear structure, traversal is essential to access all nodes systematically.

Tree traversal is categorized into two major types:

1. Depth-First Search (DFS) Traversals

  • Inorder
  • Preorder
  • Postorder

DFS explores a branch completely before moving to the next.

2. Breadth-First Search (BFS) Traversal

  • Level-order traversal

BFS visits nodes level by level.

Traversal is important for:

  • Expression evaluation
  • Searching
  • Serialization/deserialization of trees
  • Debugging structures

25. What is inorder traversal?

Inorder traversal is a depth-first traversal method used mainly for binary trees, especially binary search trees (BSTs).

The traversal order is:

Left subtree → Root → Right subtree

Example:

For a BST, inorder traversal gives the nodes in sorted ascending order.

Inorder Pseudocode:

inorder(node):
    if node == null:
        return
    inorder(node.left)
    visit(node)
    inorder(node.right)

Use Cases:

  • Extracting sorted values from a BST
  • Validating BST property
  • Expression parsing from syntax trees

Inorder traversal is one of the most important and frequently used tree algorithms.

26. What is preorder traversal?

Preorder traversal is a depth-first traversal technique where the root node is visited first.

The order is:

Root → Left subtree → Right subtree

Preorder Pseudocode:

preorder(node):
    if node == null:
        return
    visit(node)
    preorder(node.left)
    preorder(node.right)

Use Cases:

  • Creating a copy of a tree
  • Prefix notation of expressions
  • Constructing the tree from preorder and inorder combinations
  • Serializing/deserializing trees

Preorder is especially helpful when you need to process the root before child nodes.

27. What is postorder traversal?

Postorder traversal is a depth-first traversal method where the root is visited last.

The order is:

Left subtree → Right subtree → Root

Postorder Pseudocode:

postorder(node):
    if node == null:
        return
    postorder(node.left)
    postorder(node.right)
    visit(node)

Use Cases:

  • Deleting or freeing nodes from memory
  • Obtaining postfix expressions
  • Evaluating expression trees
  • Calculating sizes of subtrees

Postorder is helpful when operations depend on processing children before the parent.

28. What is a graph?

A graph is a non-linear data structure consisting of nodes (called vertices) and connections between them (called edges). Graphs model relationships between pairs of objects.

Graphs can be:

1. Directed (Digraph)

Edges point from one vertex to another.

2. Undirected

Edges do not have direction.

3. Weighted

Edges have weights/costs.

4. Unweighted

Edges represent simple connections.

Graphs can represent real-world networks such as:

  • Social networks (friendships)
  • Roads and route maps
  • Communication networks
  • Dependency graphs
  • Recommendation systems

Graphs are extremely powerful due to their flexibility in modeling complex relationships.

29. What is the difference between a tree and a graph?

While both trees and graphs are non-linear structures, they differ in many important ways:

FeatureTreeGraphStructureHierarchicalNetwork-likeCyclesNo cycles allowedCycles allowedEdgesExactly n–1 edges for n nodesAny number of edgesConnectivityAlways connectedMay or may not be connectedParent-Child RelationshipYesNo required relationshipTraversalSimpler (inorder, preorder, postorder)More complex (BFS, DFS)RootHas one rootNo root requiredDirectionImplicit parent-child directionCan be directed or undirected

Summary:

  • A tree is a special case of a graph with strict rules.
  • A graph is more general and flexible, suitable for complex models.

30. What is a hash table?

A hash table is a data structure that stores key-value pairs and provides very fast average-case lookup, insertion, and deletion operations — typically O(1).

It uses a hash function to compute an index (called a hash or bucket) where the value will be stored.

Key Components:

  1. Hash Function
    Converts a key into a bucket index.
    A good hash function:
    • Distributes keys uniformly
    • Minimizes collisions
  2. Buckets/Slots
    Array positions where data is stored.
  3. Collision Handling
    When two keys map to the same bucket:
    • Chaining: Store as a linked list
    • Open Addressing: Find next available slot via probing

Advantages:

  • Constant-time operations
  • Efficient for large datasets
  • Widely used in databases, compilers, caches

Applications:

  • Implementing dictionaries
  • Caches (like CPU cache or browser cache)
  • Symbol tables in compilers
  • Sets and maps (HashMap, HashSet)

Hash tables are one of the fastest and most powerful data structures for lookup-based applications.

31. What is hashing?

Hashing is a technique used to map data of any size into a fixed-size value using a mathematical function called a hash function. The output of this function is known as a hash, hash code, or index, which is used to store data in a hash table efficiently.

Hashing enables extremely fast insertion, deletion, and lookup operations, typically in O(1) average time.

Key characteristics of hashing:

  • Converts a key into an integer index.
  • Index is used to store and retrieve values in a hash table.
  • Minimizes comparisons by directly computing the storage location.
  • Essential for implementing dictionary-like structures (maps, sets).

Where hashing is used:

  • HashMap, HashSet, unordered_map
  • Password storage (cryptographic hashing)
  • File systems
  • Caches, database indexing
  • Compilers (symbol tables)

Hashing is fundamental for fast access-based systems.

32. What is a collision in hashing?

A collision occurs in hashing when two different keys produce the same hash value or map to the same index in the hash table.

Since hash functions cannot always guarantee unique outputs due to fixed table size, collisions are common.

Example:

If a hash table has size 10, and both key 25 and key 35 produce index 5:

25 → index 5  
35 → index 5  

This is a collision.

Collisions are unavoidable, so handling them properly is crucial to maintaining performance.

33. What are different collision resolution techniques?

Collision resolution techniques are methods to handle cases where two or more keys hash to the same index.

The two major categories are:

1. Open Addressing Techniques

In open addressing, if a position is occupied, the algorithm searches for another empty slot.

a. Linear Probing

Move sequentially to find the next empty slot.

Example:

index = (hash + i) % table_size
b. Quadratic Probing

Use quadratic increments to reduce clustering.

Example:

index = (hash + i²) % table_size
c. Double Hashing

Use a second hash function to compute a new step size.

Example:

index = (hash1 + i * hash2) % table_size

2. Separate Chaining (Closed Addressing)

Each index of the hash table stores a linked list (or another structure) of entries that map to the same index.

  • Easy to implement
  • No need to rehash
  • Efficient even with collisions

Modern languages like Java’s HashMap use separate chaining with self-balancing trees for large buckets.

3. Others

  • Cuckoo hashing
  • Hopscotch hashing
  • Robin Hood hashing

These advanced techniques optimize performance further.

34. What is a dynamic array?

A dynamic array is an array that can grow or shrink in size at runtime. Unlike a traditional fixed-size array, a dynamic array automatically allocates more memory when needed.

How it works:

  1. Starts with an initial capacity.
  2. When it becomes full, a new array is created (usually double the size).
  3. All existing elements are copied to the new array.
  4. Further insertions continue in the expanded space.

Common examples:

  • ArrayList in Java
  • vector in C++
  • List in Python (under the hood)

Advantages:

  • Flexible resizing
  • Fast random access (O(1))
  • Efficient amortized insertion (O(1) average)

Disadvantages:

  • Occasional expensive resizing operations
  • Wasted memory due to extra capacity
  • Insertion in middle still costs O(n)

35. What is a time complexity?

Time complexity is a theoretical measure that describes how the runtime of an algorithm increases with the size of the input (denoted as n). It helps compare algorithms based on efficiency rather than actual execution time, which may vary due to hardware or implementation.

Time complexity allows us to classify algorithms using Big O notation, for example:

  • O(1): Constant time
  • O(log n): Logarithmic
  • O(n): Linear
  • O(n log n): Linearithmic
  • O(n²): Quadratic
  • O(2ⁿ): Exponential

Why it's important:

  • Helps choose the optimal algorithm for large inputs
  • Predicts performance limits
  • Avoids slow or inefficient logic in production systems

Understanding time complexity is foundational in data structures and algorithm analysis.

36. What is space complexity?

Space complexity measures the total amount of memory an algorithm uses relative to the input size. It includes:

  1. Input space
  2. Auxiliary space (extra memory used by the algorithm)

It is expressed using Big O notation, such as:

  • O(1): Constant space
  • O(n): Linear space
  • O(n²): Quadratic space

Why it matters:

  • Helps ensure programs don’t run out of memory
  • Useful for embedded systems and environments with limited RAM
  • Enables optimizing memory usage in large data-processing applications

Example:

  • Simple variable → O(1)
  • Array of size n → O(n)
  • 2D matrix → O(n²)

Space complexity is essential for designing memory-efficient algorithms.

37. What is Big O notation?

Big O notation is a mathematical notation used to describe the upper bound on the time or space an algorithm uses as input size grows. It tells how the algorithm scales, not the exact time taken.

Big O focuses on the dominant term of growth and ignores constants.

Examples:

  • O(1): constant time
  • O(n): linear time
  • O(n²): quadratic time
  • O(log n): logarithmic
  • O(n log n): linearithmic

Why Big O matters:

  • Predicts scalability
  • Helps compare algorithms
  • Highlights performance bottlenecks
  • Essential for system optimization

Big O is the standard language for evaluating algorithmic performance.

38. What is the time complexity of searching in an array?

The time complexity of searching in an array depends on the type of search:

1. Linear Search (Unsorted Array)

  • Best case: O(1) → found at index 0
  • Average case: O(n)
  • Worst case: O(n) → element at end or not present

Used when array is unsorted.

2. Binary Search (Sorted Array)

  • Under the condition that the array is sorted
  • Best case: O(1) → found in middle
  • Average case: O(log n)
  • Worst case: O(log n)

Binary search divides the array in half each time, giving logarithmic performance.

39. What is the best case for linear search?

The best case for linear search occurs when the target element is found at the first position of the array.

Example:

Array: [5, 7, 9, 2, 4]
Search for: 5
First element matches → stop immediately.

Time complexity (best case):

  • O(1)
    Only one comparison is required.

This represents the most favorable scenario where the search finishes instantly.

40. What is the worst case for binary search?

The worst case for binary search occurs when:

  1. The element is not present in the array, or
  2. The element is present but requires maximum subdivisions to locate (at the last possible depth of the recursion/tree).

Binary search works by repeatedly dividing the array in half.

Worst-case time complexity:

  • O(log n)

Even in the worst case, binary search performs extremely well because the number of steps grows logarithmically with input size.

Example:

Searching for an element in an array of size 1024 worst-case steps:

log2(1024) = 10 steps

Thus, binary search is highly efficient compared to linear search.

Intermediate (Q&A)

1. Explain the difference between singly linked list and doubly linked list.

A singly linked list and a doubly linked list are both dynamic data structures made of nodes, but they differ in how nodes are connected and how navigation works.

Singly Linked List

  • Each node contains:
    • Data
    • Pointer to the next node
  • Links flow in one direction only.
  • You can traverse from the head to the last node, but not backward.
  • Insertion/deletion at the beginning is fast (O(1)).
  • Deleting a node from the end requires traversal (O(n)).
  • Less memory overhead because only one pointer per node.

Doubly Linked List

  • Each node contains:
    • Data
    • Pointer to next node
    • Pointer to previous node
  • Supports bidirectional traversal.
  • Easier to delete a specific node because you can move backward and forward.
  • More memory overhead (two pointers).
  • Suitable for applications requiring back-and-forth movement (browsers, editors).

Key Differences Table

FeatureSingly Linked ListDoubly Linked ListPointersOne (next)Two (next + prev)TraversalOnly forwardForward + backwardMemory usageLessMoreDeletion complexityHarder (need previous node)Easier (prev pointer available)ReversalDifficultEasier

2. What is the advantage of a doubly linked list?

A doubly linked list provides several important advantages over a singly linked list:

1. Bidirectional Traversal

You can move both forward and backward, making navigation more flexible.

2. Faster Deletion

Since each node has a prev pointer, deleting any node is easier and does not require searching for the previous node.

3. Better for Implementing Complex Data Structures

Doubly linked lists are ideal for:

  • Deques
  • Navigational applications
  • Undo/redo operations
  • LRU Cache (Least Recently Used)
  • Priority queues
  • Playlist controls

4. Efficient Insertions

Insertion at both ends (head/tail) is efficient: O(1).

5. Reversal is Simple

You can reverse a doubly linked list by simply swapping next and prev pointers without extra variables.

While they require more memory, the flexibility and operational efficiency make doubly linked lists extremely valuable.

3. What is a skip list?

A skip list is a probabilistic data structure that allows fast search, insertion, and deletion operations—similar to a balanced tree—but with a simpler, layered linked list design.

Structure:

  • Consists of multiple levels of linked lists.
  • The lowest level is a regular linked list.
  • Higher levels act as "express lanes" that skip over many nodes.
  • Each node appears in higher levels based on a random probability (commonly 50%).

Operations Complexity:

  • Search: O(log n)
  • Insert: O(log n)
  • Delete: O(log n)

The skip list balances itself probabilistically, avoiding the complexity of tree rotations.

Use Cases:

  • Databases (Redis uses skip lists)
  • Distributed systems
  • Ordered maps
  • Priority queues

A skip list provides near-balanced-tree performance with a simple and elegant design.

4. What is the difference between stack and queue?

Stacks and queues are both linear data structures but follow different ordering principles.

Stack

  • Follows LIFO (Last In, First Out).
  • Insertion and deletion only happen at the top.
  • Example: Stack of plates.

Queue

  • Follows FIFO (First In, First Out).
  • Insertion happens at the rear, deletion at the front.
  • Example: People standing in a line.

Differences Table:

FeatureStackQueueOrderLIFOFIFOInsertionPush at topEnqueue at rearDeletionPop from topDequeue from frontAccessTop element onlyFront element onlyExamplesFunction callsCPU scheduling

Both are essential in algorithms and system design but serve opposite purposes.

5. How is a stack implemented using an array?

A stack implemented using an array is one of the simplest stack implementations.

Steps:

  1. Create an array of fixed size.
  2. Maintain a variable top, initialized to -1 (empty state).
  3. Push operation:
    • Increment top → top++
    • Assign value → arr[top] = value
  4. Pop operation:
    • Return arr[top]
    • Decrement top → top--

Key Operations Complexity:

  • Push: O(1)
  • Pop: O(1)
  • Peek: O(1)

Advantages:

  • Simple and fast
  • Efficient memory locality
  • No overhead of dynamic memory allocation

Disadvantages:

  • Fixed size (no dynamic growth)
  • Stack overflow possible if array fills up

Array-based stacks are commonly used where maximum size is known beforehand.

6. How is a stack implemented using a linked list?

A stack implemented using a linked list provides a dynamic and flexible alternative to array-based stacks.

How it works:

  • The head of the linked list is used as the top of the stack.
  • No need for shifting elements.
  • Dynamic memory allocation ensures unlimited (until system memory limits) growth.

Push operation:

  1. Create a new node.
  2. Set its next pointer to the current head.
  3. Update head to new node.

Complexity: O(1)

Pop operation:

  1. Return data from head.
  2. Move head to head.next.
  3. Delete the removed node.

Complexity: O(1)

Advantages:

  • No size limitation
  • Efficient memory usage
  • No overflow unless system runs out of memory

Disadvantages:

  • Extra memory for pointers
  • Less cache-friendly
  • Slightly slower due to dynamic allocation

Linked-list-based stacks are used when stack size is unpredictable.

7. What are applications of queues in real life?

Queues are used in many real-world and computing scenarios where order and fairness matter.

Real-world Applications:

  • Waiting lines (tickets, banks, supermarkets)
  • Call center customer queues
  • Printing tasks sent to a printer
  • Traffic queues

Computer Science Applications:

  • CPU scheduling (Round Robin)
  • Disk scheduling
  • Breadth-First Search (BFS) in graphs
  • Asynchronous data transfer (IO Buffers)
  • Handling requests in web servers
  • Network packet routing
  • OS task scheduling
  • Message queues (RabbitMQ, Kafka)

Queues ensure order and fairness in processing operations.

8. What is a heap?

A heap is a specialized tree-based data structure that satisfies the heap property:

  • In a max-heap, the parent is always greater than or equal to its children.
  • In a min-heap, the parent is always less than or equal to its children.

Heaps are typically implemented as complete binary trees, meaning:

  • All levels are fully filled except possibly the last.
  • Nodes fill from left to right.

Heaps are commonly implemented using arrays, not linked structures, because:

  • Parent-child relationships can be computed mathematically (no pointers needed).

Operations:

  • Insert: O(log n)
  • Delete (root): O(log n)
  • Peek: O(1)
  • Heapify: O(n)

Applications:

  • Priority queues
  • Scheduling algorithms
  • Dijkstra’s shortest path
  • Heap sort
  • Event-driven simulation

9. What are the types of heaps?

The major types of heaps include:

1. Min-Heap

  • Root contains the minimum element.
  • Used in priority queues where smallest element is needed first.

2. Max-Heap

  • Root contains the maximum element.
  • Useful for extracting maximum values.

3. Binomial Heap

  • Collection of binomial trees.
  • Supports fast merging operations.

4. Fibonacci Heap

  • Specialized heap used in advanced algorithms.
  • Very fast decrease-key operation.
  • Used in Dijkstra and Prim algorithms for best performance.

5. Binary Heap

  • Most common heap implementation.
  • Complete binary tree.

Heaps vary in performance characteristics based on required operations.

10. What is a min-heap?

A min-heap is a type of binary heap where the smallest element is always at the root. Every parent node is less than or equal to its child nodes.

Properties of Min-Heap:

  1. Heap Property:
    A[parent] ≤ A[leftChild]
    A[parent] ≤ A[rightChild]
  2. Complete Binary Tree:
    All levels filled except possibly last, filled left to right.
  3. Array Representation:
    • Root at index 0
    • Left child = 2*i + 1
    • Right child = 2*i + 2

Operations:

  • Insert: Add at end → bubble up → O(log n)
  • Extract-min: Remove root → swap with last → heapify → O(log n)
  • Peek: O(1)

Applications:

  • Priority queues
  • Shortest path algorithms (Dijkstra)
  • Event scheduling
  • Merging K sorted lists

Min-heaps are essential for efficient retrieval of the smallest prioritized element.

11. What is a max-heap?

A max-heap is a specialized binary tree-based data structure where the largest value is always stored at the root node. It is a complete binary tree, meaning all levels are fully filled except possibly the last one, which is filled from left to right.

Properties of Max-Heap:

  1. Heap Property (Max Order Law):
    Every parent node’s value is greater than or equal to the values of its children.
A[parent] ≥ A[leftChild]  
A[parent] ≥ A[rightChild]
  1. Shape Property (Complete Tree):
    All levels are filled except possibly the last.
  2. Array Representation:
    Max-heaps are usually implemented using arrays, where:
    • Root is at index 0.
    • Left child at 2*i + 1
    • Right child at 2*i + 2
    • Parent at (i - 1) / 2

Operations:

  • Insert: O(log n)
  • Extract-max: Remove root & heapify → O(log n)
  • Peek (get max): O(1)

Applications:

  • Implementing max-priority queues
  • Selection algorithms
  • Sorting using heap sort
  • Bandwidth management
  • Job scheduling

A max-heap guarantees that the highest-priority element is always accessible instantly.

12. How does heapify work?

Heapify is a process used to ensure that a binary tree satisfies the heap property (min-heap or max-heap). It is essential for maintaining heap structure after insertion, deletion, or building the heap.

Heapify can work in two ways:

1. Max-Heapify

Ensures the subtree rooted at a given index i obeys max-heap property.

Steps:

  1. Compare the parent at index i with its left and right children.
  2. Identify the largest value among the three.
  3. If the parent is not the largest:
    • Swap parent with the largest child.
  4. Recursively call heapify on the affected child index.

Time Complexity:

  • Height of heap = log n → O(log n)

2. Min-Heapify

Same logic but ensures the parent is the smallest.

Building a heap using heapify:

  • Start from the last non-leaf node and heapify each node upward to the root.
  • Complexity: O(n) (surprisingly efficient)

Use Cases:

  • Restoring heap after deletion
  • Constructing heap from unsorted array
  • Part of heap sort algorithm

Heapify is the core operation that makes heaps efficient and powerful.

13. What is a trie?

A trie, also known as a prefix tree or digital tree, is a specialized tree-based data structure used for storing and searching strings efficiently.

Key Characteristics:

  • Each node represents a character of a string.
  • Paths from root to leaf represent complete words.
  • Shared prefixes use shared nodes, making it memory-efficient for large word sets.
  • Each node typically stores:
    • Children pointers (often 26 for lowercase letters)
    • Flag to indicate end of word

Performance:

  • Search: O(m)
  • Insert: O(m)
  • Delete: O(m)
    Where m is the length of the word.

Advantages of Trie:

  • Faster lookup than hash tables in prefix-based searches.
  • Avoids collisions (unlike hashing).
  • Supports auto-complete and spell-check easily.

Applications:

  • Autocomplete (Google search suggestions)
  • Spell-checkers
  • IP routing tables
  • Dictionary implementations
  • Prefix matching problems

A trie is ideal wherever prefix-based operations are required.

14. What is a balanced binary tree?

A balanced binary tree is a binary tree in which the height difference between the left and right subtrees of any node is bounded, usually by 1.
The goal is to keep operations (search, insert, delete) efficient—typically O(log n) time.

Characteristics:

  • Ensures tree height stays minimal.
  • Avoids worst-case scenarios of skewed trees (O(n)).
  • Maintains structure through rotations or balancing.

Examples of Balanced Trees:

  • AVL Tree
  • Red-Black Tree
  • Splay Tree
  • Treaps
  • B-Trees

Balanced trees guarantee better performance in dynamic datasets compared to unbalanced trees.

15. What is AVL tree?

An AVL Tree is a self-balancing binary search tree where the height difference (balance factor) between left and right subtrees of any node is at most 1.

Balance Factor:

Balance Factor = Height(left subtree) – Height(right subtree)

Valid BF values:

  • -1
  • 0
  • +1

If the balance factor becomes less than -1 or greater than 1, the tree performs a rotation to restore balance.

Operations:

  • Insert: O(log n)
  • Delete: O(log n)
  • Search: O(log n)

Use Cases:

  • Databases
  • Memory indexing
  • Applications requiring frequent lookups

AVL trees are stricter and provide faster lookups compared to most self-balancing trees.

16. What are rotations in AVL trees?

Rotations are operations performed on AVL trees to restore balance when insertion or deletion causes imbalance.

There are four types of rotations:

1. Right Rotation (LL Rotation)

Used when a node becomes unbalanced because:

  • Insertion happened in the left subtree of its left child.

Fix: Rotate right around the unbalanced node.

2. Left Rotation (RR Rotation)

Used when insertion happens in:

  • The right subtree of the right child.

Fix: Rotate left.

3. Left-Right Rotation (LR Rotation)

Used when insertion happens in:

  • The right subtree of the left child.

Fix:

  1. Rotate left
  2. Rotate right

4. Right-Left Rotation (RL Rotation)

Used when insertion happens in:

  • The left subtree of the right child.

Fix:

  1. Rotate right
  2. Rotate left

Why rotations are important:

  • Restore AVL balance factor
  • Maintain O(log n) height
  • Keep search, insert, delete efficient

Rotations are the backbone of AVL tree balancing.

17. What is a Red-Black Tree?

A Red-Black Tree is a self-balancing binary search tree that maintains balance using color properties instead of strict height balancing like AVL.

Properties:

  1. Each node is either red or black.
  2. Root is always black.
  3. Red nodes cannot have red children (no double-red rule).
  4. Every path from a node to a leaf contains the same number of black nodes (black-height).
  5. Null leaves are considered black.

Performance:

  • Search: O(log n)
  • Insert: O(log n)
  • Delete: O(log n)

Red-black trees provide near-balanced height, making operations efficient while requiring fewer rotations than AVL.

Use Cases:

  • C++ STL map and set
  • Java TreeMap
  • Linux kernel
  • File systems

Red-black trees offer excellent performance for dynamic datasets.

18. Compare AVL and Red-Black trees.

FeatureAVL TreeRed-Black TreeBalancing TypeStrict height-balancedLoosely balancedBalance FactorMust be -1, 0, +1Based on coloring rulesRotationsMore frequentFewer rotationsSearch SpeedFaster (more balanced)Slightly slowerInsertion SpeedSlower (may require multiple rotations)FasterDeletion SpeedSlowerFasterHeightAlways minimalSlightly higherUse CasesSearch-intensive appsInsert/delete-heavy apps

Summary:

  • AVL is better for read-heavy operations (fast searching).
  • Red-black is better for write-heavy operations (fewer rotations).

19. What is a B-Tree?

A B-Tree is a self-balancing multi-way search tree commonly used in databases and file systems. Unlike binary trees, B-trees allow multiple keys per node and can have many children.

Characteristics:

  • Each node has multiple keys (not just one).
  • Children count ranges from ceil(m/2) to m for order m.
  • All leaves are at the same level.
  • Designed for efficient disk reads/writes.

Performance:

  • Search: O(log n)
  • Insert: O(log n)
  • Delete: O(log n)

Applications:

  • Databases (MySQL, PostgreSQL)
  • File systems (NTFS, HFS+)
  • Indexing very large datasets

B-Trees reduce disk access time, making them the backbone of modern storage systems.

20. What is a B+ Tree?

A B+ Tree is an enhanced form of B-Tree where all values/data are stored only in the leaf nodes, and internal nodes store only keys to guide the search.

Characteristics:

  • Leaf nodes form a linked list, enabling fast sequential access.
  • All actual records exist at the leaf level.
  • Internal nodes act as routing indexes.
  • Supports efficient range queries.

Differences from B-Tree:

FeatureB-TreeB+ TreeData StorageInternal + leaf nodesLeaf nodes onlyTraversalMedium speedVery fast using leaf linksRange QueriesHarderHighly efficientNode SizeSmallerLarger index nodes

Applications:

  • Databases
  • File indexing
  • Range searches
  • Disk-based data structures

Because of its high I/O efficiency and fast range queries, B+ trees are the most widely used indexing structure in modern databases.

21. What is a graph traversal?

Graph traversal is the process of visiting every vertex and edge in a graph in a systematic manner. Like tree traversal, graph traversal ensures that all reachable nodes from a starting point are explored.

Because graphs may contain cycles and may not have hierarchical connections like trees, traversal requires careful handling to avoid infinite loops (e.g., using a visited set).

Goals of Graph Traversal

  • Explore relationships between nodes
  • Identify connected components
  • Find shortest paths
  • Detect cycles
  • Solve network-related problems

Major Graph Traversal Techniques

  1. Breadth-First Search (BFS)
  2. Depth-First Search (DFS)

These traversals work for both directed and undirected graphs.

Graph traversal forms the backbone of many algorithms, including search engines, routing, pathfinding, and recommendation systems.

22. What is BFS?

BFS (Breadth-First Search) is a graph traversal method that explores nodes level by level, starting from a given source node.

How BFS Works

  1. Start from a source node.
  2. Visit all neighbors first (1st level).
  3. Then visit neighbors of neighbors (2nd level).
  4. Continue until all reachable nodes are visited.

Data Structure Used

  • A queue is used to manage the order of exploration.

Time Complexity

  • O(V + E)
    Where:
  • V = number of vertices
  • E = number of edges

Applications of BFS

  • Finding the shortest path in unweighted graphs
  • Finding connected components
  • Solving puzzles like 8-puzzle or Rubik’s cube
  • Web crawlers
  • Social network friend suggestions
  • Network broadcasting

BFS is ideal when level-order or shortest-path-based exploration is needed.

23. What is DFS?

DFS (Depth-First Search) is a traversal method that explores as far as possible along a branch before backtracking.

How DFS Works

  1. Start at a node.
  2. Go deep into one branch (child → grandchild → deeper).
  3. Backtrack when no more nodes exist.
  4. Explore next branch.

Data Structures Used

  • Stack (explicit)
    or
  • Recursion (implicit stack)

Time Complexity

  • O(V + E)

Applications of DFS

  • Topological sorting
  • Detecting cycles
  • Finding strongly connected components (Kosaraju’s algorithm)
  • Solving mazes and puzzles
  • Pathfinding in AI
  • Generating spanning trees

DFS excels in exploring deep structures and detecting patterns or cycles within graphs.

24. What is the difference between BFS and DFS?

FeatureBFSDFSStrategyLevel-by-levelDeep, then backtrackData StructureQueueStack / RecursionPath FoundShortest path in unweighted graphNot guaranteed shortestMemory UseHigher (stores whole level)Lower (depends on recursion depth)Ideal ForShortest path, level orderCycle detection, deep explorationTraversal StyleLayeredDepth-first

Example Use Cases

  • BFS: Social networks, shortest path
  • DFS: Maze solving, topological sort, cycle detection

Both BFS and DFS are essential, but used in different types of graph-based problems.

25. What is a weighted graph?

A weighted graph is a graph in which each edge carries a numerical value called a weight or cost.

Properties

  • Can be directed or undirected
  • Edge weights may represent:
    • Distance
    • Time
    • Cost
    • Capacity
    • Probability

Examples

  • Road networks (distance between cities)
  • Airline routes (cost or duration)
  • Network bandwidth capacities
  • Risk modeling (probability weights)

Used In

  • Shortest path algorithms (Dijkstra, Bellman-Ford)
  • Minimum spanning tree (Kruskal, Prim)
  • Optimization problems

Weighted graphs enable solving real-life problems that involve measurable values.

26. What is a DAG (Directed Acyclic Graph)?

A Directed Acyclic Graph (DAG) is a directed graph with no cycles — meaning you cannot start at a node and return to it following directed edges.

Properties

  • Edges have direction
  • No cycles
  • Has a partial order
  • Often used to represent dependencies

Use Cases

  • Task scheduling (with dependencies)
  • Build systems (e.g., make)
  • Version control systems (Git commit graph)
  • Data pipelines (Apache Airflow, DAGs)
  • Expression evaluation

DAGs help represent systems where certain tasks must occur before others.

27. What is topological sorting?

Topological sorting is an ordering of the vertices of a DAG such that for every directed edge (u → v), u appears before v in the ordering.

Important Facts

  • Applicable only to DAGs
  • A graph with a cycle cannot have a topological order

Algorithms

  1. Kahn's Algorithm (BFS-based)
  2. DFS-based algorithm

Applications

  • Task scheduling (course prerequisites)
  • Resolving symbol dependencies in compilers
  • Build ordering in large software systems
  • Job scheduling with constraints

Topological sort ensures tasks are arranged in an order that respects dependency rules.

28. What is Dijkstra’s algorithm?

Dijkstra’s algorithm is a famous algorithm used to find the shortest path from a source node to all other nodes in a graph with non-negative edge weights.

How it Works

  1. Start with distance = 0 for the source, infinity for others.
  2. Use a priority queue (min-heap) to pick the node with the smallest distance.
  3. Update distances of its neighbors.
  4. Repeat until all nodes are processed.

Time Complexity

  • Using a binary heap: O((V + E) log V)

Applications

  • GPS navigation
  • Network routing
  • AI pathfinding (games, robotics)
  • Internet routing algorithms
  • Telecom network planning

Dijkstra’s algorithm is one of the most widely used algorithms in computer science for shortest path problems.

29. What is a hash function?

A hash function is a function that takes an input (key) and produces a fixed-size integer output called a hash or hash code.

Characteristics of a Good Hash Function

  • Uniform distribution
  • Fast computation
  • Low collision probability
  • Deterministic (same key → same hash)

Used In

  • Hash tables
  • Cryptographic hashing (SHA, MD5)
  • Digital signatures
  • Password storage
  • File comparisons
  • Load balancing (consistent hashing)
  • Caches

A hash function lies at the core of all hashing-based data structures and algorithms.

30. What is open addressing?

Open addressing is a collision resolution technique used in hash tables when two keys hash to the same index.

Instead of storing multiple items in a bucket (like chaining), open addressing tries to find another empty slot within the table.

Common Strategies

  1. Linear Probing
index = (hash + i) % table_size
  • Simple, but causes primary clustering.
  • Quadratic Probing
  • index = (hash + i²) % table_size
    
  • Reduces clustering.
  • Double Hashing
    Uses a second hash function:
  • index = (hash1 + i * hash2) % table_size
    
    1. Most effective for minimizing collisions.

    Advantages

    • No extra memory for lists
    • Better cache performance

    Disadvantages

    • Table can fill up
    • More sensitive to load factor
    • Requires careful probing strategy

    Open addressing keeps all keys directly inside the hash table, making it efficient and cache-friendly.

    31. What is an R-tree?

    An R-tree is a height-balanced tree data structure used for indexing multidimensional spatial data such as rectangles, polygons, geographic coordinates, and 2D/3D objects.

    It is widely used in GIS systems, spatial databases, and collision detection.

    Key Characteristics of R-Tree:

    • Each node represents a bounding rectangle that spatially contains all rectangles of its children.
    • Internal nodes store Minimum Bounding Rectangles (MBRs).
    • Leaf nodes store actual geometric objects (rectangles, polygons, etc.)
    • Height-balanced like B-Trees.

    Performance:

    • Search: O(log n) average
    • Insert: O(log n) average
    • Delete: O(log n) average

    Use Cases:

    • Google Maps spatial index
    • Spatial queries (intersection, nearest-neighbor)
    • CAD systems
    • GIS applications
    • Gaming and physics engines

    R-trees excel at handling range queries, overlapping regions, and geospatial indexing.

    32. Explain quad trees and oct trees.

    Quad trees and oct trees are spatial data structures used for partitioning 2D and 3D spaces.

    Quad Trees (2D Space Partitioning)

    A quad tree recursively divides a 2D plane into four quadrants:

    • NE (North-East)
    • NW (North-West)
    • SE (South-East)
    • SW (South-West)

    Characteristics:

    • Each node has exactly 4 children.
    • Used for:
      • Image processing
      • GIS
      • Terrain modeling
      • Region queries
      • Collision detection

    Applications:

    • Storing sparse matrices
    • Video compression (dividing image regions)
    • Game engine world partitioning

    Oct Trees (3D Space Partitioning)

    An oct tree divides 3D space into 8 octants at each recursive level.

    Uses:

    • 3D graphics
    • Physics engines
    • 3D collision detection
    • Volume rendering
    • Spatial indexing for 3D worlds

    Quad trees scale the idea of binary space partitioning into 2D, while oct trees extend it to 3D environments.

    33. How does a Z-order curve work?

    A Z-order curve (Morton order) is a space-filling curve that maps multidimensional data (2D, 3D) into one-dimensional space while preserving locality.

    How It Works:

    • Bit-interleave x and y coordinates to form a single value.
    • This ensures points that are close in 2D remain close when linearized.

    Example (Interleaving bits):

    x = 1011
    y = 1100
    Z-order = 1 1 0 1 1 0 0 0
               x y x y x y x y
    

    Benefits:

    • Preserves locality better than simple row-major order.
    • Used in:
      • Databases
      • Spatial indexing
      • Caches
      • GPU memory layouts

    Applications:

    • R-trees optimization
    • Geohash encoding (used by Redis, MongoDB)
    • Efficient range queries

    Z-order curves offer an efficient way to map multidimensional data into a 1D index for faster storage and querying.

    34. Explain B+ tree internal node and leaf node structure.

    A B+ tree is a multi-level index tree where all data is stored in leaf nodes, and internal nodes store only keys for navigation.

    Internal Nodes Structure

    • Contain only keys, not data.
    • Keys act as separators to guide search.
    • Pointers point to:
      • Other internal nodes
      • Or leaf nodes (at last level)
    • Allows high fan-out (large number of children)
      → reduces tree height
    • Supports efficient disk-based indexing.

    Leaf Nodes Structure

    • Store actual data values (records or pointers).
    • Contain key-value pairs (or key-pointer pairs).
    • All leaf nodes are linked using a doubly linked list:
      • Enables efficient range queries
      • Allows sequential scans (ORDER BY operations)

    Benefits:

    • Excellent for disk reads/writes
    • Fast range queries
    • Balanced tree ensuring O(log n) access
    • Used in databases and file systems

    A B+ tree’s separation of index nodes and data nodes makes it ideal for large-scale persistent storage indexing.

    35. Explain write amplification in LSM trees.

    Write amplification refers to the phenomenon where the system ends up writing more data to disk than what the application originally intended.

    This is a significant concern in LSM (Log-Structured Merge) trees, used in write-heavy databases like LevelDB, RocksDB, Cassandra.

    How Write Amplification Occurs in LSM Trees:

    1. Data is first written to in-memory structure (MemTable).
    2. MemTable is flushed to disk as an SSTable.
    3. Multiple SSTables accumulate over time.
    4. Background compaction merges SSTables into larger ones.
    5. Data gets rewritten multiple times during compaction.

    Thus, a single user write may result in multiple disk writes, increasing write I/O.

    Impacts:

    • Higher SSD wear
    • Increased latency
    • More CPU usage during compaction

    Mitigation Strategies:

    • Tiered compaction
    • Reducing compaction frequency
    • Configurable compaction policies
    • Using larger SSTable sizes
    • Bloom filters to reduce overlapping lookups

    Write amplification is a trade-off for LSM trees’ fast sequential writes and high throughput.

    36. Explain the design of a distributed hash table (DHT).

    A Distributed Hash Table (DHT) is a decentralized distributed system that provides hash table functionality across many machines, without a central coordinator.

    Core Design Concepts:

    1. Hashing Keys
      • A hash function assigns both nodes and data keys a position in a circular address space.
    2. Consistent Hashing
      • Ensures minimal data movement when nodes join or leave.
    3. Ring Structure
      • Nodes form a logical ring.
      • Each node responsible for a specific key range.
    4. Routing Tables
      • Used to find the node responsible for a key.
      • Supports O(log n) routing.
    5. Redundancy & Replication
      • Data replicated across multiple nodes for fault tolerance.
    6. Self-organizing
      • Nodes can join or leave dynamically.
      • DHT automatically redistributes data.

    Popular DHT Implementations:

    • Chord
    • Kademlia (used in BitTorrent, Ethereum)
    • Pastry
    • CAN
    • Tapestry

    Applications:

    • P2P networks
    • Blockchain peer discovery
    • Distributed file systems
    • Cloud-scale storage

    A DHT provides scalable, fault-tolerant hashing across thousands of nodes.

    37. What is the difference between consistent hashing and rendezvous hashing?

    Consistent Hashing

    • Maps nodes and keys onto a circular ring.
    • Minimizes data movement when nodes join or leave.
    • Used in:
      • Cassandra
      • Riak
      • Memcached clusters

    Key Characteristics:

    • Requires virtual nodes for balanced distribution.
    • Data placement depends on ring position.

    Rendezvous Hashing (HRW Hashing)

    • Each node is assigned a hash score for a key.
    • The node with the highest score wins and stores the key.
    • No ring needed.

    Key Characteristics:

    • Simpler implementation
    • Perfect load balance without virtual nodes
    • Very good for dynamic node changes

    Used in:

    • Ceph storage
    • Distributed load balancers

    Differences Table

    AspectConsistent HashingRendezvous HashingStructureRingNo structureLoad BalancingRequires virtual nodesAutomaticComplexityHigherLowerWhen nodes changeMoves small % of keysMoves minimal keysUse CasesCaches, key-value storesLoad balancing, storage

    Rendezvous hashing is simpler and often more balanced, but consistent hashing is easier to scale massively.

    38. How do you design a real-time event streaming buffer?

    A real-time event streaming buffer is designed to ingest, store, forward, and process continuous streams of data with low latency.

    Key Design Components:

    1. Circular Buffer or Ring Buffer

    • Stores events in FIFO order.
    • Old data overwritten when buffer is full.
    • Used for real-time streaming with fixed memory.

    2. Message Queue / Log Structure

    • Append-only log (Kafka-style).
    • Supports partitions for high throughput.
    • Persistent and fault-tolerant.

    3. Backpressure Handling

    • Slow consumers can’t block producers.
    • Techniques:
      • Drop oldest events
      • Drop newest events
      • Pause producers
      • Scale consumers

    4. Partitioning & Sharding

    • Events partitioned by key (user ID, device ID).
    • Ensures ordering guarantees per partition.

    5. Batching & Compression

    • Improves throughput.
    • Reduces I/O overhead.

    6. Consumer Offsets

    • Track read position for each consumer.
    • Allows fault tolerance and replay.

    Technologies Used:

    • Kafka
    • Pulsar
    • Redis Streams
    • Kinesis
    • RabbitMQ

    A good event streaming buffer ensures low latency, durability, scalability, and high throughput.

    39. What is a cache eviction algorithm (LRU, LFU, ARC, etc.)?

    A cache eviction algorithm determines which item to remove from the cache when space is full.

    1. LRU (Least Recently Used)

    • Removes the item that has not been used for the longest time.
    • Good when recent access predicts future access.

    2. LFU (Least Frequently Used)

    • Removes item with the least number of accesses.
    • Good for repeated-use workloads.

    3. FIFO (First In, First Out)

    • Removes oldest inserted item.
    • Useful for simple queue-based caches.

    4. ARC (Adaptive Replacement Cache)

    • More advanced.
    • Balances between LRU and LFU dynamically.
    • Used in:
      • ZFS file system
      • High-performance storage systems

    5. MRU (Most Recently Used)

    • Removes most recently used item.
    • Useful in certain stack-like access patterns.

    Applications of Eviction Policies

    • Databases
    • Operating system page caches
    • CDN edge caches
    • CPU caches
    • Browser caching

    These algorithms ensure the cache remains fresh, relevant, and within capacity.

    WeCP Team
    Team @WeCP
    WeCP is a leading talent assessment platform that helps companies streamline their recruitment and L&D process by evaluating candidates' skills through tailored assessments