As enterprises manage massive-scale analytical workloads and mission-critical data warehouses, recruiters must identify Teradata professionals who can design, optimize, and maintain high-performance analytical systems. Teradata is widely used in banking, telecom, retail, and large enterprises for complex analytics and large-volume data processing.
This resource, "100+ Teradata Interview Questions and Answers," is tailored for recruiters to simplify the evaluation process. It covers a wide range of topics—from Teradata fundamentals to advanced data warehousing and performance optimization, including Teradata SQL, indexing, and workload management.
Whether you're hiring Teradata Developers, Data Warehouse Engineers, BI Engineers, or Data Analysts, this guide enables you to assess a candidate’s:
- Core Teradata Knowledge: Teradata architecture, nodes, AMPs, parsing engine, data distribution, primary indexes, and basic Teradata SQL.
- Advanced Skills: Join strategies, secondary and join indexes, partitioned primary indexes (PPI), statistics collection, query optimization, and workload management (TASM).
- Real-World Proficiency: Designing scalable data models, optimizing complex analytical queries, managing large tables, and ensuring high performance in enterprise data warehouse environments.
For a streamlined assessment process, consider platforms like WeCP, which allow you to:
- Create customized Teradata assessments tailored to enterprise data warehousing and analytics roles.
- Include hands-on tasks such as writing Teradata SQL queries, optimizing execution plans, or designing data distribution strategies.
- Proctor exams remotely while ensuring integrity.
- Evaluate results with AI-driven analysis for faster, more accurate decision-making.
Save time, enhance your hiring process, and confidently hire Teradata professionals who can build and maintain scalable, high-performance enterprise analytics platforms from day one.
Teradata Interview Questions
Teradata – Beginner (1–40)
- What is Teradata and why is it used for large-scale data warehousing?
- Explain the Teradata architecture.
- What is a Node in Teradata?
- What is a Parsing Engine (PE)?
- What is an AMP (Access Module Processor)?
- Explain BYNET and its role.
- What is a Vproc in Teradata?
- What is a Clique in Teradata?
- What is a Hot Standby Node?
- Explain the concept of Data Distribution in Teradata.
- What is a Primary Index?
- Difference between Primary Key and Primary Index.
- Explain Unique Primary Index vs Non-Unique Primary Index.
- What is a Secondary Index?
- What is a Join Index?
- What is a Partitioned Primary Index?
- What is Skew and how does it impact performance?
- What is Teradata SQL Assistant?
- Explain Teradata View.
- What is a Teradata Macro?
- What is a Stored Procedure in Teradata?
- Explain FastLoad.
- Explain MultiLoad.
- What is the difference between FastLoad and MultiLoad?
- What is BTEQ?
- What is TPump?
- Explain EXPORT and IMPORT utility.
- What is TPT (Teradata Parallel Transporter)?
- What is fallback protection in Teradata?
- What is a Cylinder?
- What is a Data Block?
- What is a Hash Map?
- What is a Spool Space?
- What is a Permanent Space?
- What is a Temporary Space?
- Explain Teradata’s Parallel Processing.
- What is an Explain Plan?
- What is Collect Statistics?
- What is Hash Join?
- What is a Teradata Database User?
Teradata – Intermediate (1–40)
- Explain the PE steps when a query is submitted.
- What is the importance of Row Hash and Hash Bucket?
- How does Teradata distribute data during INSERT?
- How does Teradata handle Duplicate Row checking?
- What is the difference between SET and MULTISET tables?
- What are Derived Tables in Teradata?
- What are Volatile Tables?
- What are Global Temporary Tables?
- Difference between Volatile Table and Global Temporary Table.
- What are Error Tables in Teradata?
- What is a Normalization Index?
- What is Referential Integrity in Teradata?
- Explain the different types of Locks.
- What is Lock Escalation?
- Explain block-level vs row-level locking.
- What is a Deadlock in Teradata?
- What are Value Ordered NUSI?
- Difference between HASH JOIN and MERGE JOIN.
- What is Confidence Level in Explain Plan?
- How does Teradata handle NULL values?
- What are the phases of FastLoad?
- Why FastLoad does not support secondary indexes?
- How to improve MultiLoad performance?
- What is Checkpoint in MultiLoad?
- Explain TPT operators.
- What is SPOOL space overflow?
- What is the role of DBS Control parameters?
- Difference between Primary Index and Primary Key in Teradata internally.
- What are hashing collisions?
- How does Teradata ensure even distribution of data?
- What is a Skew Factor?
- How to reduce skew in Teradata?
- Explain Teradata’s Query Optimizer.
- What is Cost-Based Optimization?
- What is Tactical vs Decision Support Queries?
- Explain Priority Scheduler.
- What is Workload Management?
- How does Teradata handle Transaction Recovery?
- What is Time Travel in Teradata?
- What is Teradata Columnar and how does it work?
Teradata – Experienced (1–40)
- Explain Teradata architecture end-to-end with PE, AMP, BYNET internals.
- Explain in detail how the Parsing Engine parses, optimizes, and dispatches a query.
- Describe AMP operations during a SELECT query.
- What happens inside Teradata when data is inserted?
- Explain hashing algorithm internals (32-bit hashing).
- How does Teradata ensure concurrent processing without data conflicts?
- Explain Fallback vs Journaling vs RAID at an architectural level.
- What is Cylinder Read and how does AMP optimize it?
- Internal working of Teradata Join Processing (AMP to AMP communication).
- How does Teradata optimizer choose a join strategy?
- Explain how Teradata handles redistribution vs duplication in join planning.
- Explain the AMP Steps for a Multi-AMP query.
- How does Teradata manage spool files internally?
- Explain in detail how statistics impact query plans.
- What does Optimizer do when stats are stale?
- How to diagnose and fix skewed queries?
- How does Teradata work with fewer AMPs?
- Explain multi-level partitioning internals.
- What is Columnar Storage internal architecture?
- How does Teradata stores columnar data in hybrid mode?
- Explain parsing engine memory management.
- How does Teradata handle deadlocks internally?
- Explain Checkpointing in Transaction Recovery.
- Explain dual-system and cluster failover architecture.
- What is PDE (Parallel Database Extensions) and why is it needed?
- Explain the internal working of TPT Load Operator.
- Describe batch vs real-time ingestion strategies.
- How do you debug performance issues using DBQL?
- Explain ResUsage tables and how to analyze resource bottlenecks.
- How does Teradata use Histograms for stats?
- Explain query plan caching and reusage.
- How does session management work in Teradata?
- How does workload management prioritize jobs?
- Explain Service Level Goals and throttles.
- How do you tune a Teradata system for heavy BI workloads?
- Explain how zero-downtime upgrades work.
- Explain ARC/NetVault/Hadoop integration for backup and restore.
- Explain GEOspatial functions internal processing.
- What is Intelligent Memory in Teradata?
- Explain end-to-end debugging of a highly skewed, slow query.
Teradata Interview Questions and Answers
Beginner (Q&A)
1. What is Teradata and why is it used for large-scale data warehousing?
Teradata is a massively parallel processing (MPP)-based relational database management system engineered specifically for large-scale data warehousing, analytics, and decision-support workloads. It is designed to handle extremely large volumes of structured data by automatically distributing data and workload across multiple processors.
Key reasons it is used for large-scale data warehousing:
- Massively Parallel Processing (MPP)
Teradata divides data and workload across many AMPs (Access Module Processors), allowing SQL queries to run in parallel.
This parallelism enables Teradata to process terabytes and petabytes of data efficiently. - Shared-Nothing Architecture
Every node operates independently with its own CPU, memory, and disk.
No resource contention → linear scalability. - Automatic Data Distribution
With hashing, Teradata evenly distributes rows across AMPs, avoiding data hotspots and ensuring balanced workload. - High Throughput & Low Latency
Teradata is optimized for heavy analytical workloads involving large table scans, joins, aggregations, and sorting. - Fault Tolerance & High Availability
Features like fallback protection, RAID, cliques, and automatic recovery make Teradata highly reliable. - Advanced Query Optimizer
Teradata’s optimizer is cost-based and heavily statistics-driven, chosen for enterprise analytics. - Enterprise Workload Management
Supports complex scheduling, SLGs, throttles, filters, and prioritization for mixed workloads.
In summary, Teradata is chosen for large-scale data warehousing because it can handle extremely large datasets with high performance, reliability, and scalability, using an efficient parallel architecture.
2. Explain the Teradata architecture.
Teradata follows a Shared-Nothing, Massively Parallel Processing (MPP) architecture consisting of multiple components working together to process workloads efficiently.
Teradata Architecture Components:
- Nodes
A node is a high-performance server containing CPU, memory, and storage connections.
Each node runs Teradata software and contains multiple virtual processors. - Virtual Processors (Vprocs)
Each node hosts two main types of Vprocs:- Parsing Engine (PE) – handles SQL parsing, optimization, and dispatching
- AMP (Access Module Processor) – handles data storage, retrieval, and manipulation
- BYNET
BYNET is the communication backbone that connects all nodes.
It allows PEs to send work to AMPs and AMPs to communicate during joins or data redistribution. - Data Storage Layer
Each AMP manages its own disk space independently.
Data is spread across AMPs using a hashing algorithm on the primary index.
Working Process:
- A user sends an SQL query → handled by the PE.
- PE parses, optimizes, and generates execution steps.
- Steps are sent via BYNET to AMPs.
- Each AMP works in parallel, completing its part of the workload.
- Results are gathered back and returned to the user.
Key Benefits of Teradata Architecture:
- Linear scalability
- High availability
- Parallel execution
- Automatic tuning and workload distribution
3. What is a Node in Teradata?
A Node in Teradata is a single physical server (or virtualized server) that runs a portion of the Teradata system.
Each node contains:
- CPU
- Memory
- Operating system (Linux-based)
- Disk subsystem connections
- Multiple Virtual Processors (PEs and AMPs)
Key characteristics:
- Independent Operation
Each node works autonomously as part of the larger MPP system. - Hosts Vprocs
Nodes host Parsing Engines and AMPs — the key working units of Teradata. - Participates in Parallel Workload
When workload grows, more nodes are added to increase processing power. - Fault Isolation
If one node fails, others keep functioning, ensuring continuous availability.
Nodes form the building blocks of Teradata’s parallel architecture.
4. What is a Parsing Engine (PE)?
A Parsing Engine (PE) is a core Teradata component responsible for handling SQL queries. It coordinates between client sessions and the AMPs.
Functions of the Parsing Engine:
- Session Management
Handles login, authentication, and session tracking. - Syntax Parsing
Validates SQL syntax and checks query structure. - Semantic Analysis
Ensures table names, columns, privileges, and data types are valid. - Optimization
The PE optimizes SQL queries using cost-based optimization, joining strategies, and statistics. - Planning Execution Steps
Converts SQL into a series of internal steps (execution plan) for AMPs. - Dispatching to AMPs
Sends optimized steps through BYNET to the appropriate AMPs. - Final Result Assembly
Collects results from AMPs and sends them back to the client.
In simple terms:
PE = Query Manager + Optimizer + Dispatcher.
5. What is an AMP (Access Module Processor)?
An AMP is the heart of Teradata’s processing power. It is a virtual processor responsible for managing data.
Functions of an AMP:
- Data Storage
Each AMP owns a portion of disk space.
Rows are assigned to AMPs using a hashing algorithm. - Data Retrieval
During a query, each AMP reads its own portion of the data in parallel. - Executing SQL Steps
AMPs perform joins, aggregations, sorts, updates, and deletions. - Parallel Processing
Every AMP executes identical steps on its share of data → linear scalability. - Data Protection (Fallback, Journaling)
AMPs store fallback copies of data for high availability. - Spool Space Management
Temporary query results are stored and managed by AMPs.
In summary, AMPs perform all database operations and are responsible for the massively parallel execution that Teradata is famous for.
6. Explain BYNET and its role.
BYNET is Teradata’s high-speed communication network that connects Parsing Engines and AMPs across all nodes.
Roles and Functions of BYNET:
- Message Routing
Sends communication between PEs and AMPs efficiently. - Parallel Dispatching
Distributes steps to all AMPs simultaneously. - Inter-AMP Communication
Allows AMPs to exchange data during joins, merges, hash redistribution. - Broadcast, Multicast, Unicast Communication
- Unicast: one-to-one
- Multicast: one-to-many
- Broadcast: one-to-all
- Load Balancing
Dynamically balances traffic between nodes. - Fault Tolerance
Multiple BYNET paths ensure high availability; if one connection fails, traffic routes through another.
BYNET is essentially the nervous system of Teradata’s parallel architecture.
7. What is a Vproc in Teradata?
A Vproc (Virtual Processor) is a software abstraction that acts like a logical processor within Teradata.
Two types of Vprocs:
- Parsing Engine (PE) Vproc
Manages SQL parsing, optimization, session control. - AMP Vproc
Manages data storage and execution of SQL operations.
Characteristics:
- Vprocs are independent units that run on top of Teradata’s OS layer.
- They enable Teradata to scale and utilize hardware efficiently.
- Even if hardware changes, Vprocs provide consistency in functionality.
Vprocs make Teradata highly flexible and scalable by decoupling software processing from hardware.
8. What is a Clique in Teradata?
A Clique is a group of nodes that share access to the same disk subsystem.
Purpose of a Clique:
- High Availability
If one node fails, another node in the clique automatically takes over its work. - Automatic Failover
AMPs from the failed node restart on another node in the clique. - Shared Disk Access
All nodes in a clique have access to the same disk arrays, enabling smooth transitions. - Load Distribution
Workload shifts to surviving nodes temporarily.
In essence, a clique provides fault tolerance and ensures Teradata continues operating even if hardware failures occur.
9. What is a Hot Standby Node?
A Hot Standby Node is a dedicated node kept ready to take over the workload of a failed node instantly.
Key Features:
- Always On & Synchronized
Hot standby node is always running and has the necessary software to run AMP Vprocs from any failed node. - Instant Failover
When a node fails:- AMP vprocs move to the hot standby node
- Queries resume without major interruption
- Reduces Downtime
System availability is maintained even during hardware failure. - Automatic Recovery
Teradata rebalances workload automatically once original node is restored.
Hot Standby Nodes are critical for mission-critical systems requiring 24×7 uptime.
10. Explain the concept of Data Distribution in Teradata.
Teradata uses hash-based data distribution to evenly spread table rows across all AMPs.
How Data Distribution Works:
- Hashing Algorithm
Teradata applies a 32-bit hashing algorithm on the Primary Index (PI) of each row. - Hash Bucket Assignment
Hash value maps to a hash bucket. - Bucket maps to an AMP
Each bucket is associated with one AMP.
That AMP stores the row.
Benefits of Hash-Based Distribution:
- Even Distribution
Prevents skew by ensuring each AMP gets a similar number of rows. - Parallel Processing
Since data is evenly distributed, each AMP can work simultaneously on its share. - Faster Query Execution
Scans, joins, aggregations complete faster due to balanced workload. - Predictable Performance
Hashing ensures consistent and stable data distribution.
The core idea:
Teradata distributes data based on hashing → enables true parallelism → delivers high performance.
11. What is a Primary Index?
A Primary Index (PI) in Teradata is the most important physical data access method used to determine how rows are distributed across AMPs and how data is accessed during queries.
Unlike traditional databases where a primary key identifies uniqueness, in Teradata the Primary Index primarily controls data distribution.
Key purposes of a Primary Index:
- Controls Row Distribution
Teradata uses a hashing algorithm on the PI column(s).
This hash value determines which AMP will store the row.
Good PI selection ensures even data distribution and optimal performance. - Fast Data Retrieval
PI-based access is the fastest access path.
Since Teradata knows exactly which AMP holds the row, retrieval is direct and efficient. - Data Locality
CUP/AMP mapping remains consistent with PI selection, enabling parallel processing.
Types of Primary Index:
- UPI (Unique PI) – ensures unique hash values
- NUPI (Non-Unique PI) – allows duplicate hash values
Important Note:
Primary Index does not imply uniqueness.
Its function is placement of rows, not uniqueness enforcement.
12. Difference between Primary Key and Primary Index
Although their names sound similar, Primary Key (PK) and Primary Index (PI) serve entirely different purposes.
FeaturePrimary KeyPrimary IndexPurposeEnsure uniqueness of rowsControl data distribution & access pathConstraint?Yes, PK enforces a uniqueness constraintNo, PI does not enforce uniquenessDuplicates allowed?NoYes (if NUPI)Effect on Data DistributionNoneMajor impact (controls hashing)Effect on PerformanceMostly logicalDirectly affects performance and skewPhysical StorageDoes not determine storageDetermines which AMP stores rowsDefinitionLogical RDBMS conceptTeradata-specific physical concept
In summary:
Primary Key = Uniqueness.
Primary Index = Data distribution + fast access.
13. Explain Unique Primary Index vs Non-Unique Primary Index
Teradata supports two types of primary indexes: UPI and NUPI.
Unique Primary Index (UPI)
A UPI ensures every row generates a unique hash value.
No two rows with a UPI can produce identical hash values.
Characteristics:
- Perfect Distribution
Each AMP receives identical numbers of rows → no skew. - Uniqueness Enforcement
Teradata validates that no duplicates exist. - Fastest Access Path
The system can locate a row in a single AMP lookup. - Ideal for OLTP-like operations
Fast row-level operations.
Non-Unique Primary Index (NUPI)
A NUPI allows duplicate values in PI column(s).
Rows with the same PI value hash to the same AMP.
Characteristics:
- Possibility of Skew
If too many rows share the same PI value, that AMP gets overloaded. - Better for range queries
Useful when users often search based on non-unique columns. - Distribution might not be balanced
But with good PI choice, distribution can still be efficient.
Which is better?
- UPI ensures perfect distribution but cannot be used when values are not unique.
- NUPI is more flexible but must be chosen carefully to avoid skew.
14. What is a Secondary Index?
A Secondary Index (SI) is an additional access path used to improve query performance when the Primary Index is not ideal for certain queries.
Why SI exists:
Sometimes queries frequently filter on columns that are not part of the Primary Index. A Secondary Index allows faster retrieval without full table scanning.
Types of Secondary Index:
- Unique Secondary Index (USI)
- Enforces uniqueness
- Uses two-AMP access:
AMP1 (index row) → AMP2 (data row) - Fast and efficient
- Non-Unique Secondary Index (NUSI)
- Does not enforce uniqueness
- Uses single-AMP access
- Useful for selective queries
- Acts as a local index (stored on each AMP)
Characteristics of Secondary Indexes:
- Stored separately in sub-tables
- Cause additional overhead during inserts/updates
- Helpful for accelerating frequently used filters
Note:
FastLoad cannot load tables with secondary indexes.
15. What is a Join Index?
A Join Index is a pre-computed index that stores the result of a join between tables, designed to accelerate complex queries.
Purpose of Join Index:
- Reduce expensive joins
When queries frequently join the same tables/columns, a join index eliminates repeated join operations. - Store join results in advance
The result is stored physically as a separate table maintained by Teradata. - Improve performance of star-schema queries
Example: Fact table joined with multiple dimensions.
Types of Join Index:
- Single-Table Join Index
Advanced form of an index on a single table. - Multi-Table Join Index (MTJI)
Pre-joins two or more tables. - Sparse Join Index
Stores only rows that meet a certain condition.
Benefits:
- Reduces data redistribution
- Minimizes spool usage
- Improves tactical & complex query performance
Cost:
- Slower INSERT/UPDATE operations because index needs maintenance
16. What is a Partitioned Primary Index?
A Partitioned Primary Index (PPI) allows a table’s rows to be partitioned based on the values of columns, helping optimize range and selective queries.
How PPI Works:
- PI still controls which AMP stores the row.
- PPI further organizes rows within the AMP into multiple partitions.
- Query pruning reduces the number of partitions scanned.
Benefits of PPI:
- Partition Elimination
Teradata scans only necessary partitions — huge performance gain. - Better Manageability
Partition-level backup and maintenance. - Faster Range Queries
Excellent for date-based searches:
WHERE transaction_date BETWEEN '2023-01-01' AND '2023-01-31'
- Reduces Full Table Scans
Because unnecessary partitions are skipped.
Partition Types:
- Range partitions
- Case partitions
- Multi-level partitions
17. What is Skew and how does it impact performance?
Skew occurs when data is unevenly distributed across AMPs.
Some AMPs get more rows than others, causing imbalance.
How Skew Happens:
- Poorly chosen NUPI
- Many duplicates on PI column
- Using bad partition keys
- Data bias (e.g., 90% customers from same region)
Skew Impact on Performance:
- Performance Bottleneck
Slowest AMP determines query completion → entire query delayed. - Uneven Workload
Some AMPs overloaded while others sit idle. - Spool Space Errors
Overloaded AMPs may run out of spool space → query failure. - Longer Job Execution
Joins, aggregations, and table scans take much longer.
Good PI selection is the primary solution to avoid skew.
18. What is Teradata SQL Assistant?
Teradata SQL Assistant is a client-side GUI tool used to connect to Teradata databases and run SQL queries.
Key capabilities:
- Execute SQL Queries
Submit single or multiple SQL statements. - View Data and Explain Plans
View result sets and performance plans easily. - Export and Import Data
Export results to CSV, Excel, or text. - History and Query Repository
Saves query history for later reuse. - Database Browsing
View tables, schemas, indexes, stored procedures, and macros. - Session Monitoring
Evaluate current activity levels and usage.
Why it is commonly used:
- Simple
- Lightweight
- Ideal for developers, testers, analysts
19. Explain Teradata View.
A View in Teradata is a virtual table defined by a SQL SELECT statement.
It does not store data physically; instead, it stores metadata that references the underlying base table.
Purpose of a View:
- Security
Restrict user access to specific columns or rows. - Simplify Complex Queries
Abstract complex joins or calculations. - Logical Layer
Provide a semantic layer over tables. - Data Masking
Hide sensitive data fields (e.g., salary, SSN). - Reuse and Standardization
Common business logic can be centrally defined.
Characteristics:
- Views always reflect the latest data in underlying tables.
- Views can be nested.
- Performance depends on base tables, not the view definition.
20. What is a Teradata Macro?
A Teradata Macro is a stored SQL script used to execute a series of SQL statements with a single call.
Purpose of a Macro:
- Automation
Automates repetitive SQL operations. - Consistent Execution
Ensures the same steps are performed every time. - Parameter Support
Macros can accept parameters using (:paramName) syntax. - Faster Execution
Because macro code is pre-parsed and stored, only execution is needed.
Typical Uses of Macros:
- Daily reporting queries
- ETL routines
- Batch jobs
- Security/enforcement logic
Example:
CREATE MACRO sales_summary (p_date DATE) AS (
SELECT region, SUM(amount)
FROM sales
WHERE sale_date = :p_date
GROUP BY region;
);
Key Benefits:
- Pre-parsed → faster
- Simplifies repetitive tasks
- Reduces user errors
- Easier to maintain business rules
21. What is a Stored Procedure in Teradata?
A Stored Procedure in Teradata is a compiled database object that contains procedural logic (loops, conditionals, variables) combined with SQL statements. Unlike simple SQL queries, stored procedures allow developers to create complex workflows inside the database.
Key Features of Teradata Stored Procedures:
- Procedural Logic Support
Enables:- IF-ELSE blocks
- FOR/WHILE loops
- CASE statements
- Variable declarations
- Cursor processing
- Better Performance
Stored procedures run inside the Teradata engine, removing the need for multiple client round-trips.
Query plan is cached → faster execution. - Business Logic Centralization
Rules and operations are centralized inside the DB rather than scattered in client apps. - Parameterization
Support for IN, OUT, and INOUT parameters for flexible execution. - Security & Access Control
Users may execute procedures without direct access to underlying tables. - Error Handling
Built-in exception-handling mechanisms for robust ETL processes.
Common use cases:
- Batch ETL workflows
- Data cleansing
- Complex transformations
- Audits and logging
- Conditional merging and updates
Teradata stored procedures are powerful tools for operationalizing complex logic that would otherwise require multiple client-side scripts.
22. Explain FastLoad.
FastLoad is a high-speed Teradata utility used to load extremely large volumes of data (millions to billions of rows) into empty tables in parallel.
Key Characteristics of FastLoad:
- High-Speed Bulk Loading
Achieves maximum performance by using multiple parallel sessions. - Two-Phase Operation
- Phase 1: Acquisition Phase
Data is read from the client and distributed to AMPs based on hashing. - Phase 2: Application Phase
AMPs write the data blocks to disk in parallel.
- Only Loads into Empty Tables
FastLoad cannot load into populated tables.
Target table must have:- No secondary indexes
- No join indexes
- No triggers
- No referential integrity constraints
- Error Table Creation
FastLoad creates two error tables:- ET – Error table for format errors
- UV – Error table for duplicate rows (if SET table)
- Checkpoint Support
Automatically saves progress for restart capability.
Uses:
Ideal for initial loading of large fact tables in data warehousing.
23. Explain MultiLoad.
MultiLoad (MLOAD) is a Teradata utility used for high-volume batch maintenance of populated tables.
It supports multiple types of DML operations.
Key Features:
- Supports Multiple DML Operations
- Works on Populated Tables
Unlike FastLoad, MultiLoad can process tables that already contain data. - Parallel Batch Processing
Uses up to 5 DML operations in a single job. - Operates in Phases:
- Import Phase – Data acquisition
- DML Phase – Locking table(s)
- Application Phase – Performing DML operations
- Cleanup Phase
- Does Not Support:
- Tables with secondary indexes (during load)
- Join Indexes
- Identity columns
- Referential integrity constraints
- Error Handling:
Error tables (ET, UV) are created automatically to capture DML and data issues.
Use Cases:
- Daily incremental loads
- Batch updates to dimension/fact tables
- Large-scale delete or merge operations
24. What is the difference between FastLoad and MultiLoad?
FeatureFastLoadMultiLoadTable TypeOnly empty tablesExisting/populated tablesSupported OperationsINSERT onlyINSERT, UPDATE, DELETE, UPSERTIndexes Allowed?No secondary indexesAllowed (but locked during operations)Loading PatternBulk loading of large volumesBatch maintenance with multiple DMLsParallel SessionsMany sessions for speedLimited sessionsPhases2 phasesMultiple phasesRestart CapabilityYesYesIdeal Use CaseInitial load of huge fact tablesOngoing delta loads & updates
In summary:
FastLoad = Fast initial load into empty tables.
MultiLoad = Batch DML on existing tables.
25. What is BTEQ?
BTEQ (Basic Teradata Query) is a command-line utility used to interact with the Teradata database. It supports submitting SQL queries, importing/exporting data, and running batch jobs.
Key Features of BTEQ:
- Interactive & Batch Mode
Can run SQL interactively or execute scripts in batch mode. - Supports Full SQL
- SELECT
- INSERT
- UPDATE
- DELETE
- DDL statements
- Stored procedures
- Macros
- Report Formatting
Ideal for generating formatted reports with headings, spacing, and column alignment. - Exporting Capabilities
Export results to files such as CSV, text, or flat files. - Flow Control Commands
Supports .GOTO, .IF, .LABEL for conditional execution. - Error Handling
Provides return codes to control job processing.
Common Uses:
- Running daily batch SQL scripts
- Loading small to medium-sized data
- Admin tasks
- Generating formatted reports
26. What is TPump?
TPump (Teradata Parallel Data Pump) is a Teradata utility used for near-real-time, continuous, low-volume data loading.
Key Features:
- Row-by-Row Streaming
Inserts/updates rows continuously as they arrive. - Supports Multiple DML Operations
INSERT, UPDATE, DELETE, UPSERT - No Table Locking
Unlike MultiLoad, TPump applies row-level locks → table remains available. - Low Impact Loading
Ideal for trickle-feed or mini-batch loading. - Uses Pack Rate
Controls how many statements are batched per request. - Real-Time ETL Use Cases
Often used for loading:- Clickstreams
- POS (point-of-sale) data
- Sensor data
- Financial transactions
TPump is best for continuous incremental loading with minimal disturbance to users.
27. Explain EXPORT and IMPORT utility.
The EXPORT and IMPORT utilities are older Teradata tools used to move data between Teradata systems or between Teradata and file systems.
EXPORT Utility
Used to extract data from Teradata to a client-side file.
Features:
- Exports data in Teradata format or user-specified format
- Can export large result sets
- Often used for backups or migrations
IMPORT Utility
Used to import data into Teradata from files created by EXPORT.
Features:
- Works with Teradata-format data
- Faster when used with MultiLoad and FastLoad scripts
- Import data back into tables after migration
EXPORT + IMPORT Use Cases:
- Database migrations (from one Teradata cluster to another)
- Cross-system data replication
- Archival processes
- Reloading tables after redesign
Though older, they are still used in certain maintenance workflows.
28. What is TPT (Teradata Parallel Transporter)?
Teradata Parallel Transporter (TPT) is a modern, unified framework for high-performance data loading and exporting.
It replaces older tools like FastLoad, MultiLoad, TPump, and FastExport with a single flexible utility.
Key Features:
- Unified Architecture
TPT combines the capabilities of:- FastLoad
- MultiLoad
- TPump
- FastExport
Into a single scripting platform.
- Highly Parallel
Uses parallel operators to maximize throughput. - Types of TPT Operators:
- Load Operator – similar to FastLoad
- Update Operator – similar to MultiLoad
- Stream Operator – similar to TPump
- Export Operator – similar to FastExport
- SQL Selector/Inserter Operators
- Flexible Scripting Language
Supports workflow-based data transformation and control. - Better Performance & Scalability
Designed for modern ETL pipelines. - Error Handling & Restart
Robust built-in restart mechanisms.
Use Cases:
- High-volume ETL
- Complex transformations
- Multi-table loading
- Near-real-time integration
29. What is fallback protection in Teradata?
Fallback is Teradata’s built-in data protection mechanism that maintains a duplicate copy of each row on a different AMP.
How Fallback Works:
- When a row is stored on its primary AMP, Teradata simultaneously stores a fallback copy of that row on another AMP (in a different clique if possible).
- If AMP or node fails:
- Queries automatically read fallback copies
- System continues operating without data loss
- Zero or minimal downtime
Key Benefits:
- High Availability
Data remains available even when hardware fails. - Automatic Recovery
System recovers broken AMP and redistributes fallback copies. - No Loss of Data
Ensures business continuity.
Limitations:
- Uses extra disk space
- Slight performance overhead during operations
Fallback is optional; some staging tables do not require it.
30. What is a Cylinder?
A Cylinder is a fundamental unit of data storage within Teradata’s file system.
How it fits Teradata architecture:
- Base Storage Structure
Each AMP manages disk space divided into cylinders. - Contains Multiple Data Blocks
A cylinder contains multiple blocks, and each block stores rows. - Efficient I/O
AMPs read and write data one cylinder at a time for performance. - Role in Table Layout
Tables are stored across cylinders depending on data distribution. - Impact on Performance
Since AMPs read cylinders in bulk, good data distribution ensures balanced I/O.
Hierarchy of storage:
- Table
→ Rows
→ Data Blocks
→ Cylinders
→ Disks
Cylinders form the low-level physical storage architecture that supports Teradata’s parallel design.
31. What is a Data Block?
A Data Block in Teradata is the smallest unit of storage used by an AMP to store rows of a table. It acts as a physical container of multiple rows, grouped together for efficient I/O operations.
Key Characteristics of Data Blocks:
- Storage Unit
Each AMP manages thousands of data blocks.
A block typically ranges from 512 bytes to 64 KB. - Multiple Rows Per Block
Blocks contain rows stored in sorted by-row-hash order.
This allows efficient scanning and retrieval. - Optimized for Sequential Access
Teradata reads an entire block at a time, not individual rows.
This improves analytical query performance. - Dynamic Block Size
Block size increases automatically as table grows. - Block-Level Locking
During updates, Teradata can lock specific data blocks instead of the whole table, improving concurrency. - Fallback Copies at Block Level
Fallback protection also stores duplicate blocks on a different AMP.
Importance in Teradata Architecture:
Data blocks ensure efficient I/O and make Teradata’s parallel operations fast and scalable.
32. What is a Hash Map?
A Hash Map in Teradata is a system-generated mapping table that maps hash buckets to AMPs. It acts as a lookup structure to determine which AMP is responsible for storing a specific row.
How it works:
- Teradata generates a 32-bit row hash from the Primary Index.
- Hash value → Hash Bucket.
- Hash Bucket → AMP (as defined by the Hash Map).
Key Features of Hash Map:
- Distribution Control
Ensures even distribution of rows across AMPs. - Scalability
When new AMPs are added, the hash map redistributes bucket-to-AMP assignments. - Performance Stability
Prevents hotspots by ensuring uniform bucket assignment. - Hash Map Versions
Teradata supports:- Legacy hash maps (one AMP per bucket)
- “Sparse” hash maps (multi-AMP groupings)
In summary:
The hash map is the internal system table that tells Teradata exactly where each row should reside in the MPP architecture.
33. What is a Spool Space?
Spool Space is temporary storage allocated to each user session to hold intermediate query results.
What Spool Space is used for:
- Intermediate Result Sets
During JOIN, SORT, AGGREGATE operations, Teradata stores temporary results in spool. - Final Query Results
Before sending results back to client, Teradata may store them temporarily in spool. - Sub-Query Results
FROM subqueries and derived tables are stored in spool. - Intermediate Tables in OLAP Functions
Ranking, window functions, and roll-ups use spool.
Characteristics:
- Deallocated automatically when query completes.
- Defined per user and per AMP.
- If spool space is exceeded → query aborts with error “Exceeded Spool Space”.
Tuning Tips:
- Collect stats
- Use proper PI
- Avoid unnecessary joins
- Filter early in SQL
Spool space is critical for efficient query execution and is used heavily in analytical workloads.
34. What is a Permanent Space?
Permanent Space refers to the disk space allocated to a user for storing permanent objects such as:
- Tables
- Indexes
- Join indexes
- Partitioned tables
- Stored procedures
Key Characteristics:
- Allocated from Database/User Profiles
Admins specify permanent space quotas. - Persistent Storage
Data remains even after session ends. - Shared Across Tables
All permanent tables created by a user share the same allocated permanent space. - Used for Fallback Copies
Fallback consumes additional permanent space. - Quota Enforcement
When permanent space is exhausted, user cannot insert more data.
Permanent space determines how much actual table data a user or database can store.
35. What is a Temporary Space?
Temporary Space (or Temp Space) is used for session-specific temporary tables such as:
- Volatile Tables
Exist only during session. - Global Temporary Tables (GTT)
Definitions persist, data is session-specific. - Intermediate Workspace
For operations that require caching additional temporary data.
Characteristics:
- Released when user logs off.
- Shared across all temp structures in a session.
- Quotas defined in user profile.
- Not used for spool or permanent data.
Temporary space enables processing of temporary datasets without consuming permanent storage.
36. Explain Teradata’s Parallel Processing.
Teradata is built on Massively Parallel Processing (MPP) architecture, enabling it to process huge volumes of data extremely quickly.
How Parallel Processing Works in Teradata:
- Data Distribution Across AMPs
Teradata distributes data evenly across AMPs using hashing. - Independent Processing by Each AMP
Each AMP works in parallel on its own data slice. - Shared-Nothing Architecture
Every AMP has its own CPU, memory, and disk → no resource conflicts. - Parallel Query Execution
For any SQL:- Parsing Engine generates execution steps
- Steps are executed simultaneously by all AMPs
- Results are combined and returned
- Parallel Join, Sort, Aggregate
Each AMP independently processes its data blocks. - Linear Scalability
Adding more AMPs → almost linear performance improvement. - Parallel Loading
FastLoad, MultiLoad, TPT all use parallel loading techniques.
Benefits:
- High performance
- Fast bulk loads
- Efficient for analytical workloads
- Predictable response time
Parallelism is the core reason Teradata handles terabyte-to-petabyte scale warehouses effortlessly.
37. What is an Explain Plan?
The Explain Plan in Teradata reveals how the optimizer will execute a SQL query.
Purpose:
- Shows join strategies
- Shows redistribution or duplication
- Estimates cost and time
- Helps diagnose performance issues
What Explain Plan Shows:
- Access Paths
Whether using PI, NUSI, or full table scan. - AMP Participation
Single-AMP, few-AMP, or all-AMP operations. - Join Methods
- Hash Join
- Merge Join
- Nested Loop Join
- Redistribution Steps
Indicates if rows are moved across BYNET. - Cost Estimates
- CPU cost
- I/O cost
- Row counts
- Confidence levels
- Stats Usage
Shows if statistics were used or missing.
Developers use Explain Plans to:
- Fix skew
- Rewrite SQL
- Tune Primary Indexes
- Improve join paths
Explain Plan is one of the most powerful tuning tools in Teradata.
38. What is Collect Statistics?
Collect Statistics is a Teradata command used to collect detailed statistics about data distribution, row counts, values, and histograms for tables and indexes.
Importance of Statistics:
- Crucial for Optimizer
Teradata’s optimizer uses stats to choose the best execution plan. - Better Join Strategies
With accurate stats, optimizer selects the fastest join method. - Prevents Misestimation
Missing or stale stats lead to poor performance, skew, redistributions. - Stored in Data Dictionary
Stats are persisted and survive restarts.
Types of Stats:
- Column stats
- Index stats
- Partition stats
- Multi-column stats
- Expression stats
When to Collect Stats:
- After large loads
- After major updates
- On heavily queried tables
Example:
COLLECT STATISTICS ON sales COLUMN(transaction_date);
Collecting stats is mandatory for performance tuning.
39. What is Hash Join?
A Hash Join is a join strategy used by Teradata where rows from tables are hashed based on join columns and matched efficiently through hash buckets.
How Hash Join Works:
- Build Phase
One table (usually smaller) is scanned and hashed into memory-based hash buckets. - Probe Phase
The second table is scanned, hashed on join columns, and matched with the build table’s hash buckets. - Parallel Execution
Every AMP performs hash join independently on its own data. - Data Redistribution
If join columns are not aligned, rows are redistributed across AMPs before the join.
Advantages:
- Fast for large tables
- Highly parallel
- Works well even without indexes
- Ideal for equi-joins
Disadvantages:
- Requires spool space
- Redistribution can be expensive
Hash Join is the most commonly used join strategy in Teradata for large workloads.
40. What is a Teradata Database User?
A Teradata Database User is an account within Teradata that allows a person, application, or process to connect to and use the database.
Key Characteristics:
- Authentication Identity
Users authenticate with a username and password via client tools like BTEQ, SQL Assistant, or JDBC/ODBC. - Has Assigned Space
- Permanent Space
- Spool Space
- Temporary Space
- Has Permissions/Roles
Admin assigns privileges such as:- SELECT
- INSERT
- UPDATE
- CREATE TABLE
- EXECUTE PROCEDURE
- Can Own Objects
Users can own:- Tables
- Views
- Macros
- Stored Procedures
- Databases
- Resource Management
Workload management rules can limit user’s resource consumption. - Profile Association
Users can be linked to profiles to control password policies, spool space, and priorities.
Types of Teradata Users:
- Application users
- ETL service accounts
- Admin users
- Developer/analyst users
A Teradata user is essentially a logical container that manages identity, permissions, and storage allocation.
Intermediate (Q&A)
1. Explain the PE steps when a query is submitted.
When a user submits an SQL query in Teradata, the Parsing Engine (PE) performs a series of internal steps to validate, optimize, and dispatch the query for execution. These steps ensure correctness, performance optimization, and efficient parallel execution.
Detailed PE Steps:
1. Session Handling
- The SQL request arrives from a client (SQL Assistant, BTEQ, JDBC, etc.).
- PE authenticates the user and validates session-level settings.
2. Syntax Parsing
- The SQL statement is checked for syntax errors.
- The request is translated into an internal parse tree.
- Invalid SQL structure results in an immediate error.
3. Semantic Analysis
- Verifies table names, columns, data types, and privileges.
- Validates correctness:
- Tables exist?
- User has SELECT/INSERT/UPDATE privileges?
- Column types match?
- Also checks view definitions and macro expansions.
4. Query Rewrite / Normalization
- SQL is rewritten for optimization:
- Simplifies nested queries.
- Converts subqueries to joins (when possible).
- Removes redundant conditions.
5. Optimizer Phase
This is the most important step.
The Teradata Optimizer:
- Reads statistics on tables and columns.
- Estimates:
- Row counts
- Data distribution
- Selectivity
- Cost of different join strategies
- Chooses the best plan:
- Join method (hash join, merge join, product join)
- Spool management
- Partition elimination
- Redistribution or duplication strategy
Optimizer generates a detailed plan using cost-based optimization.
6. Step Generation
- SQL is broken into multiple internal “steps.”
- Each step is an instruction for AMPs (e.g., scan table, hash-redistribute rows, join, sort).
7. Dispatching to AMPs via BYNET
- PE sends steps to all AMPs using BYNET.
- AMPs execute steps in parallel on their own data slices.
8. Result Aggregation
- AMPs return their partial results to PE.
- PE assembles the final answer set.
9. Return Result to Client
- Results are sent back to the user’s session.
Summary:
A PE manages the lifecycle of an SQL request from validation → optimization → execution → result delivery, ensuring Teradata’s MPP engine performs efficiently.
2. What is the importance of Row Hash and Hash Bucket?
Row Hash and Hash Bucket are fundamental components of Teradata’s data distribution and access architecture.
Row Hash
- Generated by applying Teradata’s 32-bit hashing algorithm on the Primary Index (PI) column(s).
- Determines which AMP will store the row.
- Ensures parallelism by evenly distributing data among AMPs.
Importance:
- Enables Teradata to locate rows in constant time O(1).
- Ensures minimal skew through even distribution.
- Drives join redistribution strategies.
Hash Bucket
- A logical storage area in the hash map.
- Each AMP owns a set of hash buckets.
- Row hash → bucket → AMP mapping determines row placement.
Importance:
- Prevents excessive collision of hashes.
- Ensures balanced load across AMPs.
- Helps Teradata identify duplicate rows quickly.
Combined Importance:
- Fast row lookup
- Balanced data distribution
- Efficient join processing
- Avoidance of AMP skew
Teradata’s performance depends heavily on the accuracy and uniformity of hash distribution.
3. How does Teradata distribute data during INSERT?
When an INSERT occurs in Teradata, data distribution is governed by the Primary Index (PI) and the hashing algorithm.
Steps of Data Distribution:
1. Row hash calculation
- Teradata computes the row hash based on the PI value(s).
2. Hash bucket determination
- The hash value selects a hash bucket.
3. AMP assignment
- The hash map maps the hash bucket to a specific AMP.
4. Row storage
- The designated AMP stores:
- The primary row
- (Optionally) Fallback copy on a different AMP
5. Index updates
- If SI, USI, or NUSI exist, corresponding sub-tables are updated.
6. Journaling (if enabled)
- Before and after images are stored for recovery.
Outcome:
Teradata achieves even and predictable distribution across all AMPs, enabling massive parallelism.
4. How does Teradata handle Duplicate Row checking?
Duplicate row checking depends on table type:
For SET Tables
- SET tables do not allow duplicate rows.
- During INSERT:
- AMP hashes the PI.
- Locates target block.
- Performs row-by-row comparison with existing rows.
- If an exact match exists → row rejected.
- Duplicate rows go to UV (Unique Violation) table.
SET table enforcement is done locally within the AMP → highly efficient.
For MULTISET Tables
- MULTISET tables allow duplicate rows.
- No comparison is done.
- AMP directly inserts the row.
For Unique Primary Index (UPI)
- Duplicate PI values are rejected even in MULTISET tables.
For USI
- Unique Secondary Index ensures uniqueness across AMPs using a two-step lookup.
Key Mechanisms Involved:
- Row hash lookup
- Binary search on data blocks
- Comparison of values
- UV error tables
Duplicate checking ensures data integrity per table definition.
5. What is the difference between SET and MULTISET tables?
FeatureSET TableMULTISET TableDuplicatesNot allowedAllowedDuplicate Check CostHigher (comparison needed)None (faster insert)Insert SpeedSlowerFasterUsageData quality enforcementHigh-speed ETL, stagingBehaviorRejects identical rowsAccepts identical rows
Summary
- Use SET when data integrity and uniqueness matter.
- Use MULTISET for performance and large ETL loads.
6. What are Derived Tables in Teradata?
A Derived Table (DT) is a temporary, inline result set created by a subquery in the FROM clause of an SQL statement.
Characteristics:
- Created in spool space.
- Exists only for the duration of the query.
- Used to simplify complex queries.
- Supports aggregation, filtering, joins, ordering.
Example:
SELECT a.*
FROM (
SELECT dept_id, SUM(salary) AS total
FROM employee
GROUP BY dept_id
) AS a
WHERE total > 50000;
Benefits:
- Breaks complex logic into simpler steps
- Optimizer can push down predicates
- No need to create physical tables
- Supports multiple references using WITH RECURSIVE for advanced DTs
Derived tables improve readability and enable more efficient execution plans.
7. What are Volatile Tables?
A Volatile Table is a session-specific temporary table created and used only within the current session.
Characteristics:
- Exists only for the session; dropped automatically at logout.
- Data stored in spool space, not permanent space.
- No DBC metadata entry — faster and lightweight.
- No fallback protection.
- Can be defined as SET or MULTISET.
- Often used in ETL to store intermediate results.
Example:
CREATE VOLATILE TABLE v_emp AS
(
SELECT * FROM employee
)
WITH DATA
PRIMARY INDEX(emp_id)
ON COMMIT PRESERVE ROWS;
Use Cases:
- Temporary staging
- Intermediate results
- Session-level transformations
- Complex calculations
Volatile tables offer flexibility without overhead.
8. What are Global Temporary Tables?
A Global Temporary Table (GTT) is a permanent table definition stored in DBC metadata whose data is created only during sessions.
Characteristics:
- Table definition stored permanently, but data is session-specific.
- Data stored in temporary space, not permanent space.
- Supports FALLBACK.
- Data is deleted:
- ON COMMIT DELETE ROWS
- ON COMMIT PRESERVE ROWS
- More secure and structured than volatile tables.
Example:
CREATE GLOBAL TEMPORARY TABLE gtt_sales
(
region VARCHAR(50),
amount DECIMAL(10,2)
)
ON COMMIT PRESERVE ROWS;
Use Cases:
- Reusable ETL templates
- Complex procedures
- Intermediate aggregation tables
GTTs combine permanence of structure with temporary data.
9. Difference between Volatile Table and Global Temporary Table.
FeatureVolatile TableGlobal Temporary TableDefinition stored?No (not in DBC)Yes (in DBC)Exists after logout?NoDefinition persistsFallback available?NoYesSpace usedSpool spaceTemporary spacePerformanceFaster (lightweight)Slight overheadUse CaseAd-hoc temporary storageReusable temp structuresDDL required each session?YesNo
Summary:
Volatile = Temporary definition + temporary data.
GTT = Permanent definition + temporary data.
10. What are Error Tables in Teradata?
Teradata automatically creates Error Tables for load utilities like FastLoad, MultiLoad, and TPT to capture rows that fail during load or DML operations.
Types of Error Tables:
1. ET Table (Error Table 1)
Captures:
- Bad data format
- Data type mismatch
- Constraint violations
- Null violations
2. UV Table (Unique Violation Table)
Captures:
- Duplicate row violations
- Unique PI/USI conflicts
Error Tables Are Used By:
- FastLoad
- MultiLoad
- TPT Load Operator
Benefits:
- Prevents job failure due to bad rows.
- Captures exact error type and problematic row.
- Supports restart/recovery without reloading entire file.
- Helps data cleansing and debugging.
11. What is a Normalization Index?
A Normalization Index (NI) in Teradata is a special type of index that helps optimize queries by mapping frequently occurring values to a compact surrogate representation. Although rarely discussed in beginner-level documentation, it is used internally by the optimizer to improve selectivity estimation and reduce data scanning.
Key Concepts of Normalization Index:
- Reduces Data Redundancy
When a column contains many repeated values (e.g., gender, state, category), a normalization index stores unique values in a separate structure with references. - Improves Optimizer Accuracy
It helps the optimizer better estimate row counts for conditions like:
WHERE state = 'CA'
- Used for Selectivity Estimation
NI helps the optimizer determine how selective a predicate is without scanning the base table. - Internal Structure
- Lightweight
- Automatically maintained
- Not directly manipulated by users
- Performance Benefits:
- Better execution plan estimation
- Less unnecessary full table scanning
- Optimized join strategy selection
Normalization Indexes are rarely created manually—they are more of an internal tool used by the Teradata optimizer to deliver better plans.
12. What is Referential Integrity in Teradata?
Referential Integrity (RI) ensures that relationships between tables (parent-child) remain valid at all times. Teradata supports soft RI and hard RI, each having different operational behaviors.
Types of RI in Teradata
- Soft Referential Integrity
- Also called non-enforced RI
- The database trusts the developer/developer’s ETL to maintain correct relationships
- No real-time constraint enforcement
- Zero overhead
- Used frequently in data warehouses
- Hard Referential Integrity
- Enforced by the database
- Ensures child table rows must reference valid parent table rows
- Checked during INSERT/UPDATE/DELETE operations
- Adds overhead
- Rarely used for large tables in Teradata due to performance impact
Benefits of RI:
- Query optimization using referential constraints
- Eliminates unnecessary joins
- Improves cardinality and selectivity estimates
- Helps optimizer generate smarter plans (especially with soft RI)
Teradata’s philosophy is that ETL pipelines manage RI, so soft RI is often preferred to maintain performance.
13. Explain the different types of Locks.
Teradata uses a fine-grained locking mechanism to maintain transactional consistency and avoid conflicts.
Main Lock Types:
- READ Lock
- Acquired during SELECT
- Allows concurrent reads
- Blocks write operations
- NON-BLOCKING for other readers
- WRITE Lock
- Allows UPDATE/DELETE/MERGE
- Only one session can hold a write lock at a time
- Blocks other read/write requests on the same object
- EXCLUSIVE Lock
- Needed for DDL operations (e.g., ALTER TABLE, DROP TABLE)
- No other session can access the object
- Highest level of locking
- ACCESS Lock
- Weakest lock that allows reading during write operations
- Used for SELECT … WITH ACCESS
- SEMI-EXCLUSIVE Lock
- Used by utilities (FastLoad, MultiLoad)
- Blocks normal writes but allows controlled utility operations
Lock Levels:
Teradata supports locking at various granularities:
- Database-level
- Table-level
- RowHash-level
- Partition-level (for PPI tables)
- Row-level (rare in Teradata)
The locking system ensures consistent reads/writes without significantly impacting performance.
14. What is Lock Escalation?
Lock Escalation is when a database automatically converts fine-grained locks (e.g., row-level) into coarse-level locks (e.g., table-level) to reduce lock overhead.
✦ IMPORTANT:
Teradata does NOT support lock escalation, unlike SQL Server or Oracle.
Why Teradata avoids lock escalation:
- MPP architecture requires granular locking
Rowhash-level locks minimize contention across AMPs. - Predictable performance
Since escalations never occur, developers avoid accidental table-level locks. - Designed for high concurrency
Many users can query/update the same table without deadlocks. - Performance-oriented
Teradata’s rowhash locking is extremely lightweight.
Thus, Teradata always maintains the original lock granularity, ensuring stable performance.
15. Explain block-level vs row-level locking.
Teradata uses both block-level and rowhash-level locking depending on the situation.
Block-Level Locking
- Lock is applied on a physical data block (contains multiple rows)
- Typical operations:
- Bulk operations
- Full table scans
- Some UPDATE/DELETE operations
- Faster than row-level locking
- Uses fewer lock resources
- Good for batch-oriented workloads
Row-Level (RowHash-Level) Locking
- Lock is placed at the RowHash level
- The smallest lock granularity in Teradata
- Used during:
- Row-level updates
- WHERE clause narrowing operations
- Allows maximum concurrency
- Prevents unnecessary locking of unrelated rows
Comparison Summary:
Lock TypeScopeUse CaseRow-LevelSingle rowhashTactical queries, updates to few rowsBlock-LevelMultiple rows in a blockBulk updates, scans, utilities
Teradata intelligently chooses the lock type based on the SQL operation.
16. What is a Deadlock in Teradata?
A deadlock occurs when two or more transactions wait on resources locked by each other, and none can proceed.
Example scenario:
- Transaction A locks row1 and requests row2
- Transaction B locks row2 and requests row1
- Both are waiting → deadlock
How Teradata Detects Deadlocks:
- Uses a deadlock detection algorithm that regularly checks lock wait graphs.
- Detects circular waiting conditions.
- Automatically aborts one of the conflicting transactions.
- Rolls back changes of the canceled transaction.
- Releases locks immediately.
Teradata Deadlock Prevention Features:
- Rowhash-level locking to minimize collisions
- Optimized lock acquisition order
- Transaction rollback for fastest recovery
Deadlocks are rare in Teradata because of its consistent lock management and hashing-based row distribution.
17. What are Value Ordered NUSI?
A Value-Ordered NUSI (VO-NUSI) is a special type of Non-Unique Secondary Index whose index sub-table rows are stored in sorted order based on the index column values.
Characteristics:
- Sorted Storage
Unlike a normal NUSI, a VO-NUSI stores index rows in sorted order → faster range scans. - Best for Range Queries
Such as:
WHERE order_date BETWEEN '2023-01-01' AND '2023-01-15'
- Local to AMP
The index is local to each AMP, not globally distributed. - Low Overhead
Since NUSI depends on data being local, it avoids data redistribution. - Ideal Use Cases:
- Timestamp-based filtering
- Status-based filtering
- Columns with moderate selectivity
Value-Ordered NUSI improves performance when the optimizer expects range-based access.
18. Difference between HASH JOIN and MERGE JOIN.
FeatureHash JoinMerge JoinWorking PrincipleBuilds a hash table of join keysSorts both tables and merges matching rowsWhen it is used?When join columns are not sortedWhen tables are already sorted on join columnsData RedistributionOften requiredSometimes avoidedPerformanceBetter for large unsorted dataExcellent when data is pre-sortedMemory DependenceHigh spool usageLess spool if sortedJoin TypesInner joins, equality joinsAll join types, especially inequality joins
Hash Join Benefits:
- Great for large tables
- Efficient for equality joins
- No sorting required
Merge Join Benefits:
- Very fast when inputs are sorted
- Excellent for range joins:
A.date BETWEEN B.start AND B.end
The optimizer automatically chooses the best join type based on statistics and data characteristics.
19. What is Confidence Level in Explain Plan?
The Confidence Level in a Teradata Explain Plan indicates how accurate and reliable the optimizer’s row count and cost estimates are. It reflects how much the optimizer “trusts” its statistics.
Types of Confidence Levels:
- High Confidence
- Accurate statistics available
- Reliable cardinality and cost estimates
- Optimizer decisions are stable
- Medium Confidence
- Partial statistics available
- Some assumptions made
- Row estimates are moderately accurate
- Low Confidence
- No statistics collected
- Optimizer guesses row counts
- Execution plan may be suboptimal
- Risk of skewed or inefficient join choices
Why Confidence Level Matters:
- Determines join strategy
- Affects plan stability
- Impacts spool usage
- Influences data redistribution decisions
Collecting statistics raises confidence level and leads to much better performance.
20. How does Teradata handle NULL values?
Teradata follows ANSI SQL rules for handling NULL values, meaning NULL represents an unknown or missing value.
NULL Behavior:
- NULL ≠ NULL
NULL cannot be equal to any value, even another NULL. - Comparisons Involving NULL Return NULL
WHERE col = NULL -- always false
WHERE col <> NULL -- always false
Correct usage:
WHERE col IS NULL
WHERE col IS NOT NULL
- NULLs in Aggregations
COUNT(*) counts NULLsCOUNT(column) skips NULLs- SUM, AVG skip NULLs
- NULLs in Sorting
- NULLs appear first or last depending on session settings
- NULLs in Joins
- NULL never matches another NULL
- Outer joins preserve NULLs on non-matching side
- NULLs in Primary Index
- Allowed
- Hashing NULL generates a stable and consistent rowhash value
- NULLs in Statistics
- NULL values included in histogram statistics
- Improves optimizer accuracy
Teradata treats NULLs consistently across joins, filters, aggregations, and hashing functions to ensure predictable query results.
21. What are the phases of FastLoad?
FastLoad operates using a two-phase architecture designed for the fastest possible bulk loading of huge volumes of data into empty Teradata tables.
Phase 1: Acquisition Phase
- FastLoad reads data from the client system.
- Rows are sent to Teradata AMPs over multiple sessions.
- The PE applies hashing on the Primary Index (PI).
- Rows are redistributed to their destination AMPs but are stored in work tables, not yet in the final table format.
- Duplicate row handling and data validation begin at this stage.
Key activities in Acquisition Phase:
- Data parsing
- Row hash computation
- Row redistribution
- Insertion into temporary work tables
Phase 2: Application Phase
- Each AMP converts the rows stored in its work table into proper data blocks for the target table.
- Rows are sorted by row hash.
- Blocks are written to disk in bulk I/O operations.
- Any fallback copies are created (if fallback protection is enabled).
Key activities in Application Phase:
- Bulk writing to primary data blocks
- Creation of fallback rows
- Cleanup of temporary work tables
Why only two phases?
FastLoad uses only INSERT in bulk and does not maintain secondary indexes or journaling, allowing maximum throughput.
22. Why FastLoad does not support secondary indexes?
FastLoad prohibits secondary indexes because maintaining SIs would destroy the speed advantage of bulk loading.
Core reasons:
1. SI requires AMP-to-AMP communication
When an SI is present:
- Every inserted row requires SI sub-table updates.
- This demands extra message passing between AMPs.
This slows down bulk loads drastically.
2. SI sub-tables must be maintained synchronously
Every row inserted must also update:
- Unique Secondary Index (USI) tables
- Non-Unique Secondary Index (NUSI) tables
FastLoad skips this overhead to maintain high speed.
3. Bypassing SI allows bulk block writes
Without SIs:
- Teradata can bulk-write entire blocks efficiently
- No additional index maintenance is required
4. FastLoad requires empty tables
Secondary indexes are unnecessary during initial loads; they can be created after load completion.
23. How to improve MultiLoad performance?
MultiLoad is used for DML-heavy loads, and performance depends on the efficiency of data flow and AMP utilization.
Best Practices for Improving MultiLoad Performance
1. Use fewer indexes during load
Secondary indexes, join indexes, and referential integrity slow MultiLoad.
Drop them before load → recreate after load.
2. Increase number of sessions
MultiLoad supports multiple sessions.
More sessions → better parallelism → faster throughput.
3. Increase PACK factor
PACK > 1 means multiple rows sent at once.
This reduces network trips dramatically.
4. Optimize WHERE clause conditions
Target rows should be minimal to reduce AMP workload.
5. Improve data quality
Bad data slows down:
- ET table growth
- UV table writing
- Restart handling
Clean data before load.
6. Use CHECKPOINTs optimally
This reduces restart time after failure.
7. Avoid unnecessary logging
Disable unnecessary journaling or triggers.
8. Use MultiLoad only for batch DML
For trickle feeds → use TPump
For bulk loads → use FastLoad
24. What is Checkpoint in MultiLoad?
A Checkpoint is a recovery feature in MultiLoad that periodically saves the stage of the running job so the system can restart from that point in case of failure.
Key Features of Checkpoints
- Periodic Savepoints
Checkpoints save progress at defined intervals (e.g., every 10000 rows). - Fast Recovery
Instead of starting from scratch, MultiLoad restarts from the last successful checkpoint. - User-Controlled Interval
You can specify checkpoints such as:
CHECKPOINT 10000 ;
- Applies to Import, DML, and Apply phases
- Useful for long-running jobs
Especially those processing millions of rows.
25. Explain TPT operators.
TPT (Teradata Parallel Transporter) uses a component-based architecture with operators that perform specific tasks.
Types of TPT Operators
1. Producer Operators
These generate data to be fed into Teradata.
- DataConnector Operator – reads files
- ODBC Operator – pulls data via ODBC
- Export Operator – exports Teradata data to files
2. Consumer Operators
These receive data and insert/update Teradata tables.
- Load Operator
Equivalent to FastLoad (bulk INSERT into empty table) - Update Operator
Equivalent to MultiLoad (INSERT/UPDATE/DELETE) - Stream Operator
Equivalent to TPump (continuous trickle feed)
3. Schema Operators
Define metadata of data being moved.
4. Filter Operators
Transform or filter data before loading.
Why TPT Operators Matter?
Because TPT is modular, it can:
- Load multiple tables
- Transform on the fly
- Parallelize operations
- Mix ETL flows in one script
26. What is SPOOL space overflow?
Spool Space is temporary disk space used during query execution for:
- JOIN operations
- SORT operations
- Aggregations
- Intermediate result sets
- Redistribution of rows
SPOOL Space Overflow Occurs When:
- A query produces more intermediate data than the assigned spool space.
- Bad joins generate huge result sets.
- Skew causes one AMP to accumulate more data.
- Poor PI selection leads to redistribution.
- Missing statistics causes the optimizer to choose inefficient plans.
Impact of Spool Overflow
- Query aborts with “No more spool space”
- Session terminated
- Performance degradation
How to Prevent Spool Overflow
- Fix skew issues
- Add or refresh statistics
- Rewrite joins
- Increase spool allocation
- Reduce redistribution
27. What is the role of DBS Control parameters?
DBS Control (DBSCtl) manages global configuration parameters that control database behavior, performance, and resource allocation.
Roles of DBS Control Parameters
- Workload Performance Tuning
Controls:- CPU throttling
- Resource limits
- Checkpoint intervals
- Locking Behavior
Settings related to:- Lock acquisition
- Deadlock management
- Logging and Journaling
Parameters for:- Transient journal
- Before/after image logging
- Hashing & Dictionary Behavior
- Hash bucket configuration
- Dictionary updates
- Fault Tolerance Parameters
- Fallback settings
- AMP recovery tuning
- Concurrency Management
- Max sessions per node
- Parallel execution limits
DBSCtl parameters heavily influence overall system behavior and must be set by DBAs cautiously.
28. Difference between Primary Index and Primary Key in Teradata internally
AspectPrimary IndexPrimary KeyPurposeData distribution & access pathLogical uniqueness constraintInternal RoleDetermines AMP location via hashingEnsures row uniquenessEnforced byDatabase file systemSQL engine & dictionaryImpact on PerformanceHugeMinimalStoragePhysical storage mappingLogical constraint metadataDuplicates Allowed?Yes (if NUPI)No
Key Insight:
Primary Index affects AMPs, hashing, and parallelism.
Primary Key affects only constraint enforcement.
29. What are hashing collisions?
A hashing collision occurs when two different PI values produce the same hash value.
Why collisions occur?
- Teradata uses a 32-bit hashing algorithm.
- Limited number of hash values (4 billion)
- Large tables with millions of rows → collisions are inevitable.
Effects of Hash Collisions:
- Multiple rows with different PI values end up in the same hash bucket.
- All rows in same bucket go to the same AMP.
- Can cause mild skew if too many collisions happen.
Teradata Handling of Collisions:
- Uses rowIDs to differentiate rows within a bucket.
- Transparent to the user.
- Very efficient — does not cause correctness issues.
Only when PI selection is bad does collision increase skew.
30. How does Teradata ensure even distribution of data?
Teradata uses several mechanisms to guarantee balanced distribution across AMPs:
1. Hashing Algorithm
The PI column is hashed using a high-quality 32-bit hash function.
This ensures uniform randomness.
2. Hash Bucket Mapping
Hash values are mapped into many buckets (65,536).
Buckets are assigned evenly across AMPs.
3. Uniform PI Selection
Better PI = better distribution.
Columns with high cardinality → even hashing.
4. Statistics-Based Optimizer Choices
Optimizer uses statistics to minimize data redistribution during queries.
5. System Rebalancing After Node Changes
When more AMPs/nodes added:
- Teradata redistributes hash buckets
- Balances disk & CPU workload across system
6. Avoiding Skew by Design
Good schema design, especially PI and PPI selection, prevents skew.
31. What is a Skew Factor?
A Skew Factor is a numerical measure that indicates how unevenly data is distributed across AMPs (Access Module Processors) in Teradata. Since Teradata’s performance depends on parallel processing, skewed data leads to unbalanced workloads, where one AMP does significantly more work than others.
How Skew Factor Is Calculated:
- Teradata compares the largest AMP’s row count with the average AMP’s row count.
- Formula:
Skew Factor (%) = (Max AMP rows / Average AMP rows) × 100
Examples:
- 100% → perfect distribution
- 150% → moderate skew
- 1000% → extreme skew (unusable)
Impact of High Skew Factor:
- Slow query execution
- Increased spool usage on skewed AMPs → possible query aborts
- Joins and aggregations become bottlenecked
- System-wide performance degradation
The skew factor is one of the critical metrics DBAs monitor to ensure database health.
32. How to reduce skew in Teradata?
Reducing skew is essential for maintaining Teradata’s parallel efficiency.
Techniques to Reduce Skew:
- Choose a Better Primary Index (PI)
- Select a high-cardinality column
- Avoid columns with many duplicates (e.g., COUNTRY)
- Use Composite Primary Index (Multiple Columns)
Combining columns increases uniqueness → reduces duplicates. - Use a Surrogate Key / Identity Column
Ensures even hashing when natural keys are skewed. - Use Partitioning Wisely
Partition on columns that distribute data evenly. - Random AMP Sampling
To validate distribution before loading all data. - Avoid NUPI on Skewed Columns
NUPI can cause many rows to go to one AMP. - Use Hash By Clause in TPT
Allows custom distribution logic. - Use MULTISET Instead of SET Tables
SET tables incur overhead on duplicate row checks, increasing skew under high duplication. - Redistribute Data Using CTAS
Create table AS SELECT with better PI selection.
Skew reduction directly improves system throughput and query performance.
33. Explain Teradata’s Query Optimizer.
The Teradata Optimizer is one of the most advanced cost-based optimizers in the industry. It decides the best execution plan for SQL queries by evaluating multiple strategies and selecting the lowest-cost path.
Key Responsibilities:
- Parsing and Semantic Checking
Ensures query structure and metadata validity. - Cost Calculation
Evaluates CPU cost, I/O cost, data movement cost, spool usage. - Join Strategy Selection
Chooses between:- Merge Join
- Hash Join
- Nested Join
- Access Path Selection
Determines the best means to retrieve data:- PI access
- NUSI usage
- Full table scan
- Partition elimination
- Statistics Utilization
Uses table and column stats to estimate row counts. - Plan Optimization
Generates steps for AMPs to execute in parallel. - Rewriting Queries
Teradata may rewrite SQL internally for better efficiency.
Main Goals:
- Minimizing response time
- Minimizing data movement
- Maximizing parallelism
- Ensuring AMP balance
The optimizer is the core engine ensuring Teradata performs optimally under massive workloads.
34. What is Cost-Based Optimization?
Cost-Based Optimization (CBO) means the optimizer evaluates multiple execution strategies and picks the one with the lowest estimated cost.
Cost Metrics Include:
- Number of rows processed
- I/O required
- CPU cycles
- Data redistribution cost across BYNET
- Spool space usage
- Join method efficiency
How It Works:
- Teradata gathers statistics (COLLECT STATISTICS).
- The optimizer estimates:
- Row counts
- Selectivity
- Join cardinality
- Block access counts
- It then evaluates possible plans:
- PI access vs Full table scan
- Hash join vs Merge join
- Duplication vs redistribution
- The lowest-cost strategy becomes the Explain Plan.
Why It Matters:
- Ensures consistent performance
- Reduces query resource consumption
- Enables predictable, optimized workloads
CBO is the foundation of Teradata’s intelligent parallel processing.
35. What is Tactical vs Decision Support Queries?
Teradata classifies workloads into Tactical and Decision Support (DS) categories.
Tactical Queries
- Short-running queries
- Usually OLTP-like, sub-second or few-second response times
- Operate on small sets of data
- Tight SLAs
- Require:
- PI access
- Good indexing
- Low skew
- Examples:
- Fetch user details by customer_id
- Validate credentials
Goal → Fast response time.
Decision Support Queries
- Long-running analytical queries
- Scan large amounts of data
- Complex joins, aggregations, group by
- No strict deadlines
- Examples:
- Yearly sales analysis
- Customer behavior analytics
Goal → Throughput and parallel efficiency.
Teradata uses workload management to ensure tactical queries remain fast even when analytical workloads are heavy.