Informatica Interview Questions and Answers

Find 100+ Informatica interview questions and answers to assess candidates’ skills in ETL development, mappings, workflows, data integration, and performance optimization.

WeCP Team

Table of Content

Schedule A Demo Assess Candidate's Skills

As organizations manage complex data integration, migration, and warehousing needs, recruiters must identify Informatica professionals who can build reliable, scalable, and high-performance ETL pipelines. Informatica is widely used in enterprise data warehouses, analytics platforms, and large-scale data integration projects across industries.

This resource, "100+ Informatica Interview Questions and Answers," is tailored for recruiters to simplify the evaluation process. It covers a wide range of topics—from Informatica fundamentals to advanced ETL design and optimization, including PowerCenter architecture, mappings, workflows, and performance tuning.

Whether you're hiring Informatica Developers, ETL Engineers, Data Engineers, or BI Developers, this guide enables you to assess a candidate’s:

Core Informatica Knowledge: PowerCenter architecture, source/target definitions, mappings, mapplets, workflows, sessions, and transformations.
Advanced Skills: Performance tuning, partitioning, pushdown optimization, error handling, reusable objects, and handling slowly changing dimensions (SCD).
Real-World Proficiency: Designing end-to-end ETL pipelines, integrating multiple data sources, ensuring data quality, and maintaining enterprise-grade data integration solutions.

For a streamlined assessment process, consider platforms like WeCP, which allow you to:

Create customized Informatica assessments tailored to data warehousing and enterprise integration roles.
Include hands-on tasks such as building mappings, debugging workflows, or optimizing ETL performance.
Proctor exams remotely while ensuring integrity.
Evaluate results with AI-driven analysis for faster, more accurate decision-making.

Save time, enhance your hiring process, and confidently hire Informatica professionals who can deliver scalable, reliable, and production-ready data integration solutions from day one.

Informatica Interview Questions

Informatica – Beginner (1–40)

What is Informatica PowerCenter?
What is an ETL process?
What are the key components of Informatica PowerCenter?
Define a repository in Informatica.
What is a mapping in Informatica?
What is a session in Informatica?
What is a workflow?
What is a transformation?
Explain the Source Qualifier transformation.
What is a lookup transformation?
What are connected vs unconnected lookups?
What is a filter transformation used for?
Define aggregator transformation.
What is the difference between aggregator and expression?
What is a joiner transformation?
Explain master and detail relationships in Joiner.
What are different join types in Joiner transformation?
What is a sequence generator?
What is a router transformation?
What is the difference between filter and router?
What is a sorter transformation?
Explain update strategy transformation.
What is a parameter file in Informatica?
What is a repository server?
What is metadata?
What is a target load plan?
Explain the concept of reusable transformations.
What is a shortcut?
What is a mapplet?
What is a domain in Informatica?
What is a node?
What is a service manager?
What is a folder in Informatica?
What is the difference between static and dynamic cache in Lookup?
Explain data cleansing.
What is a database connection in Informatica?
What is normalized vs denormalized data?
What is a flat file source?
What is pushdown optimization?
What are bad records and how does Informatica handle them?

Informatica – Intermediate (1–40)

Explain the architecture of Informatica PowerCenter.
How does the Integration Service work?
Explain partitioning in Informatica.
What is session partitioning and when is it used?
What are types of caches used by Lookup transformation?
Explain persistent cache.
What is a workflow monitor?
What is a command task?
What is an event wait task?
What is a pre-session and post-session command?
Explain rank transformation with example use cases.
When do you use union transformation?
What is the difference between mapping variable and mapping parameter?
Explain incremental aggregation.
What are mapplet restrictions?
How do you handle slowly changing dimensions (SCD Type 1, 2, 3)?
Explain dynamic lookup cache in detail.
What is pushdown optimization and what are its levels?
What causes session failure?
How do you debug mappings in Informatica?
What are workflow variables?
How do you schedule workflows?
How does Informatica handle deadlock situations?
Explain differences between sorter and aggregator.
What is a recovery strategy in Informatica?
How do you improve performance in Informatica sessions?
What is a relational lookup vs flat file lookup?
What is key range partitioning?
What is hash partitioning?
Explain pipeline partitioning.
What is source row-level testing?
Explain bottleneck identification in Informatica.
What is metadata manager in Informatica?
Explain indirect file loading.
What is the role of Integration Service in handling transactions?
Explain the concept of surrogate keys.
What is code page compatibility?
Explain schema drift handling.
What are session properties and why are they important?
How does Data Masking work in Informatica?

Informatica – Experienced (1–40)

Explain PowerCenter domain architecture in enterprise environments.
How do you design a high-performance ETL solution in Informatica?
Explain the internal working of dynamic lookup cache.
How does Informatica handle CDC (Change Data Capture)?
Explain the design considerations for large fact table loading.
How do you implement error logging frameworks?
How do you optimize mappings with multiple lookups?
Explain session and mapping-level pushdown optimization internals.
Describe advanced partitioning strategies and when to avoid them.
How do you tune aggregator-heavy mappings?
What is session recovery and restartability design?
How do you implement deduplication frameworks?
Explain real-time processing using Informatica Real Time Edition.
How do you handle schema evolution in production pipelines?
What is the difference between PowerCenter and IICS?
How do you design for multi-node grid environments?
Explain the metadata-driven ETL framework.
What security considerations exist for Informatica repositories?
How do you manage repository upgrades and migrations?
What is deployment group and how is it used?
How do you integrate Informatica with Big Data platforms?
Explain the internal working of Integration Service thread pools.
Discuss load-balancing strategies in Informatica domain.
How do you design ETL for high-volume CDC replication?
What is pushdown failure recovery strategy?
Explain version control best practices in Informatica.
What are advanced SCD techniques beyond Type 1,2,3?
How do you build reusable ETL templates?
How do you implement audit/fact-less fact frameworks?
Explain error/exception partitioning.
How do you handle massive file ingestion (>1 TB)?
Explain asynchronous processing in workflows.
How do you reduce lookup cache memory footprint?
What is enterprise data masking architecture in Informatica?
How do you tune workflows containing many sessions?
What is high availability mode in Informatica?
How do you implement row-level security in ETL flows?
Explain workflow recovery from a power outage scenario.
What is a custom transformation and when is it required?
How do you enforce data governance standards in Informatica ETL?

Informatica Interview Questions and Answers

Beginner (Q&A)

1. What is Informatica PowerCenter?

Informatica PowerCenter is a widely used enterprise ETL (Extract, Transform, Load) and data integration platform designed to move, transform, and manage large volumes of data across heterogeneous systems. It enables organizations to extract data from various sources such as relational databases, flat files, cloud platforms, mainframes, and applications, and then perform complex business transformations before loading the data into target systems like data warehouses, data marts, or operational databases.

PowerCenter uses a metadata-driven architecture, which means all design objects (mappings, workflows, transformations) are stored and managed in a common repository. This ensures consistency, version control, reusability, and easy maintenance.

It provides high scalability, fault tolerance, and performance tuning options, making it suitable for large-scale enterprise data integration needs. PowerCenter supports batch processing, real-time data integration, parallel execution, and sophisticated debugging capabilities, which help in creating highly reliable and optimized ETL pipelines.

2. What is an ETL process?

ETL stands for Extract, Transform, Load, which is a fundamental process in data warehousing and data integration projects.

Extract – Data is collected from various source systems such as databases, ERP systems, cloud apps, CSV files, APIs, or mainframes. Extraction focuses on pulling the necessary data while ensuring minimal impact on the source systems.
Transform – The extracted data is cleansed, validated, enriched, standardized, and manipulated according to business rules. Transformations can include:
- Filtering and sorting
- Aggregating data
- Converting data types
- Applying business logic
- Handling nulls and errors
- Merging data from multiple sources
Load – The transformed data is then loaded into target systems such as data warehouses, operational systems, analytical platforms, or downstream applications. The loading can be full, incremental, or real-time depending on business needs.

ETL ensures that organizations have accurate, consistent, and up-to-date data for reporting, analytics, and decision-making.

3. What are the key components of Informatica PowerCenter?

Informatica PowerCenter architecture consists of several core components, each performing a specific role in the ETL lifecycle:

Repository
Stores all metadata such as mappings, workflows, sessions, configurations, and transformation rules.
Repository Server
Manages connections to the repository and handles metadata transactions.
Integration Service
Executes workflows and sessions. It performs extraction, transformation, and loading operations during runtime.
Repository Service
Enables client tools to read and write metadata to the repository.
PowerCenter Client Tools
Includes:
- Designer – used to create mappings and transformations.
- Workflow Manager – used to create and manage sessions and workflows.
- Workflow Monitor – used to track workflow execution and performance.
- Repository Manager – used to manage repository objects.
Domain & Node Architecture
The domain is the administrative unit; nodes are logical servers within it.

Together, these components allow users to design, execute, monitor, and manage complete ETL processes.

4. Define a repository in Informatica.

The Informatica repository is a centralized metadata storage location where all ETL-related objects and configurations are stored. It is typically hosted on a relational database such as Oracle, SQL Server, or DB2.

The repository stores:

Mappings and mapplets
Sessions and workflows
Transformation logic
Source and target definitions
User permissions and folder structures
Version history and deployment metadata

The repository is crucial because PowerCenter is a metadata-driven system, meaning all ETL operations rely on metadata definitions rather than hard-coded logic.

Repositories can be:

Global repository – for sharing common objects across the organization
Local repository – for project-specific development
Versioned repository – supports check-in/check-out, version control, and multi-user development

The repository ensures consistency, reusability, and centralized control for data integration processes.

5. What is a mapping in Informatica?

A mapping is a visual representation of how data moves from sources to targets along with the transformation rules applied in between. It serves as the core ETL logic in Informatica PowerCenter.

A mapping includes:

Source definitions
Target definitions
Transformations (filter, lookup, joiner, router, etc.)
Connections between transformations
Business rules and expressions

When a mapping is executed, the Integration Service reads the source data, applies the logic defined in the mapping, and loads the final processed data to the target.

Mappings enable:

Rule-based data processing
Reusable logic development
Complex transformation flows
Handling multiple source and target systems

Mappings form the heart of all data transformation workflows in Informatica.

6. What is a session in Informatica?

A session is a task that executes a mapping. It contains configuration details required to extract, transform, and load data based on the mapping logic.

A session includes:

Source and target connection details
Load properties (insert/update/delete settings)
Error handling configuration
Partitioning and performance settings
Pre-session and post-session commands

Sessions run under the control of the Integration Service, which allocates system resources, manages data movement, and logs execution statistics.

A mapping cannot run without a session, making sessions a crucial runtime object in Informatica.

7. What is a workflow?

A workflow is a container that defines the sequence and execution order of tasks such as sessions, email notifications, command tasks, event waits, and decision tasks. It orchestrates the complete ETL process.

Workflows allow you to:

Run multiple sessions in parallel or sequence
Define dependencies between tasks
Trigger conditional execution
Handle pre/post-processing activities
Schedule automated ETL jobs

The Workflow Manager is used to design workflows, while the Workflow Monitor is used to track their execution in real time.

Workflows provide a structured approach to automate and manage complex data pipelines.

8. What is a transformation?

A transformation is an object in Informatica mapping that modifies, filters, or routes data. Transformations allow developers to define business rules applied during data processing.

Transformations can:

Clean data
Aggregate data
Join datasets
Lookup reference data
Split data into multiple streams
Calculate derived values
Filter unwanted records

There are two categories:

Active transformations – change the number of rows (e.g., filter, aggregator).
Passive transformations – do not change the number of rows (e.g., expression).

Transformations are the building blocks of the ETL logic inside a mapping.

9. Explain the Source Qualifier transformation.

The Source Qualifier transformation (SQ) is an active, connected transformation automatically created for relational sources. It represents the rows that the Integration Service retrieves from the source database.

Key functions:

Acts as a bridge between source definition and the mapping pipeline.
Applies source filters, which translate into SQL WHERE clauses to reduce data volume.
Allows sorting data at the database level using ORDER BY.
Enables joining multiple relational sources using user-defined joins.
Converts SQL queries into a format compatible with the native source database.

By pushing filtering, sorting, and joining operations to the database, Source Qualifier improves ETL performance and reduces unnecessary data movement.

10. What is a lookup transformation?

A lookup transformation is used to retrieve related or reference data from a lookup table, file, or database during ETL processing. It is commonly used for data validation, dimension key lookups, deduplication, and enrichment.

Features:

Supports connected and unconnected modes.
Can use static, dynamic, or persistent cache.
Can return single or multiple columns.
Can be configured to return default values for missing records.
Allows SQL overrides to improve flexibility and performance.

Lookups are essential for operations like:

Fetching surrogate keys in data warehouses
Validating foreign key relationships
Performing reference data checks
Comparing current values with historical data

It significantly reduces database load by caching lookup data during runtime.

11. What are connected vs unconnected lookups?

A lookup transformation can be used in two distinct modes: connected and unconnected. Both retrieve reference data, but they differ in design, usage, and performance.

Connected Lookup

It is directly connected to the mapping data flow using input/output ports.
Executes for every incoming row.
Can return multiple columns as output.
Commonly used for:
- Surrogate key lookups
- Validations needing multiple fields
- Real-time transformations

Advantages:

Easy to design and debug
Can return several output fields
Supports dynamic cache for SCD Type 2 logic

Disadvantages:

Higher processing overhead when executed per row

Unconnected Lookup

It is not connected to the main pipeline.
Called manually using LKP() function inside an expression or other transformation.
Can return only one column.
Runs only when explicitly invoked.

Advantages:

Better performance when lookup needs to be executed conditionally
Cleaner mapping design

Disadvantages:

Cannot return multiple columns
Harder to debug compared to connected lookups

Summary:
Connected lookups run automatically per row and return multiple columns, whereas unconnected lookups run only when needed and return a single value.

12. What is a filter transformation used for?

A Filter transformation is an active, connected transformation used to remove unwanted rows from the data pipeline based on a condition. It works similarly to the WHERE clause in SQL.

Key characteristics:

Only rows that meet the filter condition are passed downstream.
Rows failing the condition are dropped and not processed further.
It improves performance by reducing row volume early in the pipeline.

Example use cases:

Keeping only active customers
Filtering transactions above a certain threshold
Excluding null or invalid values

Filter conditions are Boolean expressions such as:

SALARY > 50000 AND DEPT_ID = 10

Filters help ensure only relevant and clean data flows to the target.

13. Define aggregator transformation.

The Aggregator transformation is an active, connected transformation used to perform aggregate calculations such as:

SUM
AVG
COUNT
MIN / MAX
MEDIAN
FIRST / LAST

It functions similarly to SQL's GROUP BY clause.

Key features:

Supports group-by ports for summarization
Can aggregate large datasets
Uses cache memory for storing intermediate results
Supports conditional aggregations

Example use cases:

Calculating total sales per region
Finding highest salary in each department
Counting customers per city

Because it processes data group by group, the aggregator often requires sorted input to improve performance and minimize memory usage.

14. What is the difference between aggregator and expression?

Although both transformations manipulate data, they serve different purposes.

Aggregator Transformation

Performs aggregate calculations (SUM, AVG, COUNT, MIN, MAX).
Processes data group-wise.
Requires cache memory.
Is active: changes the number of rows (e.g., multiple rows become one).

Expression Transformation

Performs row-level calculations such as string manipulation, arithmetic, date conversion.
Does not perform aggregation.
Does not change the number of rows.
Is passive: all input rows are passed through.

Example:

Use aggregator to calculate total sales for each region.
Use expression to calculate a 10% discount for each row.

15. What is a joiner transformation?

A Joiner transformation allows combining data from two heterogeneous sources (e.g., flat files, relational tables) that cannot be joined using Source Qualifier.

It is an active, connected transformation that performs joins at the mapping level.

Key capabilities:

Join data from different source types
Join based on conditions (e.g., customer_id = cust_id)
Supports multiple join types (normal, left, right, full)
Useful when joining data from sources residing on different systems

Joiner is essential when:

Sources come from different databases
One source is a flat file
You need more control over join logic

16. Explain master and detail relationships in Joiner.

In a Joiner transformation, one input pipeline is chosen as master and the other as detail. This distinction is important for performance and correct join functionality.

Master Pipeline

Rows are read first and cached.
Should ideally contain the smaller dataset to reduce cache size and improve speed.
If master pipeline rows do not match, they are discarded.

Detail Pipeline

Rows are streamed and matched against cached master records.

Choosing the correct master table is critical:

Placing a smaller dataset as master reduces memory load.
Reduces disk spill-over.
Improves join execution time significantly.

Example:

If customers dataset is small and transactions are large, CUSTOMER should be master.

17. What are different join types in Joiner transformation?

Joiner supports four types of joins, similar to SQL:

Normal Join
- Returns rows where join condition matches.
- Equivalent to SQL INNER JOIN.
Left Outer Join
- Returns all rows from the master pipeline and matched rows from detail.
Right Outer Join
- Returns all rows from the detail pipeline and matched rows from master.
Full Outer Join
- Returns all matching and non-matching rows from both pipelines.

Uses:

Normal join → common matching records
Left join → keep all master records
Full join → combine unmatched data for reporting or audits

18. What is a sequence generator?

A Sequence Generator is a passive transformation that generates unique numeric values (usually surrogate keys) for target rows.

It produces:

NEXTVAL → next number in sequence
CURRVAL → current number

You can configure:

Start value
Increment value
Cycle or no-cycle
Number of cached values

Common use cases:

Generating primary keys
Creating batch numbers
Producing unique sequence values in data warehouses

Sequence Generator ensures safe and consistent ID creation without relying on database sequences.

19. What is a router transformation?

A Router transformation is an active transformation used to route data into multiple groups based on different conditions. It works like an advanced version of the Filter transformation with multiple outputs.

Features:

Supports multiple filter conditions simultaneously
Has default group for rows not matching any condition
Helps avoid using multiple Filter transformations

Example:

Sending high-value, medium-value, and low-value transactions to separate targets
Routing data to different workflows based on region or category

Router improves clarity and reduces complexity in mapping design.

20. What is the difference between filter and router?

Both transformations filter data, but they differ significantly.

Filter Transformation

Has only one output.
Rows not matching the filter condition are dropped.
Cannot send data to multiple destinations.

Router Transformation

Has multiple output groups based on different conditions.
Rows can be split into logical groups.
Includes a default group.
More flexible and avoids multiple filter transformations.

Example:

Use Filter to keep only ACTIVE customers.
Use Router to split customers into ACTIVE, INACTIVE, and HIGH-VALUE groups.

21. What is a sorter transformation?

A Sorter transformation is an active, connected transformation used to sort data in ascending or descending order based on one or more ports. It functions similarly to an SQL ORDER BY clause but performs the sort operation inside the Informatica pipeline instead of pushing it to the database.

Key features:

Supports sorting on multiple columns
Can perform case-sensitive or case-insensitive sorting
Allows sorting in ascending or descending order
Uses cache memory and may write temporary files if the dataset is large
Can be configured as a distinct sorter to remove duplicate rows

Example use case:

Sorting sales data by region and date
Ordering employees by salary
Preparing sorted input for downstream transformations like Aggregator (to improve performance)

Because sorting is memory-intensive, Informatica recommends using database-level sorting when possible. But when sources cannot push down sorting (e.g., flat files), Sorter is essential.

22. Explain update strategy transformation.

An Update Strategy transformation is an active transformation used to define how rows should be treated when loaded into a target. It allows you to specify whether each row should be:

Inserted
Updated
Deleted
Rejected

It uses the system variable DD_INSERT, DD_UPDATE, DD_DELETE, and DD_REJECT to classify rows.

Example logic:

IF NEW_FLAG = 'Y' THEN DD_INSERT 
ELSE DD_UPDATE

Common use cases:

Managing Slowly Changing Dimensions (SCD)
Updating existing records in data warehouses
Rejecting invalid data
Applying custom business logic for load behavior

When used correctly, Update Strategy ensures that the ETL pipeline modifies target data accurately according to business rules.

23. What is a parameter file in Informatica?

A parameter file is a plain text file that defines parameters and variables used in mappings, workflows, and sessions. These values can be changed without modifying the original ETL objects.

A parameter file can include:

Mapping parameters
Mapping variables
Session parameters
Workflow variables

Format example:

[Global]
$$LoadDate=2024-06-01

[wf_LoadSales.s_SalesSession]
$DBConnection=Sales_DB
$$CountryCode=US

Benefits:

Promotes reusability and flexibility
Separates configuration from logic
Helps manage different environments (DEV, QA, PROD)
Allows runtime customization of ETL jobs

Parameter files make ETL pipelines more maintainable and scalable.

24. What is a repository server?

The Repository Server (in older versions of Informatica) or Repository Service (in modern versions) manages interactions between client tools and the repository database.

It performs:

Read/write operations to the metadata repository
Authentication and authorization
Version control and object locking
Managing multi-user access
Repository backup and recovery

The Repository Service acts as a middleware layer ensuring that designers, workflow developers, and administrators can safely modify or retrieve metadata.

This server plays a foundational role because Informatica is a metadata-driven ETL tool, and without a functioning repository server, development cannot progress.

25. What is metadata?

Metadata is "data about data." In Informatica, it refers to all design-time and runtime information that defines ETL processes.

Examples of metadata include:

Source and target definitions
Mapping design, transformations, and logic
Workflow execution rules
Session configurations
User permissions
Data lineage and impact analysis details

Metadata helps the Integration Service understand:

What data to extract
How to transform it
Where to load it

Metadata is stored in the Informatica Repository, making PowerCenter a metadata-driven architecture. It improves consistency, traceability, and maintainability of all ETL operations.

26. What is a target load plan?

A target load plan specifies the order and method in which Informatica loads data into target tables when multiple targets exist in a mapping.

It governs:

Task execution sequence
Dependencies between targets
Performance and transactional behavior
Handling parent-child table relationships

For example:

Parent tables should load before child tables (to maintain referential integrity)
Fact tables may load after dimension tables
Landing tables may load before staging tables

In mappings with multiple target tables, Informatica allows defining:

Normal load order
Transaction control order
Constraint-based load order

Choosing the correct load plan ensures accurate, consistent, and efficient data loading.

27. Explain the concept of reusable transformations.

A reusable transformation is a transformation created once and saved in the repository for use in multiple mappings.

Benefits:

Promotes consistency and standardization
Reduces development time
Makes maintenance easier (one change reflects everywhere it is used)

Examples:

A reusable lookup for customer dimension key retrieval
A reusable expression for data cleansing logic
A reusable filter for active records

Reusable transformations are stored at the folder level and can be dragged into any mapping.

Difference from non-reusable transformation:

Non-reusable exists only within one mapping
Reusable can be shared across several mappings

Reusable objects improve modularity and reusability in ETL design.

28. What is a shortcut?

A shortcut in Informatica is a reference or pointer to an existing object in another folder. The actual object remains in the shared or global folder, but shortcuts allow access without duplication.

Shortcuts can be created for:

Sources
Targets
Transformations
Mapplets
Mappings

Benefits:

Avoids duplicating metadata
Ensures consistent use of shared objects
Simplifies maintenance—changes to the original object propagate automatically
Promotes cross-team development

Shortcuts support efficient multi-project architecture where objects are centrally managed.

29. What is a mapplet?

A mapplet is a reusable object that contains a set of transformations grouped together to perform a specific logic. It allows developers to encapsulate complex transformation logic into a single reusable component.

Uses:

Data cleansing routines
Standard validation logic
Reusable calculation modules
Dimension lookup logic

Characteristics:

Contains multiple transformations but no target definition
Can be invoked in multiple mappings
Supports both active and passive transformations

Mapplets help reduce complexity and increase productivity by modularizing commonly used ETL logic.

30. What is a domain in Informatica?

A domain is the highest-level administrative entity in Informatica. It is the logical container that manages all services and nodes in the Informatica platform.

A domain includes:

Nodes – physical servers that run services
Service Manager – manages all domain-level operations
Application services such as:
- Repository Service
- Integration Service
- Reporting Service
- SAP BW Service

Functions of a domain:

Authentication and security
Service configuration and monitoring
Load balancing and failover
Resource management
Centralized administration

Domains form the backbone of Informatica's distributed, scalable, enterprise-grade architecture.

31. What is a node?

A node in Informatica is a logical representation of a physical server where Informatica services run. It is part of the Informatica domain architecture and forms the execution environment for application services.

There are two types of nodes:

Gateway Node
- Acts as the entry point to the domain.
- Handles client requests, authentication, and routing.
- Can host application services as well.
Worker Node
- Primarily responsible for running application services such as:
  - Integration Service
  - Repository Service
  - Reporting Service
- Used for load balancing and scaling.

Key functions of a node:

Executes ETL workloads
Hosts services in high-availability clusters
Manages service failover
Supports distributed execution and scalability

Nodes make Informatica a distributed and high-performance ETL platform capable of handling enterprise-level workloads.

32. What is a service manager?

The Service Manager is a core component of the Informatica domain responsible for managing and controlling all domain services. It runs inside the domain and ensures that all administrative and operational tasks are handled efficiently.

Key responsibilities of the Service Manager:

Authentication & Authorization:
Validates users through domain security policies.
Service Lifecycle Management:
Starts, stops, restarts, and monitors all application services.
Heartbeat & Health Monitoring:
Continuously checks the status of nodes and services.
Metadata and Configuration Management:
Stores configuration details of the domain, nodes, and services.
Load Balancing & Failover:
Distributes workloads across multiple nodes and ensures high availability.

Simply put, the Service Manager functions like a brain of the domain, coordinating all communication, service orchestration, and administrative tasks.

33. What is a folder in Informatica?

A folder in Informatica is a logical container used to organize and manage ETL objects within the repository. It groups related metadata objects for better project structure and security.

Folders can contain:

Mappings
Mapplets
Sessions
Workflows
Transformations
Source/target definitions

Key benefits of folders:

Allow team-based development
Maintain project boundaries
Support permissions and access control
Facilitate object reuse across projects
Enable version control for specific teams

Folders help maintain a clean, modular, and secure repository structure, allowing multiple development teams to work independently.

34. What is the difference between static and dynamic cache in Lookup?

Lookups use cache memory to speed up reference data retrieval. Informatica supports static and dynamic lookup caches.

Static Cache

Cache is created once at the beginning of the session.
Contents do not change as rows are processed.
Ideal for lookups on stable reference data (e.g., product list).
Faster performance because cache is not updated.
Cannot detect new entries in the target table during load.

Dynamic Cache

Cache can be updated on the fly as rows are processed.
Used in combination with Update Strategy and Lookup for SCD Type 2 or upsert logic.
Supports insert, update, and delete operations on cache.
Helps maintain consistency between target and lookup data in real time.

Example use case:
Dynamic cache is required when implementing:

Customer dimension table updates
Real-time record matching
Incremental data warehouse loads

Static cache is used when lookup data does not change during the session.

35. Explain data cleansing.

Data cleansing is the process of identifying, correcting, or removing inaccurate, incomplete, inconsistent, or irrelevant data. It ensures data quality before loading into target systems.

Common cleansing operations:

Removing duplicates
Fixing invalid formats (phone numbers, dates)
Standardizing text (upper/lowercase)
Replacing null values
Correcting spelling or missing values
Validating reference data (e.g., valid country codes)

Tools used in Informatica for cleansing:

Expression transformation
Lookup validation
Router for conditional cleansing
Filter for eliminating bad data
Aggregator for deduplication
Informatica Data Quality (IDQ) for advanced rules

Clean data ensures accurate analytics, consistent reporting, and reliable decision-making.

36. What is a database connection in Informatica?

A database connection in Informatica defines how the Integration Service connects to relational databases or other data stores during ETL operations.

Connections include details such as:

Database type (Oracle, SQL Server, MySQL, DB2, etc.)
Username and password
Hostname and port number
Service name or database name
Connection pooling settings

Types of connections:

Relational connections
Application connections (Salesforce, SAP, etc.)
FTP connections
ODBC connections

These connections are reused across mappings and workflows, ensuring standardized access and reducing configuration effort.

37. What is normalized vs denormalized data?

Normalized Data

Normalization organizes data into multiple related tables to:

Reduce redundancy
Maintain data integrity
Avoid anomalies

Characteristics:

Data is spread across many tables
Uses foreign keys to maintain relationships
Ideal for OLTP (transactional) systems

Example:
Separate tables for CUSTOMER, ADDRESS, and ORDERS.

Denormalized Data

Denormalization combines tables to reduce joins and improve query performance.

Characteristics:

More redundancy
Faster read performance
Ideal for OLAP (data warehousing)

Example:
A single fact table with customer, product, and region attributes embedded.

Summary:
Normalize for accuracy and integrity.
Denormalize for reporting and speed.

38. What is a flat file source?

A flat file source is a type of input source that contains raw data in plain text format. Informatica can read flat files such as:

CSV (Comma-separated values)
TXT (Tab or pipe-delimited)
Fixed-width files
XML or JSON (with special parsing)

Flat file sources are commonly used because:

They are easy to generate and transfer
They are lightweight and portable
Many legacy systems export data as flat files

Informatica Designer provides Flat File Wizard to define the structure, delimiters, and data types for the file.

These sources are common in ETL pipelines involving data exchange between external systems, vendors, or legacy applications.

39. What is pushdown optimization?

Pushdown Optimization (PDO) is a performance technique where Informatica pushes transformation logic to the underlying database instead of processing it in the Integration Service.

Three levels of pushdown:

Source-side pushdown
- Filters, joins, and calculations push into source SQL.
Target-side pushdown
- Insert, update, delete operations executed at target DB.
Full pushdown
- Majority of the mapping logic executed inside the database.

Benefits:

Reduces data movement
Utilizes the DB engine’s processing power
Improves throughput
Minimizes ETL server load

Pushdown is ideal when working with large volumes of structured data where databases are well-optimized.

40. What are bad records and how does Informatica handle them?

Bad records are rows that fail to meet data quality rules or cannot be loaded due to errors. Examples include:

Invalid data types
Constraint violations (PK/FK issues)
Missing mandatory values
Conversion errors (string to date)
Lookup failures
Business rule failures

Informatica handles bad records using multiple mechanisms:

Session Log & Error Log Files
- Automatically records reasons for failure.
Reject File (.bad file)
- Captures rejected rows for further analysis.
Error Handling Transformations
- Router and Expression for custom validation
- Lookup for defaulting missing values
Exception Tables
- Target tables specifically created for storing failed rows.
Error Ports (for update strategy or mapping logic failures)

Handling bad records ensures accuracy, prevents target corruption, and supports debugging and cleansing workflows.

Intermediate (Q&A)

‍1. Explain the architecture of Informatica PowerCenter.

The architecture of Informatica PowerCenter follows a client-server, scalable, and metadata-driven model designed for enterprise data integration. It consists of several major components that work together to design, manage, execute, and monitor ETL processes.

Key Components:

1. PowerCenter Domain

The top-level administrative unit.
Contains nodes, grids, and application services.
Manages security, configuration, licensing, and load balancing.

2. Nodes

Represent physical servers.
Can host integration services, repository services, and other components.

3. PowerCenter Repository

A relational database storing all metadata:
- Mappings
- Workflows
- Transformations
- Connections
- Version history
Used by developers and administrators.

4. Repository Service (RS)

Provides access to repository metadata.
Handles check-in/check-out, version control, and object management.

5. Integration Service (IS)

Executes ETL workflows and sessions.
Responsible for extracting data, transforming it, and loading targets.
Handles partitioning, caching, error management, and pushdown optimization.

6. Client Tools

Designer – build mappings
Workflow Manager – build sessions & workflows
Workflow Monitor – monitor ETL jobs
Repository Manager – manage objects & security

7. Metadata Management

Maintains lineage, impact analysis, and reusable objects.

Flow Summary:

Developers create mappings in Designer.
Repository Service stores metadata.
Integration Service reads metadata and executes ETL logic.
Workflow Monitor tracks runtime performance.

The architecture ensures high performance, modular development, scalability, and strong metadata management, making PowerCenter an enterprise-grade ETL platform.

2. How does the Integration Service work?

The Integration Service (IS) is the execution engine of Informatica PowerCenter. It runs workflows, manages sessions, and performs the complete ETL process.

Working Mechanism:

Reads Metadata from Repository
- Mapping logic
- Transformations
- Session configurations
- Connections
Establishes Connections
- Connects to source and target systems.
- Uses relational, flat file, or application connections.
Extracts Data
- Reads data based on Source Qualifier or other sources.
- Applies pushdown optimization when configured.
Processes Transformations
- Executes row-level transformations (Expression, Lookup, etc.)
- Uses caches for Lookup, Aggregator, Joiner, and other operations.
Manages Caches & Buffers
- Maintains dynamic/static cache
- Handles memory allocation and partitioning
Loads Data into Target
- Inserts, updates, deletes, or merges based on session properties
- Manages commit intervals and transactions
Error Handling & Logging
- Creates session logs
- Captures rejected rows
- Manages failover in HA configurations

The Integration Service is designed for scalability, high performance, parallel processing, and reliability.

3. Explain partitioning in Informatica.

Partitioning is a performance optimization technique that allows the Integration Service to divide data into multiple segments and process them in parallel, increasing throughput.

Key Concepts:

A mapping or session can be split into multiple pipelines.
Each partition processes a subset of the data simultaneously.
Reduces ETL time significantly for large datasets.

Types of Partitioning:

Database Partitioning
- Uses the database's internal partitioning (Oracle partitioned tables).
Pipeline Partitioning
- Splits the flow at transformation level.
Partition Points
- Defined where data is divided or merged (e.g., SQ, Aggregator).
Types of Partition Algorithms:
- Round-Robin – evenly distributes rows
- Hash – partitions based on key
- Key Range – partitions based on value ranges
- Pass-through – no actual partitioning

Benefits:

Faster ETL processing
Better utilization of CPU cores
Higher scalability

Partitioning is essential for big data volumes, large fact table loads, and high-performance ETL pipelines.

4. What is session partitioning and when is it used?

Session partitioning involves configuring multiple partitions at the session level so the Integration Service can run parallel threads and improve performance.

How it works:

The session divides the source data into partitions.
Each partition runs the mapping logic independently.
Results are combined before loading into targets.

When it is used:

Large-scale daily or hourly loads
Heavy transformations (Aggregator, Lookup)
Multiple CPUs available for parallelism
When the source supports partitioning (RDBMS, partitioned files)

Advantages:

Reduces session runtime
Increases pipeline throughput
Efficient resource usage

Caution:

Not recommended when:

Order of data must be preserved
Transformations like Rank or Normalizer cannot be partitioned

Session partitioning is a key performance tuning feature used in enterprise ETL systems.

5. What are types of caches used by Lookup transformation?

Lookup transformation uses cache memory to store lookup data for fast access during ETL processing. The following types of caches are used:

1. Static Cache

Created once at the start of the session
Not modified during processing
Default for lookup
Best for stable reference data

2. Dynamic Cache

Cache updated as new data is processed
Used in upsert scenarios and SCD implementations
Supports insert/update logic for cache entries

3. Persistent Cache

Cache stored on disk after session completion
Reused in future sessions
Avoids rebuilding large caches, improving performance

4. Shared Cache

Shared across multiple lookups in different mappings
Useful for consistent master data reference

5. Recache Option

Forces the cache to rebuild even if persistent cache exists

Lookup caching is essential for reducing database hits and improving ETL performance.

6. Explain persistent cache.

A persistent cache allows Informatica to store lookup cache data on disk after a session completes so it can be reused in subsequent ETL runs.

Why use persistent cache?

Reduces database load by avoiding repeated lookup table queries
Improves session performance
Useful when lookup source data changes infrequently
Ideal for reference or dimension tables

How it works:

First session run → cache created & stored on disk.
Subsequent runs → Informatica loads cache directly from disk.
Only differences (if any) may need updating.

Where it's used:

Large lookup tables
Slow database connections
Situations requiring consistent lookup results across sessions

Persistent cache is a powerful optimization tool for improving lookup performance.

7. What is a workflow monitor?

Workflow Monitor is a client tool used to view, track, analyze, and troubleshoot workflows and sessions executed in PowerCenter.

Key Capabilities:

Shows execution status (Succeeded, Failed, Running, Queued)
Provides Gantt chart of task timelines
Displays session logs, error logs, and performance details
Allows stopping, restarting, or aborting workflows
Shows performance metrics:
- Rows processed
- Throughput
- Cache usage
- Transformation times

Benefits:

Helps developers debug ETL failures
Provides operational visibility
Useful for monitoring long-running data loads
Supports real-time performance tuning

Workflow Monitor is essential for ETL administrators and developers.

8. What is a command task?

A Command Task allows you to run operating system-level commands or scripts within a workflow.

Examples:

Copy or move files
Archive logs
Trigger shell or batch scripts
Call third-party utilities
Create FTP/SFTP operations
Clean temporary files

Usage Scenarios:

Preparing files before ETL starts
Running post-load cleanup scripts
Validating file arrival
Automating system tasks

Command Task improves automation and integration with external systems.

9. What is an event wait task?

An Event Wait Task pauses the workflow until a specified event occurs. Informatica supports:

File-based events
Custom-defined events

How it works:

Workflow enters waiting state
Checks for event (e.g., file arrival, trigger signal)
Resumes when event is detected
Times out if event does not occur within the set interval

Use Cases:

Waiting for an upstream system to generate a file
Synchronizing workflows between multiple systems
Handling dependencies in batch processing

Event Wait Tasks ensure proper sequencing in data pipelines.

10. What is a pre-session and post-session command?

Pre-session and post-session commands allow execution of external scripts before or after a session runs.

Pre-session Command

Executed before the session starts.

Use Cases:

Clean or initialize staging tables
Create backup of target tables
Validate file existence
Set environment variables
Trigger database stored procedures

Post-session Command

Executed after the session completes.

Use Cases:

Archive processed files
Send notifications
Move bad files to an error folder
Execute cleanup routines
Run database indexing or maintenance scripts

Benefits:

Automates operational tasks
Integrates ETL with external systems
Reduces manual intervention

Pre/post session commands provide flexibility and automation for end-to-end ETL workflows.

11. Explain rank transformation with example use cases.

The Rank transformation is an active, connected transformation used to select the top or bottom records based on a specific measure or port. It works similarly to SQL’s ROW_NUMBER() or TOP N functionality.

Key Capabilities:

Can return top N or bottom N rows.
Allows ranking within groups using the Group By option.
Supports ties handling, meaning multiple rows can share the same rank.

How It Works:

Identify a rank port (e.g., salary, sales).
Set the number of ranks (e.g., top 5 employees).
Optional: enable Group By (e.g., top 3 employees per department).

Example Use Cases:

Top 5 customers by revenue.
Top 10 selling products per month.
Employees with highest salaries.
Lowest-performing stores for analysis.
Top-3 students in each class.

Rank transformation is widely used in reporting, analytics, and business intelligence ETL scenarios.

12. When do you use union transformation?

A Union transformation is a passive, connected transformation that combines data from multiple input pipelines into a single output pipeline. It works similarly to SQL’s UNION ALL (not UNION).

When to Use:

When integrating homogeneous data from multiple sources with the same structure.
When consolidating data from:
- Multiple regions
- Multiple flat files
- Multiple partitioned datasets
- Historical + current data streams

Characteristics:

Does not remove duplicates (like UNION ALL).
All input groups must have the same metadata (same port names and data types).
Inputs can come from different sources or transformations.

Example:

Merging the following:

Sales_Q1
Sales_Q2
Sales_Q3
Sales_Q4

→ into a single unified dataset for annual reporting.

Union transformation is essential for data consolidation and ETL pipelines requiring dataset merging.

13. What is the difference between mapping variable and mapping parameter?

Both are metadata objects that store values, but they differ significantly in purpose and behavior.

Mapping Parameter

Value is constant during the entire session run.
Must be initialized using parameter file or default value.
Used for static filtering, database connections, date values, etc.
Cannot change within a session.

Example:
$$LoadDate = '2025-01-01'

Good for filtering data by date or environment-specific settings.

Mapping Variable

Value can change during session execution.
Stores last-run values for incremental loads.
Automatically saved in the repository.
Supports variable functions like:
- SETVARIABLE()
- SETPREVIOUSVARIABLE()
- GETVARIABLE()

Example:
$$MaxOrderID updates each session to track new rows.

Mapping variables are essential for incremental loads, watermarks, and capturing dynamic states.

14. Explain incremental aggregation.

Incremental Aggregation improves performance by processing only newly added or changed data instead of recalculating aggregates from scratch.

How It Works:

The first session run loads all data into the aggregator cache.
The next run processes only new incoming records.
Integration Service updates cached historical aggregate values.
Updated cache is used for future calculations.

Benefits:

Dramatically reduces processing time.
Ideal for large fact tables.
Reduces memory and CPU usage.

Example Use Case:

Daily sales aggregation:

Day 1: Load all data → compute total revenue.
Day 2: Only process new sales records → update totals.

Incremental aggregation is commonly used in incremental ETL pipelines for data warehouses.

15. What are mapplet restrictions?

Mapplets are reusable logic containers, but they have certain limitations.

Major Restrictions:

Cannot contain the following:
- Source definitions
- Target definitions
- Normalizer transformation
- XML transformations
- COBOL sources
- Pre or Post SQL commands
- Transaction Control transformation
Cannot contain transformations that generate multiple pipelines
(like Joiner with heterogeneous inputs).
Cannot use mapplets inside other mapplets (no nesting).
Active transformations inside a mapplet may restrict usage of mapplet in certain mappings.
Input/output ports must be properly defined to avoid conflicts.

Mapplets are great for reusability but are not suitable for every type of transformation flow.

16. How do you handle Slowly Changing Dimensions (SCD Type 1, 2, 3)?

SCDs manage how historical data is stored in dimension tables.

SCD Type 1 — Overwrite History

Updates existing records; no historical data preserved.
Use Update Strategy with DD_UPDATE.
Typically used for corrections like name change typo.

Example:
Change customer address → overwrite old address.

SCD Type 2 — Preserve History

Creates new record for every change.
Maintains history with:
- Surrogate key
- Effective start/end dates
- Current row flag

ETL Steps:

Lookup customer on natural key.
If data changed:
- Expire old record
- Insert new record
Maintain dynamic lookup cache for high performance.

Used when historical correctness is required (e.g., sales reporting).

SCD Type 3 — Partial History

Stores current and previous values only.
Adds new columns such as:
- PREV_ADDRESS
- CURRENT_ADDRESS

Used when limited history is sufficient (e.g., tracking last job role).

17. Explain dynamic lookup cache in detail.

A dynamic lookup cache is a lookup cache that updates itself as data flows through the pipeline, making the lookup and target table stay synchronized.

Key Features:

Supports Insert, Update, Delete operations on cache.
Avoids re-querying the target table repeatedly.
Essential in SCD Type 2 and upsert logic.

How It Works:

Cache is built initially.
For each input row:
1. Lookup performed
2. If row not found → insert new row
3. Cache is updated with new key
4. If found → update logic applied
Cache always reflects the latest target state.

Benefits:

High performance
Eliminates stale cache issues
Supports real-time warehouse loads

Dynamic lookup cache is one of the most powerful features in Informatica for modern ETL strategies.

18. What is pushdown optimization and what are its levels?

Pushdown Optimization (PDO) pushes transformation logic to the underlying database to improve ETL performance.

Levels of Pushdown:

1. Source-Side Pushdown

Pushes SQL logic into the source database.
SQ filters, joins, and expressions translated into SQL.
Reduces data movement from source → ETL server.

2. Target-Side Pushdown

Pushes INSERT, UPDATE, DELETE logic into the target database.
Uses SQL override with merge/update operations.
Useful for large fact table loads.

3. Full Pushdown (End-to-End)

Nearly all mapping logic is converted into SQL and executed inside the database.
Integration Service orchestrates execution but database executes transformations.

Benefits:

Minimizes network latency
Reduces CPU load on Integration Service
Uses DB engine’s power (indexes, optimizers, parallel execution)

Pushdown is ideal when working with large RDBMS sources and targets.

19. What causes session failure?

A session may fail due to multiple reasons related to configuration, connectivity, transformation logic, or data issues.

Common Causes:

Invalid source or target connections
Lookup failures
Insufficient cache memory
SQL syntax errors
File not found or file permissions
Network connectivity failures
Transformation errors (conversion, overflow, null mismatch)
Database constraints (PK/FK/unique violations)
Target database full or unreachable
Incorrect parameter file path

Logs to Check:

Session log
Workflow log
Error log (.bad files)
Mapping logs (debugger)

Sessions fail only when critical issues prevent successful execution.

20. How do you debug mappings in Informatica?

Debugging is essential for identifying data flow issues, transformation errors, or incorrect logic.

Steps to Debug a Mapping:

1. Use the Mapping Debugger

Set breakpoints
Step through row-by-row execution
Inspect port values
Identify incorrect transformation logic

2. Check Session Logs

Review error messages
Identify failing transformation
Inspect SQL queries

3. Validate Mapping

Use Designer → Mapping → Validate to detect structural errors.

4. Test with Sample Data

Run subsets of data to isolate issues.
Use simplified sources or mock data.

5. Enable Verbose Data Logging

Captures transformation-level details.
Helps trace incorrect calculations.

6. Test Each Transformation Independently

Verify Lookup queries
Validate expressions
Test joins and filters

7. Use Reject Files

Analyze rejected rows and error messages.

Outcome:

Debugging ensures correctness, identifies performance issues, and validates business logic before production deployment.

21. What are workflow variables?

Workflow variables are dynamic values defined within a workflow that can change during execution and influence workflow behavior. They enable conditional logic, runtime decision-making, and state management in workflows.

Key Characteristics:

Workflow variables begin with **∗∗(e.g.,‘** (e.g., `∗∗(e.g.,‘FileExists, $$LoadFlag`).
They store values that can be referenced in:
- Decision tasks
- Event wait tasks
- Command tasks
- E-mail notifications
Variables can update at runtime using assignment tasks or task output values.

Types of Workflow Variables:

Boolean Variables – True/False logic.
String Variables – Hold dynamic text values.
Numeric Variables – Used for counters, totals, or flags.
Datetime Variables – Store timestamps for comparison.

Use Cases:

Checking if a file exists before running a session
Running different branches of workflow based on conditions
Tracking session run counts
Implementing restartability and recovery logic
Triggering downstream tasks only when required

Workflow variables make workflows intelligent and adaptable to runtime conditions.

22. How do you schedule workflows?

Workflows in Informatica can be scheduled to run automatically at specific intervals or based on events.

Methods to Schedule Workflows:

1. Using Workflow Manager Scheduler

The built-in scheduler allows you to configure:
- Daily, weekly, monthly schedules
- Custom calendar events
- Start/End times
- Repetition intervals

Steps:

Open Workflow Manager
Go to Workflow → Edit
Select Schedule tab
Set schedule and save

2. Using External Schedulers

Organizations often use enterprise schedulers like:

Control-M
Autosys
Crontab
Tidal
UC4

These tools invoke workflows using pmcmd command-line utility.

3. Event-Based Scheduling

Workflows can be triggered when:

A file arrives
A server event is raised
Another workflow completes

Using Event Wait Task, workflows run based on external conditions.

4. Command-Line Scheduling (pmcmd)

Workflows can be triggered via shell or batch scripts using:

pmcmd startworkflow -sv IntegrationService -d Domain -u user -p pass wf_Name

Scheduling ensures timely data loads and automates end-to-end data integration pipelines.

23. How does Informatica handle deadlock situations?

Deadlocks occur when two or more processes compete for the same resources in a way that prevents progress.

In Informatica, deadlocks typically happen due to:

Multiple sessions loading the same target table
Contention for indexes or constraints
Long-running queries
Missing primary keys or incorrect update strategies

How Informatica Handles Deadlocks:

1. Automatic Retry Mechanism

When the database returns a deadlock error:

Integration Service retries the operation (default: 3 times)

2. Commit Intervals

Smaller commit intervals reduce transaction size and minimize lock durations.

3. Target Load Ordering

Ensures dependent tables are loaded in the correct sequence.

4. Constraint-Based Loading

Loads parent tables before child tables, reducing lock conflicts.

5. Index Management

Dropping indexes before load
Rebuilding them afterward
Reducing unnecessary constraints

6. Using Partitioning

Partitions help parallelize loads to reduce contention.

7. Database Tuning

Increasing lock wait time
Optimizing queries
Improving database configuration

Deadlocks are handled through retries, optimized workflows, and database-level strategies.

24. Explain differences between sorter and aggregator.

FeatureSorter TransformationAggregator TransformationFunctionSorts dataAggregates data (SUM, AVG, COUNT)TypeActiveActiveInput OrderSorted outputMay require sorted input for performanceCache UsageUses sort cacheUses aggregation cacheRow CountDoes not reduce row count unless distinct option usedUsually reduces rows (grouping)OperationsSort, distinctGroup By, aggregate functionsPerformanceFast for small datasetsCPU/memory intensive for large datasets

In Simple Terms:

Sorter organizes data.
Aggregator summarizes data.

Sometimes Sorter is used before Aggregator to reduce cache usage and improve performance.

25. What is a recovery strategy in Informatica?

A recovery strategy ensures that workflows or sessions can resume correctly after failure without restarting from the beginning.

Key Recovery Options:

1. Restart Workflow / Session

Informatica can restart:

From the failed task
From the beginning
From the last successful checkpoint

2. Recovery Strategy Settings

In Workflow Manager:

Fail Task and Continue Workflow
Restart Task
Stop on Error

3. Session-Level Recovery

Session settings allow:

Perform recovery – load resumes from last committed point
Recover from last checkpoint – minimizes reprocessing
Treat source rows as new – forces reload

4. Checkpointing

Checkpoints include:

Commit points
Cache states
Variable values

5. Using Workflow Variables

Variables help detect partial loads and re-run only required segments.

Recovery strategies improve system robustness and avoid reloading large volumes unnecessarily.

26. How do you improve performance in Informatica sessions?

Performance tuning involves optimizing mappings, sessions, databases, and system resources.

Techniques to Improve Performance:

1. Pushdown Optimization

Push transformation logic to database to reduce ETL workload.

2. Partitioning

Enables parallel processing using:

Hash
Key range
Round robin

3. Optimize Lookups

Use dynamic cache
Use persistent cache
Reduce number of lookup columns
Use indexes on lookup condition columns

4. Optimize Aggregator

Pre-sort input
Use sorted input option
Reduce group-by fields

5. Eliminate Unnecessary Transformations

Avoid:

Excess filters
Transformations that slow down pipeline
Data type conversions

6. SQL Overrides

Improve Source Qualifier performance with:

Filters
Joins
Optimized SQL

7. Increase Buffer and DTM Memory

Configurable in session properties.

8. Target Optimization

Use bulk loads
Drop indexes before load and recreate afterward

Performance tuning is an iterative process involving mapping, database, and system-level optimizations.

27. What is a relational lookup vs flat file lookup?

Relational Lookup

Queries data from a database table.
Supports SQL overrides, joins, conditions.
Can use dynamic cache and persistent cache.
Faster when indexes are used.

Best for:

Dimension table lookups
Reference/master data stored in databases

Flat File Lookup

Lookup source is a flat file.
Does not support SQL overrides.
Entire file must be cached.
Limited to static or persistent cache (no dynamic cache).

Best for:

File-based reference data
Code lists provided by external vendors

Key Differences:

FeatureRelational LookupFlat File LookupSourceDatabaseFlat FileSQL OverrideYesNoDynamic CacheYesNoPerformanceFaster with indexingSlower for large filesFlexibilityHighLimited

28. What is key range partitioning?

Key range partitioning is a partitioning strategy where data is divided based on specific ranges of key values.

How It Works:

Define partitions by ranges:
- Partition 1: ID 1–1000
- Partition 2: ID 1001–2000
- Partition 3: ID 2001–3000

Each partition processes rows falling in its assigned range.

Use Cases:

Customer ID ranges
Date ranges (e.g., monthly partitions)
Transaction ID ranges
Order number ranges

Benefits:

Minimizes data skew
Provides predictable distribution
Ideal for sequential keys

Key range partitioning is effective for structured datasets with numeric or date-based keys.

29. What is hash partitioning?

Hash partitioning distributes rows across partitions based on a hash function applied to one or more key fields.

How It Works:

Determine a partition key (e.g., Customer_ID).
The hash function ensures even distribution across partitions.

Benefits:

Avoids data skew
Suitable for random or non-sequential values
Good for parallel processing of large datasets

Use Cases:

Customer-based partitioning
Product or transaction keys
Scenarios where values are not naturally grouped

Hash partitioning helps achieve balanced workloads in large-scale ETL jobs.

30. Explain pipeline partitioning.

Pipeline partitioning improves performance by splitting the data flow into parallel pipelines, where each pipeline executes the same mapping logic independently.

How It Works:

Source data is divided into partitions.
Each partition flows through the same set of transformations.
All partitions run in parallel.
Output eventually merges at the target.

Benefits:

Major increase in throughput
Efficient use of multi-core CPUs
Reduces processing bottlenecks

Restrictions:

Some transformations do not support partitioning (Rank, Normalizer).
Requires careful management of cache and memory.

Pipeline partitioning is one of the most powerful features for optimizing large ETL pipelines.

31. What is source row-level testing?

Source row-level testing is a validation technique used to ensure that the ETL process correctly extracts, transforms, and loads every individual row from the source to the target without data loss or corruption.

Purpose:

To verify correctness of mappings at a granular level.
To ensure transformation logic (filters, joins, expressions) is applied correctly.
To confirm that no rows are unintentionally dropped or duplicated.

How It Works:

Compare row counts between source and target.
Validate sample row data before and after transformations.
Trace incorrect rows using:
- Verbose data logging
- Debugger tool
- Error/reject files
Check SQL overrides for correct filter logic.

Use Cases:

Initial ETL pipeline development
Change request validations
Testing complex mappings (joins, aggregators, routers)
Ensuring data quality in production releases

Source row-level testing is crucial for ensuring mapping accuracy and data reliability.

32. Explain bottleneck identification in Informatica.

A bottleneck is any part of the ETL pipeline that slows down overall session performance. Identifying bottlenecks is essential for tuning and optimizing performance.

Common Bottleneck Areas:

1. Source Bottleneck

Occurs when:

Source query is slow
Table lacks indexes
Network latency exists
Large volumes are extracted

Fix:

Optimize SQL
Apply partitioning
Pushdown filtering to database

2. Target Bottleneck

Occurs when:

Target indexes slow down inserts
Constraints cause locking
Database is under heavy load

Fix:

Drop and recreate indexes
Use bulk load
Increase commit intervals

3. Lookup Bottleneck

Occurs when:

Lookup table is large
Cache is insufficient
No indexes on lookup keys

Fix:

Use persistent or dynamic cache
Reduce lookup ports

4. Transformation Bottleneck

Frequently caused by:

Aggregator (sort + grouping)
Rank (sort required)
Sorter (memory intensive)
Joiner (two pipelines processed)

Fix:

Use sorted input where applicable
Push join logic to database
Reduce group-by fields

Tools for Bottleneck Detection:

Session logs
Performance detail reports
Workflow Monitor throughput metrics

Effective bottleneck identification ensures high-performance ETL jobs.

33. What is metadata manager in Informatica?

Metadata Manager is a component of Informatica that provides metadata-centric visibility across the entire data environment.

Key Features:

Data Lineage:
Shows source-to-target flow, including mapping rules.
Impact Analysis:
Displays downstream impact of modifying a column or table.
Metadata Integration:
Imports metadata from databases, Hadoop, ETL tools, and BI tools.
Metadata Repository:
Stores metadata such as schemas, mappings, transformations, workflow dependencies.
Glossary & Business Dictionary:
Helps define business terms and link them to technical assets.

Benefits:

Enables governance and compliance (GDPR, SOX).
Improves data quality by offering transparency.
Helps developers understand existing pipelines.
Useful for auditors and data governance teams.

Metadata Manager enables organizations to manage data assets efficiently and trace data movement end-to-end.

34. Explain indirect file loading.

Indirect file loading allows Informatica to load data from multiple flat files without explicitly listing each file in the session.

Instead of pointing to a single file, Informatica reads a file list that contains names of all data files.

How It Works:

Create a master file (e.g., filelist.txt) that contains paths of all data files:

/data/sales_jan.csv
/data/sales_feb.csv
/data/sales_mar.csv

In the session properties → Source → choose Indirect option.
Informatica loads each file listed in the file list sequentially.

Use Cases:

Daily batch files delivered in folders
Multiple vendor files with similar formats
Automating multi-file ingestion tasks

Advantages:

Reduces manual configuration
Automatically handles multiple input files
Simplifies maintenance

Indirect loading is essential when dealing with large-scale file-based ETL systems.

35. What is the role of Integration Service in handling transactions?

The Integration Service (IS) manages transaction control for ETL sessions to ensure data consistency and integrity.

How Integration Service Handles Transactions:

1. Transaction Boundaries

Defined by:

Commit interval
Transaction Control transformation
Target-based commit policy

2. Uses Commit and Rollback Operations

Commit: Saves data permanently to the target.
Rollback: Reverts data when errors occur.

3. Supports Row-level Error Handling

Reject files
Error tables
Session error thresholds

4. Manages Bulk vs Normal Loads

Bulk load disables logging
Normal load respects constraints and triggers

5. Maintains Transaction State

For recovery:

Tracks committed offsets
Supports session restartability
Logs transaction history

Integration Service ensures reliable, atomic, consistent, isolated, and durable (ACID) data loads.

36. Explain the concept of surrogate keys.

A surrogate key is an artificial and system-generated primary key used in data warehouse dimension tables.

Characteristics:

Has no business meaning
Usually sequential numbers (e.g., 1, 2, 3...)
Not exposed to the business users
Never changes
Independent of source system keys

Why Surrogate Keys Are Required:

Natural keys may change over time — surrogate keys remain stable.
They support SCD Type 2 by distinguishing historical versions.
Ensure uniqueness across multiple source systems.
Improve join performance due to small integer values.

Implementation in Informatica:

Generated using:
- Sequence Generator transformation
- Database sequences
- Lookup on max surrogate key

Surrogate keys form the foundation of dimensional modeling in ETL.

37. What is code page compatibility?

Code page compatibility ensures that character encoding used by the source, target, and Informatica environment are compatible to prevent data corruption.

Example Code Pages:

UTF-8
ASCII
ISO-8859
Unicode (UTF-16)

Why It Matters:

If a source is UTF-8 and target is ASCII, characters like “é”, “ø”, “ñ” may fail or get corrupted.

Compatibility Rules:

Source code page must be a subset of session code page.
Session code page must be compatible with target.
Integration Service converts data as needed.

Problems Without Compatibility:

Incorrect special characters
Data truncation
Load failures
Inconsistent sorting or filtering

Code page compatibility ensures global, multi-language, and Unicode-safe ETL operations.

38. Explain schema drift handling.

Schema drift refers to unexpected changes in source schema, such as:

New columns added
Columns removed
Data types modified
Column order changed

Informatica needs to handle schema drift to avoid failures during ETL.

How Informatica Handles Schema Drift:

1. Dynamic Schemas (IICS Cloud Data Integration)

Supports dynamic mapping tasks that auto-adjust to schema changes.

2. Field Rules & Port-Level Rules

Automatically add or drop ports during mapping execution.

3. Flat File Schema Drift Support

Can process new or missing columns using dynamic port definitions.

4. Error Handling

Capture unknown fields
Log schema mismatch errors

5. Manual Schema Updates

In PowerCenter, designer requires:

Re-importing changed source
Propagating changes to transformations
Updating mapping logic

Schema drift handling is essential in modern pipelines receiving schema-flexible data such as JSON, Kafka, or cloud streams.

39. What are session properties and why are they important?

Session properties define how a mapping is executed during runtime. They control almost every aspect of ETL execution.

Key Session Property Categories:

1. Source Properties

SQL overrides
Pre/post SQL
Partitioning
Reader properties

2. Target Properties

Load type (bulk/normal)
Commit interval
Constraint-based loading
Pre/post SQL

3. Transformations Properties

Lookup caches
Aggregator sorted input
Joiner master/detail settings

4. Error Handling

Row error thresholds
Bad file locations
Log levels

5. Performance Settings

Buffer memory
DTM buffer size
Partition numbers

Session properties determine:

How data is read
How transformations behave
How data is loaded
How errors are handled
Overall performance and stability

They are critical for successful ETL execution.

40. How does Data Masking work in Informatica?

Data Masking is used to protect sensitive information during development, testing, and analytics by transforming data into a non-identifiable format while preserving usability.

How Informatica Data Masking Works:

1. Static Data Masking

Applies masking rules at rest:

Masks data stored in databases or files
Used for test or development environments

2. Dynamic Data Masking

Applies masking in real time:

Data masked during retrieval
Used in scenarios where users should not see sensitive data

Masking Techniques:

Substitution – Replace names, addresses with realistic but fake data
Shuffling – Shuffle values across rows
Blurring – Apply random noise (salary + 5%)
Encryption – Encode data
Nulling or Deletion – Remove sensitive values
Tokenization – Replace sensitive values with tokens

Use Cases:

Masking credit card numbers
Protecting PII (names, addresses, SSNs)
Anonymizing healthcare data
Complying with GDPR, HIPAA, PCI-DSS

Data masking ensures security, privacy, and regulatory compliance in data handling.

WeCP Team

Team @WeCP

WeCP is a leading talent assessment platform that helps companies streamline their recruitment and L&D process by evaluating candidates' skills through tailored assessments

Check out these other Interview Questions...

Interviews, tips, guides, industry best practices, and news.

Google Analytics Interview Questions and Answers

Entity Framework Interview Questions and Answers

Data Analysis Interview Questions and Answers

MongoDB Interview Questions and Answers

Reinforcement Learning interview Questions and Answers

Blockchain Interview Questions and Answers

NLP Interview Questions and Answers

Cobol Interview Questions and Answers

Websphere Interview Questions and Answers

View all posts