Informatica Interview Questions and Answers

Find 100+ Informatica interview questions and answers to assess candidates’ skills in ETL development, mappings, workflows, data integration, and performance optimization.
By
WeCP Team

As organizations manage complex data integration, migration, and warehousing needs, recruiters must identify Informatica professionals who can build reliable, scalable, and high-performance ETL pipelines. Informatica is widely used in enterprise data warehouses, analytics platforms, and large-scale data integration projects across industries.

This resource, "100+ Informatica Interview Questions and Answers," is tailored for recruiters to simplify the evaluation process. It covers a wide range of topics—from Informatica fundamentals to advanced ETL design and optimization, including PowerCenter architecture, mappings, workflows, and performance tuning.

Whether you're hiring Informatica Developers, ETL Engineers, Data Engineers, or BI Developers, this guide enables you to assess a candidate’s:

  • Core Informatica Knowledge: PowerCenter architecture, source/target definitions, mappings, mapplets, workflows, sessions, and transformations.
  • Advanced Skills: Performance tuning, partitioning, pushdown optimization, error handling, reusable objects, and handling slowly changing dimensions (SCD).
  • Real-World Proficiency: Designing end-to-end ETL pipelines, integrating multiple data sources, ensuring data quality, and maintaining enterprise-grade data integration solutions.

For a streamlined assessment process, consider platforms like WeCP, which allow you to:

  • Create customized Informatica assessments tailored to data warehousing and enterprise integration roles.
  • Include hands-on tasks such as building mappings, debugging workflows, or optimizing ETL performance.
  • Proctor exams remotely while ensuring integrity.
  • Evaluate results with AI-driven analysis for faster, more accurate decision-making.

Save time, enhance your hiring process, and confidently hire Informatica professionals who can deliver scalable, reliable, and production-ready data integration solutions from day one.

Informatica Interview Questions

Informatica – Beginner (1–40)

  1. What is Informatica PowerCenter?
  2. What is an ETL process?
  3. What are the key components of Informatica PowerCenter?
  4. Define a repository in Informatica.
  5. What is a mapping in Informatica?
  6. What is a session in Informatica?
  7. What is a workflow?
  8. What is a transformation?
  9. Explain the Source Qualifier transformation.
  10. What is a lookup transformation?
  11. What are connected vs unconnected lookups?
  12. What is a filter transformation used for?
  13. Define aggregator transformation.
  14. What is the difference between aggregator and expression?
  15. What is a joiner transformation?
  16. Explain master and detail relationships in Joiner.
  17. What are different join types in Joiner transformation?
  18. What is a sequence generator?
  19. What is a router transformation?
  20. What is the difference between filter and router?
  21. What is a sorter transformation?
  22. Explain update strategy transformation.
  23. What is a parameter file in Informatica?
  24. What is a repository server?
  25. What is metadata?
  26. What is a target load plan?
  27. Explain the concept of reusable transformations.
  28. What is a shortcut?
  29. What is a mapplet?
  30. What is a domain in Informatica?
  31. What is a node?
  32. What is a service manager?
  33. What is a folder in Informatica?
  34. What is the difference between static and dynamic cache in Lookup?
  35. Explain data cleansing.
  36. What is a database connection in Informatica?
  37. What is normalized vs denormalized data?
  38. What is a flat file source?
  39. What is pushdown optimization?
  40. What are bad records and how does Informatica handle them?

Informatica – Intermediate (1–40)

  1. Explain the architecture of Informatica PowerCenter.
  2. How does the Integration Service work?
  3. Explain partitioning in Informatica.
  4. What is session partitioning and when is it used?
  5. What are types of caches used by Lookup transformation?
  6. Explain persistent cache.
  7. What is a workflow monitor?
  8. What is a command task?
  9. What is an event wait task?
  10. What is a pre-session and post-session command?
  11. Explain rank transformation with example use cases.
  12. When do you use union transformation?
  13. What is the difference between mapping variable and mapping parameter?
  14. Explain incremental aggregation.
  15. What are mapplet restrictions?
  16. How do you handle slowly changing dimensions (SCD Type 1, 2, 3)?
  17. Explain dynamic lookup cache in detail.
  18. What is pushdown optimization and what are its levels?
  19. What causes session failure?
  20. How do you debug mappings in Informatica?
  21. What are workflow variables?
  22. How do you schedule workflows?
  23. How does Informatica handle deadlock situations?
  24. Explain differences between sorter and aggregator.
  25. What is a recovery strategy in Informatica?
  26. How do you improve performance in Informatica sessions?
  27. What is a relational lookup vs flat file lookup?
  28. What is key range partitioning?
  29. What is hash partitioning?
  30. Explain pipeline partitioning.
  31. What is source row-level testing?
  32. Explain bottleneck identification in Informatica.
  33. What is metadata manager in Informatica?
  34. Explain indirect file loading.
  35. What is the role of Integration Service in handling transactions?
  36. Explain the concept of surrogate keys.
  37. What is code page compatibility?
  38. Explain schema drift handling.
  39. What are session properties and why are they important?
  40. How does Data Masking work in Informatica?

Informatica – Experienced (1–40)

  1. Explain PowerCenter domain architecture in enterprise environments.
  2. How do you design a high-performance ETL solution in Informatica?
  3. Explain the internal working of dynamic lookup cache.
  4. How does Informatica handle CDC (Change Data Capture)?
  5. Explain the design considerations for large fact table loading.
  6. How do you implement error logging frameworks?
  7. How do you optimize mappings with multiple lookups?
  8. Explain session and mapping-level pushdown optimization internals.
  9. Describe advanced partitioning strategies and when to avoid them.
  10. How do you tune aggregator-heavy mappings?
  11. What is session recovery and restartability design?
  12. How do you implement deduplication frameworks?
  13. Explain real-time processing using Informatica Real Time Edition.
  14. How do you handle schema evolution in production pipelines?
  15. What is the difference between PowerCenter and IICS?
  16. How do you design for multi-node grid environments?
  17. Explain the metadata-driven ETL framework.
  18. What security considerations exist for Informatica repositories?
  19. How do you manage repository upgrades and migrations?
  20. What is deployment group and how is it used?
  21. How do you integrate Informatica with Big Data platforms?
  22. Explain the internal working of Integration Service thread pools.
  23. Discuss load-balancing strategies in Informatica domain.
  24. How do you design ETL for high-volume CDC replication?
  25. What is pushdown failure recovery strategy?
  26. Explain version control best practices in Informatica.
  27. What are advanced SCD techniques beyond Type 1,2,3?
  28. How do you build reusable ETL templates?
  29. How do you implement audit/fact-less fact frameworks?
  30. Explain error/exception partitioning.
  31. How do you handle massive file ingestion (>1 TB)?
  32. Explain asynchronous processing in workflows.
  33. How do you reduce lookup cache memory footprint?
  34. What is enterprise data masking architecture in Informatica?
  35. How do you tune workflows containing many sessions?
  36. What is high availability mode in Informatica?
  37. How do you implement row-level security in ETL flows?
  38. Explain workflow recovery from a power outage scenario.
  39. What is a custom transformation and when is it required?
  40. How do you enforce data governance standards in Informatica ETL?

Informatica Interview Questions and Answers

Beginner (Q&A)

1. What is Informatica PowerCenter?

Informatica PowerCenter is a widely used enterprise ETL (Extract, Transform, Load) and data integration platform designed to move, transform, and manage large volumes of data across heterogeneous systems. It enables organizations to extract data from various sources such as relational databases, flat files, cloud platforms, mainframes, and applications, and then perform complex business transformations before loading the data into target systems like data warehouses, data marts, or operational databases.

PowerCenter uses a metadata-driven architecture, which means all design objects (mappings, workflows, transformations) are stored and managed in a common repository. This ensures consistency, version control, reusability, and easy maintenance.

It provides high scalability, fault tolerance, and performance tuning options, making it suitable for large-scale enterprise data integration needs. PowerCenter supports batch processing, real-time data integration, parallel execution, and sophisticated debugging capabilities, which help in creating highly reliable and optimized ETL pipelines.

2. What is an ETL process?

ETL stands for Extract, Transform, Load, which is a fundamental process in data warehousing and data integration projects.

  1. Extract – Data is collected from various source systems such as databases, ERP systems, cloud apps, CSV files, APIs, or mainframes. Extraction focuses on pulling the necessary data while ensuring minimal impact on the source systems.
  2. Transform – The extracted data is cleansed, validated, enriched, standardized, and manipulated according to business rules. Transformations can include:
    • Filtering and sorting
    • Aggregating data
    • Converting data types
    • Applying business logic
    • Handling nulls and errors
    • Merging data from multiple sources
  3. Load – The transformed data is then loaded into target systems such as data warehouses, operational systems, analytical platforms, or downstream applications. The loading can be full, incremental, or real-time depending on business needs.

ETL ensures that organizations have accurate, consistent, and up-to-date data for reporting, analytics, and decision-making.

3. What are the key components of Informatica PowerCenter?

Informatica PowerCenter architecture consists of several core components, each performing a specific role in the ETL lifecycle:

  1. Repository
    Stores all metadata such as mappings, workflows, sessions, configurations, and transformation rules.
  2. Repository Server
    Manages connections to the repository and handles metadata transactions.
  3. Integration Service
    Executes workflows and sessions. It performs extraction, transformation, and loading operations during runtime.
  4. Repository Service
    Enables client tools to read and write metadata to the repository.
  5. PowerCenter Client Tools
    Includes:
    • Designer – used to create mappings and transformations.
    • Workflow Manager – used to create and manage sessions and workflows.
    • Workflow Monitor – used to track workflow execution and performance.
    • Repository Manager – used to manage repository objects.
  6. Domain & Node Architecture
    The domain is the administrative unit; nodes are logical servers within it.

Together, these components allow users to design, execute, monitor, and manage complete ETL processes.

4. Define a repository in Informatica.

The Informatica repository is a centralized metadata storage location where all ETL-related objects and configurations are stored. It is typically hosted on a relational database such as Oracle, SQL Server, or DB2.

The repository stores:

  • Mappings and mapplets
  • Sessions and workflows
  • Transformation logic
  • Source and target definitions
  • User permissions and folder structures
  • Version history and deployment metadata

The repository is crucial because PowerCenter is a metadata-driven system, meaning all ETL operations rely on metadata definitions rather than hard-coded logic.

Repositories can be:

  • Global repository – for sharing common objects across the organization
  • Local repository – for project-specific development
  • Versioned repository – supports check-in/check-out, version control, and multi-user development

The repository ensures consistency, reusability, and centralized control for data integration processes.

5. What is a mapping in Informatica?

A mapping is a visual representation of how data moves from sources to targets along with the transformation rules applied in between. It serves as the core ETL logic in Informatica PowerCenter.

A mapping includes:

  • Source definitions
  • Target definitions
  • Transformations (filter, lookup, joiner, router, etc.)
  • Connections between transformations
  • Business rules and expressions

When a mapping is executed, the Integration Service reads the source data, applies the logic defined in the mapping, and loads the final processed data to the target.

Mappings enable:

  • Rule-based data processing
  • Reusable logic development
  • Complex transformation flows
  • Handling multiple source and target systems

Mappings form the heart of all data transformation workflows in Informatica.

6. What is a session in Informatica?

A session is a task that executes a mapping. It contains configuration details required to extract, transform, and load data based on the mapping logic.

A session includes:

  • Source and target connection details
  • Load properties (insert/update/delete settings)
  • Error handling configuration
  • Partitioning and performance settings
  • Pre-session and post-session commands

Sessions run under the control of the Integration Service, which allocates system resources, manages data movement, and logs execution statistics.

A mapping cannot run without a session, making sessions a crucial runtime object in Informatica.

7. What is a workflow?

A workflow is a container that defines the sequence and execution order of tasks such as sessions, email notifications, command tasks, event waits, and decision tasks. It orchestrates the complete ETL process.

Workflows allow you to:

  • Run multiple sessions in parallel or sequence
  • Define dependencies between tasks
  • Trigger conditional execution
  • Handle pre/post-processing activities
  • Schedule automated ETL jobs

The Workflow Manager is used to design workflows, while the Workflow Monitor is used to track their execution in real time.

Workflows provide a structured approach to automate and manage complex data pipelines.

8. What is a transformation?

A transformation is an object in Informatica mapping that modifies, filters, or routes data. Transformations allow developers to define business rules applied during data processing.

Transformations can:

  • Clean data
  • Aggregate data
  • Join datasets
  • Lookup reference data
  • Split data into multiple streams
  • Calculate derived values
  • Filter unwanted records

There are two categories:

  1. Active transformations – change the number of rows (e.g., filter, aggregator).
  2. Passive transformations – do not change the number of rows (e.g., expression).

Transformations are the building blocks of the ETL logic inside a mapping.

9. Explain the Source Qualifier transformation.

The Source Qualifier transformation (SQ) is an active, connected transformation automatically created for relational sources. It represents the rows that the Integration Service retrieves from the source database.

Key functions:

  • Acts as a bridge between source definition and the mapping pipeline.
  • Applies source filters, which translate into SQL WHERE clauses to reduce data volume.
  • Allows sorting data at the database level using ORDER BY.
  • Enables joining multiple relational sources using user-defined joins.
  • Converts SQL queries into a format compatible with the native source database.

By pushing filtering, sorting, and joining operations to the database, Source Qualifier improves ETL performance and reduces unnecessary data movement.

10. What is a lookup transformation?

A lookup transformation is used to retrieve related or reference data from a lookup table, file, or database during ETL processing. It is commonly used for data validation, dimension key lookups, deduplication, and enrichment.

Features:

  • Supports connected and unconnected modes.
  • Can use static, dynamic, or persistent cache.
  • Can return single or multiple columns.
  • Can be configured to return default values for missing records.
  • Allows SQL overrides to improve flexibility and performance.

Lookups are essential for operations like:

  • Fetching surrogate keys in data warehouses
  • Validating foreign key relationships
  • Performing reference data checks
  • Comparing current values with historical data

It significantly reduces database load by caching lookup data during runtime.

11. What are connected vs unconnected lookups?

A lookup transformation can be used in two distinct modes: connected and unconnected. Both retrieve reference data, but they differ in design, usage, and performance.

Connected Lookup

  • It is directly connected to the mapping data flow using input/output ports.
  • Executes for every incoming row.
  • Can return multiple columns as output.
  • Commonly used for:
    • Surrogate key lookups
    • Validations needing multiple fields
    • Real-time transformations

Advantages:

  • Easy to design and debug
  • Can return several output fields
  • Supports dynamic cache for SCD Type 2 logic

Disadvantages:

  • Higher processing overhead when executed per row

Unconnected Lookup

  • It is not connected to the main pipeline.
  • Called manually using LKP() function inside an expression or other transformation.
  • Can return only one column.
  • Runs only when explicitly invoked.

Advantages:

  • Better performance when lookup needs to be executed conditionally
  • Cleaner mapping design

Disadvantages:

  • Cannot return multiple columns
  • Harder to debug compared to connected lookups

Summary:
Connected lookups run automatically per row and return multiple columns, whereas unconnected lookups run only when needed and return a single value.

12. What is a filter transformation used for?

A Filter transformation is an active, connected transformation used to remove unwanted rows from the data pipeline based on a condition. It works similarly to the WHERE clause in SQL.

Key characteristics:

  • Only rows that meet the filter condition are passed downstream.
  • Rows failing the condition are dropped and not processed further.
  • It improves performance by reducing row volume early in the pipeline.

Example use cases:

  • Keeping only active customers
  • Filtering transactions above a certain threshold
  • Excluding null or invalid values

Filter conditions are Boolean expressions such as:

SALARY > 50000 AND DEPT_ID = 10

Filters help ensure only relevant and clean data flows to the target.

13. Define aggregator transformation.

The Aggregator transformation is an active, connected transformation used to perform aggregate calculations such as:

  • SUM
  • AVG
  • COUNT
  • MIN / MAX
  • MEDIAN
  • FIRST / LAST

It functions similarly to SQL's GROUP BY clause.

Key features:

  • Supports group-by ports for summarization
  • Can aggregate large datasets
  • Uses cache memory for storing intermediate results
  • Supports conditional aggregations

Example use cases:

  • Calculating total sales per region
  • Finding highest salary in each department
  • Counting customers per city

Because it processes data group by group, the aggregator often requires sorted input to improve performance and minimize memory usage.

14. What is the difference between aggregator and expression?

Although both transformations manipulate data, they serve different purposes.

Aggregator Transformation

  • Performs aggregate calculations (SUM, AVG, COUNT, MIN, MAX).
  • Processes data group-wise.
  • Requires cache memory.
  • Is active: changes the number of rows (e.g., multiple rows become one).

Expression Transformation

  • Performs row-level calculations such as string manipulation, arithmetic, date conversion.
  • Does not perform aggregation.
  • Does not change the number of rows.
  • Is passive: all input rows are passed through.

Example:

  • Use aggregator to calculate total sales for each region.
  • Use expression to calculate a 10% discount for each row.

15. What is a joiner transformation?

A Joiner transformation allows combining data from two heterogeneous sources (e.g., flat files, relational tables) that cannot be joined using Source Qualifier.

It is an active, connected transformation that performs joins at the mapping level.

Key capabilities:

  • Join data from different source types
  • Join based on conditions (e.g., customer_id = cust_id)
  • Supports multiple join types (normal, left, right, full)
  • Useful when joining data from sources residing on different systems

Joiner is essential when:

  • Sources come from different databases
  • One source is a flat file
  • You need more control over join logic

16. Explain master and detail relationships in Joiner.

In a Joiner transformation, one input pipeline is chosen as master and the other as detail. This distinction is important for performance and correct join functionality.

Master Pipeline

  • Rows are read first and cached.
  • Should ideally contain the smaller dataset to reduce cache size and improve speed.
  • If master pipeline rows do not match, they are discarded.

Detail Pipeline

  • Rows are streamed and matched against cached master records.

Choosing the correct master table is critical:

  • Placing a smaller dataset as master reduces memory load.
  • Reduces disk spill-over.
  • Improves join execution time significantly.

Example:

  • If customers dataset is small and transactions are large, CUSTOMER should be master.

17. What are different join types in Joiner transformation?

Joiner supports four types of joins, similar to SQL:

  1. Normal Join
    • Returns rows where join condition matches.
    • Equivalent to SQL INNER JOIN.
  2. Left Outer Join
    • Returns all rows from the master pipeline and matched rows from detail.
  3. Right Outer Join
    • Returns all rows from the detail pipeline and matched rows from master.
  4. Full Outer Join
    • Returns all matching and non-matching rows from both pipelines.

Uses:

  • Normal join → common matching records
  • Left join → keep all master records
  • Full join → combine unmatched data for reporting or audits

18. What is a sequence generator?

A Sequence Generator is a passive transformation that generates unique numeric values (usually surrogate keys) for target rows.

It produces:

  • NEXTVAL → next number in sequence
  • CURRVAL → current number

You can configure:

  • Start value
  • Increment value
  • Cycle or no-cycle
  • Number of cached values

Common use cases:

  • Generating primary keys
  • Creating batch numbers
  • Producing unique sequence values in data warehouses

Sequence Generator ensures safe and consistent ID creation without relying on database sequences.

19. What is a router transformation?

A Router transformation is an active transformation used to route data into multiple groups based on different conditions. It works like an advanced version of the Filter transformation with multiple outputs.

Features:

  • Supports multiple filter conditions simultaneously
  • Has default group for rows not matching any condition
  • Helps avoid using multiple Filter transformations

Example:

  • Sending high-value, medium-value, and low-value transactions to separate targets
  • Routing data to different workflows based on region or category

Router improves clarity and reduces complexity in mapping design.

20. What is the difference between filter and router?

Both transformations filter data, but they differ significantly.

Filter Transformation

  • Has only one output.
  • Rows not matching the filter condition are dropped.
  • Cannot send data to multiple destinations.

Router Transformation

  • Has multiple output groups based on different conditions.
  • Rows can be split into logical groups.
  • Includes a default group.
  • More flexible and avoids multiple filter transformations.

Example:

  • Use Filter to keep only ACTIVE customers.
  • Use Router to split customers into ACTIVE, INACTIVE, and HIGH-VALUE groups.

21. What is a sorter transformation?

A Sorter transformation is an active, connected transformation used to sort data in ascending or descending order based on one or more ports. It functions similarly to an SQL ORDER BY clause but performs the sort operation inside the Informatica pipeline instead of pushing it to the database.

Key features:

  • Supports sorting on multiple columns
  • Can perform case-sensitive or case-insensitive sorting
  • Allows sorting in ascending or descending order
  • Uses cache memory and may write temporary files if the dataset is large
  • Can be configured as a distinct sorter to remove duplicate rows

Example use case:

  • Sorting sales data by region and date
  • Ordering employees by salary
  • Preparing sorted input for downstream transformations like Aggregator (to improve performance)

Because sorting is memory-intensive, Informatica recommends using database-level sorting when possible. But when sources cannot push down sorting (e.g., flat files), Sorter is essential.

22. Explain update strategy transformation.

An Update Strategy transformation is an active transformation used to define how rows should be treated when loaded into a target. It allows you to specify whether each row should be:

  • Inserted
  • Updated
  • Deleted
  • Rejected

It uses the system variable DD_INSERT, DD_UPDATE, DD_DELETE, and DD_REJECT to classify rows.

Example logic:

IF NEW_FLAG = 'Y' THEN DD_INSERT 
ELSE DD_UPDATE 

Common use cases:

  • Managing Slowly Changing Dimensions (SCD)
  • Updating existing records in data warehouses
  • Rejecting invalid data
  • Applying custom business logic for load behavior

When used correctly, Update Strategy ensures that the ETL pipeline modifies target data accurately according to business rules.

23. What is a parameter file in Informatica?

A parameter file is a plain text file that defines parameters and variables used in mappings, workflows, and sessions. These values can be changed without modifying the original ETL objects.

A parameter file can include:

  • Mapping parameters
  • Mapping variables
  • Session parameters
  • Workflow variables

Format example:

[Global]
$$LoadDate=2024-06-01

[wf_LoadSales.s_SalesSession]
$DBConnection=Sales_DB
$$CountryCode=US

Benefits:

  • Promotes reusability and flexibility
  • Separates configuration from logic
  • Helps manage different environments (DEV, QA, PROD)
  • Allows runtime customization of ETL jobs

Parameter files make ETL pipelines more maintainable and scalable.

24. What is a repository server?

The Repository Server (in older versions of Informatica) or Repository Service (in modern versions) manages interactions between client tools and the repository database.

It performs:

  • Read/write operations to the metadata repository
  • Authentication and authorization
  • Version control and object locking
  • Managing multi-user access
  • Repository backup and recovery

The Repository Service acts as a middleware layer ensuring that designers, workflow developers, and administrators can safely modify or retrieve metadata.

This server plays a foundational role because Informatica is a metadata-driven ETL tool, and without a functioning repository server, development cannot progress.

25. What is metadata?

Metadata is "data about data." In Informatica, it refers to all design-time and runtime information that defines ETL processes.

Examples of metadata include:

  • Source and target definitions
  • Mapping design, transformations, and logic
  • Workflow execution rules
  • Session configurations
  • User permissions
  • Data lineage and impact analysis details

Metadata helps the Integration Service understand:

  • What data to extract
  • How to transform it
  • Where to load it

Metadata is stored in the Informatica Repository, making PowerCenter a metadata-driven architecture. It improves consistency, traceability, and maintainability of all ETL operations.

26. What is a target load plan?

A target load plan specifies the order and method in which Informatica loads data into target tables when multiple targets exist in a mapping.

It governs:

  • Task execution sequence
  • Dependencies between targets
  • Performance and transactional behavior
  • Handling parent-child table relationships

For example:

  • Parent tables should load before child tables (to maintain referential integrity)
  • Fact tables may load after dimension tables
  • Landing tables may load before staging tables

In mappings with multiple target tables, Informatica allows defining:

  • Normal load order
  • Transaction control order
  • Constraint-based load order

Choosing the correct load plan ensures accurate, consistent, and efficient data loading.

27. Explain the concept of reusable transformations.

A reusable transformation is a transformation created once and saved in the repository for use in multiple mappings.

Benefits:

  • Promotes consistency and standardization
  • Reduces development time
  • Makes maintenance easier (one change reflects everywhere it is used)

Examples:

  • A reusable lookup for customer dimension key retrieval
  • A reusable expression for data cleansing logic
  • A reusable filter for active records

Reusable transformations are stored at the folder level and can be dragged into any mapping.

Difference from non-reusable transformation:

  • Non-reusable exists only within one mapping
  • Reusable can be shared across several mappings

Reusable objects improve modularity and reusability in ETL design.

28. What is a shortcut?

A shortcut in Informatica is a reference or pointer to an existing object in another folder. The actual object remains in the shared or global folder, but shortcuts allow access without duplication.

Shortcuts can be created for:

  • Sources
  • Targets
  • Transformations
  • Mapplets
  • Mappings

Benefits:

  • Avoids duplicating metadata
  • Ensures consistent use of shared objects
  • Simplifies maintenance—changes to the original object propagate automatically
  • Promotes cross-team development

Shortcuts support efficient multi-project architecture where objects are centrally managed.

29. What is a mapplet?

A mapplet is a reusable object that contains a set of transformations grouped together to perform a specific logic. It allows developers to encapsulate complex transformation logic into a single reusable component.

Uses:

  • Data cleansing routines
  • Standard validation logic
  • Reusable calculation modules
  • Dimension lookup logic

Characteristics:

  • Contains multiple transformations but no target definition
  • Can be invoked in multiple mappings
  • Supports both active and passive transformations

Mapplets help reduce complexity and increase productivity by modularizing commonly used ETL logic.

30. What is a domain in Informatica?

A domain is the highest-level administrative entity in Informatica. It is the logical container that manages all services and nodes in the Informatica platform.

A domain includes:

  • Nodes – physical servers that run services
  • Service Manager – manages all domain-level operations
  • Application services such as:
    • Repository Service
    • Integration Service
    • Reporting Service
    • SAP BW Service

Functions of a domain:

  • Authentication and security
  • Service configuration and monitoring
  • Load balancing and failover
  • Resource management
  • Centralized administration

Domains form the backbone of Informatica's distributed, scalable, enterprise-grade architecture.

31. What is a node?

A node in Informatica is a logical representation of a physical server where Informatica services run. It is part of the Informatica domain architecture and forms the execution environment for application services.

There are two types of nodes:

  1. Gateway Node
    • Acts as the entry point to the domain.
    • Handles client requests, authentication, and routing.
    • Can host application services as well.
  2. Worker Node
    • Primarily responsible for running application services such as:
      • Integration Service
      • Repository Service
      • Reporting Service
    • Used for load balancing and scaling.

Key functions of a node:

  • Executes ETL workloads
  • Hosts services in high-availability clusters
  • Manages service failover
  • Supports distributed execution and scalability

Nodes make Informatica a distributed and high-performance ETL platform capable of handling enterprise-level workloads.

32. What is a service manager?

The Service Manager is a core component of the Informatica domain responsible for managing and controlling all domain services. It runs inside the domain and ensures that all administrative and operational tasks are handled efficiently.

Key responsibilities of the Service Manager:

  • Authentication & Authorization:
    Validates users through domain security policies.
  • Service Lifecycle Management:
    Starts, stops, restarts, and monitors all application services.
  • Heartbeat & Health Monitoring:
    Continuously checks the status of nodes and services.
  • Metadata and Configuration Management:
    Stores configuration details of the domain, nodes, and services.
  • Load Balancing & Failover:
    Distributes workloads across multiple nodes and ensures high availability.

Simply put, the Service Manager functions like a brain of the domain, coordinating all communication, service orchestration, and administrative tasks.

33. What is a folder in Informatica?

A folder in Informatica is a logical container used to organize and manage ETL objects within the repository. It groups related metadata objects for better project structure and security.

Folders can contain:

  • Mappings
  • Mapplets
  • Sessions
  • Workflows
  • Transformations
  • Source/target definitions

Key benefits of folders:

  • Allow team-based development
  • Maintain project boundaries
  • Support permissions and access control
  • Facilitate object reuse across projects
  • Enable version control for specific teams

Folders help maintain a clean, modular, and secure repository structure, allowing multiple development teams to work independently.

34. What is the difference between static and dynamic cache in Lookup?

Lookups use cache memory to speed up reference data retrieval. Informatica supports static and dynamic lookup caches.

Static Cache

  • Cache is created once at the beginning of the session.
  • Contents do not change as rows are processed.
  • Ideal for lookups on stable reference data (e.g., product list).
  • Faster performance because cache is not updated.
  • Cannot detect new entries in the target table during load.

Dynamic Cache

  • Cache can be updated on the fly as rows are processed.
  • Used in combination with Update Strategy and Lookup for SCD Type 2 or upsert logic.
  • Supports insert, update, and delete operations on cache.
  • Helps maintain consistency between target and lookup data in real time.

Example use case:
Dynamic cache is required when implementing:

  • Customer dimension table updates
  • Real-time record matching
  • Incremental data warehouse loads

Static cache is used when lookup data does not change during the session.

35. Explain data cleansing.

Data cleansing is the process of identifying, correcting, or removing inaccurate, incomplete, inconsistent, or irrelevant data. It ensures data quality before loading into target systems.

Common cleansing operations:

  • Removing duplicates
  • Fixing invalid formats (phone numbers, dates)
  • Standardizing text (upper/lowercase)
  • Replacing null values
  • Correcting spelling or missing values
  • Validating reference data (e.g., valid country codes)

Tools used in Informatica for cleansing:

  • Expression transformation
  • Lookup validation
  • Router for conditional cleansing
  • Filter for eliminating bad data
  • Aggregator for deduplication
  • Informatica Data Quality (IDQ) for advanced rules

Clean data ensures accurate analytics, consistent reporting, and reliable decision-making.

36. What is a database connection in Informatica?

A database connection in Informatica defines how the Integration Service connects to relational databases or other data stores during ETL operations.

Connections include details such as:

  • Database type (Oracle, SQL Server, MySQL, DB2, etc.)
  • Username and password
  • Hostname and port number
  • Service name or database name
  • Connection pooling settings

Types of connections:

  • Relational connections
  • Application connections (Salesforce, SAP, etc.)
  • FTP connections
  • ODBC connections

These connections are reused across mappings and workflows, ensuring standardized access and reducing configuration effort.

37. What is normalized vs denormalized data?

Normalized Data

Normalization organizes data into multiple related tables to:

  • Reduce redundancy
  • Maintain data integrity
  • Avoid anomalies

Characteristics:

  • Data is spread across many tables
  • Uses foreign keys to maintain relationships
  • Ideal for OLTP (transactional) systems

Example:
Separate tables for CUSTOMER, ADDRESS, and ORDERS.

Denormalized Data

Denormalization combines tables to reduce joins and improve query performance.

Characteristics:

  • More redundancy
  • Faster read performance
  • Ideal for OLAP (data warehousing)

Example:
A single fact table with customer, product, and region attributes embedded.

Summary:
Normalize for accuracy and integrity.
Denormalize for reporting and speed.

38. What is a flat file source?

A flat file source is a type of input source that contains raw data in plain text format. Informatica can read flat files such as:

  • CSV (Comma-separated values)
  • TXT (Tab or pipe-delimited)
  • Fixed-width files
  • XML or JSON (with special parsing)

Flat file sources are commonly used because:

  • They are easy to generate and transfer
  • They are lightweight and portable
  • Many legacy systems export data as flat files

Informatica Designer provides Flat File Wizard to define the structure, delimiters, and data types for the file.

These sources are common in ETL pipelines involving data exchange between external systems, vendors, or legacy applications.

39. What is pushdown optimization?

Pushdown Optimization (PDO) is a performance technique where Informatica pushes transformation logic to the underlying database instead of processing it in the Integration Service.

Three levels of pushdown:

  1. Source-side pushdown
    • Filters, joins, and calculations push into source SQL.
  2. Target-side pushdown
    • Insert, update, delete operations executed at target DB.
  3. Full pushdown
    • Majority of the mapping logic executed inside the database.

Benefits:

  • Reduces data movement
  • Utilizes the DB engine’s processing power
  • Improves throughput
  • Minimizes ETL server load

Pushdown is ideal when working with large volumes of structured data where databases are well-optimized.

40. What are bad records and how does Informatica handle them?

Bad records are rows that fail to meet data quality rules or cannot be loaded due to errors. Examples include:

  • Invalid data types
  • Constraint violations (PK/FK issues)
  • Missing mandatory values
  • Conversion errors (string to date)
  • Lookup failures
  • Business rule failures

Informatica handles bad records using multiple mechanisms:

  1. Session Log & Error Log Files
    • Automatically records reasons for failure.
  2. Reject File (.bad file)
    • Captures rejected rows for further analysis.
  3. Error Handling Transformations
    • Router and Expression for custom validation
    • Lookup for defaulting missing values
  4. Exception Tables
    • Target tables specifically created for storing failed rows.
  5. Error Ports (for update strategy or mapping logic failures)

Handling bad records ensures accuracy, prevents target corruption, and supports debugging and cleansing workflows.

Intermediate (Q&A)

1. Explain the architecture of Informatica PowerCenter.

The architecture of Informatica PowerCenter follows a client-server, scalable, and metadata-driven model designed for enterprise data integration. It consists of several major components that work together to design, manage, execute, and monitor ETL processes.

Key Components:

1. PowerCenter Domain

  • The top-level administrative unit.
  • Contains nodes, grids, and application services.
  • Manages security, configuration, licensing, and load balancing.

2. Nodes

  • Represent physical servers.
  • Can host integration services, repository services, and other components.

3. PowerCenter Repository

  • A relational database storing all metadata:
    • Mappings
    • Workflows
    • Transformations
    • Connections
    • Version history
  • Used by developers and administrators.

4. Repository Service (RS)

  • Provides access to repository metadata.
  • Handles check-in/check-out, version control, and object management.

5. Integration Service (IS)

  • Executes ETL workflows and sessions.
  • Responsible for extracting data, transforming it, and loading targets.
  • Handles partitioning, caching, error management, and pushdown optimization.

6. Client Tools

  • Designer – build mappings
  • Workflow Manager – build sessions & workflows
  • Workflow Monitor – monitor ETL jobs
  • Repository Manager – manage objects & security

7. Metadata Management

  • Maintains lineage, impact analysis, and reusable objects.

Flow Summary:

  1. Developers create mappings in Designer.
  2. Repository Service stores metadata.
  3. Integration Service reads metadata and executes ETL logic.
  4. Workflow Monitor tracks runtime performance.

The architecture ensures high performance, modular development, scalability, and strong metadata management, making PowerCenter an enterprise-grade ETL platform.

2. How does the Integration Service work?

The Integration Service (IS) is the execution engine of Informatica PowerCenter. It runs workflows, manages sessions, and performs the complete ETL process.

Working Mechanism:

  1. Reads Metadata from Repository
    • Mapping logic
    • Transformations
    • Session configurations
    • Connections
  2. Establishes Connections
    • Connects to source and target systems.
    • Uses relational, flat file, or application connections.
  3. Extracts Data
    • Reads data based on Source Qualifier or other sources.
    • Applies pushdown optimization when configured.
  4. Processes Transformations
    • Executes row-level transformations (Expression, Lookup, etc.)
    • Uses caches for Lookup, Aggregator, Joiner, and other operations.
  5. Manages Caches & Buffers
    • Maintains dynamic/static cache
    • Handles memory allocation and partitioning
  6. Loads Data into Target
    • Inserts, updates, deletes, or merges based on session properties
    • Manages commit intervals and transactions
  7. Error Handling & Logging
    • Creates session logs
    • Captures rejected rows
    • Manages failover in HA configurations

The Integration Service is designed for scalability, high performance, parallel processing, and reliability.

3. Explain partitioning in Informatica.

Partitioning is a performance optimization technique that allows the Integration Service to divide data into multiple segments and process them in parallel, increasing throughput.

Key Concepts:

  • A mapping or session can be split into multiple pipelines.
  • Each partition processes a subset of the data simultaneously.
  • Reduces ETL time significantly for large datasets.

Types of Partitioning:

  1. Database Partitioning
    • Uses the database's internal partitioning (Oracle partitioned tables).
  2. Pipeline Partitioning
    • Splits the flow at transformation level.
  3. Partition Points
    • Defined where data is divided or merged (e.g., SQ, Aggregator).
  4. Types of Partition Algorithms:
    • Round-Robin – evenly distributes rows
    • Hash – partitions based on key
    • Key Range – partitions based on value ranges
    • Pass-through – no actual partitioning

Benefits:

  • Faster ETL processing
  • Better utilization of CPU cores
  • Higher scalability

Partitioning is essential for big data volumes, large fact table loads, and high-performance ETL pipelines.

4. What is session partitioning and when is it used?

Session partitioning involves configuring multiple partitions at the session level so the Integration Service can run parallel threads and improve performance.

How it works:

  • The session divides the source data into partitions.
  • Each partition runs the mapping logic independently.
  • Results are combined before loading into targets.

When it is used:

  • Large-scale daily or hourly loads
  • Heavy transformations (Aggregator, Lookup)
  • Multiple CPUs available for parallelism
  • When the source supports partitioning (RDBMS, partitioned files)

Advantages:

  • Reduces session runtime
  • Increases pipeline throughput
  • Efficient resource usage

Caution:

Not recommended when:

  • Order of data must be preserved
  • Transformations like Rank or Normalizer cannot be partitioned

Session partitioning is a key performance tuning feature used in enterprise ETL systems.

5. What are types of caches used by Lookup transformation?

Lookup transformation uses cache memory to store lookup data for fast access during ETL processing. The following types of caches are used:

1. Static Cache

  • Created once at the start of the session
  • Not modified during processing
  • Default for lookup
  • Best for stable reference data

2. Dynamic Cache

  • Cache updated as new data is processed
  • Used in upsert scenarios and SCD implementations
  • Supports insert/update logic for cache entries

3. Persistent Cache

  • Cache stored on disk after session completion
  • Reused in future sessions
  • Avoids rebuilding large caches, improving performance

4. Shared Cache

  • Shared across multiple lookups in different mappings
  • Useful for consistent master data reference

5. Recache Option

  • Forces the cache to rebuild even if persistent cache exists

Lookup caching is essential for reducing database hits and improving ETL performance.

6. Explain persistent cache.

A persistent cache allows Informatica to store lookup cache data on disk after a session completes so it can be reused in subsequent ETL runs.

Why use persistent cache?

  • Reduces database load by avoiding repeated lookup table queries
  • Improves session performance
  • Useful when lookup source data changes infrequently
  • Ideal for reference or dimension tables

How it works:

  1. First session run → cache created & stored on disk.
  2. Subsequent runs → Informatica loads cache directly from disk.
  3. Only differences (if any) may need updating.

Where it's used:

  • Large lookup tables
  • Slow database connections
  • Situations requiring consistent lookup results across sessions

Persistent cache is a powerful optimization tool for improving lookup performance.

7. What is a workflow monitor?

Workflow Monitor is a client tool used to view, track, analyze, and troubleshoot workflows and sessions executed in PowerCenter.

Key Capabilities:

  • Shows execution status (Succeeded, Failed, Running, Queued)
  • Provides Gantt chart of task timelines
  • Displays session logs, error logs, and performance details
  • Allows stopping, restarting, or aborting workflows
  • Shows performance metrics:
    • Rows processed
    • Throughput
    • Cache usage
    • Transformation times

Benefits:

  • Helps developers debug ETL failures
  • Provides operational visibility
  • Useful for monitoring long-running data loads
  • Supports real-time performance tuning

Workflow Monitor is essential for ETL administrators and developers.

8. What is a command task?

A Command Task allows you to run operating system-level commands or scripts within a workflow.

Examples:

  • Copy or move files
  • Archive logs
  • Trigger shell or batch scripts
  • Call third-party utilities
  • Create FTP/SFTP operations
  • Clean temporary files

Usage Scenarios:

  • Preparing files before ETL starts
  • Running post-load cleanup scripts
  • Validating file arrival
  • Automating system tasks

Command Task improves automation and integration with external systems.

9. What is an event wait task?

An Event Wait Task pauses the workflow until a specified event occurs. Informatica supports:

  • File-based events
  • Custom-defined events

How it works:

  • Workflow enters waiting state
  • Checks for event (e.g., file arrival, trigger signal)
  • Resumes when event is detected
  • Times out if event does not occur within the set interval

Use Cases:

  • Waiting for an upstream system to generate a file
  • Synchronizing workflows between multiple systems
  • Handling dependencies in batch processing

Event Wait Tasks ensure proper sequencing in data pipelines.

10. What is a pre-session and post-session command?

Pre-session and post-session commands allow execution of external scripts before or after a session runs.

Pre-session Command

Executed before the session starts.

Use Cases:

  • Clean or initialize staging tables
  • Create backup of target tables
  • Validate file existence
  • Set environment variables
  • Trigger database stored procedures

Post-session Command

Executed after the session completes.

Use Cases:

  • Archive processed files
  • Send notifications
  • Move bad files to an error folder
  • Execute cleanup routines
  • Run database indexing or maintenance scripts

Benefits:

  • Automates operational tasks
  • Integrates ETL with external systems
  • Reduces manual intervention

Pre/post session commands provide flexibility and automation for end-to-end ETL workflows.

11. Explain rank transformation with example use cases.

The Rank transformation is an active, connected transformation used to select the top or bottom records based on a specific measure or port. It works similarly to SQL’s ROW_NUMBER() or TOP N functionality.

Key Capabilities:

  • Can return top N or bottom N rows.
  • Allows ranking within groups using the Group By option.
  • Supports ties handling, meaning multiple rows can share the same rank.

How It Works:

  1. Identify a rank port (e.g., salary, sales).
  2. Set the number of ranks (e.g., top 5 employees).
  3. Optional: enable Group By (e.g., top 3 employees per department).

Example Use Cases:

  • Top 5 customers by revenue.
  • Top 10 selling products per month.
  • Employees with highest salaries.
  • Lowest-performing stores for analysis.
  • Top-3 students in each class.

Rank transformation is widely used in reporting, analytics, and business intelligence ETL scenarios.

12. When do you use union transformation?

A Union transformation is a passive, connected transformation that combines data from multiple input pipelines into a single output pipeline. It works similarly to SQL’s UNION ALL (not UNION).

When to Use:

  • When integrating homogeneous data from multiple sources with the same structure.
  • When consolidating data from:
    • Multiple regions
    • Multiple flat files
    • Multiple partitioned datasets
    • Historical + current data streams

Characteristics:

  • Does not remove duplicates (like UNION ALL).
  • All input groups must have the same metadata (same port names and data types).
  • Inputs can come from different sources or transformations.

Example:

Merging the following:

  • Sales_Q1
  • Sales_Q2
  • Sales_Q3
  • Sales_Q4

→ into a single unified dataset for annual reporting.

Union transformation is essential for data consolidation and ETL pipelines requiring dataset merging.

13. What is the difference between mapping variable and mapping parameter?

Both are metadata objects that store values, but they differ significantly in purpose and behavior.

Mapping Parameter

  • Value is constant during the entire session run.
  • Must be initialized using parameter file or default value.
  • Used for static filtering, database connections, date values, etc.
  • Cannot change within a session.

Example:
$$LoadDate = '2025-01-01'

Good for filtering data by date or environment-specific settings.

Mapping Variable

  • Value can change during session execution.
  • Stores last-run values for incremental loads.
  • Automatically saved in the repository.
  • Supports variable functions like:
    • SETVARIABLE()
    • SETPREVIOUSVARIABLE()
    • GETVARIABLE()

Example:
$$MaxOrderID updates each session to track new rows.

Mapping variables are essential for incremental loads, watermarks, and capturing dynamic states.

14. Explain incremental aggregation.

Incremental Aggregation improves performance by processing only newly added or changed data instead of recalculating aggregates from scratch.

How It Works:

  1. The first session run loads all data into the aggregator cache.
  2. The next run processes only new incoming records.
  3. Integration Service updates cached historical aggregate values.
  4. Updated cache is used for future calculations.

Benefits:

  • Dramatically reduces processing time.
  • Ideal for large fact tables.
  • Reduces memory and CPU usage.

Example Use Case:

Daily sales aggregation:

  • Day 1: Load all data → compute total revenue.
  • Day 2: Only process new sales records → update totals.

Incremental aggregation is commonly used in incremental ETL pipelines for data warehouses.

15. What are mapplet restrictions?

Mapplets are reusable logic containers, but they have certain limitations.

Major Restrictions:

  1. Cannot contain the following:
    • Source definitions
    • Target definitions
    • Normalizer transformation
    • XML transformations
    • COBOL sources
    • Pre or Post SQL commands
    • Transaction Control transformation
  2. Cannot contain transformations that generate multiple pipelines
    (like Joiner with heterogeneous inputs).
  3. Cannot use mapplets inside other mapplets (no nesting).
  4. Active transformations inside a mapplet may restrict usage of mapplet in certain mappings.
  5. Input/output ports must be properly defined to avoid conflicts.

Mapplets are great for reusability but are not suitable for every type of transformation flow.

16. How do you handle Slowly Changing Dimensions (SCD Type 1, 2, 3)?

SCDs manage how historical data is stored in dimension tables.

SCD Type 1 — Overwrite History

  • Updates existing records; no historical data preserved.
  • Use Update Strategy with DD_UPDATE.
  • Typically used for corrections like name change typo.

Example:
Change customer address → overwrite old address.

SCD Type 2 — Preserve History

  • Creates new record for every change.
  • Maintains history with:
    • Surrogate key
    • Effective start/end dates
    • Current row flag

ETL Steps:

  1. Lookup customer on natural key.
  2. If data changed:
    • Expire old record
    • Insert new record
  3. Maintain dynamic lookup cache for high performance.

Used when historical correctness is required (e.g., sales reporting).

SCD Type 3 — Partial History

  • Stores current and previous values only.
  • Adds new columns such as:
    • PREV_ADDRESS
    • CURRENT_ADDRESS

Used when limited history is sufficient (e.g., tracking last job role).

17. Explain dynamic lookup cache in detail.

A dynamic lookup cache is a lookup cache that updates itself as data flows through the pipeline, making the lookup and target table stay synchronized.

Key Features:

  1. Supports Insert, Update, Delete operations on cache.
  2. Avoids re-querying the target table repeatedly.
  3. Essential in SCD Type 2 and upsert logic.

How It Works:

  • Cache is built initially.
  • For each input row:
    1. Lookup performed
    2. If row not found → insert new row
    3. Cache is updated with new key
    4. If found → update logic applied
  • Cache always reflects the latest target state.

Benefits:

  • High performance
  • Eliminates stale cache issues
  • Supports real-time warehouse loads

Dynamic lookup cache is one of the most powerful features in Informatica for modern ETL strategies.

18. What is pushdown optimization and what are its levels?

Pushdown Optimization (PDO) pushes transformation logic to the underlying database to improve ETL performance.

Levels of Pushdown:

1. Source-Side Pushdown

  • Pushes SQL logic into the source database.
  • SQ filters, joins, and expressions translated into SQL.
  • Reduces data movement from source → ETL server.

2. Target-Side Pushdown

  • Pushes INSERT, UPDATE, DELETE logic into the target database.
  • Uses SQL override with merge/update operations.
  • Useful for large fact table loads.

3. Full Pushdown (End-to-End)

  • Nearly all mapping logic is converted into SQL and executed inside the database.
  • Integration Service orchestrates execution but database executes transformations.

Benefits:

  • Minimizes network latency
  • Reduces CPU load on Integration Service
  • Uses DB engine’s power (indexes, optimizers, parallel execution)

Pushdown is ideal when working with large RDBMS sources and targets.

19. What causes session failure?

A session may fail due to multiple reasons related to configuration, connectivity, transformation logic, or data issues.

Common Causes:

  1. Invalid source or target connections
  2. Lookup failures
  3. Insufficient cache memory
  4. SQL syntax errors
  5. File not found or file permissions
  6. Network connectivity failures
  7. Transformation errors (conversion, overflow, null mismatch)
  8. Database constraints (PK/FK/unique violations)
  9. Target database full or unreachable
  10. Incorrect parameter file path

Logs to Check:

  • Session log
  • Workflow log
  • Error log (.bad files)
  • Mapping logs (debugger)

Sessions fail only when critical issues prevent successful execution.

20. How do you debug mappings in Informatica?

Debugging is essential for identifying data flow issues, transformation errors, or incorrect logic.

Steps to Debug a Mapping:

1. Use the Mapping Debugger

  • Set breakpoints
  • Step through row-by-row execution
  • Inspect port values
  • Identify incorrect transformation logic

2. Check Session Logs

  • Review error messages
  • Identify failing transformation
  • Inspect SQL queries

3. Validate Mapping

  • Use Designer → Mapping → Validate to detect structural errors.

4. Test with Sample Data

  • Run subsets of data to isolate issues.
  • Use simplified sources or mock data.

5. Enable Verbose Data Logging

  • Captures transformation-level details.
  • Helps trace incorrect calculations.

6. Test Each Transformation Independently

  • Verify Lookup queries
  • Validate expressions
  • Test joins and filters

7. Use Reject Files

  • Analyze rejected rows and error messages.

Outcome:

Debugging ensures correctness, identifies performance issues, and validates business logic before production deployment.

21. What are workflow variables?

Workflow variables are dynamic values defined within a workflow that can change during execution and influence workflow behavior. They enable conditional logic, runtime decision-making, and state management in workflows.

Key Characteristics:

  • Workflow variables begin with **∗∗(e.g.,‘** (e.g., `∗∗(e.g.,‘FileExists, $$LoadFlag`).
  • They store values that can be referenced in:
    • Decision tasks
    • Event wait tasks
    • Command tasks
    • E-mail notifications
  • Variables can update at runtime using assignment tasks or task output values.

Types of Workflow Variables:

  1. Boolean Variables – True/False logic.
  2. String Variables – Hold dynamic text values.
  3. Numeric Variables – Used for counters, totals, or flags.
  4. Datetime Variables – Store timestamps for comparison.

Use Cases:

  • Checking if a file exists before running a session
  • Running different branches of workflow based on conditions
  • Tracking session run counts
  • Implementing restartability and recovery logic
  • Triggering downstream tasks only when required

Workflow variables make workflows intelligent and adaptable to runtime conditions.

22. How do you schedule workflows?

Workflows in Informatica can be scheduled to run automatically at specific intervals or based on events.

Methods to Schedule Workflows:

1. Using Workflow Manager Scheduler

  • The built-in scheduler allows you to configure:
    • Daily, weekly, monthly schedules
    • Custom calendar events
    • Start/End times
    • Repetition intervals

Steps:

  1. Open Workflow Manager
  2. Go to Workflow → Edit
  3. Select Schedule tab
  4. Set schedule and save

2. Using External Schedulers

Organizations often use enterprise schedulers like:

  • Control-M
  • Autosys
  • Crontab
  • Tidal
  • UC4

These tools invoke workflows using pmcmd command-line utility.

3. Event-Based Scheduling

Workflows can be triggered when:

  • A file arrives
  • A server event is raised
  • Another workflow completes

Using Event Wait Task, workflows run based on external conditions.

4. Command-Line Scheduling (pmcmd)

Workflows can be triggered via shell or batch scripts using:

pmcmd startworkflow -sv IntegrationService -d Domain -u user -p pass wf_Name

Scheduling ensures timely data loads and automates end-to-end data integration pipelines.

23. How does Informatica handle deadlock situations?

Deadlocks occur when two or more processes compete for the same resources in a way that prevents progress.

In Informatica, deadlocks typically happen due to:

  • Multiple sessions loading the same target table
  • Contention for indexes or constraints
  • Long-running queries
  • Missing primary keys or incorrect update strategies

How Informatica Handles Deadlocks:

1. Automatic Retry Mechanism

When the database returns a deadlock error:

  • Integration Service retries the operation (default: 3 times)

2. Commit Intervals

Smaller commit intervals reduce transaction size and minimize lock durations.

3. Target Load Ordering

Ensures dependent tables are loaded in the correct sequence.

4. Constraint-Based Loading

Loads parent tables before child tables, reducing lock conflicts.

5. Index Management

  • Dropping indexes before load
  • Rebuilding them afterward
  • Reducing unnecessary constraints

6. Using Partitioning

Partitions help parallelize loads to reduce contention.

7. Database Tuning

  • Increasing lock wait time
  • Optimizing queries
  • Improving database configuration

Deadlocks are handled through retries, optimized workflows, and database-level strategies.

24. Explain differences between sorter and aggregator.

FeatureSorter TransformationAggregator TransformationFunctionSorts dataAggregates data (SUM, AVG, COUNT)TypeActiveActiveInput OrderSorted outputMay require sorted input for performanceCache UsageUses sort cacheUses aggregation cacheRow CountDoes not reduce row count unless distinct option usedUsually reduces rows (grouping)OperationsSort, distinctGroup By, aggregate functionsPerformanceFast for small datasetsCPU/memory intensive for large datasets

In Simple Terms:

  • Sorter organizes data.
  • Aggregator summarizes data.

Sometimes Sorter is used before Aggregator to reduce cache usage and improve performance.

25. What is a recovery strategy in Informatica?

A recovery strategy ensures that workflows or sessions can resume correctly after failure without restarting from the beginning.

Key Recovery Options:

1. Restart Workflow / Session

Informatica can restart:

  • From the failed task
  • From the beginning
  • From the last successful checkpoint

2. Recovery Strategy Settings

In Workflow Manager:

  • Fail Task and Continue Workflow
  • Restart Task
  • Stop on Error

3. Session-Level Recovery

Session settings allow:

  • Perform recovery – load resumes from last committed point
  • Recover from last checkpoint – minimizes reprocessing
  • Treat source rows as new – forces reload

4. Checkpointing

Checkpoints include:

  • Commit points
  • Cache states
  • Variable values

5. Using Workflow Variables

Variables help detect partial loads and re-run only required segments.

Recovery strategies improve system robustness and avoid reloading large volumes unnecessarily.

26. How do you improve performance in Informatica sessions?

Performance tuning involves optimizing mappings, sessions, databases, and system resources.

Techniques to Improve Performance:

1. Pushdown Optimization

Push transformation logic to database to reduce ETL workload.

2. Partitioning

Enables parallel processing using:

  • Hash
  • Key range
  • Round robin

3. Optimize Lookups

  • Use dynamic cache
  • Use persistent cache
  • Reduce number of lookup columns
  • Use indexes on lookup condition columns

4. Optimize Aggregator

  • Pre-sort input
  • Use sorted input option
  • Reduce group-by fields

5. Eliminate Unnecessary Transformations

Avoid:

  • Excess filters
  • Transformations that slow down pipeline
  • Data type conversions

6. SQL Overrides

Improve Source Qualifier performance with:

  • Filters
  • Joins
  • Optimized SQL

7. Increase Buffer and DTM Memory

Configurable in session properties.

8. Target Optimization

  • Use bulk loads
  • Drop indexes before load and recreate afterward

Performance tuning is an iterative process involving mapping, database, and system-level optimizations.

27. What is a relational lookup vs flat file lookup?

Relational Lookup

  • Queries data from a database table.
  • Supports SQL overrides, joins, conditions.
  • Can use dynamic cache and persistent cache.
  • Faster when indexes are used.

Best for:

  • Dimension table lookups
  • Reference/master data stored in databases

Flat File Lookup

  • Lookup source is a flat file.
  • Does not support SQL overrides.
  • Entire file must be cached.
  • Limited to static or persistent cache (no dynamic cache).

Best for:

  • File-based reference data
  • Code lists provided by external vendors

Key Differences:

FeatureRelational LookupFlat File LookupSourceDatabaseFlat FileSQL OverrideYesNoDynamic CacheYesNoPerformanceFaster with indexingSlower for large filesFlexibilityHighLimited

28. What is key range partitioning?

Key range partitioning is a partitioning strategy where data is divided based on specific ranges of key values.

How It Works:

  • Define partitions by ranges:
    • Partition 1: ID 1–1000
    • Partition 2: ID 1001–2000
    • Partition 3: ID 2001–3000

Each partition processes rows falling in its assigned range.

Use Cases:

  • Customer ID ranges
  • Date ranges (e.g., monthly partitions)
  • Transaction ID ranges
  • Order number ranges

Benefits:

  • Minimizes data skew
  • Provides predictable distribution
  • Ideal for sequential keys

Key range partitioning is effective for structured datasets with numeric or date-based keys.

29. What is hash partitioning?

Hash partitioning distributes rows across partitions based on a hash function applied to one or more key fields.

How It Works:

  • Determine a partition key (e.g., Customer_ID).
  • The hash function ensures even distribution across partitions.

Benefits:

  • Avoids data skew
  • Suitable for random or non-sequential values
  • Good for parallel processing of large datasets

Use Cases:

  • Customer-based partitioning
  • Product or transaction keys
  • Scenarios where values are not naturally grouped

Hash partitioning helps achieve balanced workloads in large-scale ETL jobs.

30. Explain pipeline partitioning.

Pipeline partitioning improves performance by splitting the data flow into parallel pipelines, where each pipeline executes the same mapping logic independently.

How It Works:

  • Source data is divided into partitions.
  • Each partition flows through the same set of transformations.
  • All partitions run in parallel.
  • Output eventually merges at the target.

Benefits:

  • Major increase in throughput
  • Efficient use of multi-core CPUs
  • Reduces processing bottlenecks

Restrictions:

  • Some transformations do not support partitioning (Rank, Normalizer).
  • Requires careful management of cache and memory.

Pipeline partitioning is one of the most powerful features for optimizing large ETL pipelines.

31. What is source row-level testing?

Source row-level testing is a validation technique used to ensure that the ETL process correctly extracts, transforms, and loads every individual row from the source to the target without data loss or corruption.

Purpose:

  • To verify correctness of mappings at a granular level.
  • To ensure transformation logic (filters, joins, expressions) is applied correctly.
  • To confirm that no rows are unintentionally dropped or duplicated.

How It Works:

  1. Compare row counts between source and target.
  2. Validate sample row data before and after transformations.
  3. Trace incorrect rows using:
    • Verbose data logging
    • Debugger tool
    • Error/reject files
  4. Check SQL overrides for correct filter logic.

Use Cases:

  • Initial ETL pipeline development
  • Change request validations
  • Testing complex mappings (joins, aggregators, routers)
  • Ensuring data quality in production releases

Source row-level testing is crucial for ensuring mapping accuracy and data reliability.

32. Explain bottleneck identification in Informatica.

A bottleneck is any part of the ETL pipeline that slows down overall session performance. Identifying bottlenecks is essential for tuning and optimizing performance.

Common Bottleneck Areas:

1. Source Bottleneck

Occurs when:

  • Source query is slow
  • Table lacks indexes
  • Network latency exists
  • Large volumes are extracted

Fix:

  • Optimize SQL
  • Apply partitioning
  • Pushdown filtering to database

2. Target Bottleneck

Occurs when:

  • Target indexes slow down inserts
  • Constraints cause locking
  • Database is under heavy load

Fix:

  • Drop and recreate indexes
  • Use bulk load
  • Increase commit intervals

3. Lookup Bottleneck

Occurs when:

  • Lookup table is large
  • Cache is insufficient
  • No indexes on lookup keys

Fix:

  • Use persistent or dynamic cache
  • Reduce lookup ports

4. Transformation Bottleneck

Frequently caused by:

  • Aggregator (sort + grouping)
  • Rank (sort required)
  • Sorter (memory intensive)
  • Joiner (two pipelines processed)

Fix:

  • Use sorted input where applicable
  • Push join logic to database
  • Reduce group-by fields

Tools for Bottleneck Detection:

  • Session logs
  • Performance detail reports
  • Workflow Monitor throughput metrics

Effective bottleneck identification ensures high-performance ETL jobs.

33. What is metadata manager in Informatica?

Metadata Manager is a component of Informatica that provides metadata-centric visibility across the entire data environment.

Key Features:

  • Data Lineage:
    Shows source-to-target flow, including mapping rules.
  • Impact Analysis:
    Displays downstream impact of modifying a column or table.
  • Metadata Integration:
    Imports metadata from databases, Hadoop, ETL tools, and BI tools.
  • Metadata Repository:
    Stores metadata such as schemas, mappings, transformations, workflow dependencies.
  • Glossary & Business Dictionary:
    Helps define business terms and link them to technical assets.

Benefits:

  • Enables governance and compliance (GDPR, SOX).
  • Improves data quality by offering transparency.
  • Helps developers understand existing pipelines.
  • Useful for auditors and data governance teams.

Metadata Manager enables organizations to manage data assets efficiently and trace data movement end-to-end.

34. Explain indirect file loading.

Indirect file loading allows Informatica to load data from multiple flat files without explicitly listing each file in the session.

Instead of pointing to a single file, Informatica reads a file list that contains names of all data files.

How It Works:

  1. Create a master file (e.g., filelist.txt) that contains paths of all data files:
/data/sales_jan.csv
/data/sales_feb.csv
/data/sales_mar.csv
  1. In the session properties → Source → choose Indirect option.
  2. Informatica loads each file listed in the file list sequentially.

Use Cases:

  • Daily batch files delivered in folders
  • Multiple vendor files with similar formats
  • Automating multi-file ingestion tasks

Advantages:

  • Reduces manual configuration
  • Automatically handles multiple input files
  • Simplifies maintenance

Indirect loading is essential when dealing with large-scale file-based ETL systems.

35. What is the role of Integration Service in handling transactions?

The Integration Service (IS) manages transaction control for ETL sessions to ensure data consistency and integrity.

How Integration Service Handles Transactions:

1. Transaction Boundaries

Defined by:

  • Commit interval
  • Transaction Control transformation
  • Target-based commit policy

2. Uses Commit and Rollback Operations

  • Commit: Saves data permanently to the target.
  • Rollback: Reverts data when errors occur.

3. Supports Row-level Error Handling

  • Reject files
  • Error tables
  • Session error thresholds

4. Manages Bulk vs Normal Loads

  • Bulk load disables logging
  • Normal load respects constraints and triggers

5. Maintains Transaction State

For recovery:

  • Tracks committed offsets
  • Supports session restartability
  • Logs transaction history

Integration Service ensures reliable, atomic, consistent, isolated, and durable (ACID) data loads.

36. Explain the concept of surrogate keys.

A surrogate key is an artificial and system-generated primary key used in data warehouse dimension tables.

Characteristics:

  • Has no business meaning
  • Usually sequential numbers (e.g., 1, 2, 3...)
  • Not exposed to the business users
  • Never changes
  • Independent of source system keys

Why Surrogate Keys Are Required:

  1. Natural keys may change over time — surrogate keys remain stable.
  2. They support SCD Type 2 by distinguishing historical versions.
  3. Ensure uniqueness across multiple source systems.
  4. Improve join performance due to small integer values.

Implementation in Informatica:

  • Generated using:
    • Sequence Generator transformation
    • Database sequences
    • Lookup on max surrogate key

Surrogate keys form the foundation of dimensional modeling in ETL.

37. What is code page compatibility?

Code page compatibility ensures that character encoding used by the source, target, and Informatica environment are compatible to prevent data corruption.

Example Code Pages:

  • UTF-8
  • ASCII
  • ISO-8859
  • Unicode (UTF-16)

Why It Matters:

If a source is UTF-8 and target is ASCII, characters like “é”, “ø”, “ñ” may fail or get corrupted.

Compatibility Rules:

  • Source code page must be a subset of session code page.
  • Session code page must be compatible with target.
  • Integration Service converts data as needed.

Problems Without Compatibility:

  • Incorrect special characters
  • Data truncation
  • Load failures
  • Inconsistent sorting or filtering

Code page compatibility ensures global, multi-language, and Unicode-safe ETL operations.

38. Explain schema drift handling.

Schema drift refers to unexpected changes in source schema, such as:

  • New columns added
  • Columns removed
  • Data types modified
  • Column order changed

Informatica needs to handle schema drift to avoid failures during ETL.

How Informatica Handles Schema Drift:

1. Dynamic Schemas (IICS Cloud Data Integration)

Supports dynamic mapping tasks that auto-adjust to schema changes.

2. Field Rules & Port-Level Rules

Automatically add or drop ports during mapping execution.

3. Flat File Schema Drift Support

Can process new or missing columns using dynamic port definitions.

4. Error Handling

  • Capture unknown fields
  • Log schema mismatch errors

5. Manual Schema Updates

In PowerCenter, designer requires:

  • Re-importing changed source
  • Propagating changes to transformations
  • Updating mapping logic

Schema drift handling is essential in modern pipelines receiving schema-flexible data such as JSON, Kafka, or cloud streams.

39. What are session properties and why are they important?

Session properties define how a mapping is executed during runtime. They control almost every aspect of ETL execution.

Key Session Property Categories:

1. Source Properties

  • SQL overrides
  • Pre/post SQL
  • Partitioning
  • Reader properties

2. Target Properties

  • Load type (bulk/normal)
  • Commit interval
  • Constraint-based loading
  • Pre/post SQL

3. Transformations Properties

  • Lookup caches
  • Aggregator sorted input
  • Joiner master/detail settings

4. Error Handling

  • Row error thresholds
  • Bad file locations
  • Log levels

5. Performance Settings

  • Buffer memory
  • DTM buffer size
  • Partition numbers

Session properties determine:

  • How data is read
  • How transformations behave
  • How data is loaded
  • How errors are handled
  • Overall performance and stability

They are critical for successful ETL execution.

40. How does Data Masking work in Informatica?

Data Masking is used to protect sensitive information during development, testing, and analytics by transforming data into a non-identifiable format while preserving usability.

How Informatica Data Masking Works:

1. Static Data Masking

Applies masking rules at rest:

  • Masks data stored in databases or files
  • Used for test or development environments

2. Dynamic Data Masking

Applies masking in real time:

  • Data masked during retrieval
  • Used in scenarios where users should not see sensitive data

Masking Techniques:

  1. Substitution – Replace names, addresses with realistic but fake data
  2. Shuffling – Shuffle values across rows
  3. Blurring – Apply random noise (salary + 5%)
  4. Encryption – Encode data
  5. Nulling or Deletion – Remove sensitive values
  6. Tokenization – Replace sensitive values with tokens

Use Cases:

  • Masking credit card numbers
  • Protecting PII (names, addresses, SSNs)
  • Anonymizing healthcare data
  • Complying with GDPR, HIPAA, PCI-DSS

Data masking ensures security, privacy, and regulatory compliance in data handling.

WeCP Team
Team @WeCP
WeCP is a leading talent assessment platform that helps companies streamline their recruitment and L&D process by evaluating candidates' skills through tailored assessments