Reading: Introduction
Jump to Section
other

Informatica PowerCenter Complete Revision Cheat Sheet

Shaik Noor Shaik Noor
Updated on Apr 14, 2026
10 min read
A comprehensive, interview-ready reference covering Informatica PowerCenter architecture, transformations, workflows, partitioning, performance tuning, and real-world best practices for ETL developers and data engineers.

Overview and Core Concepts

Informatica PowerCenter is an enterprise-grade ETL (Extract, Transform, Load) platform used to build, manage, and operate data warehouses and data integration pipelines. It operates on a Service-Oriented Architecture (SOA) model, making it modular, scalable, and enterprise-ready.

  • ETL: Extract data from source, transform using business logic, then load into target (DW, data mart, flat file, etc.).

  • ELT vs ETL: ETL transforms before loading (PowerCenter); ELT loads raw data first then transforms in-target (common in cloud/big data).


Architecture: The Big Picture

PowerCenter architecture has four primary layers: Client Tools, Domain/Services Layer, Repository, and Source/Target Systems.

Domain

The Domain is the primary administrative unit. It is a collection of nodes and services. It includes a Master Gateway Node and optionally multiple Worker Nodes, providing high availability and load balancing.

  • Node: A physical or virtual machine that runs PowerCenter services.

  • Master Gateway Node: Entry point for all service requests in the domain.

  • Worker Node: Executes integration services on behalf of the domain.

Core Services

Service

Role

Repository Service

Manages all metadata: mappings, workflows, transformations, version control.

Integration Service

Executes workflows and data transformation; orchestrates the ETL flow.

Reporting Service

Generates and publishes operational reports.

Web Services Hub

Exposes PowerCenter as a web service to external clients.

Client Tools

  • PowerCenter Designer: Create sources, targets, mappings, and transformations.

  • Workflow Manager: Design and schedule workflows and sessions.

  • Workflow Monitor: Monitor real-time execution of workflows; view logs and statistics.

  • Repository Manager: Manage folders, users, permissions, and object deployment.


Repository and Metadata

The Repository is a relational database (Oracle, SQL Server, DB2, etc.) that stores all PowerCenter metadata.

  • Folder: Logical container for all repository objects (mappings, sessions, workflows).

  • Shared Folder: Objects can be shared and reused across folders.

  • Version Control: Tracks multiple versions of the same object; objects can be in states: Valid, Invalid, or Impacted.

  • Impacted Object: A parent object whose child objects have become invalid.

  • Object Locking: Prevents simultaneous edits; objects must be checked out before editing.


Mappings and Mapplets

Mapping

A Mapping defines the data flow from source to target, including all transformation logic. It is created in the Mapping Designer.

Steps to create a mapping:

  1. Open PowerCenter Designer and go to Mapping Designer.

  2. Import or define the Source definition via Source Analyzer.

  3. Import or define the Target definition via Target Designer.

  4. Drag and link transformations between source and target.

  5. Validate the mapping using Mapping then Validate.

  6. Save and check into the repository.

Mapplet

A Mapplet is a reusable set of transformations packaged as a single object. It can be embedded inside multiple mappings, promoting reuse and consistency.

  • Mapplets have Input and Output transformations (not real sources or targets).

  • Use mapplets for common logic: date formatting, data cleansing, code lookups.


Transformations

Transformations are repository objects that transform source data. They are either Active (change row count) or Passive (do not change row count), and either Connected (part of data flow pipeline) or Unconnected (called explicitly).

Key Transformations at a Glance

Transformation

Type

Purpose

Source Qualifier

Active, Connected

Reads from relational sources; generates SQL; acts as intermediary between source and pipeline.

Expression

Passive, Connected

Row-level calculations: arithmetic, string ops, conditionals. Does not change row count.

Filter

Active, Connected

Filters rows based on a condition; only passing rows continue downstream.

Aggregator

Active, Connected

Performs group-level calculations: SUM, AVG, COUNT, MAX, MIN.

Lookup

Passive or Active, Connected or Unconnected

Retrieves data from a lookup table or flat file based on a match condition.

Joiner

Active, Connected

Joins data from two heterogeneous sources.

Router

Active, Connected

Routes rows to multiple output groups based on conditions.

Sorter

Active, Connected

Sorts data based on specified ports; supports ascending and descending.

Sequence Generator

Passive, Connected

Generates unique numeric values (surrogate keys).

Update Strategy

Active, Connected

Marks rows as Insert, Update, Delete, or Reject.

Rank

Active, Connected

Selects top or bottom N rows within a group.

Union

Active, Connected

Merges data from multiple pipelines into one (like SQL UNION ALL).

Normalizer

Active, Connected

Converts a single row with repeating columns into multiple rows.

Data Masking

Passive, Connected

Replaces real data with masked test data for non-prod environments.

Custom Transformation

Active or Passive, Connected

User-defined logic via DLL or shared library.

Lookup: Connected vs Unconnected

  • Connected Lookup: Part of the data flow; can return multiple columns; supports default values.

  • Unconnected Lookup: Called via :LKP.lookup_name(arg) expression; returns only one port; reusable across multiple transformations.

Active vs Passive Quick Rule

  • Active: Filter, Aggregator, Joiner, Router, Rank, Union, Normalizer, Update Strategy, Sorter.

  • Passive: Expression, Lookup (usually), Sequence Generator, Data Masking.


Sessions and Workflows

Session

A Session is a task in Workflow Manager that runs a specific mapping and connects it to physical source and target connections.

Steps to configure a session:

  1. Open Workflow Manager, go to Task Developer, and create a Session.

  2. Select the mapping to associate with the session.

  3. Set source and target connections (database or file paths).

  4. Configure error handling: error threshold and reject file path.

  5. Set session properties: commit interval, buffer block size, partitioning options.

  6. Save and add to a workflow.

Workflow Task Types

  • Start Task: Always the entry point of any workflow.

  • Session Task: Runs a mapping.

  • Command Task: Runs shell or OS commands.

  • Email Task: Sends email notifications on success or failure.

  • Decision Task: Conditional branching based on a variable or expression.

  • Event Wait / Event Raise: Synchronization between workflow branches.

  • Timer Task: Pauses execution for a set duration or until a specific time.

  • Assignment Task: Assigns values to workflow variables.

Workflow Variables

  • Built-in examples: $PMWorkflowRunId, $PMSessionRunId, $PMTargetTableName.

  • User-defined: Declared in Workflow Manager; used for dynamic routing and conditional logic.


Partitioning and Performance

Partition Types

  • Round-Robin: Distributes rows evenly across all partitions. Best for evenly distributed data.

  • Hash Auto-Keys: Hashes rows based on group-by or key ports; ensures same key goes to same partition.

  • Database Partitioning: Uses native database partition info; reads from each partition node directly.

  • Key Range: Defines explicit key ranges for each partition.

  • Pass-Through: Passes all data through each partition without redistribution.

Performance Tuning Tips

  • Use Sorted Input for Aggregator to reduce cache usage.

  • Enable Persistent Cache in Lookup to avoid rebuilding cache on every run.

  • Apply Filter early in the pipeline to reduce data volume downstream.

  • Connect only required ports; unused ports waste memory and I/O.

  • Use pushdown optimization to push transformation logic to the database engine.

  • Tune DTM Buffer Size and Buffer Block Size for memory optimization.

  • Use bulk loading mode for targets when possible.

  • Minimize Joiner transformation; prefer Source Qualifier SQL JOIN for same-database sources.

  • Fine-tune Lookup, Joiner, and Aggregator cache sizes to reduce disk spill.


Lookup Caching Strategies

Cache Type

Description

Static Cache

Default; cache built once at session start, not updated during run.

Dynamic Cache

Updates cache in real-time as rows are processed; for SCD Type 1 and 2 patterns.

Persistent Cache

Cache saved to disk between runs; avoids rebuild on next execution.

Shared Cache

Multiple Lookup transformations share one cache file; reduces memory footprint.

No Cache

Queries DB for every row; use only for very small lookup tables.


Slowly Changing Dimensions (SCD)

  • SCD Type 1: Overwrite old value with new value. No history retained. Use Update Strategy with DD_UPDATE.

  • SCD Type 2: Add a new row for each change; retain full history. Uses surrogate key, effective date, and current flag. Flow: Lookup (find existing) then Router (new vs changed) then Update Strategy.

  • SCD Type 3: Add a new column to store the previous value. Limited history only.


Error Handling and Logging

  • Reject File: Rows that fail target constraints are written to a .bad reject file.

  • Error Threshold: Set max allowable errors before session aborts.

  • Session Log: Detailed log per session; shows rows read, written, rejected, and errors.

  • Workflow Log: High-level log for workflow execution events.

  • Pre/Post-Session SQL: Run SQL commands before and after a session (truncate target, update audit flags).

  • Recovery: Sessions can resume from the last checkpoint rather than restarting fully.


Pushdown Optimization (PDO)

PDO pushes transformation logic to the source or target database instead of the Integration Service, reducing data movement and improving speed.

  • Source-side PDO: Logic pushed to source database.

  • Target-side PDO: Logic pushed to target database.

  • Full PDO: Entire mapping logic runs in the database as generated SQL.

  • Not all transformations support PDO (e.g., Joiner across heterogeneous sources does not).


Common PowerCenter Expressions and Syntax

SQL
-- IIF (Conditional)
IIF(salary > 50000, 'High', 'Low')

-- DECODE (Multi-value switch)
DECODE(dept_id, 1, 'Sales', 2, 'HR', 'Other')

-- String functions
LTRIM(RTRIM(name))
SUBSTR(name, 1, 10)
UPPER(city)
CONCAT(first_name, ' ', last_name)

-- Date functions
TO_DATE('2026-01-01', 'YYYY-MM-DD')
ADD_TO_DATE(hire_date, 'MM', 6)
DATE_DIFF(SYSDATE, hire_date, 'DD')

-- NULL handling
ISNULL(column_name)
NVL(column_name, 'default_value')

-- Aggregate (in Aggregator)
SUM(sales_amount)
COUNT(*)
AVG(revenue)
MAX(order_date)

-- Unconnected Lookup call (in Expression)
:LKP.LKP_EMPLOYEE(emp_id)

-- Update Strategy flags
DD_INSERT  -- 0
DD_UPDATE  -- 1
DD_DELETE  -- 2
DD_REJECT  -- 3

Parameters and Variables

  • Mapping Parameters ($$ParamName): Defined in mapping; set at session runtime via parameter file; constant during session.

  • Mapping Variables ($$VarName): Can change during session run; saved between runs (ideal for incremental loads).

  • Session Parameters ($ParamName): Connection names, file names defined at session level.

  • Workflow Variables: Used for conditional branching and assignment tasks within a workflow.

SQL
[session_name.mapping_name]
$$START_DATE=2026-01-01
$$END_DATE=2026-01-31
$DBConnection_Source=SRC_ORACLE_PROD

Incremental Loading Pattern

  1. Use a Mapping Variable (e.g., $$LAST_EXTRACT_DATE) to store the last successful load timestamp.

  2. In Source Qualifier, add filter: WHERE modified_date > $$LAST_EXTRACT_DATE.

  3. At end of session, use SETVARIABLE($$LAST_EXTRACT_DATE, SYSDATE) to update the variable.

  4. Variable value is persisted in the repository between runs automatically.


Common Mistakes and Pitfalls

  • Connecting unused ports: wastes memory and CPU; only link necessary ports.

  • No early filtering: placing Filter too late causes unnecessary downstream processing.

  • Static cache on frequently changing lookup tables: causes stale data; use dynamic or no-cache instead.

  • Using Joiner for same-database sources: use Source Qualifier SQL JOIN for better performance.

  • Forgetting Pre/Post-session SQL: leads to duplicate data loads; always truncate or flag target appropriately.

  • Not using sorted input with Aggregator: increases cache size dramatically on large datasets.

  • Hardcoding file paths: use parameter files and mapping parameters for portability.

  • Ignoring reject files: causes silent data loss; always monitor reject files after production runs.

  • Over-partitioning: too many partitions can cause thread overhead; tune based on available CPU cores.


Top Interview Questions and Answers 2026

Q1: What is the difference between a connected and unconnected lookup?

A connected lookup is part of the main data flow pipeline, can return multiple columns, and supports default values. An unconnected lookup is called via :LKP.lookup_name() syntax from within another transformation and can only return one output port.

Q2: What is the difference between Router and Filter transformations?

Filter has only one output group: rows either pass or are dropped. Router supports multiple output groups with different conditions, routing rows appropriately. Router also has a default group that catches rows not matching any condition.

Q3: How do you improve Aggregator performance?

Sort the input data before the Aggregator and enable the Sorted Input option. This reduces the amount of data that needs to be cached. Also filter unnecessary rows before aggregation and connect only needed ports.

Q4: What is a Source Qualifier and what does it do?

Source Qualifier is an active, connected transformation that acts as an intermediary between Informatica and relational source systems. It generates the SQL query used to extract data and supports user-defined SQL overrides, joins between tables from the same source, and filter conditions.

Q5: Explain SCD Type 2 implementation in PowerCenter.

SCD Type 2 retains full historical data by inserting new rows for changes. Implementation: (1) Lookup existing record by business key, (2) Router to separate new vs changed records, (3) Update Strategy with DD_UPDATE for old record to set end date and current flag to false, (4) Insert new record with DD_INSERT using a new surrogate key and start date.

Q6: What is the difference between a mapping parameter and a mapping variable?

A mapping parameter ($$ParamName) is set once before a session runs and stays constant during execution. A mapping variable ($$VarName) can change in value during a session run and its final value is saved back to the repository for use in the next run, making it ideal for incremental loads.

Q7: What are the partitioning types and when do you use them?

Round-Robin distributes data evenly. Hash Auto-Keys ensures same-key rows go to the same partition, which is required before Aggregator or Joiner. Database Partitioning reads natively from partitioned DB tables. Key Range splits by explicit key boundaries. Pass-Through is a no-op used at pipeline boundaries.

Q8: What is pushdown optimization?

PDO pushes transformation logic to the source or target database, reducing data movement over the network. Instead of pulling data into the Integration Service for transformation, the IS generates SQL or stored procedures and runs them directly in the database engine.

Q9: What is the difference between a session and a workflow?

A session is a task that runs a single mapping. A workflow orchestrates multiple tasks (sessions, commands, emails, decisions) in a defined order, managing the full ETL pipeline end-to-end.

Q10: How do you handle errors and rejects in PowerCenter?

Set an error threshold in session properties. Rejected rows are written to a reject (.bad) file. Session and workflow logs capture error details. Use Pre/Post-session SQL for cleanup. Enable session recovery so the Integration Service can resume from the last checkpoint rather than restarting from scratch.


Quick-Reference Best Practices

  • Always validate mappings before saving and deploying to production.

  • Use parameter files for environment-specific configurations such as source paths, DB connections, and date ranges.

  • Enable persistent lookup caches for frequently queried, rarely changing tables.

  • Schedule workflows during off-peak hours for large data loads.

  • Use Reusable Transformations for common logic to avoid duplication.

  • Monitor session logs regularly; set up Email tasks for failure alerts.

  • Use version control in the repository; check out objects before editing.

  • Document mappings with mapping-level and transformation-level descriptions.

  • Apply bulk loading for large target inserts when rollback is not required.

  • Test with sample data in development and validate row counts match expectations end-to-end.

Home Videos Quiz Blog