IBM data integration · field notes · 2026

Data & AI
landscape

1

Storage paradigms

Where data lives, how it's shaped, and who it serves. The foundation before anything else makes sense.

Data lake Warehouse Lakehouse Data mart OLTP
Data storage paradigms comparison Comparison of data lake, data warehouse, data lakehouse, data mart, and operational database paradigms Data lake Raw, unstructured/semi-structured Store everything, schema on read Cheap storage · S3, ADLS, GCS Best for: ML, EDA, archival Data warehouse Structured, curated, schema on write ETL pipeline into rigid tables Fast analytics · Snowflake, Redshift Best for: BI, reporting, dashboards merges into Data lakehouse Open table formats (Delta Lake, Iceberg, Hudi) on cheap object storage ACID transactions + columnar layout + schema enforcement Supports SQL analytics AND ML workloads from one store Databricks, Snowflake, BigLake · Best for: unified analytics platform Data mart Subset of warehouse/lakehouse Scoped to one team or domain Best for: self-serve domain BI Operational DB (OLTP) Postgres, MySQL, Db2 etc. Row-store, low latency, CRUD Best for: app transactions CDC / ETL ← more flexibility (schema-on-read) structure + governance (schema-on-write) → ← cheaper, slower queries faster queries, higher cost → Click any card to go deeper
Key points
Store everythingData lake: raw, cheap, schema on read. Best for ML and exploration.
Curate firstWarehouse: structured, fast SQL, expensive ETL. Best for BI dashboards.
Best of bothLakehouse (Delta/Iceberg): ACID + cheap storage. Where modern stacks are heading.
Your contextAndromeda touches Db2 (OLTP). Data flows from OLTP → lake/warehouse, not reverse.
2

watsonx product map

watsonx is a brand umbrella. Three core pillars, two sub-products under data, two distinct orchestration things that are easy to confuse.

watsonx.ai watsonx.data watsonx.governance Cloud Pak Orchestrate
watsonx product map (accurate) Structural map of the watsonx portfolio: three core pillars (watsonx.ai, watsonx.data, watsonx.governance), watsonx.orchestrate as a separate product, and Orchestration Pipelines as a platform feature. watsonx.data includes watsonx.data integration and watsonx.data intelligence sub-products. watsonx IBM AI + data platform — brand umbrella, not a single product watsonx.ai Model studio Train, tune, deploy models IBM Granite + open-source watsonx.data Open data lakehouse Catalog — knows where data lives Doesn't store it, connects to it watsonx.governance AI lifecycle governance Risk, compliance, bias monitoring Governs Gen AI + ML models Cloud Pak for Data — data hub sub-products under watsonx.data watsonx.data integration Moving data StreamSets DataStage DataBand Unstructured data watsonx.data intelligence Data quality + governance Data quality Data governance Data product hub Separate products in the watsonx portfolio watsonx.orchestrate AI agent control plane Build + deploy + manage agents 700+ enterprise system connectors HR, sales, procurement use cases Orchestration Pipelines Feature inside Cloud Pak for Data Sequences DataStage jobs + ML models Graphical canvas, train → deploy flow Not the same as watsonx.orchestrate what you clarified watsonx.orchestrate = IBM product, business agent automation (HR, sales, procurement) Orchestration Pipelines = platform feature, sequences ML + DataStage jobs Same name family, completely different things confirm which one your team means by "standalone orchestrate"
Key points
Three pillarswatsonx.ai (models), watsonx.data (lakehouse), watsonx.governance (risk + compliance).
Your team's layerData integration + data intelligence live under Cloud Pak for Data — StreamSets, DataStage, DataBand.
Name collisionwatsonx.orchestrate ≠ Orchestration Pipelines. One is a product, one is a platform feature.
It's a brandwatsonx is marketing. The actual products are distinct, separately licensed, separately deployed.
3

On-prem vs SaaS

Same products, completely different stacks underneath. Enterprise clients often run on-prem for data residency. IBM supports hybrid.

Cloud Pak for Data OpenShift IBM Cloud VPC licensing Hybrid
IBM watsonx: on-premises vs SaaS deployment Side-by-side comparison of IBM watsonx on-premises deployment via Cloud Pak for Data on Red Hat OpenShift versus SaaS deployment on IBM Cloud, with a hybrid zone in the middle. On-premises / self-managed SaaS (IBM Cloud) vs runtime + infrastructure Red Hat OpenShift Kubernetes platform you install + manage IBM Cloud (managed) IBM runs the infra — you just use it platform layer Cloud Pak for Data Software you install on OpenShift watsonx as a Service Fully managed, subscribe and go watsonx products available watsonx.ai watsonx.data watsonx.governance DataStage · StreamSets · DataBand watsonx.ai watsonx.data watsonx.governance watsonx.orchestrate licensing model VPC (virtual processor cores) Capacity you buy upfront, fixed Resource units / token consumption Pay for what you use, scales up/down key tradeoffs + Full data residency control + Works air-gapped / offline – You manage infra + upgrades + IBM manages everything + Always on latest version – Data leaves your perimeter IBM supports hybrid: e.g. run DataStage on-prem for sensitive data, use watsonx.governance SaaS for model metadata
Key points
On-prem stackRed Hat OpenShift → Cloud Pak for Data → watsonx products. You manage everything.
SaaS stackIBM Cloud → watsonx as a Service. IBM manages infra, you subscribe and use.
Licensing differsOn-prem: VPCs (fixed capacity). SaaS: resource units / tokens (consumption-based).
Andromeda relevanceEnterprise Db2 clients are often on-prem. Design must work in self-managed, constrained environments.
4

Data integration tools

What moves data, what sequences it, and what watches it. Three distinct jobs that are easy to conflate.

DataStage StreamSets DataBand Kafka Flink
Data integration tools comparison Comparison of DataStage, DataBand, StreamSets, and related data pipeline tools Batch / ETL Streaming / real-time IBM DataStage Enterprise batch ETL, 30+ years old Visual flow canvas, parallel jobs Tight Db2 + mainframe integration Runs on IBM Cloud Pak for Data DataBand (IBM) Pipeline observability & monitoring Not a pipeline tool — tracks them Data quality, lineage, anomaly alerts Works alongside DataStage, Airflow StreamSets (IBM) Modern batch + streaming pipelines Smart pipelines — drift detection Deploy via Docker or Kubernetes Acquired by IBM 2021 Kafka / Flink / Spark Open-source streaming engines Kafka = message bus, Flink = compute StreamSets can sit on top of these More DIY, more control How they relate to each other Source DB DataStage / StreamSets Destination DataBand monitors ↑ observes Key distinctions DataStage moves data (batch, enterprise, Db2-native) StreamSets moves data (modern, streaming-capable, K8s-native) DataBand watches data pipelines — it's observability, not movement
Key points
DataStage30-year-old enterprise batch ETL. Deep Db2 integration. Runs on Cloud Pak. Your team's workhorse.
StreamSetsModern batch + streaming. Drift detection. K8s-native. IBM acquired 2021. Your other project.
DataBandNot a pipeline tool — it's observability. Watches pipelines run. Alerts on quality issues.
Kafka / FlinkOpen-source engines. StreamSets can sit on top of these for streaming workloads.
5

Band progression

Band 6 is the start. Foundation cert is the first gate. Two project write-ups, 13 points, manager review, SME sign-off.

Band 6 → 7 Foundation cert 13 points Giveback Rubric
IBM career band progression: band 6 to band 7+ Visual roadmap of IBM career bands, developer profession certifications, required skills, and the application process for promotion from band 6 to band 7. Band 6 You are here · entry level ~1 yr Band 7 Cross-team scope Band 8 Technical leadership Band 9+ Industry influence participation expands as you grow → Innovation → Responsibility to others → Technical leadership → Industry influence → Rewards + honors developer profession certifications Level 1 — Foundation Build + apply skills, deliver client value · Band 6→7 Level 2 — Experienced Understand client needs, high-quality solutions · Band 7→8 Level 3 — Expert Technical leadership + industry giveback · Band 8→9 skills assessed Stay curious Agile practices Communication Problem solving Technical leadership Managing risks Knowledge sharing Giveback (teaching, speaking, IC) Formal training Outside-team participation how to apply for Foundation certification (band 6 → 7) 1. Write it up 2 key projects Your role + learnings Cover all skills 2. Hit 13 pts Points across rubric skills categories Check the rubric 3. Manager review Jonathan reviews + endorses your submission 4. SME judgment Subject matter experts verify you meet cert requirements tips for your Foundation application Read the rubric first — write your project stories to match the skills categories explicitly Don't just describe what you did — explain your role, decisions made, and what you learned Start collecting evidence now: contributions, giveback moments, cross-team work Your eng background is a strength — use it to demonstrate technical leadership from day one Access the application via IBM's internal career portal (ask Jonathan for the direct link)
Key points
Foundation firstLevel 1 cert is the Band 6→7 gate. Write two project stories that explicitly map to the rubric skills.
13 pointsPoints accumulate across skill categories. Read the rubric before writing — reverse-engineer your stories to it.
Giveback countsTeaching, speaking, IC contributions signal readiness. Start collecting evidence now.
Your edgeEngineering background = credible technical leadership claims from day one. Use it explicitly in your write-up.