IBM data + AI — field notes

1

Data architecture

Storage paradigms

Where data lives, how it's shaped, and who it serves. The foundation before anything else makes sense.

Data lake Warehouse Lakehouse Data mart OLTP

Key points

Store everythingData lake: raw, cheap, schema on read. Best for ML and exploration.

Curate firstWarehouse: structured, fast SQL, expensive ETL. Best for BI dashboards.

Best of bothLakehouse (Delta/Iceberg): ACID + cheap storage. Where modern stacks are heading.

Your contextAndromeda touches Db2 (OLTP). Data flows from OLTP → lake/warehouse, not reverse.

2

IBM platform

watsonx product map

watsonx is a brand umbrella. Three core pillars, two sub-products under data, two distinct orchestration things that are easy to confuse.

watsonx.ai watsonx.data watsonx.governance Cloud Pak Orchestrate

Key points

Three pillarswatsonx.ai (models), watsonx.data (lakehouse), watsonx.governance (risk + compliance).

Your team's layerData integration + data intelligence live under Cloud Pak for Data — StreamSets, DataStage, DataBand.

Name collisionwatsonx.orchestrate ≠ Orchestration Pipelines. One is a product, one is a platform feature.

It's a brandwatsonx is marketing. The actual products are distinct, separately licensed, separately deployed.

3

Deployment model

On-prem vs SaaS

Same products, completely different stacks underneath. Enterprise clients often run on-prem for data residency. IBM supports hybrid.

Cloud Pak for Data OpenShift IBM Cloud VPC licensing Hybrid

Key points

On-prem stackRed Hat OpenShift → Cloud Pak for Data → watsonx products. You manage everything.

SaaS stackIBM Cloud → watsonx as a Service. IBM manages infra, you subscribe and use.

Licensing differsOn-prem: VPCs (fixed capacity). SaaS: resource units / tokens (consumption-based).

Andromeda relevanceEnterprise Db2 clients are often on-prem. Design must work in self-managed, constrained environments.

4

Integration tooling

Data integration tools

What moves data, what sequences it, and what watches it. Three distinct jobs that are easy to conflate.

DataStage StreamSets DataBand Kafka Flink

Key points

DataStage30-year-old enterprise batch ETL. Deep Db2 integration. Runs on Cloud Pak. Your team's workhorse.

StreamSetsModern batch + streaming. Drift detection. K8s-native. IBM acquired 2021. Your other project.

DataBandNot a pipeline tool — it's observability. Watches pipelines run. Alerts on quality issues.

Kafka / FlinkOpen-source engines. StreamSets can sit on top of these for streaming workloads.

5

Career development

Band progression

Band 6 is the start. Foundation cert is the first gate. Two project write-ups, 13 points, manager review, SME sign-off.

Band 6 → 7 Foundation cert 13 points Giveback Rubric

Key points

Foundation firstLevel 1 cert is the Band 6→7 gate. Write two project stories that explicitly map to the rubric skills.

13 pointsPoints accumulate across skill categories. Read the rubric before writing — reverse-engineer your stories to it.

Giveback countsTeaching, speaking, IC contributions signal readiness. Start collecting evidence now.

Your edgeEngineering background = credible technical leadership claims from day one. Use it explicitly in your write-up.