Salesforce Data Cloud (formerly CDP) has matured from a marketing tool into a platform-wide data foundation. The architectural decisions made early - around data streams, identity resolution, and calculated insights - have enormous downstream impact.
What Data Cloud Actually Is
Data Cloud is a native Salesforce data platform that:
- Ingests data from any source (S3, Snowflake, Salesforce orgs, real-time API streams)
- Unifies customer profiles via identity resolution across email, phone, CookieID, ECID
- Activates unified profiles into Salesforce CRM, Marketing Cloud, Commerce Cloud, and external platforms
It runs inside your Salesforce org but has its own metadata model. Every object type is a Data Model Object (DMO) mapped to the standard Customer 360 data model.
Data Stream Design Patterns
The first architectural decision is how to structure your data streams. There are three categories:
Salesforce Bundles - pre-built connectors for Sales Cloud, Service Cloud, Commerce Cloud. These are the easiest to set up but require you to understand what field mappings get auto-generated.
Cloud Storage - S3, Azure Blob, GCS. Ideal for high-volume transactional data (orders, events, web sessions). Use Parquet format at scale; CSV works for smaller datasets.
API & Streaming - real-time events via the Ingestion API. Critical for behavioural signals (page views, cart events) where latency matters for personalisation.
S3 (order history) ─────────────┐
Commerce Cloud (live orders) ────┼──► Unified Individual (Identity Resolution)
Marketing Cloud (email events) ──┤ │
Web Analytics (sessions) ────────┘ ▼
Calculated Insights
│
Segments / Activations
Identity Resolution: The Hard Part
Identity resolution is where most projects stumble. The rules you configure determine whether a customer browsing on mobile and purchasing on desktop gets unified into one profile or counted as two.
Match rules operate in priority order. A typical configuration:
| Priority | Rule | Field |
|---|---|---|
| 1 | Exact | Email (normalised lower-case) |
| 2 | Exact | Phone (E.164 format) |
| 3 | Fuzzy | Name + PostalCode |
| 4 | Probabilistic | Device graph (ECID + CookieID) |
Key mistakes I’ve seen:
- Using phone as a primary key without normalising to E.164 first -
+44 7444 123456,07444123456, and447444123456all represent the same number but won’t match - Running probabilistic device matching too aggressively and merging household members into a single profile
- Not accounting for shared email addresses (family accounts, info@ addresses)
Reconciliation rules determine which field value “wins” when two profiles merge. For Commerce Cloud, set order history to Most Recent and loyalty tier to Most Frequent (not most recent - customers shouldn’t lose status from one bad month).
Calculated Insights
Calculated Insights (CI) are ANSI SQL queries that run on Data Cloud’s data lake and materialise results back into the DMO. They’re the right place to compute metrics that span sources.
-- Lifetime value across online and in-store
SELECT
ind.ssot__Id__c AS individualId,
SUM(ord.TotalAmount__c) AS lifetimeValue,
COUNT(DISTINCT ord.Id__c) AS orderCount,
MAX(ord.OrderedDate__c) AS lastOrderDate,
DATEDIFF(day, MAX(ord.OrderedDate__c), CURRENT_DATE) AS daysSinceLastOrder
FROM ssot__Individual__dlm ind
JOIN UnifiedOrderLineItem__dlm ord ON ind.ssot__Id__c = ord.IndividualId__c
GROUP BY ind.ssot__Id__c
Run CI on a scheduled cadence (hourly for active segments, daily for historical). Avoid materialising raw transaction rows - aggregate at the CI layer.
Segmentation for Commerce Cloud
Once profiles are unified and insights are calculated, segmentation is fast. The key patterns for Commerce:
Lapsed purchasers - daysSinceLastOrder > 90 AND lifetimeValue > 200 → reactivation flow
High LTV at risk - lifetimeValue > 500 AND daysSinceLastOrder > 45 → winback with loyalty offer
Browse-to-buy propensity - combine web session data with purchase history for predictive scoring
Segments activate into Marketing Cloud journeys via the native activation target, or into Commerce Cloud’s personalisation engine via Data Actions.
Data Actions and Real-Time Activation
Data Actions fire when a profile enters or exits a segment, or when a calculated insight crosses a threshold. Common uses:
- Triggering a Service Cloud case when a high-value customer raises a complaint (cross-cloud signal)
- Updating a custom field on the Account to reflect LTV tier (so Sales reps see it in their UI)
- Calling an external webhook to update a loyalty platform
Governance and Data Retention
Data Cloud stores data in a managed data lake with retention policies configurable per data stream. Default is 90 days for streaming data; you can extend to 3 years for transactional records.
Key governance decisions:
- Define your retention policy before going live - retroactively deleting data is possible but expensive
- Map every data stream to a data category (personal, sensitive, anonymous) in the Data Cloud consent model
- Use individual-level consent objects to honour GDPR deletion requests - Data Cloud has a native Delete API that removes across all unified profiles
Performance Considerations
Data Cloud query performance degrades predictably with segment complexity. Rules of thumb:
- Segments referencing only DMO attributes: sub-second refresh
- Segments joining to Calculated Insights: 1–5 minutes per million profiles
- Segments using Related Attribute filters across multiple DMOs: can hit timeout at scale - denormalise into a CI first
The unified individual count is your primary cost driver. Model your identity resolution rules conservatively to start - you can loosen them as data quality improves.
Integration with the Rest of the Stack
Data Cloud sits at the centre of a modern Salesforce architecture:
Marketing Cloud ◄──── Activation Targets ────► Service Cloud
│
Commerce Cloud ◄┤ Data Cloud ├► Einstein Personalisation
│
External (CDW, BI tools) via Data Share (Snowflake)
The Data Share feature lets you expose your unified profiles directly to Snowflake without export - a significant advantage if your analytics team already lives there.