Building with Salesforce Data Cloud: A Practitioner's Architecture Guide

Salesforce Data Cloud (formerly CDP) has matured from a marketing tool into a platform-wide data foundation. The architectural decisions made early - around data streams, identity resolution, and calculated insights - have enormous downstream impact.

What Data Cloud Actually Is

Data Cloud is a native Salesforce data platform that:

Ingests data from any source (S3, Snowflake, Salesforce orgs, real-time API streams)
Unifies customer profiles via identity resolution across email, phone, CookieID, ECID
Activates unified profiles into Salesforce CRM, Marketing Cloud, Commerce Cloud, and external platforms

It runs inside your Salesforce org but has its own metadata model. Every object type is a Data Model Object (DMO) mapped to the standard Customer 360 data model.

Data Stream Design Patterns

The first architectural decision is how to structure your data streams. There are three categories:

Salesforce Bundles - pre-built connectors for Sales Cloud, Service Cloud, Commerce Cloud. These are the easiest to set up but require you to understand what field mappings get auto-generated.

Cloud Storage - S3, Azure Blob, GCS. Ideal for high-volume transactional data (orders, events, web sessions). Use Parquet format at scale; CSV works for smaller datasets.

API & Streaming - real-time events via the Ingestion API. Critical for behavioural signals (page views, cart events) where latency matters for personalisation.

S3 (order history) ─────────────┐
Commerce Cloud (live orders) ────┼──► Unified Individual (Identity Resolution)
Marketing Cloud (email events) ──┤         │
Web Analytics (sessions) ────────┘         ▼
                                    Calculated Insights
                                           │
                                    Segments / Activations

Identity Resolution: The Hard Part

Identity resolution is where most projects stumble. The rules you configure determine whether a customer browsing on mobile and purchasing on desktop gets unified into one profile or counted as two.

Match rules operate in priority order. A typical configuration:

Priority	Rule	Field
1	Exact	Email (normalised lower-case)
2	Exact	Phone (E.164 format)
3	Fuzzy	Name + PostalCode
4	Probabilistic	Device graph (ECID + CookieID)

Key mistakes I’ve seen:

Using phone as a primary key without normalising to E.164 first - +44 7444 123456, 07444123456, and 447444123456 all represent the same number but won’t match
Running probabilistic device matching too aggressively and merging household members into a single profile
Not accounting for shared email addresses (family accounts, info@ addresses)

Reconciliation rules determine which field value “wins” when two profiles merge. For Commerce Cloud, set order history to Most Recent and loyalty tier to Most Frequent (not most recent - customers shouldn’t lose status from one bad month).

Calculated Insights

Calculated Insights (CI) are ANSI SQL queries that run on Data Cloud’s data lake and materialise results back into the DMO. They’re the right place to compute metrics that span sources.

-- Lifetime value across online and in-store
SELECT
    ind.ssot__Id__c                           AS individualId,
    SUM(ord.TotalAmount__c)                   AS lifetimeValue,
    COUNT(DISTINCT ord.Id__c)                 AS orderCount,
    MAX(ord.OrderedDate__c)                   AS lastOrderDate,
    DATEDIFF(day, MAX(ord.OrderedDate__c), CURRENT_DATE) AS daysSinceLastOrder
FROM ssot__Individual__dlm ind
JOIN UnifiedOrderLineItem__dlm ord ON ind.ssot__Id__c = ord.IndividualId__c
GROUP BY ind.ssot__Id__c

Run CI on a scheduled cadence (hourly for active segments, daily for historical). Avoid materialising raw transaction rows - aggregate at the CI layer.

Segmentation for Commerce Cloud

Once profiles are unified and insights are calculated, segmentation is fast. The key patterns for Commerce:

Lapsed purchasers - daysSinceLastOrder > 90 AND lifetimeValue > 200 → reactivation flow High LTV at risk - lifetimeValue > 500 AND daysSinceLastOrder > 45 → winback with loyalty offer Browse-to-buy propensity - combine web session data with purchase history for predictive scoring

Segments activate into Marketing Cloud journeys via the native activation target, or into Commerce Cloud’s personalisation engine via Data Actions.

Data Actions and Real-Time Activation

Data Actions fire when a profile enters or exits a segment, or when a calculated insight crosses a threshold. Common uses:

Triggering a Service Cloud case when a high-value customer raises a complaint (cross-cloud signal)
Updating a custom field on the Account to reflect LTV tier (so Sales reps see it in their UI)
Calling an external webhook to update a loyalty platform

Governance and Data Retention

Data Cloud stores data in a managed data lake with retention policies configurable per data stream. Default is 90 days for streaming data; you can extend to 3 years for transactional records.

Key governance decisions:

Define your retention policy before going live - retroactively deleting data is possible but expensive
Map every data stream to a data category (personal, sensitive, anonymous) in the Data Cloud consent model
Use individual-level consent objects to honour GDPR deletion requests - Data Cloud has a native Delete API that removes across all unified profiles

Performance Considerations

Data Cloud query performance degrades predictably with segment complexity. Rules of thumb:

Segments referencing only DMO attributes: sub-second refresh
Segments joining to Calculated Insights: 1–5 minutes per million profiles
Segments using Related Attribute filters across multiple DMOs: can hit timeout at scale - denormalise into a CI first

The unified individual count is your primary cost driver. Model your identity resolution rules conservatively to start - you can loosen them as data quality improves.

Integration with the Rest of the Stack

Data Cloud sits at the centre of a modern Salesforce architecture:

Marketing Cloud ◄──── Activation Targets ────► Service Cloud
                              │
              Commerce Cloud ◄┤ Data Cloud ├► Einstein Personalisation
                              │
                    External (CDW, BI tools) via Data Share (Snowflake)

The Data Share feature lets you expose your unified profiles directly to Snowflake without export - a significant advantage if your analytics team already lives there.