Skip to content

Use Case: Privacy-Preserving Consumer Behavior Model Training

Version: v1.0 (Draft) Status: Draft Authoritative: Informative

Scenario

A model operator wants to train a consumer-behavior model using protected receipt and purchase-history data distributed across many Vaults.

The model goal may include:

  1. purchase-propensity prediction
  2. category or brand affinity estimation
  3. coupon-response modeling
  4. privacy-preserving user clustering

This use case is informative. It shows how the revised SCP and SCS specifications map onto a training workflow that uses protected records without turning raw consumer purchase data into unrestricted centralized plaintext.

Canonical protocol terms used here follow SCP Protocol Overview and SCP Core Spec.

Why This Use Case Matters

This scenario is useful because it demonstrates several important properties of the revised Symphony stack:

  1. model-training intent is distinct from protocol-recognized training work
  2. protected source records may remain inside authorized Vault or TEE boundaries
  3. training may depend on derived features, query projections, or protected computation rather than unrestricted record export
  4. model artifacts and evaluation results must be linked to replayable evidence and version context
  5. the resulting model is downstream from governed training truth rather than a substitute for protocol truth

Entry Distinction

This scenario has two different layers:

  1. business-side or research-side desire to train a model
  2. protocol-recognized training work with fixed policy, semantic, and privacy constraints

The request to "train a consumer-behavior model on purchase data" is not yet a protocol event by itself. It becomes part of the SCP lifecycle only when the system admits an authorized training task with explicit feature scope, version context, and privacy-preserving execution rules.

Unlike the receipt-upload scenario and even the coupon-campaign scenario, this use case involves:

  1. a composite task with multiple iterative training rounds, not a single atomic execution
  2. multi-Vault feature construction where each Vault contributes locally derived features
  3. cross-Vault gradient or model-update aggregation under federated learning semantics
  4. significant privacy budget consumption across many data subjects over multiple rounds
  5. TEE-attested execution for every training round
  6. model artifact delivery to a model registry, not a single result to a task submitter

From a node perspective, this scenario spans both master nodes and enterprise nodes. Master nodes drive admission, orchestration, aggregation, and settlement. Enterprise nodes host the Vaults that store training data and run local computation inside TEE boundaries. The two node types cooperate through the inter-node communication layer defined in the SCS architecture.

Protocol View

From the protocol perspective, this scenario transforms training intent into a verifiable and governable model-training result.

1. Training Intent and Scope Definition

The requesting party defines the training goal and constraints, such as:

  1. target prediction or clustering task
  2. eligible user or market scope
  3. allowed feature families
  4. model output type
  5. evaluation threshold or acceptance criteria

At this point, the request is still a business or research intent rather than protocol truth.

2. Task Admission, Classification, and Constraint Binding

The system turns that request into authorized protocol work.

At a high level, admission establishes:

  1. who is authorized to submit the training task
  2. that the task is classified as task_class: train, which activates composite task semantics
  3. which record domains, feature families, or query projections may be used
  4. which privacy, policy, and version constraints apply, including total privacy budget allocation across all training rounds
  5. what outputs count as the protocol-evaluated training result
  6. the iteration policy: maximum training rounds, convergence threshold, and early-stop conditions

Upon successful admission, the Admission Plane produces a TaskEnvelope that carries all downstream-critical fields, including:

  1. iteration_policy: maximum rounds, convergence thresholds, and early-stop conditions for the composite training lifecycle
  2. coordination_envelope: the multi-Vault coordination parameters specifying participating Vaults, quorum rules, and aggregation method
  3. budget_lock_ref: reference to the SYM locked in the Escrow Contract to cover the estimated cost of multi-round, multi-Vault computation

Before admission completes, the protocol also verifies:

  1. that participating Vaults hold records from data subjects who have granted consent for training usage scope
  2. that the estimated total privacy budget consumption across all rounds does not exceed the available budget for affected data subjects
  3. that sufficient monetary budget is locked to cover the estimated cost of multi-round, multi-Vault computation

Only after this step does the training workflow become protocol-recognized work.

3. Semantic Resolution, Feature Derivation, and Attribute Composition

Before training can proceed, the protocol must resolve the semantic meaning of the task under the active context.

That may include determining:

  1. which domains define the protected source records (typically commerce, behavior, identity)
  2. which canonical attributes are directly_queryable and can serve as raw features without derivation
  3. which features require explicit DerivationRule definitions — for example, purchase_frequency derived via an aggregate rule over the commerce domain's transaction records, or brand_affinity_score derived via a composite rule combining purchase counts across categories
  4. which AttributeComposition declarations are needed — training features typically combine attributes from multiple domains, requiring cross-domain composition with declared combined_privacy_class
  5. which local-only signals must remain private and whether any qualify as vault_scoped_query attributes
  6. which semantic_version and policy_version govern the run

This step matters because the revised SCP requires shared meaning, explicit derivation rules, and declared attribute composition before protected computation can be trusted. In the training context, this means the feature engineering plan must be protocol-visible and auditable, not a black-box transformation hidden inside the execution runtime.

4. Multi-Vault Feature Construction and Iterative Training Execution

Once the meaning is resolved, the composite task begins its iterative execution. Each training round is realized as a sub-task under the parent composite task, following the composite and iterative task semantics defined in the SCP Core Spec.

The protocol manages a three-level hierarchy for this task:

  1. Parent task: the root composite task admitted at the protocol level, managed by the master node Task Orchestrator
  2. Sub-task (round): each iterative training round, coordinated by the master node Multi-Vault Coordinator which fans out to participating Vaults
  3. Execution slice (per-Vault): each Vault's contribution within a single round, executed by the enterprise node Computation Runtime inside TEE

For a multi-Vault training task with N rounds and M Vaults, the master node manages: 1 parent task + N sub-tasks + (N x M) execution slices. Each level has its own lifecycle, evidence, and settlement context.

Per-Round Execution Flow

Each training round proceeds through:

  1. Per-Vault feature construction: each participating Vault independently derives approved feature aggregates from local records inside its TEE boundary
  2. Per-Vault local training step: each Vault computes local model updates (gradients or parameter deltas) using the locally constructed features
  3. Secure aggregation: local model updates from all participating Vaults are combined through the secure aggregation method specified at admission (typically federated_average), without exposing any Vault's individual gradients to other Vaults or to the model operator
  4. Global model update: the aggregated update is applied to produce the next-round global model parameters

Each round produces a sub-task with its own ExecutionResultBundle, Commitment references, and TEE attestation reports.

Convergence and Iteration Control

  1. the iteration policy defines maximum rounds, convergence thresholds, and early-stop conditions
  2. after each round, the coordination layer evaluates whether convergence criteria are met
  3. if convergence is reached before the maximum rounds, the composite task proceeds to verification
  4. if the maximum rounds are reached without convergence, the result is flagged for verification with an explicit convergence_not_reached indicator

Privacy Budget Consumption per Round

  1. each training round consumes privacy budget for the data subjects whose records were used
  2. cumulative privacy budget consumption across all rounds must not exceed the allocation declared at admission
  3. if privacy budget is exhausted before convergence, the training must stop early and proceed to verification with the current state
  4. round-to-round state is carried through commitment references, not through plaintext intermediate artifacts

Throughout this phase:

  1. raw purchase history remains inside authorized Vault or TEE boundaries
  2. the protocol relies on commitments, evidence, lineage, and replayable artifacts rather than unrestricted plaintext movement
  3. training remains bound to the admitted feature, policy, and version constraints
  4. the model operator never sees individual Vault data or per-Vault gradients, only the securely aggregated model updates

5. Verification and Accepted Training Result

The protocol does not treat a produced model artifact as final truth immediately.

Because this is a composite task, verification operates at two levels:

  1. Per-round verification: each sub-task's execution and TEE attestation are individually validated
  2. Composite verification: the overall training result, including convergence state, total privacy budget consumption, and model quality metrics, is evaluated against the acceptance criteria defined at admission

The composite verification must confirm:

  1. that all sub-task TEE attestations are valid
  2. that total privacy budget consumption matches the sum of per-round records
  3. that the model artifact does not memorize or allow extraction of individual training records, verifiable through protocol-defined evaluation criteria such as membership inference resistance
  4. that convergence state is consistent with the declared iteration policy

In outcome terms, the result may be:

  1. accepted
  2. rejected
  3. challenged

Only accepted work can continue into settlement.

6. Settlement, Accounting, and Model-Use Basis

If the result is accepted, it can enter settlement as a composite settlement, linking all sub-task settlement contexts into a single finalized composite context.

Each sub-task settles in the epoch to which it was assigned at dispatch time. If an epoch closes while a sub-task is in progress, that sub-task retains its assigned epoch; subsequent sub-tasks are assigned to the next epoch. Composite settlement aggregates sub-task settlements across their respective epochs, and reward accounting for each sub-task follows the epoch to which that sub-task was assigned.

In this scenario:

  1. settlement turns the accepted training run into protocol-recognized training truth
  2. the composite settlement context preserves linkage to every round's sub-task settlement, TEE attestation, and privacy budget records
  3. finalized accounting becomes the basis for downstream model registry updates, deployment, or gated usage
  4. the deployed model remains a downstream system effect rather than the source of protocol truth by itself
  5. reward accounting reflects the aggregate verified contribution across all rounds and all participating Vaults
  6. SYM distribution is executed through the Reward Contract on Aptos, with per-actor shares calculated by the Reward Accounting Service under the active policy_version

7. Result Delivery to Model Registry

After settlement, the finalized model artifact must be delivered to the requesting party's model registry.

In this scenario:

  1. the task submitter (model operator) is the authorized delivery recipient
  2. the delivered result is the trained model artifact and its associated quality metrics, not the raw training data or per-Vault gradients
  3. delivery is recorded as a protocol-auditable event with linkage to the composite settlement context
  4. the model registry may then deploy the model for inference or further evaluation, but the protocol does not govern deployment mechanics

Node-Level Execution

From the SCS perspective, the same scenario maps onto the concrete two-node-type topology. This section traces how master nodes and enterprise nodes cooperate to realize each stage of the protocol lifecycle.

Master Nodes

The 3 master nodes (operated by the Symphony Foundation, BFT consensus) collectively run the following services for this training workflow:

  1. API Gateway: receives the model operator's training request, performs TLS termination, rate limiting, and routes the request to the Admission Plane services

  2. Admission Plane: validates submitter identity and authorization, verifies data-subject consent for training usage scope across participating Vaults, resolves semantics under semantic_version, reserves privacy budget, locks SYM via the Escrow Contract, and assembles the immutable TaskEnvelope with iteration_policy, coordination_envelope, and budget_lock_ref

  3. Task Orchestrator: drives the composite task state machine through canonical state transitions. For this training task, the Task Orchestrator manages the parent task lifecycle, sequences sub-tasks (rounds), delegates per-round multi-Vault fan-out to the Multi-Vault Coordinator, and tracks overall convergence through the Composite Iteration Controller

  4. Multi-Vault Coordinator: expands each round's coordination envelope into per-Vault execution assignments, dispatches assignments to target enterprise nodes, tracks per-Vault slice progress and enforces quorum, and triggers aggregation when sufficient slices complete

  5. Aggregation Runtime (TEE): runs inside an attested TEE on a master node. Per round, it receives per-Vault SliceResultBundle outputs (commitment references and privacy-safe model updates), executes federated_average to produce the aggregated global model update, enforces cardinality thresholds, and produces the aggregate result with cryptographic proof and AttestationReport. It does not retain per-Vault inputs after producing the aggregate.

  6. Composite Iteration Controller: after each round's aggregation, evaluates convergence criteria from the iteration_policy. Decides whether to proceed to the next round, stop early on convergence, or terminate on privacy budget exhaustion or maximum round count

  7. Settlement Plane: operates at two levels for this composite task. Per-round sub-task verification validates individual TEE attestations and privacy budget records. Composite verification evaluates the overall training result against acceptance criteria. The Reward Accounting Service calculates per-actor reward shares, and the Payout Service submits finalized reward distributions to the Reward Contract on Aptos

Enterprise Nodes

Each participating enterprise node runs the following services:

  1. Vault (Data Sovereignty Service): stores encrypted purchase-history and receipt records. Enforces consent checks before any data access for training scope. Maintains per-data-subject privacy budget ledger entries. No plaintext leaves the Vault except into an attested TEE.

  2. Computation Runtime (TEE): per round, performs two operations inside the TEE boundary:

    • Local feature construction: derives approved feature aggregates from Vault-internal records according to the admitted DerivationRule and AttributeComposition declarations
    • Local gradient computation: computes local model updates (gradients or parameter deltas) using the locally constructed features and the current global model parameters received from the master node
    • Produces a SliceResultBundle containing commitment-linked, deterministic outputs with TEE AttestationReport and privacy budget consumption records

Three-Level Hierarchy in Practice

The node-level execution realizes the SCP three-level hierarchy as follows:

LevelProtocol ObjectManaged ByCount
Parent taskComposite training taskMaster node Task Orchestrator1
Sub-task (round)Per-round iterationMaster node Multi-Vault CoordinatorN (rounds)
Execution slicePer-Vault contribution per roundEnterprise node Computation RuntimeN x M (rounds x Vaults)

Each level has its own lifecycle, evidence chain, TEE attestation, and settlement context. The master node coordinates the upper two levels; enterprise nodes execute the lowest level.

Economic Flow

The training task follows the train pricing model defined in SCP Economics and Governance. This section traces the full monetary lifecycle from budget lock to reward distribution.

Fee Calculation

The model operator (Task Submitter) locks SYM in the Escrow Contract at admission. The locked amount is calculated as:

train_fee = base_fee_train
          + Σ_round(round_compute_fee + round_aggregation_fee)
          + (total_epsilon × per_epsilon_fee)

Where:

  1. base_fee_train: minimum fee for composite task setup, covering admission, orchestration, and verification overhead
  2. round_compute_fee: sum of per-Vault compute costs for that round. Per-round compute fees accumulate across N rounds and M Vaults, meaning the total compute cost scales with the product of rounds and participating Vaults.
  3. round_aggregation_fee: cost for the secure aggregation step (federated_average) per round
  4. total_epsilon: cumulative differential privacy budget consumed across all rounds
  5. per_epsilon_fee: price per unit of privacy budget, reflecting the finite and non-renewable nature of per-data-subject privacy

All fee parameters are set by the active policy_version and may vary by usage_scope.

Fee Distribution

On composite settlement, the locked fee is distributed among protocol actors:

RecipientShareBasis
Data Contributors30-50%Proportional to records used and privacy budget consumed against their data across all rounds
Vault Operators15-25%Proportional to records served and storage commitment across participating Vaults
Executors15-25%Proportional to verified compute work (per-Vault training slices + aggregation)
Verifiers5-10%Per verification decision (sub-task and composite levels)
Protocol Treasury5-10%Fixed protocol fee for sustainability

Exact percentages are set by policy_version. The sum must equal 100% of the locked fee.

Reward Accounting

Reward accounting for this composite task reflects the aggregate verified contribution across all rounds and all Vaults:

  1. each participating Vault operator receives a share proportional to the records served and the privacy budget consumed for their data subjects across all rounds
  2. each executor that performed per-Vault training slices receives reward proportional to the computational work verified for those slices
  3. the aggregation executor (master node TEE) receives a separate reward for the per-round aggregation steps
  4. staked Data Producers whose records were used receive 100% of their calculated data usage dividend; unstaked Data Producers receive 50% (remainder returned to the reward pool)
  5. a Vault that was authorized but timed out or failed in a given round receives no reward for that round's slices

Settlement Timing

  1. each sub-task (round) settles in the epoch to which it was assigned at dispatch time
  2. if an epoch closes while a sub-task is in progress, the in-progress sub-task retains its assigned epoch; subsequent sub-tasks are assigned to the next epoch
  3. composite settlement aggregates sub-task settlements across their respective epochs
  4. reward accounting for each sub-task follows the epoch to which that sub-task was assigned

On-Chain Execution

All SYM distribution flows through smart contracts on Aptos:

  1. Escrow Contract: locks the model operator's SYM at admission; holds funds until settlement finalization
  2. Reward Contract: receives finalized reward distributions from the Payout Service; distributes SYM to Data Contributors, Vault Operators, Executors, and Verifiers; applies staking multiplier for Data Producers
  3. Treasury Contract: receives the Protocol Treasury share

No protocol actor holds SYM on behalf of another actor outside of smart contract custody.

Semantic and Feature Handling

This scenario is also a good example of governed semantic interpretation.

If the requested model depends on concepts that do not map cleanly to the active canonical or query sets, several things may happen:

  1. the task may remain blocked until the semantics are clarified
  2. some feature logic may remain local under policy-permitted handling
  3. emerging behavioral or query concepts may later contribute to governed semantic evolution

This is where the use case touches:

  1. domain
  2. CanonicalAttribute
  3. QueryAttribute
  4. local attribute handling
  5. governed candidate evolution

The important point is that training logic does not get to redefine protocol meaning privately or silently.

Key Boundary Reminders

The desire to train on consumer behavior is not yet a full protocol event.

The full SCP lifecycle begins only when the system turns that training intent into authorized protocol work with explicit feature, privacy, and evaluation constraints.

Likewise, the trained model itself is not automatically the protocol truth. The protocol truth is the accepted and settled training result that downstream systems then use as the basis for deployment or activation.

This use case exercises the most demanding protocol path: composite task with iterative sub-tasks, multi-Vault fan-out at every round, federated secure aggregation, cumulative privacy budget management, TEE attestation across all Vaults and all rounds, composite settlement linking all round-level evidence, and model artifact delivery to a downstream registry. It serves as the upper-bound reference scenario for SCP protocol complexity.