Use Case: Privacy-Preserving Consumer Behavior Model Training
Version: v1.0 (Draft) Status: Draft Authoritative: Informative
Scenario
A model operator wants to train a consumer-behavior model using protected receipt and purchase-history data distributed across many Vaults.
The model goal may include:
- purchase-propensity prediction
- category or brand affinity estimation
- coupon-response modeling
- privacy-preserving user clustering
This use case is informative. It shows how the revised SCP and SCS specifications map onto a training workflow that uses protected records without turning raw consumer purchase data into unrestricted centralized plaintext.
Canonical protocol terms used here follow SCP Protocol Overview and SCP Core Spec.
Why This Use Case Matters
This scenario is useful because it demonstrates several important properties of the revised Symphony stack:
- model-training intent is distinct from protocol-recognized training work
- protected source records may remain inside authorized Vault or TEE boundaries
- training may depend on derived features, query projections, or protected computation rather than unrestricted record export
- model artifacts and evaluation results must be linked to replayable evidence and version context
- the resulting model is downstream from governed training truth rather than a substitute for protocol truth
Entry Distinction
This scenario has two different layers:
- business-side or research-side desire to train a model
- protocol-recognized training work with fixed policy, semantic, and privacy constraints
The request to "train a consumer-behavior model on purchase data" is not yet a protocol event by itself. It becomes part of the SCP lifecycle only when the system admits an authorized training task with explicit feature scope, version context, and privacy-preserving execution rules.
Unlike the receipt-upload scenario and even the coupon-campaign scenario, this use case involves:
- a composite task with multiple iterative training rounds, not a single atomic execution
- multi-Vault feature construction where each Vault contributes locally derived features
- cross-Vault gradient or model-update aggregation under federated learning semantics
- significant privacy budget consumption across many data subjects over multiple rounds
- TEE-attested execution for every training round
- model artifact delivery to a model registry, not a single result to a task submitter
From a node perspective, this scenario spans both master nodes and enterprise nodes. Master nodes drive admission, orchestration, aggregation, and settlement. Enterprise nodes host the Vaults that store training data and run local computation inside TEE boundaries. The two node types cooperate through the inter-node communication layer defined in the SCS architecture.
Protocol View
From the protocol perspective, this scenario transforms training intent into a verifiable and governable model-training result.
1. Training Intent and Scope Definition
The requesting party defines the training goal and constraints, such as:
- target prediction or clustering task
- eligible user or market scope
- allowed feature families
- model output type
- evaluation threshold or acceptance criteria
At this point, the request is still a business or research intent rather than protocol truth.
2. Task Admission, Classification, and Constraint Binding
The system turns that request into authorized protocol work.
At a high level, admission establishes:
- who is authorized to submit the training task
- that the task is classified as
task_class: train, which activates composite task semantics - which record domains, feature families, or query projections may be used
- which privacy, policy, and version constraints apply, including total privacy budget allocation across all training rounds
- what outputs count as the protocol-evaluated training result
- the iteration policy: maximum training rounds, convergence threshold, and early-stop conditions
Upon successful admission, the Admission Plane produces a TaskEnvelope that carries all downstream-critical fields, including:
iteration_policy: maximum rounds, convergence thresholds, and early-stop conditions for the composite training lifecyclecoordination_envelope: the multi-Vault coordination parameters specifying participating Vaults, quorum rules, and aggregation methodbudget_lock_ref: reference to the SYM locked in the Escrow Contract to cover the estimated cost of multi-round, multi-Vault computation
Before admission completes, the protocol also verifies:
- that participating Vaults hold records from data subjects who have granted consent for
trainingusage scope - that the estimated total privacy budget consumption across all rounds does not exceed the available budget for affected data subjects
- that sufficient monetary budget is locked to cover the estimated cost of multi-round, multi-Vault computation
Only after this step does the training workflow become protocol-recognized work.
3. Semantic Resolution, Feature Derivation, and Attribute Composition
Before training can proceed, the protocol must resolve the semantic meaning of the task under the active context.
That may include determining:
- which domains define the protected source records (typically
commerce,behavior,identity) - which canonical attributes are
directly_queryableand can serve as raw features without derivation - which features require explicit
DerivationRuledefinitions — for example,purchase_frequencyderived via anaggregaterule over thecommercedomain's transaction records, orbrand_affinity_scorederived via acompositerule combining purchase counts across categories - which
AttributeCompositiondeclarations are needed — training features typically combine attributes from multiple domains, requiring cross-domain composition with declaredcombined_privacy_class - which local-only signals must remain private and whether any qualify as
vault_scoped_queryattributes - which
semantic_versionandpolicy_versiongovern the run
This step matters because the revised SCP requires shared meaning, explicit derivation rules, and declared attribute composition before protected computation can be trusted. In the training context, this means the feature engineering plan must be protocol-visible and auditable, not a black-box transformation hidden inside the execution runtime.
4. Multi-Vault Feature Construction and Iterative Training Execution
Once the meaning is resolved, the composite task begins its iterative execution. Each training round is realized as a sub-task under the parent composite task, following the composite and iterative task semantics defined in the SCP Core Spec.
The protocol manages a three-level hierarchy for this task:
- Parent task: the root composite task admitted at the protocol level, managed by the master node Task Orchestrator
- Sub-task (round): each iterative training round, coordinated by the master node Multi-Vault Coordinator which fans out to participating Vaults
- Execution slice (per-Vault): each Vault's contribution within a single round, executed by the enterprise node Computation Runtime inside TEE
For a multi-Vault training task with N rounds and M Vaults, the master node manages: 1 parent task + N sub-tasks + (N x M) execution slices. Each level has its own lifecycle, evidence, and settlement context.
Per-Round Execution Flow
Each training round proceeds through:
- Per-Vault feature construction: each participating Vault independently derives approved feature aggregates from local records inside its TEE boundary
- Per-Vault local training step: each Vault computes local model updates (gradients or parameter deltas) using the locally constructed features
- Secure aggregation: local model updates from all participating Vaults are combined through the secure aggregation method specified at admission (typically
federated_average), without exposing any Vault's individual gradients to other Vaults or to the model operator - Global model update: the aggregated update is applied to produce the next-round global model parameters
Each round produces a sub-task with its own ExecutionResultBundle, Commitment references, and TEE attestation reports.
Convergence and Iteration Control
- the iteration policy defines maximum rounds, convergence thresholds, and early-stop conditions
- after each round, the coordination layer evaluates whether convergence criteria are met
- if convergence is reached before the maximum rounds, the composite task proceeds to verification
- if the maximum rounds are reached without convergence, the result is flagged for verification with an explicit
convergence_not_reachedindicator
Privacy Budget Consumption per Round
- each training round consumes privacy budget for the data subjects whose records were used
- cumulative privacy budget consumption across all rounds must not exceed the allocation declared at admission
- if privacy budget is exhausted before convergence, the training must stop early and proceed to verification with the current state
- round-to-round state is carried through commitment references, not through plaintext intermediate artifacts
Throughout this phase:
- raw purchase history remains inside authorized Vault or TEE boundaries
- the protocol relies on commitments, evidence, lineage, and replayable artifacts rather than unrestricted plaintext movement
- training remains bound to the admitted feature, policy, and version constraints
- the model operator never sees individual Vault data or per-Vault gradients, only the securely aggregated model updates
5. Verification and Accepted Training Result
The protocol does not treat a produced model artifact as final truth immediately.
Because this is a composite task, verification operates at two levels:
- Per-round verification: each sub-task's execution and TEE attestation are individually validated
- Composite verification: the overall training result, including convergence state, total privacy budget consumption, and model quality metrics, is evaluated against the acceptance criteria defined at admission
The composite verification must confirm:
- that all sub-task TEE attestations are valid
- that total privacy budget consumption matches the sum of per-round records
- that the model artifact does not memorize or allow extraction of individual training records, verifiable through protocol-defined evaluation criteria such as membership inference resistance
- that convergence state is consistent with the declared iteration policy
In outcome terms, the result may be:
- accepted
- rejected
- challenged
Only accepted work can continue into settlement.
6. Settlement, Accounting, and Model-Use Basis
If the result is accepted, it can enter settlement as a composite settlement, linking all sub-task settlement contexts into a single finalized composite context.
Each sub-task settles in the epoch to which it was assigned at dispatch time. If an epoch closes while a sub-task is in progress, that sub-task retains its assigned epoch; subsequent sub-tasks are assigned to the next epoch. Composite settlement aggregates sub-task settlements across their respective epochs, and reward accounting for each sub-task follows the epoch to which that sub-task was assigned.
In this scenario:
- settlement turns the accepted training run into protocol-recognized training truth
- the composite settlement context preserves linkage to every round's sub-task settlement, TEE attestation, and privacy budget records
- finalized accounting becomes the basis for downstream model registry updates, deployment, or gated usage
- the deployed model remains a downstream system effect rather than the source of protocol truth by itself
- reward accounting reflects the aggregate verified contribution across all rounds and all participating Vaults
- SYM distribution is executed through the Reward Contract on Aptos, with per-actor shares calculated by the Reward Accounting Service under the active
policy_version
7. Result Delivery to Model Registry
After settlement, the finalized model artifact must be delivered to the requesting party's model registry.
In this scenario:
- the task submitter (model operator) is the authorized delivery recipient
- the delivered result is the trained model artifact and its associated quality metrics, not the raw training data or per-Vault gradients
- delivery is recorded as a protocol-auditable event with linkage to the composite settlement context
- the model registry may then deploy the model for inference or further evaluation, but the protocol does not govern deployment mechanics
Node-Level Execution
From the SCS perspective, the same scenario maps onto the concrete two-node-type topology. This section traces how master nodes and enterprise nodes cooperate to realize each stage of the protocol lifecycle.
Master Nodes
The 3 master nodes (operated by the Symphony Foundation, BFT consensus) collectively run the following services for this training workflow:
API Gateway: receives the model operator's training request, performs TLS termination, rate limiting, and routes the request to the Admission Plane services
Admission Plane: validates submitter identity and authorization, verifies data-subject consent for
trainingusage scope across participating Vaults, resolves semantics undersemantic_version, reserves privacy budget, locks SYM via the Escrow Contract, and assembles the immutableTaskEnvelopewithiteration_policy,coordination_envelope, andbudget_lock_refTask Orchestrator: drives the composite task state machine through canonical state transitions. For this training task, the Task Orchestrator manages the parent task lifecycle, sequences sub-tasks (rounds), delegates per-round multi-Vault fan-out to the Multi-Vault Coordinator, and tracks overall convergence through the Composite Iteration Controller
Multi-Vault Coordinator: expands each round's coordination envelope into per-Vault execution assignments, dispatches assignments to target enterprise nodes, tracks per-Vault slice progress and enforces quorum, and triggers aggregation when sufficient slices complete
Aggregation Runtime (TEE): runs inside an attested TEE on a master node. Per round, it receives per-Vault
SliceResultBundleoutputs (commitment references and privacy-safe model updates), executesfederated_averageto produce the aggregated global model update, enforces cardinality thresholds, and produces the aggregate result with cryptographic proof andAttestationReport. It does not retain per-Vault inputs after producing the aggregate.Composite Iteration Controller: after each round's aggregation, evaluates convergence criteria from the
iteration_policy. Decides whether to proceed to the next round, stop early on convergence, or terminate on privacy budget exhaustion or maximum round countSettlement Plane: operates at two levels for this composite task. Per-round sub-task verification validates individual TEE attestations and privacy budget records. Composite verification evaluates the overall training result against acceptance criteria. The Reward Accounting Service calculates per-actor reward shares, and the Payout Service submits finalized reward distributions to the Reward Contract on Aptos
Enterprise Nodes
Each participating enterprise node runs the following services:
Vault (Data Sovereignty Service): stores encrypted purchase-history and receipt records. Enforces consent checks before any data access for
trainingscope. Maintains per-data-subject privacy budget ledger entries. No plaintext leaves the Vault except into an attested TEE.Computation Runtime (TEE): per round, performs two operations inside the TEE boundary:
- Local feature construction: derives approved feature aggregates from Vault-internal records according to the admitted
DerivationRuleandAttributeCompositiondeclarations - Local gradient computation: computes local model updates (gradients or parameter deltas) using the locally constructed features and the current global model parameters received from the master node
- Produces a
SliceResultBundlecontaining commitment-linked, deterministic outputs with TEEAttestationReportand privacy budget consumption records
- Local feature construction: derives approved feature aggregates from Vault-internal records according to the admitted
Three-Level Hierarchy in Practice
The node-level execution realizes the SCP three-level hierarchy as follows:
| Level | Protocol Object | Managed By | Count |
|---|---|---|---|
| Parent task | Composite training task | Master node Task Orchestrator | 1 |
| Sub-task (round) | Per-round iteration | Master node Multi-Vault Coordinator | N (rounds) |
| Execution slice | Per-Vault contribution per round | Enterprise node Computation Runtime | N x M (rounds x Vaults) |
Each level has its own lifecycle, evidence chain, TEE attestation, and settlement context. The master node coordinates the upper two levels; enterprise nodes execute the lowest level.
Economic Flow
The training task follows the train pricing model defined in SCP Economics and Governance. This section traces the full monetary lifecycle from budget lock to reward distribution.
Fee Calculation
The model operator (Task Submitter) locks SYM in the Escrow Contract at admission. The locked amount is calculated as:
train_fee = base_fee_train
+ Σ_round(round_compute_fee + round_aggregation_fee)
+ (total_epsilon × per_epsilon_fee)Where:
base_fee_train: minimum fee for composite task setup, covering admission, orchestration, and verification overheadround_compute_fee: sum of per-Vault compute costs for that round. Per-round compute fees accumulate across N rounds and M Vaults, meaning the total compute cost scales with the product of rounds and participating Vaults.round_aggregation_fee: cost for the secure aggregation step (federated_average) per roundtotal_epsilon: cumulative differential privacy budget consumed across all roundsper_epsilon_fee: price per unit of privacy budget, reflecting the finite and non-renewable nature of per-data-subject privacy
All fee parameters are set by the active policy_version and may vary by usage_scope.
Fee Distribution
On composite settlement, the locked fee is distributed among protocol actors:
| Recipient | Share | Basis |
|---|---|---|
| Data Contributors | 30-50% | Proportional to records used and privacy budget consumed against their data across all rounds |
| Vault Operators | 15-25% | Proportional to records served and storage commitment across participating Vaults |
| Executors | 15-25% | Proportional to verified compute work (per-Vault training slices + aggregation) |
| Verifiers | 5-10% | Per verification decision (sub-task and composite levels) |
| Protocol Treasury | 5-10% | Fixed protocol fee for sustainability |
Exact percentages are set by policy_version. The sum must equal 100% of the locked fee.
Reward Accounting
Reward accounting for this composite task reflects the aggregate verified contribution across all rounds and all Vaults:
- each participating Vault operator receives a share proportional to the records served and the privacy budget consumed for their data subjects across all rounds
- each executor that performed per-Vault training slices receives reward proportional to the computational work verified for those slices
- the aggregation executor (master node TEE) receives a separate reward for the per-round aggregation steps
- staked Data Producers whose records were used receive 100% of their calculated data usage dividend; unstaked Data Producers receive 50% (remainder returned to the reward pool)
- a Vault that was authorized but timed out or failed in a given round receives no reward for that round's slices
Settlement Timing
- each sub-task (round) settles in the epoch to which it was assigned at dispatch time
- if an epoch closes while a sub-task is in progress, the in-progress sub-task retains its assigned epoch; subsequent sub-tasks are assigned to the next epoch
- composite settlement aggregates sub-task settlements across their respective epochs
- reward accounting for each sub-task follows the epoch to which that sub-task was assigned
On-Chain Execution
All SYM distribution flows through smart contracts on Aptos:
- Escrow Contract: locks the model operator's SYM at admission; holds funds until settlement finalization
- Reward Contract: receives finalized reward distributions from the Payout Service; distributes SYM to Data Contributors, Vault Operators, Executors, and Verifiers; applies staking multiplier for Data Producers
- Treasury Contract: receives the Protocol Treasury share
No protocol actor holds SYM on behalf of another actor outside of smart contract custody.
Semantic and Feature Handling
This scenario is also a good example of governed semantic interpretation.
If the requested model depends on concepts that do not map cleanly to the active canonical or query sets, several things may happen:
- the task may remain blocked until the semantics are clarified
- some feature logic may remain local under policy-permitted handling
- emerging behavioral or query concepts may later contribute to governed semantic evolution
This is where the use case touches:
domainCanonicalAttributeQueryAttribute- local attribute handling
- governed candidate evolution
The important point is that training logic does not get to redefine protocol meaning privately or silently.
Key Boundary Reminders
The desire to train on consumer behavior is not yet a full protocol event.
The full SCP lifecycle begins only when the system turns that training intent into authorized protocol work with explicit feature, privacy, and evaluation constraints.
Likewise, the trained model itself is not automatically the protocol truth. The protocol truth is the accepted and settled training result that downstream systems then use as the basis for deployment or activation.
This use case exercises the most demanding protocol path: composite task with iterative sub-tasks, multi-Vault fan-out at every round, federated secure aggregation, cumulative privacy budget management, TEE attestation across all Vaults and all rounds, composite settlement linking all round-level evidence, and model artifact delivery to a downstream registry. It serves as the upper-bound reference scenario for SCP protocol complexity.