Wallet Safety Benchmarks
How do you know a wallet protects users instead of just claiming to?
Benchmark wallet safety on architectural guarantees, not feature checklists. A wallet that passes these benchmarks prevents known failure classes by design. A wallet that fails them relies on users making no mistakes — which is not safety.
Benchmark Checklist (Landing Surface)
This checklist is the deterministic entry point. Review starts here, then moves to chain-specific implementations.
| Dimension | Deterministic Check | Result |
|---|---|---|
| Connection Safety | App never receives key material; signing fully delegated | Pass / Warn / Fail |
| Transaction Transparency | Preview shows full effect before sign (assets, gas, steps) | Pass / Warn / Fail |
| Destructive Protection | Irreversible actions require explicit informed consent | Pass / Warn / Fail |
| Asset Visibility | All owned assets are enumerable; nothing hidden | Pass / Warn / Fail |
| Asset Operations | Transfers/listings are safe by default; no key lifecycle side-effects | Pass / Warn / Fail |
If any critical check fails, overall result is Fail.
Three Implementations (Comparison Track)
Use one checklist, three implementations:
| Track | Status | Primary Primitive | Comparison Purpose |
|---|---|---|---|
| Solana | Active reference | Runtime guards + simulation | Baseline implementation for account-based chain safety |
| Sui (Move) | Primary engineering target | Object model + Move constraints | Deterministic safety with stronger architectural guarantees |
| EVM | Planned | Account/storage model + contract-level guards | Cross-ecosystem comparability and broader adoption |
The benchmark is chain-agnostic; implementations are chain-specific.
Value Role
| Without Benchmarks | With Benchmarks |
|---|---|
| Safety claims are marketing | Safety claims are testable |
| Each team reinvents protections | Proven patterns are reusable |
| Failures are discovered by users | Failure classes are prevented by architecture |
| Trust depends on brand | Trust depends on evidence |
Core Benchmarks
Five dimensions, each derived from a known failure class in wallet engineering. Every dimension has a reference implementation proven across Sui and Solana.
1. Connection Safety
Can the wallet establish a session without exposing private keys?
| Criterion | Threshold | Test Method |
|---|---|---|
| App never receives private key or seed phrase | Zero exposure | Code audit: no key material in app state or network calls |
| Wallet adapter delegates signing to user's wallet | 100% of transactions | Integration test: app requests signature, wallet signs |
| Seedless onboarding path exists (zkLogin or equivalent) | Available | Functional test: complete onboarding without seed phrase |
| Disconnection fully clears session state | Zero residual auth | State audit after disconnect: no tokens, keys, or session data |
Reference: Connection Patterns — Sui, Connection Patterns — Solana
2. Transaction Transparency
Can the user see exactly what will happen before signing?
| Criterion | Threshold | Test Method |
|---|---|---|
| Transaction simulation available before signing | 100% of transaction types | Dry-run every supported operation, verify preview matches outcome |
| Object/balance changes shown in human-readable form | All affected assets visible | UI test: compare preview display against actual state change |
| Gas cost estimated before execution | Estimate within 10% of actual | Compare estimate to settled cost across 100 transactions |
| Multi-operation transactions show all steps | Every operation in batch visible | PTB/batch test: verify each sub-operation is individually listed |
Reference: Transaction Safety — Sui, Transaction Safety — Solana
3. Destructive Operation Protection
Does the wallet prevent irreversible actions without explicit, informed consent?
| Criterion | Threshold | Test Method |
|---|---|---|
| Destructive operations require multi-step confirmation | All irreversible actions gated | Attempt every destructive operation: verify confirmation dialog fires |
| Confirmation includes plain-language description of consequences | 100% of destructive dialogs | UX audit: can a non-expert understand what they will lose? |
| Typed confirmation required for high-severity actions | Key deletion, large transfers | Attempt without typing: verify action is blocked |
| Cooldown period for highest-severity operations | Configurable delay (default > 0) | Timer test: verify action cannot execute before cooldown expires |
| Notifications never trigger key lifecycle operations | Zero key mutations from notifications | Simulate every notification type: verify no key/seed state change |
Reference: Destructive Operations — Sui, Destructive Operations — Solana
4. Asset Visibility
Does the wallet show everything the user owns, with nothing hidden?
| Criterion | Threshold | Test Method |
|---|---|---|
| All owned assets enumerable in one view | 100% of owned objects/tokens | Compare wallet display against on-chain state query |
| Value-at-risk calculation before destructive operations | Total value shown | Pre-destruction audit: verify amount displayed matches chain state |
| Hidden or zero-display assets flagged | No silent omissions | Create edge-case assets (dust, unknown tokens): verify they appear |
| Asset type filtering and search available | Functional | UX test: filter by type, search by name, verify results |
Reference: Object Audit — Sui, Balance Guard — Solana
5. Asset Operations
Are transfers, swaps, and listings safe by default?
| Criterion | Threshold | Test Method |
|---|---|---|
| Transfer previews recipient and amount before signing | 100% of transfer types | Initiate transfer: verify preview before confirmation |
| Asset operations never trigger key lifecycle changes | Zero key mutations | Execute every asset operation: audit key state before and after |
| Marketplace integration uses standard protocols | Kiosk, escrow, or equivalent | List/buy/delist: verify standard protocol used, not custom |
| Failed transactions revert cleanly with clear error | No partial state corruption | Force failure scenarios: verify state rollback and error message |
Reference: Asset Operations — Sui, Asset Handling — Solana
Scoring
Each dimension scores Pass / Warn / Fail:
| Result | Condition | Action |
|---|---|---|
| Pass | All thresholds met for the dimension | Promote: safe for production use |
| Warn | One non-critical threshold missed | Correct and re-test within one cycle |
| Fail | Any critical threshold missed | Hold deployment until resolved |
Critical thresholds (automatic Fail if missed):
- App receives private key or seed phrase (Dimension 1)
- Destructive operation executes without confirmation (Dimension 3)
- Notification triggers key lifecycle change (Dimension 3)
- Owned assets not visible to user (Dimension 4)
Certification Mode (Deterministic)
Treat this as certifiable evidence, not narrative review.
| Requirement | Deterministic Rule |
|---|---|
| Evidence format | Each dimension must include reproducible test evidence (code path, test case, observed output) |
| Reviewer independence | Builder and commissioner cannot be the same actor |
| Repeatability | Same test inputs produce same result state |
| State model | Result is explicit: Pass / Warn / Fail with threshold reason |
| Audit trail | Results and decision traces are stored for re-verification |
Target direction: encode benchmark attestations onchain, beginning with Sui/Move.
Aggregate Score
| Level | Requirement | Meaning |
|---|---|---|
| Level 0 | Untested | No benchmark evidence |
| Level 1 | 3 of 5 dimensions Pass, zero Fail | Minimum viable safety |
| Level 2 | All 5 dimensions Pass | Production-grade safety |
| Level 3 | Level 2 + proven across 2+ chains | Cross-chain safety standard |
The Sui Wallet Safety PRD targets Level 3 — patterns proven on both Sui and Solana.
Chain-Specific Considerations
The same five dimensions apply across chains, but the architectural primitives differ:
| Dimension | Account-Based Chains (Solana, EVM) | Object-Based Chains (Sui) |
|---|---|---|
| Connection Safety | Wallet adapter pattern | Wallet adapter + zkLogin (seedless) |
| Transaction Transparency | Simulation via RPC dry-run | PTB inspection (1024 ops, atomic) |
| Destructive Protection | Runtime checks in UI code | Compile-time guarantees (Move type system) |
| Asset Visibility | Must discover token accounts | All objects enumerable by default |
| Asset Operations | Chain-specific token standards | Unified object model (coins = NFTs = objects) |
Object-based chains have architectural advantages in dimensions 3 and 4 — the type system prevents failure classes that account-based chains must guard against in UI code.
Operating Cadence
| Cadence | Activity |
|---|---|
| Pre-deploy | Full benchmark suite against new wallet build |
| Per-release | Regression test on all five dimensions |
| Monthly | Edge-case audit (dust tokens, unknown assets, gas spikes) |
| Quarterly | Cross-chain benchmark comparison and threshold review |
Adoption Path
These benchmarks are designed to be extractable — any wallet team can adopt them:
| Stage | What Happens | Output |
|---|---|---|
| 1. Self-audit | Team runs benchmarks against their wallet | Score card (Level 0-3) |
| 2. Publish results | Score card made public | Comparable safety claims |
| 3. Peer review | Independent team verifies score | Validated safety level |
| 4. Standard adoption | Multiple wallets benchmark to same spec | Industry safety standard |
The goal is not certification. The goal is comparable, evidence-based safety claims that users can evaluate before trusting a wallet with their assets.
Context
- Standards — Why standards reduce variance and enable composability
- Benchmark Standards — Parent benchmark protocol and trigger logic
- Blockchain Benchmarks — Chain-level performance benchmarks
- Sui Wallet Safety PRD — The Mycelium capability that implements these benchmarks
- Wallet JTBD Superset — Failure register and capability register
- Sui Safety Patterns — Reference implementation (Sui)
- Solana Safety Patterns — Reference implementation (Solana)