Wallet Safety Benchmarks

How do you know a wallet protects users instead of just claiming to?

Benchmark wallet safety on architectural guarantees, not feature checklists. A wallet that passes these benchmarks prevents known failure classes by design. A wallet that fails them relies on users making no mistakes — which is not safety.

Benchmark Checklist (Landing Surface)

This checklist is the deterministic entry point. Review starts here, then moves to chain-specific implementations.

Dimension	Deterministic Check	Result
Connection Safety	App never receives key material; signing fully delegated	Pass / Warn / Fail
Transaction Transparency	Preview shows full effect before sign (assets, gas, steps)	Pass / Warn / Fail
Destructive Protection	Irreversible actions require explicit informed consent	Pass / Warn / Fail
Asset Visibility	All owned assets are enumerable; nothing hidden	Pass / Warn / Fail
Asset Operations	Transfers/listings are safe by default; no key lifecycle side-effects	Pass / Warn / Fail

If any critical check fails, overall result is Fail.

Three Implementations (Comparison Track)

Use one checklist, three implementations:

Track	Status	Primary Primitive	Comparison Purpose
Solana	Active reference	Runtime guards + simulation	Baseline implementation for account-based chain safety
Sui (Move)	Primary engineering target	Object model + Move constraints	Deterministic safety with stronger architectural guarantees
EVM	Planned	Account/storage model + contract-level guards	Cross-ecosystem comparability and broader adoption

The benchmark is chain-agnostic; implementations are chain-specific.

Value Role

Without Benchmarks	With Benchmarks
Safety claims are marketing	Safety claims are testable
Each team reinvents protections	Proven patterns are reusable
Failures are discovered by users	Failure classes are prevented by architecture
Trust depends on brand	Trust depends on evidence

Core Benchmarks

Five dimensions, each derived from a known failure class in wallet engineering. Every dimension has a reference implementation proven across Sui and Solana.

1. Connection Safety

Can the wallet establish a session without exposing private keys?

Criterion	Threshold	Test Method
App never receives private key or seed phrase	Zero exposure	Code audit: no key material in app state or network calls
Wallet adapter delegates signing to user's wallet	100% of transactions	Integration test: app requests signature, wallet signs
Seedless onboarding path exists (zkLogin or equivalent)	Available	Functional test: complete onboarding without seed phrase
Disconnection fully clears session state	Zero residual auth	State audit after disconnect: no tokens, keys, or session data

Reference: Connection Patterns — Sui, Connection Patterns — Solana

2. Transaction Transparency

Can the user see exactly what will happen before signing?

Criterion	Threshold	Test Method
Transaction simulation available before signing	100% of transaction types	Dry-run every supported operation, verify preview matches outcome
Object/balance changes shown in human-readable form	All affected assets visible	UI test: compare preview display against actual state change
Gas cost estimated before execution	Estimate within 10% of actual	Compare estimate to settled cost across 100 transactions
Multi-operation transactions show all steps	Every operation in batch visible	PTB/batch test: verify each sub-operation is individually listed

Reference: Transaction Safety — Sui, Transaction Safety — Solana

3. Destructive Operation Protection

Does the wallet prevent irreversible actions without explicit, informed consent?

Criterion	Threshold	Test Method
Destructive operations require multi-step confirmation	All irreversible actions gated	Attempt every destructive operation: verify confirmation dialog fires
Confirmation includes plain-language description of consequences	100% of destructive dialogs	UX audit: can a non-expert understand what they will lose?
Typed confirmation required for high-severity actions	Key deletion, large transfers	Attempt without typing: verify action is blocked
Cooldown period for highest-severity operations	Configurable delay (default > 0)	Timer test: verify action cannot execute before cooldown expires
Notifications never trigger key lifecycle operations	Zero key mutations from notifications	Simulate every notification type: verify no key/seed state change

Reference: Destructive Operations — Sui, Destructive Operations — Solana

4. Asset Visibility

Does the wallet show everything the user owns, with nothing hidden?

Criterion	Threshold	Test Method
All owned assets enumerable in one view	100% of owned objects/tokens	Compare wallet display against on-chain state query
Value-at-risk calculation before destructive operations	Total value shown	Pre-destruction audit: verify amount displayed matches chain state
Hidden or zero-display assets flagged	No silent omissions	Create edge-case assets (dust, unknown tokens): verify they appear
Asset type filtering and search available	Functional	UX test: filter by type, search by name, verify results

Reference: Object Audit — Sui, Balance Guard — Solana

5. Asset Operations

Are transfers, swaps, and listings safe by default?

Criterion	Threshold	Test Method
Transfer previews recipient and amount before signing	100% of transfer types	Initiate transfer: verify preview before confirmation
Asset operations never trigger key lifecycle changes	Zero key mutations	Execute every asset operation: audit key state before and after
Marketplace integration uses standard protocols	Kiosk, escrow, or equivalent	List/buy/delist: verify standard protocol used, not custom
Failed transactions revert cleanly with clear error	No partial state corruption	Force failure scenarios: verify state rollback and error message

Reference: Asset Operations — Sui, Asset Handling — Solana

Scoring

Each dimension scores Pass / Warn / Fail:

Result	Condition	Action
Pass	All thresholds met for the dimension	Promote: safe for production use
Warn	One non-critical threshold missed	Correct and re-test within one cycle
Fail	Any critical threshold missed	Hold deployment until resolved

Critical thresholds (automatic Fail if missed):

App receives private key or seed phrase (Dimension 1)
Destructive operation executes without confirmation (Dimension 3)
Notification triggers key lifecycle change (Dimension 3)
Owned assets not visible to user (Dimension 4)

Certification Mode (Deterministic)

Treat this as certifiable evidence, not narrative review.

Requirement	Deterministic Rule
Evidence format	Each dimension must include reproducible test evidence (code path, test case, observed output)
Reviewer independence	Builder and commissioner cannot be the same actor
Repeatability	Same test inputs produce same result state
State model	Result is explicit: Pass / Warn / Fail with threshold reason
Audit trail	Results and decision traces are stored for re-verification

Target direction: encode benchmark attestations onchain, beginning with Sui/Move.

Aggregate Score

Level	Requirement	Meaning
Level 0	Untested	No benchmark evidence
Level 1	3 of 5 dimensions Pass, zero Fail	Minimum viable safety
Level 2	All 5 dimensions Pass	Production-grade safety
Level 3	Level 2 + proven across 2+ chains	Cross-chain safety standard

The Sui Wallet Safety PRD targets Level 3 — patterns proven on both Sui and Solana.

Chain-Specific Considerations

The same five dimensions apply across chains, but the architectural primitives differ:

Dimension	Account-Based Chains (Solana, EVM)	Object-Based Chains (Sui)
Connection Safety	Wallet adapter pattern	Wallet adapter + zkLogin (seedless)
Transaction Transparency	Simulation via RPC dry-run	PTB inspection (1024 ops, atomic)
Destructive Protection	Runtime checks in UI code	Compile-time guarantees (Move type system)
Asset Visibility	Must discover token accounts	All objects enumerable by default
Asset Operations	Chain-specific token standards	Unified object model (coins = NFTs = objects)

Object-based chains have architectural advantages in dimensions 3 and 4 — the type system prevents failure classes that account-based chains must guard against in UI code.

Operating Cadence

Cadence	Activity
Pre-deploy	Full benchmark suite against new wallet build
Per-release	Regression test on all five dimensions
Monthly	Edge-case audit (dust tokens, unknown assets, gas spikes)
Quarterly	Cross-chain benchmark comparison and threshold review

Adoption Path

These benchmarks are designed to be extractable — any wallet team can adopt them:

Stage	What Happens	Output
1. Self-audit	Team runs benchmarks against their wallet	Score card (Level 0-3)
2. Publish results	Score card made public	Comparable safety claims
3. Peer review	Independent team verifies score	Validated safety level
4. Standard adoption	Multiple wallets benchmark to same spec	Industry safety standard

The goal is not certification. The goal is comparable, evidence-based safety claims that users can evaluate before trusting a wallet with their assets.

Context

Standards — Why standards reduce variance and enable composability
Benchmark Standards — Parent benchmark protocol and trigger logic
Blockchain Benchmarks — Chain-level performance benchmarks
Sui Wallet Safety PRD — The Mycelium capability that implements these benchmarks
Wallet JTBD Superset — Failure register and capability register
Sui Safety Patterns — Reference implementation (Sui)
Solana Safety Patterns — Reference implementation (Solana)

Benchmark Checklist (Landing Surface)​

Three Implementations (Comparison Track)​

Value Role​

Core Benchmarks​

1. Connection Safety​

2. Transaction Transparency​

3. Destructive Operation Protection​

4. Asset Visibility​

5. Asset Operations​

Scoring​

Certification Mode (Deterministic)​

Aggregate Score​

Chain-Specific Considerations​

Operating Cadence​

Adoption Path​

Context​