Architecting software is the process of converting software characteristics into a structured solution that meets both the technical and the business requirements. In digital asset custody this is certainly no easy task. So, here are the broad strokes and the concept of an institutional wallet. By the end you should know more than the basics of keeping private keys safe and having some transaction rules. In a follow up piece, I will move from concept to implementation using Cordial Treasury and industry models. This will show how different deployments introduce different risks and responsibilities.
So let’s dive in, initially we will break it down into different components. By appropriately defining these, we will have closely related data and functionality existing in well packaged services. Having a great digital asset custody solution means having great solutions in all of these areas.
You have a few different components to consider, similar to how organizations are broken down into smaller teams which collectively make up the whole. Each component shouldn’t be too large or heavy with varying functionality and here are some of the more important ones, starting with the security critical components.
Fundamentally there are two goals to ensure proper safeguarding of digital assets:
Anything that relates to these 2 goals is deemed security critical and could result in devastating loss. Let's explore how we can design a solution that satisfies these goals.
This module should cover all functionalities related to cryptographic keys (generation, storing, destruction, and replacement of keys). Often keys under custody are "long term secrets", meaning they are not easily rotated and should be kept safe for a very long time. This is because they consist of a full or partial key to some cryptocurrency or other blockchain native financial asset. If compromised, the funds are gone. Whereas other "short term" credentials like API keys, or user logins, can be rotated easily and do not directly hold funds.
These blockchain long term secrets are much more consequential and normal key protections are not sufficient. The use of hardware security modules (HSM) or trusted execution environments (TEE) is common, but the trade off is that they don't support policies and is hard to audit or update -- usability of the key is just as important, and being able to support new blockchains is important too. The “same actor principle” should also be obeyed - the user and creator of the key should be the same. This has more to do with appropriate confidentiality and guarding of secrets. Lastly, you do not want the key to be a single point of failure and will want multiple untrusting participants to sign. Typically this involves some threshold signature scheme which allows for key generation and signing to happen in a distributed manner, without combining all key shares. Here you can think of multi-party computation (MPC).
From a scalability perspective, your business teams will need these keys to produce signatures across different blockchains following different signing algorithms (ECDSA, EdDSA…). That way you can sign transactions across as many blockchains as possible (bitcoin, solana, EVM,...) and not limit your business opportunities to transactions on a select few chains.
This can be thought of as your permission layer. In the simplest terms, any operation on your custody product should be governed by policies and if no explicit policy exists, then the response to a request should be denied (“default-deny”). Remembering goal 2) above, in the absence of policies, or default-deny, any MPC or TEE is useless - your weakest short term credential could lead to total compromise. Policy covers everything related to traditional identity access management (IAM), governing how the keys are used, and any business logic or rules that transactions should obey.
You will want both human and programmatic users of your custody product to authenticate themselves to the system using strong authentication (WebauthN, Yubikey, TPMs… not passwords or simple API keys). It will be the job of the policy engine to validate these credentials and make a number of assertions: is this authentication signature valid, does this user have the appropriate permissions/ role, does this request require additional approvals or additional steps.
Policy should be able to expand to cover activities on chain, like transfers and staking. By setting explicitly what is allowed, i.e. what has been signed off internally by risk and compliance, through the use of whitelist/allowlisting. The more granular and flexible the policy engine, the better this can scale with your operations and automate controls to follow a straight-through-processing (STP) flow. Here are some examples of what is typically allowlisted:
In managing your digital assets, or those of your clients, you will want to backup the private keys. Backups are just as important and could be a single point of failure. Therefore it’s important to first encrypt the backups and escrow them. It is also important to keep in mind the ease of testing the backups. This will protect your assets but doesn’t solve an outage problem. For that, you also need to backup the state of your system. If the system fails and you need to rehydrate, then you want to restore to your latest backup point. Ideally as close to real time as possible or whatever your Recovery Point Objective (RPO) is deemed to be. By taking periodic snapshots, when the time comes you can spin up the system again with your existing users, policies, wallets, and other resources - in effect recovering your primary system and control environment, alongside the digital assets.
These are not security critical. If they fail then your goals of achieving 1) and 2) above will not be impacted. Instead you might have an availability issue which causes an interruption in the broader custody service. That could still be a major issue and require an incident report, but at least there is no risk around loss of funds.
Essentially creating the transaction packet that will be signed and broadcasted to the blockchain e.g. sending that 100 ETH to Exchange X which you intended to, or meeting a customer withdrawal for 0.1 BTC. The transaction builder should only build from inputs that have passed policies. If it doesn't, you are blind signing and it's a huge security risk (see below). This also comes back to the need to support multiple signing schemes for all the target chains you want to support. For example, a wallet supporting just the ECDSA signing scheme and EVM transaction standard will be limited to Ethereum, its L2s, and its side-chains (e.g., Metamask wallet). On the other hand, a wallet supporting both EVM and SVM (Solana VM) will be able to support both ecosystems (e.g., Phantom wallet).
Now for a major health warning! The cryptocurrency industry has internalized “blind signing” as an acceptable means of transaction creation and signing. What this means is that in a lot of transactions it has somehow been deemed acceptable to sign a transaction which is not presented in a human readable format, hence the term blind signing. You sign and hope that it appears on-chain in your intended destination address, because you wouldn’t know if someone manipulated your transaction details before signing. Therefore, go the extra mile and make sure you are only doing “clear signing”. Only signing transactions which are human readable and confirmed as correct. The industry is moving in this direction and is one of the biggest risks not being talked about enough.
JSON RPC (Remote Procedure Calls) remains necessary for writing data to the blockchain. These nodes are the only way to interact with blockchains as part of executing the transactions you built and signed. Therefore, you will need RPC nodes for all the blockchains you want to interact with. Despite its widespread use, JSON RPC has several limitations. Part of the problem is it typically involves making calls to a full node. This type of querying does not scale well with complexity or volume, the process can be slow and resource-intensive. It is primarily designed to handle straightforward queries. If you want to know your token transactions for a specific token or broader wallet transactions, you shouldn’t query that using a node - it’s not what they were built for. Enter the blockchain indexer.
This addresses the challenges commonly associated with RPC nodes. Blockchain indexers extract data from blockchains, transform them, and index them into more accessible formats. This allows for more efficient querying capabilities and faster data retrieval, significantly improving the experience of accessing blockchain data. In offering your blockchain based application to other institutional participants, or even an internal interface for your Ops team, you will need indexers in order to have a more application friendly real-time data interface, run reports, and other middle or back office tasks.
A core responsibility of custody is reconciliation. For example you need to reconcile the transactions and account activity in the software application with the on-chain settlement data. This could be a chain of many different events, all happening in asynchronous systems (i.e. your software vs blockchain). One example would be reflecting incoming deposits, or relaying back to the software that a transaction has finalised on-chain. Whether polling message queues or getting push notifications, it’s important to have event driven architecture and use service-to-service communication techniques to decouple services from one another. This allows for two services to communicate and yet not be dependent on one another in case one fails.
You want to be able to create, update and delete users. People join teams and leave teams all the time. But that’s only half the story. While you can create a user directly in the custody technology system, this is also a bit like adding a taxi to a taxi rank. It needs to be paired with a passenger to give it greater meaning. In this case you will want to impose Microsoft/ Google single sign on (SSO) for your corporate directory and link those employees to a user in the custody application when they are approved to be added. You will also want to create unique permission sets and roles which are tailored to your operating environment.
This is perhaps a more general point in that you want to instrument your product and keep an eye on the health of the system. However, you will want a centralised observability platform to receive log files and metrics. In effect "all of your software" writes logs, metrics, traces to known places where they can be picked up by the same telemetry collector/forwarder sidecar for further use. Besides the health of the different components in the system, you will also want to maintain audit logs for your books and records. A lot of financial institutions are mandated to maintain these for several years and bear in mind a “write once, read many” approach to transactions. This will be inclusive of the path taken that effectuated the transaction: users, approvers, policies checked, timestamp etc.
Translating design into effort is where the real work begins. Thinking about how to appropriately build the features and what tradeoffs you can take, along with any considerations for security hardening. You should be thinking about all the usual aspects of fault tolerance, maintainability, speed, scalability, and ease of implementation. Luckily, in a follow up post I will explain how all of this is addressed by Cordial Systems through the example of our Cordial Treasury product, bringing in industry practices and alternatives where relevant, and how taking a forward looking risk assessment informs the best ways to build viable wallets today. Stay tuned!
Reach out to [email protected] to discuss further.