VASPs... meet DORA. Operational resilience.

June 19, 2024

If you are doing MiCA, you have to do DORA as well. That is the message, and it is rapidly getting more attention. Sharing my thoughts on the matter as someone who has spent the last 6 years involved in digital asset custodians, securing regulatory licenses for them, and also building a client side custody tech business. This 5 minute read will drastically increase your awareness and give a steer on how to approach the readiness assessment phase. Character limits might be good for my verbosity, but I’m always happy to talk in more detail with anyone else looking at this.

Whilst the Markets in Crypto Assets Regulation (MiCA) contains some ICT requirements for operators of trading platforms specifically, the Digital Operational Resilience Act (DORA) will broadly capture all crypto and many tokenisation firms in the EU - any firm authorised under MICA, will be subject to DORA. This is part of a growing trend globally by financial regulators. So whether you are subject to NYDFS part 500 in the US, or the PSA in Singapore, the specific details may not apply but this will still be directionally correct for your firm.

https://www.eiopa.europa.eu/digital-operational-resilience-act-dora_en

Planning for Operational Failure

It is best to assume that you will at some point have to deal with failures and it will be exactly at the time when you don’t want it to happen. Ensuring that you are operationally resilient starts with identifying important business services that, if disrupted, would impact customers or otherwise harm the firm. The importance of this holds for all institutions, while regulated financial firms have a higher bar. Therefore, you will want to test the operational resiliency of any business critical IT that you run locally as well as any purchased from service providers or vendors.

Critical IT infrastructure

By any measure, technology solutions for digital asset custody will be deemed critical IT infrastructure. I could write a whole separate piece on how to think about appropriately securing private keys and the policy engine. Instead let’s put forward the postulate that these are important components to the delivery of safe custody. Operational resiliency itself starts with focussing efforts and resources on achieving the continuity of important business services such as these. If the signing keys and policy engine live on servers, then it should be explored whether this is running on one primary server and an active backup is running in parallel; and whether these are your servers or on the vendor side.

Servers & Data

Better yet, are there multiple active primary servers all running the same software under the hood. State machine replication would be one example of building a digital asset custody system with operational resilience built in by design. It is very resistant to operational faults by virtue of having multiple servers all independently running the custody software processes. Such a setup might have a coalition of 4 servers and only ever require 3 to be online to conduct normal business operations. This would provide an impact tolerance of 1 server failing, if server number 2 fails then there is a halt in service but security guarantees remain intact. It also provides multiple locations for maintaining synchronized books and records. Let’s not forget that your books and records likely need to be stored for a minimum period, usually several years. They also generally follow the “write once, read many” approach in the compliance world. Therefore availability of data is crucial. Being savvy risk officers, of course we are not forgetting about our data residency concerns - do you really want to, or are you even able to, trust a third party provider with maintaining that.

Going further, a stronger setup would be to assume that any one or more servers could be corrupt and acting maliciously at any time. This goes back to principles of zero trust. A server should be able to run custody software processes and not have to trust the other servers that it is communicating with. This goes beyond common operational failures and provide resiliency to more malicious threats, both internal and external, providing what is known as byzantine fault tolerance. If there is any tampering with keys or policies from any single party, it will be rejected. Therefore, you should know if the policy database and accompanying custody processes exist client side or server side, and if there is any helpful distribution or replication.

Lastly, clearly in this network design there are some number of servers communicating with each other. It can be further bolstered by applying best practices of cybersecurity defense in depth to protect an organization’s assets. Are they operating on a firewalled VPN such that outsiders can not spam or attempt to intercept traffic on the network? Are these servers communicating to each other across an encrypted channel? The traffic is sensitive data, you don’t want outsiders seeing your flow of funds or potentially spamming and causing a denial of service.

Recovery Time Objective

Ensuring operational resilience and business continuity is great, however you should be prepared for the worst. Despite best planning it is possible that at some point you experience an interruption to the custody service or a disaster situation. The more critical the service, the shorter your recovery time objective (RTO) will be. Appropriate handling of the situation and incident management can save you from reputational damage or repercussions from clients and regulators. First is to understand the responsibility model. Are you responsible for the critical business service, or is your custody vendor. Where possible, you will want to have control and direct access to fix issues. It is much easier to plan and perform “dry-run” exercises when you can run them end to end. Performing a disaster recovery dry run should be part of your vendor selection process. Know that it exists and works in practice. Do not rely on paper based compliance detailing what should happen in theory. Test it out.

Now for a cautionary postscript. Do not be easily swayed by escape hatch or disaster recovery services. Leaning on the argument that if the worst were to happen to your provider you can always recover out of them since you had a 3rd party disaster recovery service promising a 24 hour SLA. You may have even tested it out yourself before onboarding. You saw it works. However, there is a big difference between an isolated recovery event of your account when compared to a systemic vendor failure where you are perhaps joining 1,000 other firms trying to recover their keys from the same DR solution. Assuming that you are even comfortable with a 24 hour recovery time (you might be better off doing self back up and recovery), when there is a rush for the exit door you might find yourself waiting days or even weeks to recover your keys…

In summary

This is just a quick tour of some of the risk items to think about and should equip you with the appropriate mindset to approach DORA. If you have identified that a vendor is responsible for maintaining a critical component or business service (as is the case with SaaS based products, particularly for self custody) then you will need to dive deep on understanding how the vendor is upholding their end of the bargain. Sitting on your hands when an issue is happening live is an uncomfortable situation and not a form of risk mitigation. You will need to be certain that service can be resumed in a timely fashion without a loss event. Operational resilience is not just limited to technology. Firms should also identify and document the people, processes, and information needed to deliver each important business service. It is not uncommon for a regulated financial firm to have audit rights over a vendor. Audit rights to cybersecurity programs, disaster recovery, business continuity plans, data centres and more - with a set remediation period to correct material deficiencies. At the end of the day you can delegate and outsource tasks, however you cannot delegate your regulatory responsibilities or fiduciary commitments to your clients. For this reason, and many other good ones, a lot of institutions are moving outsourced solutions back in-house; particularly mission critical IT such as digital asset custody.

Reach out to seb@cordialsystems.com to discuss further.

Take back control.
Join the growing number of organizations opting out of pure SaaS wallets and taking control of their security back in-house.
Success! One of the team will be in touch.
Oops! Something went wrong while submitting the form.