AgilData Zero: Zero-Knowledge Encryption for MySQL
Today we are announcing a new open source project for protecting MySQL databases from security breaches. The scale and frequency of data security breaches crowd the news these days and are impacting the largest brands – both from brand reputation damage and often a direct financial impact due to regulatory fines or loss of revenue.
AgilData Zero implements Zero-Knowledge Encryption, which simply means that the database stores and operates on encrypted data with zero knowledge of how to decrypt that data. The database server never has access to the encryption or decryption keys. This is the critical difference between AgilData Zero and traditional approaches to database encryption. If the database is compromised and a hacker is able to log in and run SQL queries, the result sets would contain encrypted data thus creating another barrier for penetration.
TL; DR – Show me the code!
Why AgilData Zero
The AgilData Zero advantage:
- Sensitive data is always encrypted in the database
- The database never has access to encryption or decryption keys
- If the database is compromised, no sensitive data is exposed
The business benefits:
- Sensitive data can be hosted in the cloud and shared securely, enabling new revenue opportunities and reducing operational costs
- New data protection regulations, such as GDPR, can be met with minimal impact to existing applications
- Increase the trustworthiness of your brand by going beyond compliance and offering the state of the art in encryption
New data protection regulations, such as the General Data Protection Regulation (GDPR) in the EU will require companies to declare any data breaches within a certain timeframe, but only if the data is unencrypted. A zero-knowledge encryption approach is the only way to ensure that a database breach does not reveal encrypted data, because the data in the database is always encrypted.
How does this compare to existing database encryption features?
Most commercial databases already offer encryption features, with a focus on encryption “at rest” (the files are encrypted), and “in transit” (the communication channels use SSL/TLS). However, anyone logging into the database and running SQL queries has full access to the unencrypted data and can easily run a “SELECT * FROM customer” query and save the results to disk in clear text.
How does AgilData Zero work?
AgilData Zero acts as an encryption gateway, either co-located with the application server, or running on one or more dedicated servers. Each query passing through the gateway is examined and compared to the encryption schema to determine if any literal values or bound parameters need to be encrypted before the query is passed along to the database. Likewise, any encrypted columns returned in the result set are decrypted before the results are returned to the application.
Doesn’t this just move the problem?
A breach of an unencrypted database results in all resident data being exposed.
In an AgilData Zero architecture, if a sophisticated hacker gained access to an application server then they would potentially be able to see in-flight data on that one server. This is orders of magnitude better than gaining access to the entire database.
Supported Encryption Algorithms
AgilData Zero currently supports the following levels of encryption:
- Clear text
- Data is not encrypted
- The database can operate on the data without restriction
- AES-256 with unique initialization vector per column
- This is a form of deterministic encryption where encrypting the same input value multiple times always results in the same encrypted value
- Supports equality operations, allowing the database to filter (WHERE ssn = ?)
- If two columns share the same initialization vector and key then they can be joined
- Not suitable for low-cardinality data since this encryption is deterministic e.g. for a gender column storing M or F, there would only be two encrypted values
- AES-256 with unique initialization vector per value
- Non-deterministic encryption. Encrypting the same value multiple times results in a different encrypted value each time.
- More secure than using a fixed IV but no support for equality
- Database can include column in projection but cannot operate on the data
There are other forms of encryption, such as order-preserving encryption that go beyond just supporting equality operations and support range queries and sorting. We have chosen not to implement this encryption method at this time since it is a weaker form of encryption that should be used with extreme caution, although it can be suitable for some use cases.
The choice of encryption scheme will vary from application to application. Sensitive columns, such as social security number or credit card number, typically do not need to support range queries or joins, and should use the strongest encryption available.
Performance
The encryption gateway adds additional overhead per query because of the need to parse the query and potentially rewrite the query. Also there is the overhead of encrypting literal values in statements and decrypting data in result sets. Performance will vary greatly depending on the application and the chosen encryption scheme.
We used the TPC-C benchmark to measure performance of AgilData Zero and found that the overhead for this particular test was around 20%. We have not spend much time profiling or attempting to optimize code at this point.
We chose to use TPC-C for this testing as it is representative of real applications compared to micro-benchmarks that measure a single database operation.
What is the status of the project?
AgilData Zero is currently a proof-of-concept. The main limitations currently are:
- Subset of MySQL syntax supported (just enough to run a TPC-C benchmark)
- Depends on rust-crypto which has not been verified as secure yet
- Query planner only handles subset of validation required to ensure no unencrypted data can leak to the database server
We are releasing this product as open source to generate feedback from the community and to help drive the roadmap for future direction.
What is the roadmap?
We use github issues to track the roadmap for this product. Some of the major themes are:
- Add query engine in the gateway to allow for increased functionality against strongly encrypted data
- Add support for caching unencrypted index data in the gateway to support efficient range queries and sort operations on sensitive data
- Improving coverage of MySQL SQL syntax
- Develop tools to make recommendations for encryption schemes based on current query access patterns
How can I get involved?
There are a number of ways to get involved and contribute to the project, from testing with different applications and different versions of MySQL, to submitting pull requests for new features and improved functionality.
AgilData Zero is implemented in the Rust programming language and the source code is available at https://github.com/AgilData/agildata-zero, with documentation at https://agildata.github.io/agildata-zero/. AgilData Zero is distributed under the Apache 2.0 license.
Binary releases are also available, with a tarball available for Linux and also a Docker image for any platform supporting Docker. See the installation instructions for more details.
Is this a good fit for my company?
AgilData offers consulting services and can help you determine if AgilData Zero would be a good fit for your organization and can provide help with proof-of-concept projects.