Apache Cassandra® is becoming the best database for handling JSON documents. If you’re a Cassandra developer who finds that statement provocative, read on.
In a previous post, I discussed using data APIs and data modeling to mold Cassandra into a developer experience more idiomatic to the way developers think, thus improving developer productivity while preserving reasonable database performance and scale. It’s a great hypothesis, and one that needs to be tested in the context of a particular developer idiom and developer community.
In this post, I’ll discuss how to provide a developer-friendly JSON idiom using Cassandra together with Stargate, and how we’re working to do just that for Mongoose developers.
The Goldilocks of JS communities
In October 2022, we released a new version of Stargate. With the new Version 2, individual APIs are no longer embedded in the core Stargate coordinator code, but instead separated out into individual services. This improves Stargate’s operational efficiency; individual API services can now be deployed and scaled independently. This also makes new API services easier to develop. As long as they abide by the service boundary, these services can be developed in parallel with and independent of core Stargate development work.
- It enjoys broad adoption, with roughly 2 million GitHub repositories listing Mongoose as a dependency
- Mongoose creator Valeri Karpov’s active leadership provides a clear focus
- It’s an open source project that has lacked an open source database since MongoDB’s decision to move to a shared source model with its Server Side Public License
Developers don’t really interact directly with a database so much as a data model. In Stargate’s original Document API, the API handles JSON by making it look like a traditional Cassandra table. This puts a burden on JSON-oriented developers to think in terms of Cassandra data structures, and puts a burden on Cassandra’s row-oriented indexing logic because a JSON document gets spread across multiple rows.
Our new JSON API departs from this data model, and instead relies on a data model we call “super shredding.” You can learn more about super shredding at Aaron Morton’s talk at Cassandra Forward, a free digital event on March 14. In short, we take advantage of Cassandra’s wide-column nature to store one document per row, knowing that a Cassandra row can handle even very large documents. We also have a set of columns in that row that are explicitly for storing standard metadata characteristics of a JSON document. Now we have something more easily indexable, as well as a means of preserving and retrieving metadata.
We will then front this data model with our new JSON API, using the same mQuery specification that Mongoose uses as our guiding requirement for which calls the API needs to support. When complete, this should enable any of the more than 2 million Mongoose-dependent applications to run against open source Cassandra or DataStax’s hosted Cassandra service, Astra DB, with just a configuration change.
Mongoose creator Karpov will also speak at the Cassandra Forward event, demonstrating a simple e-commerce application that uses the Stargate version of Mongoose, open source Stargate and the DataStax Enterprise (DSE) version of Cassandra. You’ll be able to download the working code for this application and the supporting platform pieces from GitHub. While we have enough code to run this application, we are not yet code complete. For example, we run against DSE right now because we need storage-attached indexing (SAI), which works with DSE and is planned for release in Cassandra 5.0 later this year.
Contributing back to Cassandra
Cassandra isn’t a static piece of software; it’s a vibrant and evolving open source project. So we are also continuing a longstanding Cassandra tradition of using features like SAI that emerge client-side to foster changes on the database side. Similarly, Stargate’s Mongoose work has prompted a set of proposals for Cassandra around global sort and advanced query filtering that will not only make Stargate’s JSON API and Mongoose client better, but will add powerful new features to Cassandra Query Language. This is a great reminder that data engineers and application developers are not two different communities, but complementary cohorts of the extended Cassandra community.
And JSON is just the first step. Essentially, what we will have done is to take the building blocks of Cassandra, Stargate and a reasonably efficient Cassandra data model and build a document database that you interact with through a JSON API. In other words, we’ve used super shredding to create a purpose-built database that better serves the community of Mongoose developers.
With the modular architecture of Stargate v2, and the proof point of Mongoose for the idiomatic approach, we are ready to take on new developer communities that organize around a particular software development idiom. The process by which we’ve harnessed Cassandra for Mongoose is repeatable – and it’s one that we will repeat. In so doing, we dramatically expand the number of developers and use cases that Cassandra can address, which is the sort of goal worthy of an open source project.