In a data-driven organization, choosing the right Database or Database Management System (DBMS) is one of the most critical decisions to support end-user requirements. Choosing a DBMS system shouldn’t be neglected and all options should be investigated - from more traditional systems, like SQL, to less rigid ones, like NoSQL, or hybrid versions with a multi-model database. This article supports the database paradigm decision process by supplying four tracks to create an analogous data model to the reflect reality. 

Avoid a lock-in choice

We can save time and money by avoiding a lock-in choice. Requirements may change as priorities and solutions evolve over time. As architects, we are driven by our understanding of the problem we are looking to solve. This vision is limited to the actual knowledge and will require iteration to deliver the final solution. Blocking the development until a holy grail is found is usually not feasible. It can even be counterproductive due to missed end-user feedback collected during feedback loops. A layered approach, based on abstraction, can prevent loosely coupled business logic and storage. The database should become a detail which supports the data model and the reflection of reality. The data model is more significant than the DBMS system. Having a database that serves the data model is fundamental (Martin, 2018).

DBMS migration and code refactoring is always possible but can take a lot of effort. Making sure rights decisions are taken in time and well-architected solutions are set up by respecting guidelines and frameworks can be a crucial step towards the success of a project.

ACID, BASE, or something else? Going over stereotypes

Choosing a DBMS to follow the (Atomicity, Consistency, Isolation, Durability) or BASE (Basic Availability, Soft-state, and Eventual consistency) principle should not limit your choice. While these principles give you a sign about the system’s main concern, mitigation can be put in place to reduce their impact on your data. Database systems are generally split into two categories, relating to the SQL (Structured Query Language) language to interface with those and NoSQL or “not only SQL” for all database systems where the relation is not the only concern. Both types of systems had their advantages and disadvantages until the concept of multi-model polystore appeared, which found a compromise to query and integrate multiple, heterogeneous, storage engines. The first type follows ACID principles where the main concern is to keep data consistent. Meanwhile, “not only SQL” (NoSQL) stores data in other types of storage than relational tables. Different data models can be found under the NoSQL umbrella. Most known are key-value stores, column-oriented stores, document-oriented stores, or graph databases and follow BASE principles. The NoSQL main concern is to retrieve and quickly store data. A polystore consists of multiple data stores with a heterogeneous storage engine that is integrated across all systems. Two examples can be given, such as BigDAWG or Typhon Polystore. The advantage of a polystore instead of hybrid storage is the possibility to make existing data models in a paradigm communicate with a unified interface. Does this mean polystore follows ACID and BASE principles? Polystore relies on underlying principles, meaning different principles can co-exist in case several types (SQL and NoSQL) need to be used simultaneously. Most of the time, the overall system inherits its approach from the least restrictive principles, BASE, for multi-model stores.

“There is no data model in a NoSQL database”

Having an implicit data model does not mean having no data model. Both SQL and NoSQL have a data model behind their data. Relational databases have an explicit schema declaration before ingesting data into their systems. In contrast, NoSQLs have an implicit data model. A clear structure can be recognized by looking at the data from a meta point of view (Kleppmann, 2021). Neither option should affect DBMS selection. Data can be stored in all systems but constraints on data are not the same from one system to another.

The cloud, a disruptive element

Tech giants have disrupted the market by offering scalable cloud services. Even if - in theory - DBMS respective principles stay the same in a cloud environment, cloud solutions have pushed boundaries back by enabling trivial scaling. The transfer from an upfront investment paradigm to a pay-as-you-go approach has reduced the cost weight in the total cost of ownership of a solution. Owning DMBS and/or scaling a DBMS makes it possible to perform more complex querying later. Limitations like reducing to the minimum the number of joins on a SQL query have now less impact than before on the data model. Cloud solutions have also trivialized multi-model databases. The engine can store diverse types of data like Cosmos DB (Azure). Boundaries have been pushed but not removed, limitation of system still exists and choosing the right DBMS is still a fit-for-purpose exercise. 

Not ‘one size fits all’

There is no DBMS that can cover all use cases. From ideas to business requirements, to applications, to technology, all are designed to serve end-users. It’s possible to avoid making costly errors by loosely coupling the data model with the DBMS, choosing an explicit or implicit data model, and using cloud capabilities, but in the end it’s still a trade-off between advantages and disadvantages and it will be for a long time.

 

1° Robert C. Martin, Clean Architecture: A Craftsman’s Guide to Software Structure and Design, p 277-283, 2018, Person Education

2° Martin Kleppmann, Designing Data-Intensive Application, p63, 2021, O’Reilly

 

Author: Sébastien Marbehant