When joining systems together, we have to take their differences into account. Based on experience with various clients, we have described some complex cases – as thought experiment to consider when crafting new integration solutions.
When joining systems together, we have to take their differences into account. Based on experience with various clients, we have described some complex cases – as thought experiment to consider when crafting new integration solutions.
The first case is when data from a source system that you want to link might be structured differently or needs translation via mappings or code-value lists before it can be processed. These mappings need to be spelled out in detail. However, more complex issues arise when the data structure is different and a one-to-one mapping can’t fix this. When the source system is delivering the address field as a single string with street, number and postfix joined together and your own systems require split data because you only work with postal code, number and postfix. Other problems might occur when a required field in your system (like Gender) is not required in other systems, are populated with other values (e.g., X instead of M/F) or are of a different length (e.g., remark fields in source are 255 but cut off to 100 in the target system). Addressing these differences is key to improving the integration solution design.
When improving a global system at a large bank, it was decided that the country name should no longer be written in full (e.g., “Nederland”) but should comply with the ISO standard for countries (ISO 3166-1 alpha-2) to support internationalization of country names in other languages and systems. The side effect of this update was that all data of all customers would be updated in the source system, leading to a chain of questions: Should all linked systems also comply? Should this be done via the default API for updating customer data? How do we prevent that, when we update the source field to the new ISO standard, a message burst with millions of record updates is triggered? Is this change a real change (should the DateUpdated be reset?) or only technical?
The same considerations are to be evaluated when introducing new fields in a request/response: Can this property be NULL when initialized, or should it be defined with a default value? Who defines or derives the logic to set this value? What if new regulatory requirements force extending on the information provided but the source systems cannot implement the logic for the derived data in a structured manner? Should alternatives be given to support sending additional information and derive and implement logic elsewhere?
We want systems to be linked and working together, but we don’t want larger piles of garbage and to replicate a haystack to a new system. When joining systems together, the first goal is often to integrate as much as possible. Reality usually kicks in when we learn that old data (e.g., old customers) might not be compliant with the new system or cannot fit. Invalid data from a bygone era should not be integrated into the system. To prevent issues, you have to understand the data quality, create assessments and set limitations on what to merge and when to alert that some parts are intentionally set to fail. Validation rules might be active for current records but sometimes need to be disabled for historical (incomplete) data.
Refugees coming to the Netherlands do not always know their exact date of birth, as this is not registered or just not even known. For this reason, there are approximately 70.000 people registered with only a year of birth (as they usually know their age). Not all systems can handle this, and organizations such as CBR, UWV and the Dutch Tax Office have trouble matching these records (Wijland, 2015). To prevent this in the future, legislation was changed and new people were assigned July 1st as date of birth. However, it was decided not to revert or update the existing records, so this is still the case. In integrating systems, we have to decide on how to handle these kinds of edge cases: How do we store this (as a date or not)? Or should we create a separate marker? If the system is sending a message and our systems need to send a confirmation, are we sending the 00-00-jj date back or do we reconcile differently?
In original data structures, we work with relational databases (RDBMS) to store data. This has worked for decades and can still work for current and future applications. New developments in streaming messaging and multidimensional RDF- and OWL-structured messaging require a different approach for integration. Graph databases are structured differently: in RDBMS systems, your tables contained the data, and primary and foreign keys held the structure. In a graph structure, the triplet data stores data on the subject, predicate/property/relationship and object. For example, when a book has multiple writers in a RDMBS, this can be stored as a 1-n (one-to-many) relationship. In a Graph structure, there will be only one book, with multiple links to different authors. In an interface that updates data on this book, all fields must be re-evaluated, to determine fields (properties) that have been added, changed or deleted, or a new version of this record should overwrite the previous one.
Handle with care and know what you are handling. Not all data has the right quality, the right structure or the same pattern. To understand both your source and target system(s) and their triggers is key in integrating the correct data and structure, and distributing your information across your organization.
For more information, please contact Joost Koedijk or Jochem van Galen.