After taking these two steps, you will have produced a master list of data. The next step will be to decide how you want to manage that data.
Master Data Management: Where to Begin
Once you’ve decided that master data management is the solution for your organization, what is the next step? For one, you’ll have to decide whether you want to purchase a tool or if you want to create and maintain something on your own. There are just two simple steps to creating master data. First, the data needs to be cleaned and standardized.
Next, you must match all data from every source within the organization and consolidate. The purpose of the second step is to eliminate duplicate data. However, even before you can take the first step, you need to understand the master data’s model. The contents of each attribute need to be defined, and a map will be made from each application to your data model. After you have taken this step, you can define the necessary transformations to clean the source data.
Cleaning and transforming data
This step is very similar to the Extract, Transform, and Load processes used in data warehousing. If you are already familiar with these processes and have your Load tools defined, you may find that you can take a bit of a shortcut. You may be able to just make a few modifications in order to use the load tools for master data. Otherwise, you will have to learn a new tool. When cleansing data, you will need to familiarize yourself with the following functions: normalizing data formats, replacing missing values, standardizing values, and mapping attributes.
If you have a tool for this job, it will most likely cleanse whatever data it can, and then deposit the rest in an error table. Each output should be thoroughly inspected after the source is cleansed so you can make sure the process is running efficiently.
Matching master data records to eliminate duplicates
This step is by far the most difficult for any organization to overcome, but it is also the most important. If you find that you have false matches, you can actually lose data. For example, if you have two companies with the same, or very similar, names, you could stand to lose one complete entry if they are identified as being the same. If you have unique identifiers for all of your data (for example Social Security numbers for your customers), matching data will be easy. Unfortunately, this rarely is the case. That is exactly why matching algorithms end up being so complex.
The more attribute matches you require, the higher the likelihood that the MDM system will make an appropriate match. You can set a threshold that needs to be met in order to be accepted. The threshold should be set very high for cases in which the consequences of an incorrect match are very high. Keep in mind when setting the threshold that your data steward will only be prompted to review matches that fall below it.