Grammar and Data Management | 2018

One of the largest challenges for analysts that are developing data management plans are ones that involve organizing and making sense out of data that allows for free form text. Any time users are able to enter information into systems with free form text they inherit mistakes that are natural to human input. Typos, spelling mistakes, and grammatical errors can be found throughout open ended data.

These challenges can make it difficult to organize and account for all variations of data provided by users. A simple typo can change someone talking about a “Tree” to being entered into the data entry fields as “Three”. This typo of situation can only be minimized by developing systems that have a grammar check, spell check, and format check built into the data entry fields. With these elements in place users will be able to review and check their work before submitting to larger databases of information.

The first thing a data management analyst should investigate when starting a project that includes data with open ended input is to see what checks and safety mechanisms are in place to account for possible errors. If there are not sufficient checks in place than it is the responsibility of a data manager to coordinate with project and system developer leads to push for the implementation of these checks.

If adding these checks are not possible an analyst should prepare a risk document that walks stake holders through the possible errors and impact the lack of these checks may have on their overall system. This maybe a difficult document to complete but should include the fields where data may have possible errors, examples of what these errors may be, and possible solutions to identify these errors when including the data in a higher enterprise data management plan.

Data Management plans should clearly identify all systems that have risks of including data with grammar, spelling, and typing mistakes. If this is not included in the overall data management plan than long term and costly errors can be made by systems and decision makers.

How does your organization account for these type of data management trials?