Define Data for better maintainability & Utilization

Before working with data, precautions need to be taken to define it clearly. Data about data is termed metadata. A person may be identified by different names at home, office or on official records. Sometimes friends only know each other by their nickname. Similarly, different pieces of data, referred to formally as data elements are referred by a name or names along with other descriptions that help users understand available data. This is to provide clarity to users and businesses on the importance and use of each piece of data also referred to formally as a Data Element.

Characteristics of clear Data Element Descriptions

Data Element Name

The name should be self-descriptive. FirstName, LastName, City, Zip that we commonly encounter are all examples of Data Element names. Data Element names should be unique within a database which means FirstName cannot be the name of any other data element in the same data base. The ISO/IEC 11179 metadata registry naming system offers guidelines to standardize Data Element names for international recognition, portability and ease of use.

Data Element Definition

This is a phrase which is a more detailed description of a data element. Good data definitions enable better use of data by users as it helps them understand the true nature of each data element. Clear data definitions also promote better data sets through meaningful combinations with other dependent or relevant data elements. Data Element Definitions describe the contents of the database plainly, enhancing its use and value.

Representation Terms

A Representation Term is a word or a word combination which  denotes the data type. For example the suffix ‘ID’ in the data elements ‘EmployeeID’ or ‘ProductID’ is a Representation Term denoting identification of a person or object. Another example is the suffix ‘date’ in the data elements ‘employeejoiningdate’ or ‘employeeleavingdate’ denoting the date of an event. Representation terms are used to group or classify similar data elements.

Codes & Values

A data element can attain several coded values. For example the data element ‘NewYorkdaytemprature’ can have three values ‘H’ for high, ‘M’ for medium and ‘L’ for low. When codes such as H, M and L are used, each code should be described to explain its true meaning. Just as an example H could stand for temperatures between 30 Degrees Celsius and 40 Degrees Celsius, M for temperatures between 18 and 29 degree Celsius and L for subzero to 17 degree Celsius.   

Synonym Ring or Synset

When multiple databases are queried for relevant information, similar data elements could have different names in different databases. For example ‘ZIPcode’ could be ‘PINcode’ in the other database. The Synonym Ring or Synset is a set of the similar data elements with different names in different database. Synsets are useful in retrieving data from different database which have similar data elements with different names.

Creating and maintaining data about data, metadata definition, is an important practice which, when applied wisely can vastly improve data ‘Cleanliness’ and its helpfulness for marketing or any business activity.

More Tips


Data Cleansing

Fast Simple Cost Effective