So, your HR data is dirty. It includes issues such as missing values in people’s information, typos that corrupt categorical variables, wrong labeling, duplicate records, errors, or records that you neglect to update. And so, you live with that. You accept the inevitable reality that you’ll never have the perfect data in HR. Sadly, the messy data in HR is sometimes an excuse to avoid People Analytics projects, hence, missing the opportunity to impact. However, HR data cleaning is part of your People Analytics Journey.
Start with imperfect data assets
But what if you start with what you have, i.e., your imperfect data assets? Your data quality must not be a barrier to a project. On the contrary! An analytics project is a practical step towards better and cleaner data. Whether you take a DIY (do it yourself) approach or implement a people analytics solution, experience shows that practicing People Analytics will improve your data quality. After all, no one cleans data just for the sake of it. However, when you have a purpose derived from a business challenge, and your priority to gain visibility into your workforce insights is high, you’ll be motivated to put the time, effort, and resources into data cleanup.
Clean and Tidy data is a milestone in your analytics project
Data preparation is a part of every analytics process in each vertical of your organization. But no one taught you that when you took analytics classes. The datasets you’ve been practicing were perfect and ready for analysis. In real life, you don’t get readymade, tidy data. It is a milestone in your analytics project, but you must get there yourself. From my experience in many quantitative types of research in organizations, this is an effective way of engaging with your data, understanding it, and reaching the most in-depth acquaintance with it.
Systematic errors lead to new procedures for data maintenance
Right after the crucial stage of transforming a business question into the hypothesis that comprises your analytics project and spotting relevant data sources, you pull and merge datasets and start exploring them. The exploration phase starts with finding gaps in data quality and fixing them. You will indeed find systematic errors. However, it will eventually lead you to propose or come up with new procedures to maintain data integrity and new configuration of HR platforms that your HR department will embrace later to prevent or continuously reduce errors.
Therefore, you should not be intimidated by the entire data lake of the HR department. Instead, focus on cleaning the datasets of your analytics project. When I meet HR professionals that have analytics questions on their minds, I am confident that there is a higher chance that they will find a way to improve data quality. So, the win is double! You gain less burden in cleaning up only the data you need, but HR data quality is constantly changing for the better!
How far should you go in data cleaning?
If you reached this point, you are probably raising two additional questions in your heart: how far should you go? And how to do it the right way? The first question is hard to answer; therefore, my obvious answer as a consultant would be: “It depends.” In workforce analytics, data accuracy may be more critical in some cases, e.g., when predicting personal outcomes. However, there are many cases in which you study groups and deal with means as estimates. In such cases, you can handle the inaccuracy and make the appropriate notes and reservations about it. The expectations about data quality should be an ongoing discussion that depends on the context of the data usage. The data quality definition will include accuracy, completeness, consistency, integrity, reasonability, timeliness, deduplication, validity, and more. Again, it is based on the context of your analysis.
The second question, regarding the best practice of data cleaning, is easy to answer. I incorporated the data cleaning best practice into my workshops to teach you how to fix your data and ensure your data handling is well documented and your analytics project is reproducible.