So, your HR data is dirty. It includes issues such as missing values in people’s information, typos that corrupt categorical variables, wrong labeling, duplicate records, errors, or records that you neglect of updating. And so, you live with that. You accept the inevitable reality that you’ll never have the perfect data in HR. Sadly, the messy data in HR is sometimes an excuse to avoid People Analytics projects, and hence, missing the opportunity to impact.
Start with imperfect data assets
But what if you start with what you have, i.e., your imperfect data assets? Your data quality must not be a barrier to a project. On the contrary! An analytics project is a practical step towards a better and cleaner data. No matter if you take a DIY (do it yourself) approach, or implement a people analytics solution, experience shows that practicing People Analytics will improve your data quality. After all, no one cleans data just for the sake of it. However, when you have a purpose, derived from a business challenge, and when your priority to gain visibility into your workforce insights is high, you’ll be motivated to put the time, efforts, and resources into data cleanup.
Clean and Tidy data is a milestone in your analytics project
Data preparation is a part of every analytics process in each vertical in your organization. But no one taught you that when you took analytics classes. The datasets that you’ve been practicing on were perfect and ready for analysis. In real life, you don’t get a readymade tidy data. This is a milestone in your analytics project, but you need to get there yourself. From my experience in many quantitative types of research in organizations, this is an effective way of being engaged with your data, understanding it, and reaching the most in-depth acquaintance with it.
Systematic errors lead to new procedures of data maintenance
Right after the crucial stage of transforming a business question into the hypothesis that comprises your analytics project, and spotting relevant data sources you pull and merge datasets, and start exploring it. The exploration phase starts with finding gaps in data quality and fixing them. You will surely find systematic errors. It will lead you eventually to propose or come up with new procedures to maintain data integrity and new configuration of HR platforms that your HR department will embrace later to prevents or reduce errors continuously.
Therefore, you should not be intimidated by the entire data lake of the HR department. Focus on cleaning the datasets of your analytics project. When I meet HR professionals that have analytics questions on their minds, I am confident that there is a higher chance that they will find a way to improve data quality. So, the win is double! You gain less burden in cleaning up only the data you need, but HR data quality is on a constant change and for better!
How far should you go in data cleaning?
If you reached this point, you are probably raising two additional questions in your heart: how far should you go? And how to do it the right way? The first question is hard to answer; therefore, my obvious answer as a consultant would be: “It depends.” In the realm of workforce analytics, data accuracy may be more critical in some cases, e.g., when predicting personal outcomes. However, there are many cases in which you study groups and deal with means as estimates. In such cases, you can handle the inaccuracy and make the appropriate notes and reservations about it. The expectations about data quality should be an ongoing discussion that depends on the context of the data usage. The definition of data quality will include dimensions such as accuracy, completeness, consistency, integrity, reasonability, timeliness, deduplication, validity, and so forth, based on the context of your analysis.
The second question, regarding the best practice of data cleaning, is easy to answer. In fact, I incorporated the data cleaning best practice into a new workshop, to teach you how to fix your data, and make sure your data handling is well documented and your analytics project is reproducible.