Littal Shemer Haim

People Analytics, HR Data Strategy, Organizational Research – Consultant, Mentor, Speaker, Influencer

HR data cleaning is part of your People Analytics Journey

An analytics project starts with imperfect data assets. Clean and tidy data is a milestone in your analytics project, but the systematic errors you find lead to new procedures of data maintenance.
Photography by Littal Shemer Haim ©
(Reading Time: 3 minutes)

So, your HR data is dirty. It includes issues such as missing values in people’s information, typos that corrupt categorical variables, wrong labeling, duplicate records, errors, or records that you neglect of updating. And so, you live with that. You accept the inevitable reality that you’ll never have the perfect data in HR. Sadly, the messy data in HR is sometimes an excuse to avoid People Analytics projects, and hence, missing the opportunity to impact.

Start with imperfect data assets

But what if you start with what you have, i.e., your imperfect data assets? Your data quality must not be a barrier to a project. On the contrary! An analytics project is a practical step towards a better and cleaner data. No matter if you take a DIY (do it yourself) approach, or implement a people analytics solution, experience shows that practicing People Analytics will improve your data quality. After all, no one cleans data just for the sake of it. However, when you have a purpose, derived from a business challenge, and when your priority to gain visibility into your workforce insights is high, you’ll be motivated to put the time, efforts, and resources into data cleanup.

Clean and Tidy data is a milestone in your analytics project

Data preparation is a part of every analytics process in each vertical in your organization. But no one taught you that when you took analytics classes. The datasets that you’ve been practicing on were perfect and ready for analysis. In real life, you don’t get a readymade tidy data. This is a milestone in your analytics project, but you need to get there yourself. From my experience in many quantitative types of research in organizations, this is an effective way of being engaged with your data, understanding it, and reaching the most in-depth acquaintance with it.

Systematic errors lead to new procedures of data maintenance

Right after the crucial stage of transforming a business question into the hypothesis that comprises your analytics project, and spotting relevant data sources you pull and merge datasets, and start exploring it. The exploration phase starts with finding gaps in data quality and fixing them. You will surely find systematic errors. It will lead you eventually to propose or come up with new procedures to maintain data integrity and new configuration of HR platforms that your HR department will embrace later to prevents or reduce errors continuously.

Therefore, you should not be intimidated by the entire data lake of the HR department. Focus on cleaning the datasets of your analytics project. When I meet HR professionals that have analytics questions on their minds, I am confident that there is a higher chance that they will find a way to improve data quality. So, the win is double! You gain less burden in cleaning up only the data you need, but HR data quality is on a constant change and for better!

How far should you go in data cleaning?

If you reached this point, you are probably raising two additional questions in your heart: how far should you go? And how to do it the right way? The first question is hard to answer; therefore, my obvious answer as a consultant would be: “It depends.” In the realm of workforce analytics, data accuracy may be more critical in some cases, e.g., when predicting personal outcomes. However, there are many cases in which you study groups and deal with means as estimates. In such cases, you can handle the inaccuracy and make the appropriate notes and reservations about it. The expectations about data quality should be an ongoing discussion that depends on the context of the data usage. The definition of data quality will include dimensions such as accuracy, completeness, consistency, integrity, reasonability, timeliness, deduplication, validity, and so forth, based on the context of your analysis.

The second question, regarding the best practice of data cleaning, is easy to answer. In fact, I incorporated the data cleaning best practice into a new workshop, to teach you how to fix your data, and make sure your data handling is well documented and your analytics project is reproducible.

Littal Shemer Haim

Littal Shemer Haim

Littal Shemer Haim brings Data Science into HR activities, to guide organizations to base decision-making about people on data. Her vast experience in applied research, keen usage of statistical modeling, constant exposure to new technologies, and genuine interest in people’s lives, all led her to focus nowadays on HR Data Strategy, People Analytics, and Organizational Research.

+60 Articles

Employee Lifetime Value

The Employee lifetime value is a scheme that connects the people processes to the business outcomes. It refers to the...

Leave a Comment

Your email address will not be published.

Thoughts and ideas related to People, Data, Work, and Ethics.

Your browser doesn't support the HTML5 CANVAS tag.

Join many People Analytics enthusiasts, and get new and featured articles, delivered weekly straight to your inbox!


The People Analytics Journey, an introductory course for HR professionals, covers real-world use cases of analytics and enables them to be familiar with data science terms and competencies.

All images and texts on this website are copyrighted © Littal Shemer Haim ALL RIGHTS RESERVED

Stay Tuned!

Subscribe and get notified about featured articles weekly