Littal Shemer Haim

People Analytics, HR Data Strategy, Organizational Research – Consultant, Mentor, Speaker, Influencer

Finding Hidden Patterns in Gender Pay Gap Data

Why are we failing to see the hidden patterns in the gender pay gap? How can HR professionals work better with data scientists to spot hidden patterns? How can we generalize this case study for up-skilling and re-skilling in critical thinking and an analytical mindset?
Photography by Littal Shemer Haim ©
(Reading Time: 9 minutes)

Are there hidden patterns in the gender pay gap? The gap exists despite regulation, hype, and preoccupation with the subject. The extra mile for HR and People Analytics professionals on this topic is related to analytics skills and critical thinking. I’ll bind the two ends, presenting a short case study with practical advice and an opportunity to challenge and use your critical thinking. (The article was based on my lecture at the People Analytics World, April 2022, where I offered out-of-the-box thinking on this traditionally unsolved issue. Also, read my list of Public Speaking)

In this article, I’ll answer four questions. First, why are we failing to see the hidden patterns in the gender pay gap? Second, what are some hidden patterns based on data? Third, how can HR professionals work better with data scientists to spot the hidden patterns? And lastly, how can we generalize this case study for up-skilling and re-skilling in critical thinking and an analytical mindset?

Why are we failing to see the hidden patterns in the gender pay gap?

I’m sure you heard a lot about the gender pay gap, at least in the media in your countries. For example, in Israel, my country, although employers must report the gender pay, data reveals that for 1 NIS a man earns, a woman makes 68 cents. Why is that happening? What can we do about it?

As a citizen and a professional in People Analytics, I consider the gender pay gap more than a compliance issue. It’s a mission to help women succeed at work and in life, which can influence families, communities, and society.

I believe this is a common goal or aspiration for most of you and many HR professionals. However, I’m not sure we necessarily share a common perspective on how to start making an impact.

My perspective comes from data and analytics. The data in your organization is where you should start, shed light on the current situation, understand its factors, direct your intervention, and guarantee that your insights are discussed in a broader context of the business and workforce markets.

However, many HR professionals start elsewhere – with programs, being confident about the organizational development point of view. Even when data is their starting point, it often takes the form of reporting and not exploring.

What is the difference between reporting and exploring? To explore data, you must have an analytical mindset. It enables you to analyze information and identify patterns in the data to solve problems. You use your curiosity by asking the question, “why?”.

I don’t expect HR professionals to become data scientists and run advanced statistics to identify patterns in the data. Still, I’m sure that being a better inner client of data professionals and solutions is essential, and a key to your success is asking “why?”. It will enable you to tell a clear story, impact any topic related to people, track improvement and progress, and certainly contribute to closing the gender pay gap.

What are some hidden patterns based on data?  

Let’s take the extra mile beyond reporting and dashboarding on the one hand, and don’t jump to recommended interventions and programs on the other hand. Instead, we will focus on exploration.

What is exactly that extra mile beyond reporting and dashboarding? Dashboards enable us to present different metrics and KPIs and answer the questions: Did we reach our goals? How far are we from achieving our goals? However, by using dashboards, we can’t answer the question: Why? Instead, we need to analyze the factors that drive those KPIs presented on our dashboards.

Regarding the gender pay gap, we will explore the data beyond finding differences between men and women in compensation. Instead, we will ask “why?” to explore how those differences occurred, implying what we should do about it.

While reading the rest of this article, I suggest you imagine that you are in a meeting with a data scientist that supports your work as an HR leader. Let’s assume that your academic background is not higher than a bachelor’s degree in social science. Trust me, your common sense and curiosity are good enough to lead the conversation. Don’t worry. This short case study has no advanced analytics, only statistics suited to your hypothetical background. (But if you’d like to explore the R code that generated the following visualizations and results, visit my GitHub profile).

My source and inspiration for this case study was a dataset of employee salaries in a municipal authority organization. For public transparency, this organization shared a few years ago its dataset, which contained almost 10 thousand records. The open data included annual salary information and some demographics. Although People Analytics in your organization probably involves the integration of additional data sources into such analysis, from different platforms, like recruitment, learning, and performance, this simple dataset is sufficient for our purpose.

First, I created an anonymized and randomized version of the dataset. So, it would be impossible to point to individuals or even recognize the organization from the following findings. But I guarantee that the dataset I used is realistic. Then, I ran some Inferential Statistics. I used only binary gender categories in the analysis, men vs. women, since that was a classification in the dataset. Some organizations, however, may use more gender categories, but that would be beyond our scope. Of course, like any public organization, I assumed that the contributing organization was subjected to strict regulations regarding equal pay. But only going beyond the basic comparison between men and women enabled me to spot other patterns and reach some insights. So, without further ado, let’s explore the findings.

Do men and women on average earn the same in this public sector organization? Well, almost. Women’s and men’s annual salaries were 73K vs. 77K dollars per year. So, for every 1$ a man in this organization earns, a woman makes 95 cents. It is not a huge gap, at least not compared to the pay gap reported in my country. But it is worth exploring the annual salary distributions.

When you explore only the averages, you lose information, e.g., outliers. Therefore, we want to explore both indications for center and dispersion of the earning distribution without losing information. So, I placed here two visualizations, a boxplot near a histogram with a density plot, and set the genders vertically, one on top of the other, so that the comparison would be easy for the bare eye.

You typically won’t see such charts on your dashboards, but this is a common way to start your exploration of the data, so I suggest you get to know and leverage these visualizations. Notice a slight difference between men and women in the deviation of histograms on the left from the shared normal approximation curve. Which gender deviates at the lower part of the distribution or the higher part? I bet you can see the pattern. Also, look at the boxplot’s centers, which represent the medians. We’ll further examine the sources of variance in salaries to understand how men earn more than women.

But before we do so, if you were leading the conversation with a data scientist, how would you criticize these numbers? You would probably raise a question about male and female occupations. The dataset includes some roles with both genders and other positions held by only men or women.

After screening out those male and female occupations, I repeated the analysis and got similar results. The pay gap only slightly increased, to 72K vs. 78K respectively of women vs. men earnings. For gender-diverse roles, women make 92 cents for 1$ men do. In your analysis, you should explore each diverse role and sort roles by gender pay gap to report where the gap is higher. Since we have a few hundred occupations in the dataset, this would be beyond our scope.

If you report the gender pay gap using a dashboard, you may slice the annual salaries of genders by age, tenure, and additional demographics. However, your dashboard slicer won’t point to the interactions of variables. An interaction may arise when you explore the relationship between more than two variables. The effect of one causal variable on an outcome depends on the state of a second causal variable. Do we have such interaction in our dataset? And if we do, what would it tell us about the causes of the gender pay gap?

Let’s explore gender with only one additional variable. What would be your first variable of choice from background variables and demographics? My choice was tenure.

Exploring gender pay averages across tenure ranges reveals that while both genders start at a similar earning level and are promoted while gaining tenure, men are promoted at higher rates, as the different slope indicates. In addition, a statistical procedure called ANOVA analysis of variance (that you may recall from your fundamental statistics learning) reveals that the interaction between gender and tenure variables is significant, meaning that the slopes are not random.

Interestingly, when filtering the diverse roles, we can see that the gap is even more comprehensive as years go by. So, in this specific dataset, it is clear that some explanation of the gender pay gap is related to things that happen along with the careers in this organization. Any intervention should consider something that happens along the way. And we found this hidden pattern only by adding a single variable and analyzing it using multi-variate statistics.

What if we add more variables? What if we use predictive analytics? What would we learn? Let me give you some clues on further exploration.

Assume you integrate compensation data and additional datasets covering performance reviews, promotions, and internal mobility. You would be able to explore, across different roles, how women are compensated and promoted in comparison to men. Additionally, you can study biases in the way women are evaluated compared to men. For example, are they perceived differently regarding performance, self-management, relationships, and potential leadership?

Furthermore, What is the correlation between yearly reviews and promotions for the two genders? Conducting such analysis in other real organizations revealed differences between genders in such perceptions. Do men and women who received similar reviews get a similar promotion? These questions can undoubtedly shed more light on what’s happening during work tenure, explaining the growing compensation gap between genders. They can build a story and a business case for intervention.

HR professionals can work better with data scientists to spot hidden patterns

The practice of data science is multidisciplinary. It encompasses three general skills – the business domain of expertise, statistical modeling, and hacking skills. Therefore, a crucial part of your challenge in People Analytics is the effort to establish communication between different professionals who hold different skills.

You heard a lot about the People Analytics journey that enables HR professionals to become more strategic because they speak the language of the business and impact using the right questions and insights derived from people’s data. But they can support decision-making only when they communicate those questions to data scientists.

I encourage you to be proactive in your conversation with the data scientist that supports your work. Ask, “why?”; Suggest hypotheses; Challenge explanations, and offer alternative descriptions that the data scientist can confirm and disprove. As a domain expert in human resources, organizations, and the workforce, feel free to be creative and lead the exploration of the data. Your domain expertise is invaluable in completing the data scientist skills.

Part of your role in leading and leveraging People Analytics is being a translator, enabling this communication. It would be best to make sure that the data scientists understand the business needs in workforce-related analysis. It would help if you articulated the right business questions, so the research findings yield the best storytelling with data.

Generalizing the case study for up-skilling and re-skilling in critical thinking

This case study demonstrates how you can lead the story to more impact only by adding a single variable to the analysis beyond dashboarding. Your proactivity is critical. You don’t have to be a data scientist. But leveraging an analytical mindset when working with data scientists can move the needle beyond simple metrics.

The gender pay gap is part of a broader topic of Diversity, Equality, and Inclusion. Other groups should be analyzed and handled to dismiss bias and discrimination by ethnicity, age, sexual orientation, disability, and other minorities. But to strengthen our analytical muscles, it is much easier to start with data that the organization continuously collects, as long as it pays its workforce. The pay data is usually in good shape and integrity, the topic is regulated in many countries, and it is easy to find benchmarks across industries and economies.

Do you need particular procurement of compensation software to examine the gender pay gap? Maybe this would help. But I wanted to demonstrate that you can analyze it, for free, with R, maybe with some help from a data scientist.

HR departments establish a People Analytics function with data professionals or external consultants. As people processes are analyzed, the gender pay gap can be explored across recruitment, development, and retention. We focused on pay data in this demo, but I offered some direction to more potential projects.

We also see a demand for the topic in the HR-Tech industry. But we focus on a DIY (Do It Yourself) approach. This approach may help you sharpen your analytical mindset, leverage data that you have immediately, and eventually enable you to be players in the procurement processes of People Analytics tools as you continue your journey.

I offer the DIY approach in my introductory course to People Analytics, and I encourage HR leaders to be proactive when I support them in research and data science projects. I believe that it eventually enables you to make people’s data valid for informed decisions and employee experience and improve business performance and enhance competitive advantage.

Littal Shemer Haim

Littal Shemer Haim

Littal Shemer Haim brings Data Science into HR activities, to guide organizations to base decision-making about people on data. Her vast experience in applied research, keen usage of statistical modeling, constant exposure to new technologies, and genuine interest in people’s lives, all led her to focus nowadays on HR Data Strategy, People Analytics, and Organizational Research.

+60 Articles

LLMs Changed People Analytics

Large language models (LLMs) altered how we work and create value. How can the People Analytics profession leverage these deep...

Leave a Comment

Your email address will not be published. Required fields are marked *

Thoughts and ideas related to People, Data, Work, and Ethics.

Your browser doesn't support the HTML5 CANVAS tag.

Join many People Analytics enthusiasts, and get new and featured articles delivered
straight to your inbox!


The People Analytics Journey, an introductory course for HR professionals, covers real-world use cases of analytics and enables them to be familiar with data science terms and competencies.

All images and texts on this website are copyrighted © Littal Shemer Haim ALL RIGHTS RESERVED

Stay Tuned!

Subscribe and get notified about featured articles weekly