Littal Shemer Haim

People Analytics, HR Data Strategy, Organizational Research – Consultant, Mentor, Speaker, Influencer

Gender Pay Gap: More Hidden Patterns

Littal Shemer Haim
August 3, 2022
3:07 pm

Simulating work with a data scientist in HR and People Analytics use case: The gender pay gap. Analyzing continuous variables instead of categorical variables, swapping between ANOVA and Linear Regression, and additional insights based on actual data.

Photography by Littal Shemer Haim ©

(Reading Time: 5 minutes)

In my previous article, Finding Hidden Patterns In Gender Pay Gap Data, I demonstrated that if you go beyond reporting to exploring, a single additional variable you add to your analysis reveals new insight. In this article, I continue the exploration to find more hidden patterns. The following analysis is based on my workshop at the People Analytics World in April 2022. (Also, read my list of public speaking past engagements).

In the workshop, I simulated working with a data scientist. Before diving deeper into the data, here’s a quick review of the previously shared analysis (find the code on my GitHub profile). The simulation was based on open data that I anonymized and randomized. It included details about occupation, seniority, gender, mode of employment, and salary. The output presented women’s and men’s annual wages, and we noticed a slight difference between men and women, in general, and in a subset of gender-diverse roles. Then, we explored the interactions of variables by adding only one additional variable, tenure, and found that while both genders start at a similar earning level and are promoted while gaining tenure, men are promoted at higher rates.

Tenure as a continuous variable instead of categorical

Let’s take a more advanced point of view. When adding a single variable to the gender pay gap analysis, an alternative is using tenure as a continuous variable instead of creating tenure groups. When I previously added tenure to the gender pay gap analysis, I created tenure categories and ran ANOVA. But sometimes, continuous variables work better than categorical variables. How different are the results in this case? Will I get different results if I run linear regression instead of ANOVA?

As you recall from learning statistics fundamentals, Linear Regression is an algorithm or statistical procedure to draw an optimized straight line between two or more variables. It helps us better understand the relationship between the variables and make predictions. When visualizing the continuous option, the insight is pretty the same. As years of tenure go by, the gap between genders gets larger.

Swapping between ANOVA and linear regression

What’s beyond visualization? Can we swap between ANOVA and Linear regression? Actually, yes! Let me explain why. In both options, we have a dependent variable, the annual salary, and data reveals that it is affected by tenure and gender together. To be more formal, we can say that the effect of one explanatory variable (the tenure) on the expected response (the annual salary) changes depending on the value of another explanatory variable (the gender). Therefore, these variables interact. If you run Regression or ANOVA, you’ll get the same results about interaction, but it will be different in the language of statistics.

Statistical analysis software, or open-source code, allows you to run ANOVAs and linear regressions in different procedures. In the ANOVA model, the predictors are often called factors, but we can call them predictors like the regression case. ANOVA assumes that all the predictors are categorical, therefore, have a limited number of values, which the software creates. Then, their effect is added to the outcome’s general or grand mean. This grand mean is essentially the same as the intercept in the regression model, to which all predictors are added.

You don’t have to dig deeper into the statistics formalities. All you need to know is there are differences in the workflow of conducting regressions and ANOVAs, including how we code the predictors into our data. This continuous or categorical coding changes the output we present but not the basic phenomenon we explore. So feel free to use whatever approach you prefer.

Mode of employment as an additional categorical variable

I want to proceed with tenure as a continuous variable and add another categorical variable: full-time versus part-time jobs. I visualized it by transforming data points related to part-time jobs into more prominent dots. The chart reveals that the compensation growth rate with tenure is lower for part-time employees. Moreover, it is clear from the chart that most of those part-time dots belong to women. And indeed, analyzing part-time jobs by gender reveals that 18% of women vs. 3% of men work part-time.

What would happen to the linear relationship between compensation and tenure among genders if we split it based on the type of assignment – full-time vs. part-time? The lines of genders who work full-time got somehow closer. So, the hidden pattern we thought we discovered looks now somewhat different.

We can support these findings with multiple regression, a statistical technique that enables us to analyze the relationship between a single dependent variable and several independent variables. Alternatively, we can use three-way ANOVA, in which three separate variables (gender, tenure, and assignment) affect the outcome (salary). You can follow the code if you’d like to receive both types of analysis outputs. The bottom line is that gender is not the only predictor interacting with tenure. It is also interacting with assignments, full-time and part-time. As they gain tenure, the accumulative gap between men and women throughout their careers may stem from their full- and part-time positions. An intervention to close the pay gap should take into account these insights.

Additional insights on the Gender Pay Gap based on your actual data

The dataset that I used in the workshop is quite limited. However, assuming you integrate compensation data and additional datasets covering performance reviews, promotions, and internal mobility, you would be able to explore, across different roles, how women are compensated and promoted in comparison to men. You can also study biases in how women are evaluated compared to men. For example, are they perceived differently regarding performance, self-management, relationships, and potential leadership?

Furthermore, what is the correlation between yearly reviews and promotions for the two genders? Such analysis in actual organizations may reveal differences between genders in such perceptions. For example, do men and women who received similar reviews get a similar promotion? Such a question can undoubtedly shed more light on what’s happening during work tenure, explaining the growing compensation gap between genders. They can build a story and a business case for intervention.

But as you torture your data for additional insight, here’s an important warning: Sometimes, a statistical relationship that you explored within the entire workforce in your organization could be reversed or be different within subgroups, e.g., gender, tenure, or assignment type groups. It is known as Simpson’s Paradox. So, my recommendation when you lead the conversation with your data scientist is always to raise hypotheses to explore the statistical relationship among subgroups.

Littal Shemer Haim

Littal Shemer Haim brings Data Science into HR activities, to guide organizations to base decision-making about people on data. Her vast experience in applied research, keen usage of statistical modeling, constant exposure to new technologies, and genuine interest in people’s lives, all led her to focus nowadays on HR Data Strategy, People Analytics, and Organizational Research.

+60 Articles

Employees in the big data era: Will you let robots determine your future at work?

Employees and candidates will judge employers, in addition to Employee Experience perceptions, by employer ethics in data management, and when...

Leveraging workforce data as it was a state security project

An interview about People Analytics with a Lieutenant Colonel in the Israeli Military intelligence - A rare chance to explore...

Leading With Data – Experts Panel

I was honored to participate in the experts' panel that opened the Hacking HR online event "Leading With Data" and...

HR and Tech Evangelists in HackingHR Manhattan

My experience and key takeaways from Manhattan chapter of Hacking-HR, a professional community event where I had the opportunity to...

About This Blog

Thoughts and ideas related to People, Data, Work, and Ethics.

Enjoy Reading This?

Join many People Analytics enthusiasts, and get new and featured articles delivered
straight to your inbox!

Privacy Policy

"I read your blog regularly and consider you one of the experts in this field."

“Amazing content from your blogs. Inspirational, one of the key voices on people analytics.”

“One of the first ones that got into my radar regarding People Analytics! Solid knowledge!”

"You write some very thoughtful pieces that are enjoyable to read"

“The best material on how to challenge the people analytics space!"

“I enjoy reading your posts. Feels like we have a lot of common experience...”

Thanks for your great content. I started learning about People Analytics through your content.

“The future of Human Resources. Great insight!”

“Very interesting! Relevant review for us - Human Resources professionals.”

Projects

Littal brings Data Science into the workforce processes and provides executives with actionable insights to major business questions related to all stages of the employment cycle.

Lectures

Everywhere, from conferences to organizations to academies, Littal speaks on People Analytics, Workforce Data Strategy, AI Ethics, Data-Driven Careers, and more.

Mentoring

Littal supports executives' re-skilling and up-skilling to leverage data science and work-tech solutions in improving business performance by informed decisions about people.

Scouting

Littal's ongoing analysis of the work-tech industry enables executives to make better procurement decisions, and more ethical, for the people operations and analytics function in their organization.

course

The People Analytics Journey, an introductory course for HR professionals, covers real-world use cases of analytics and enables them to be familiar with data science terms and competencies.

Book

A sneak peek into a book that offers sixteen lessons, organized into four milestones that, from the author’s experience, build the People Analytics value chain.