In my previous article, Finding Hidden Patterns In Gender Pay Gap Data, I demonstrated that if you go beyond reporting to exploring, a single additional variable you add to your analysis reveals new insight. In this article, I continue the exploration to find more hidden patterns. The following analysis is based on my workshop at the People Analytics World in April 2022. (Also, read my list of public speaking past engagements).
In the workshop, I simulated working with a data scientist. Before diving deeper into the data, here’s a quick review of the previously shared analysis (find the code on my GitHub profile). The simulation was based on open data that I anonymized and randomized. It included details about occupation, seniority, gender, mode of employment, and salary. The output presented women’s and men’s annual wages, and we noticed a slight difference between men and women, in general, and in a subset of gender-diverse roles. Then, we explored the interactions of variables by adding only one additional variable, tenure, and found that while both genders start at a similar earning level and are promoted while gaining tenure, men are promoted at higher rates.
Tenure as a continuous variable instead of categorical
Let’s take a more advanced point of view. When adding a single variable to the gender pay gap analysis, an alternative is using tenure as a continuous variable instead of creating tenure groups. When I previously added tenure to the gender pay gap analysis, I created tenure categories and ran ANOVA. But sometimes, continuous variables work better than categorical variables. How different are the results in this case? Will I get different results if I run linear regression instead of ANOVA?
As you recall from learning statistics fundamentals, Linear Regression is an algorithm or statistical procedure to draw an optimized straight line between two or more variables. It helps us better understand the relationship between the variables and make predictions. When visualizing the continuous option, the insight is pretty the same. As years of tenure go by, the gap between genders gets larger.
Swapping between ANOVA and linear regression
What’s beyond visualization? Can we swap between ANOVA and Linear regression? Actually, yes! Let me explain why. In both options, we have a dependent variable, the annual salary, and data reveals that it is affected by tenure and gender together. To be more formal, we can say that the effect of one explanatory variable (the tenure) on the expected response (the annual salary) changes depending on the value of another explanatory variable (the gender). Therefore, these variables interact. If you run Regression or ANOVA, you’ll get the same results about interaction, but it will be different in the language of statistics.
Statistical analysis software, or open-source code, allows you to run ANOVAs and linear regressions in different procedures. In the ANOVA model, the predictors are often called factors, but we can call them predictors like the regression case. ANOVA assumes that all the predictors are categorical, therefore, have a limited number of values, which the software creates. Then, their effect is added to the outcome’s general or grand mean. This grand mean is essentially the same as the intercept in the regression model, to which all predictors are added.
You don’t have to dig deeper into the statistics formalities. All you need to know is there are differences in the workflow of conducting regressions and ANOVAs, including how we code the predictors into our data. This continuous or categorical coding changes the output we present but not the basic phenomenon we explore. So feel free to use whatever approach you prefer.
Mode of employment as an additional categorical variable
I want to proceed with tenure as a continuous variable and add another categorical variable: full-time versus part-time jobs. I visualized it by transforming data points related to part-time jobs into more prominent dots. The chart reveals that the compensation growth rate with tenure is lower for part-time employees. Moreover, it is clear from the chart that most of those part-time dots belong to women. And indeed, analyzing part-time jobs by gender reveals that 18% of women vs. 3% of men work part-time.
What would happen to the linear relationship between compensation and tenure among genders if we split it based on the type of assignment – full-time vs. part-time? The lines of genders who work full-time got somehow closer. So, the hidden pattern we thought we discovered looks now somewhat different.
We can support these findings with multiple regression, a statistical technique that enables us to analyze the relationship between a single dependent variable and several independent variables. Alternatively, we can use three-way ANOVA, in which three separate variables (gender, tenure, and assignment) affect the outcome (salary). You can follow the code if you’d like to receive both types of analysis outputs. The bottom line is that gender is not the only predictor interacting with tenure. It is also interacting with assignments, full-time and part-time. As they gain tenure, the accumulative gap between men and women throughout their careers may stem from their full- and part-time positions. An intervention to close the pay gap should take into account these insights.