Educating and mentoring HR professionals to embrace the practices of People Analytics is a challenge. There are barriers, and it takes time and effort to overcome them. However, one issue remained unsolved for years: The lack of open HR data to practice on. Although there are many inspiring case studies of People Analytics, obviously, organizations don’t share their people data for the sake of learning. Simulation-based data may be an alternative, though usually it is oversimplified and lacks real or interesting patterns to explore.
A Practice with Open Data
In my recent teaching initiatives, e.g., the People Analytics session in Lahav Executive Education at the University of Tel Aviv, I wanted to demonstrate HR managers that their academic background, professional experience, and their common sense, is enough for exploring organizational occurrences and effects based on data. HR managers don’t have to become data scientists in order to conduct People Analytics projects. But they do need to communicate with Data Scientists, bring them business questions to study, and request research outputs. For that reason, I constantly search for open HR data and use it in learning sessions. Fortunately, I could present a case study of Gender Equality, that theoretically and methodological was based on a real project, but the analytics part was conducted on open data that was offered by other organizations.
For the Analysts and Data Science enthusiasts among my readers, it is worth mentioning that although it is not the first time I demonstrate People Analytics practices based on open data, this time my objective is a bit different. I did not use practical Machine Learning in this case study. The analysis process was based on research methodology and Statistics that a Bachelor of Social Science, i.e., someone with a B.A. degree, should understand and can comfortably communicate. Nevertheless, I used R for my analysis, because I believe that HR people who may not have learned or used R and manage to receive analytics from an inner supplier or an outsource service, should have a grasp on how a desktop of a Data Scientist looks like, and what in the functionality of R Studio makes it so popular.
My source and inspiration for the dataset was Montgomery County Maryland’s employee salaries in 2017. The open data included annual salary information such as gross pay and overtime pay for all active, permanent employees, and some demographics. The reason for opening this dataset to the public is the Digital Government Strategy of Montgomery County Maryland which aims to serve residents, employees, and other partners better. In this case, it serves the purpose of education, in an open-source community of People Analytics students, professionals, and enthusiasts. However, the dataset used is anonymized and randomized.
Gender Pay Gap
Pay transparency is among Global Talent Trends in 2019, according to LinkedIn. But “Transparency isn’t the goal. The goal is paying everyone fairly”, as Anil Dash, CEO at Glitch was wisely quoted in the report. Transparency forces Organizations to make sure they keep the compensation balanced across genders and other groups’ characteristics. Although people share salaries on sites like Glassdoor and LinkedIn, only 27% of companies are transparent about pay. The first step to establishing pay transparency, as recommended in LinkedIn’s report, is to conduct an internal audit, and explore how the company’s pay compares to competitors and whether it has a major pay gap across gender, race, and those in similar roles. If significant inequities are found, a detailed plan to fix them is recommended.
A pay gap audit or exploration may be a People Analyst’s task. However, in the People Analytics project, descriptive statistics is not enough. We need to go deeper into understanding the reasons for our findings and the directions for a solution. In the following analysis, I included some diagnostics and Inferential Statistics, to understand the reasons for the patterns in pay data. I assumed that as any American public organization, Montgomery County Maryland is subjected to some kind of strict regulation regarding equal pay. But only going beyond the basic descriptive statistics enabled me to find some interesting patterns. So, without further ado, let’s explore the findings.
Gender Pay in Montgomery County Maryland
“Telling a story with data” is almost a cliché in our field. Nevertheless, there is no substitute for the exploration of data visually, before moving on to test the hypothesis. There are plenty of visual tools out there. The great thing about R, however, apart from its price (free!), is the flexibility it enables in creating the story and reproduce it again and again as the data is updated. In the following description of my analysis, I did not explain every term in statistics, since I assume the readers learned them on their undergraduate studies. But “no one remembers”, right? So, the links in every statistical term may walk you through a “memory refreshment experience”, if you choose to follow them.
I started my exploration, as shown in Figure 1, with the pay distributions. I intended to present, in a single slide, both common and separated gender pay distributions. I also wanted to explore both indications for center and dispersion, without losing information about outliers. So, I placed a boxplot near a histogram with a density plot and ordered the genders vertically, one on the top of the other, so the comparison would be easy for the bare eye.
If you look closely in Figure 1, you’ll notice a little difference between men and women, both in the deviation of histograms from the shared distribution, i.e., that normal approximation curve, and the center of the boxplot, which represent the median. Running t-test resulted in a p-value below 0.05, which means that on average, the pay differences between men and women are statistically significant. This significant result is impacted by a large number of cases in the dataset (about 9400 employees). The average yearly pay gap is about 4.5k US$. (I repeated the visualization and t-tests for all pay variables I had in my dataset, but for the purpose of simplicity, let’s remain with only one variable).
Figure 1: Gender Pay Distributions
Obviously, the average pay gap is not the whole story. Additional variables should be added, to deeply understand the source of the gap. Adding background variables, e.g., full vs. part-time job and tenure may change the story. For the analysis presented in Figure 2, I had to create new variables based on the raw data. I mention it because it is important to take into consideration that, usually, the data you download from your systems won’t be ready for analysis. A significant part of the Data Scientist time will be invested in cleaning, mounting, and preparing the data for the analysis.
Exploring gender pay averages across tenure ranges reveals that while both genders are promoted while gaining tenure, men are promoted with higher rates, as the different slope indicates. Running ANOVA reveals that the interaction between the gender and tenure variables is significant, meaning that the different slopes are not a random occurrence. Such interaction was not found between gender and full/part-time. However, we do witness full-time employees promoted at a higher rate, in comparison to part-time employees, as slops indicate. This interaction, between full/part-time and tenure, is also significant.
Figure 2: Gender effect, Tenure effect, Full/part-time effect
But who holds most of the part-time jobs? Apparently, the proportion of part-time employees in Montgomery County Maryland is significantly higher among women (18%), in comparison to men (3%). In other words, the accumulative gap between men and women throughout their careers, as they gain tenure, may stem from their assignment in full and part-time jobs. In a Linear regression model that explains the annual salary by gender, assignment, and tenure, the gender is not a significant predictor, as opposed to the other variables: tenure and assignment. Together these variables explain 37% of the variance of annual pay, which is a fair result, but still, other factors impact it too. Positions and occupations may be among those factors.
Indeed, a critical reader may raise a question about the male’s and female’s occupation. The dataset includes some occupations with both genders and other occupations with only men or women. I repeated the whole analysis after screening out those male and female occupations, and I got similar results. Yes, analysis within each occupation is also needed. However, there are 390 occupations in this dataset, so I prefer to leave this task to People Analysts in Montgomery County Maryland. (For dynamic charts of this case study, by departments for example, please visit my GitHub)
The gender pay gap analysis in this article is straightforward. Most HR managers with a B.A. education can handle it, with a little help from a data scientist on some occasions. I encourage HR practitioners who start their journey in People Analytics to practice this analysis. The data is available, and the insights may be vital. According to Gartner’s Digital Employee Experience Survey in 2018, #1 in the top ten memorable experiences that affect employee experience is “Being discriminated against at work”. No doubt that transparency and closing the pay gap is crucial for employee engagement and indirectly to employer branding.
My last note may be the most important. Women still don’t get their fair share, according to an analysis by Visier. Data from this People Analytics platform reveals that the gender pay gap widened in 2017 rather than becoming smaller: In 2016, women made 81 cents to the dollar a man-made, but in 2017, women made 78 cents to the dollar, according to Visier data. Organizations still have a long way to go to close the gender pay gap, so why don’t you start by analyzing the situation in your organization?
(To explore the R code used in this article, check my GitHub).