<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Open Data Archives - Littal Shemer Haim</title>
	<atom:link href="https://www.littalics.com/category/open-data/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.littalics.com/category/open-data/</link>
	<description>People Analytics, HR Data Strategy, Organizational Research - Consultant, Mentor, Speaker, Influencer</description>
	<lastBuildDate>Thu, 14 Mar 2024 15:38:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.7.2</generator>

<image>
	<url>https://www.littalics.com/wp-content/uploads/2021/02/cropped-grey-32x32.png</url>
	<title>Open Data Archives - Littal Shemer Haim</title>
	<link>https://www.littalics.com/category/open-data/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Gender Pay Gap and People Analytics: A Practice with Open Data</title>
		<link>https://www.littalics.com/gender-pay-gap-and-people-analytics-a-practice-with-open-data/</link>
					<comments>https://www.littalics.com/gender-pay-gap-and-people-analytics-a-practice-with-open-data/#comments</comments>
		
		<dc:creator><![CDATA[Littal Shemer Haim]]></dc:creator>
		<pubDate>Thu, 31 Jan 2019 16:54:32 +0000</pubDate>
				<category><![CDATA[Module 3]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[People Analytics]]></category>
		<category><![CDATA[Syllabus]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[case study]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[gender]]></category>
		<category><![CDATA[simulation]]></category>
		<guid isPermaLink="false">http://www.littalshemerhaim.com/?p=1476</guid>

					<description><![CDATA[<p>The gender pay gap analysis in this article is straightforward. HR managers with a B.A. education can handle it, with a little help from a data scientist. I encourage HR practitioners who start their journey in People Analytics to practice it. The data is available, and the insights may be vital.</p>
<p>The post <a href="https://www.littalics.com/gender-pay-gap-and-people-analytics-a-practice-with-open-data/">Gender Pay Gap and People Analytics: A Practice with Open Data</a> appeared first on <a href="https://www.littalics.com">Littal Shemer Haim</a>.</p>
]]></description>
										<content:encoded><![CDATA[<span class="span-reading-time rt-reading-time" style="display: block;"><span class="rt-label rt-prefix">(Reading Time: </span> <span class="rt-time"> 7</span> <span class="rt-label rt-postfix">minutes)</span></span>		<div data-elementor-type="wp-post" data-elementor-id="1476" class="elementor elementor-1476" data-elementor-post-type="post">
						<section class="elementor-section elementor-top-section elementor-element elementor-element-6a441d5f elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-id="6a441d5f" data-element_type="section">
						<div class="elementor-container elementor-column-gap-default">
					<div class="elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1791f12" data-id="1791f12" data-element_type="column">
			<div class="elementor-widget-wrap elementor-element-populated">
						<div class="elementor-element elementor-element-7d88621 elementor-widget elementor-widget-text-editor" data-id="7d88621" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
									<p>Educating and mentoring HR professionals to embrace the practices of People Analytics is a challenge. <b><a href="https://www.littalics.com/learning-culture-rituals-and-establishing-people-analytics/">There are barriers</a>,</b> and it takes time and effort to overcome them. However, one issue remained unsolved for years: The lack of open HR data to practice on. Although there are many inspiring case studies of People Analytics, obviously, organizations don&#8217;t share their people data for the sake of learning. Simulation-based data may be an alternative, though usually it is oversimplified and lacks real or interesting patterns to explore.<br /><br /></p><p> </p><h1><span style="font-family: var( --e-global-typography-text-font-family ), Sans-serif;"><b style="font-size: 1.66667rem;">A Practice with Open Data</b></span></h1><p><span style="font-size: 16px; color: var( --e-global-color-text ); font-family: var( --e-global-typography-text-font-family ), Sans-serif;"><br />In my </span><a style="font-size: 16px; font-family: var( --e-global-typography-text-font-family ), Sans-serif; background-color: #ffffff;" href="https://www.littalics.com/people-analytics-public-speaking-media-coverage-recognition/"><b>recent teaching initiatives</b></a><span style="font-size: 16px; color: var( --e-global-color-text ); font-family: var( --e-global-typography-text-font-family ), Sans-serif;">, e.g., the People Analytics session in Lahav Executive Education at the University of Tel Aviv, I wanted to demonstrate HR managers that their academic background, professional experience, and their common sense, is enough for exploring organizational occurrences and effects based on data. HR managers don&#8217;t have to become data scientists in order to conduct People Analytics projects. But they do need to </span><a style="font-size: 16px; font-family: var( --e-global-typography-text-font-family ), Sans-serif; background-color: #ffffff;" href="https://www.littalics.com/your-journey-to-people-analytics-makes-you-cry/"><b>communicate with Data Scientists</b></a><span style="font-size: 16px; color: var( --e-global-color-text ); font-family: var( --e-global-typography-text-font-family ), Sans-serif;">, bring them business questions to study, and request research outputs. For that reason, I constantly search for open HR data and use it in learning sessions. Fortunately, I could present a </span><a style="font-size: 16px; font-family: var( --e-global-typography-text-font-family ), Sans-serif; background-color: #ffffff;" href="https://www.littalics.com/gender-diversity-in-tech-simple-steps-forward/"><b>case study of Gender Equality</b></a><span style="font-size: 16px; color: var( --e-global-color-text ); font-family: var( --e-global-typography-text-font-family ), Sans-serif;">, that theoretically and methodological was based on a real project, but the analytics part was conducted on open data that was offered by other organizations.</span></p><p>For the Analysts and Data Science enthusiasts among my readers, it is worth mentioning that although it is not the first time I demonstrate <a href="https://www.littalics.com/predicting-employee-attrition-r-vs-dmway/"><b>People Analytics practices based on open data</b></a>, this time my objective is a bit different. I did not use practical Machine Learning in this case study. The analysis process was based on research methodology and Statistics that a Bachelor of Social Science, i.e., someone with a B.A. degree, should understand and can comfortably communicate. Nevertheless, I used R for my analysis, because I believe that HR people who may not have learned or used R and manage to receive analytics from an inner supplier or an outsource service, should have a grasp on how a desktop of a Data Scientist looks like, and what in the functionality of R Studio makes it so popular.</p><p>My source and inspiration for the dataset was <a href="https://data.montgomerycountymd.gov/Human-Resources/Employee-Salaries-2017/2qd6-mr43/data" target="_blank" rel="noopener noreferrer"><b>Montgomery County Maryland’s employee salaries</b></a> in 2017. The open data included annual salary information such as gross pay and overtime pay for all active, permanent employees, and some demographics. The reason for opening this dataset to the public is the Digital Government Strategy of Montgomery County Maryland which aims to serve residents, employees, and other partners better. In this case, it serves the purpose of education, in an <a href="https://www.littalics.com/will-people-analytics-be-open-source/"><b>open-source community of People Analytics</b></a> students, professionals, and enthusiasts. However, the dataset used is anonymized and randomized.<br /><br /></p><p> </p><h3><strong>Gender Pay Gap</strong></h3><p><br />Pay transparency is among <a href="https://business.linkedin.com/talent-solutions/recruiting-tips/global-talent-trends-2019" target="_blank" rel="noopener noreferrer"><b>Global Talent Trends in 2019</b></a>, according to LinkedIn. But &#8220;Transparency isn’t the goal. The goal is paying everyone fairly&#8221;, as Anil Dash, CEO at Glitch was wisely quoted in the report. Transparency forces Organizations to make sure they keep the compensation balanced across genders and other groups&#8217; characteristics. Although people share salaries on sites like Glassdoor and LinkedIn, only 27% of companies are transparent about pay. The first step to establishing pay transparency, as recommended in LinkedIn&#8217;s report, is to conduct an internal audit, and explore how the company&#8217;s pay compares to competitors and whether it has a major pay gap across gender, race, and those in similar roles. If significant inequities are found, a detailed plan to fix them is recommended.</p><p>A pay gap audit or exploration may be a People Analyst&#8217;s task. However, in the People Analytics project, <a href="https://www.littalics.com/hr-dashboards-are-not-people-analytics-but-you-need-both/"><b>descriptive statistics is not enough</b></a>. We need to go deeper into understanding the reasons for our findings and the directions for a solution. In the following analysis, I included some diagnostics and Inferential Statistics, to understand the reasons for the patterns in pay data. I assumed that as any American public organization, Montgomery County Maryland is subjected to some kind of strict regulation regarding equal pay. But only going beyond the basic descriptive statistics enabled me to find some interesting patterns. So, without further ado, let&#8217;s explore the findings.<br /><br /></p><h3><strong>Gender Pay in Montgomery County Maryland</strong></h3><p><br />&#8220;<a href="https://hbr.org/2013/04/how-to-tell-a-story-with-data" target="_blank" rel="noopener noreferrer"><b>Telling a story with data</b></a>&#8221; is almost a cliché in our field. Nevertheless, there is no substitute for the exploration of data visually, before moving on to test the hypothesis. There are <a href="https://www.creativebloq.com/design-tools/data-visualization-712402" target="_blank" rel="noopener noreferrer"><b>plenty of visual tools</b></a> out there. The great thing about <a href="https://www.r-project.org/"><b>R</b></a>, however, apart from its price (free!), is the flexibility it enables in creating the story and reproduce it again and again as the data is updated. In the following description of my analysis, I did not explain every term in statistics, since I assume the readers learned them on their undergraduate studies. But &#8220;no one remembers&#8221;, right? So, the links in every statistical term may walk you through a &#8220;memory refreshment experience&#8221;, if you choose to follow them. </p><p>I started my exploration, as shown in Figure 1, with the pay distributions. I intended to present, in a single slide, both common and separated gender pay distributions. I also wanted to explore both indications for center and dispersion, without losing information about outliers. So, I placed a <b><a href="https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51" target="_blank" rel="noopener noreferrer">boxplot</a> </b>near a <b><a href="https://en.wikipedia.org/wiki/Histogram" target="_blank" rel="noopener noreferrer">histogram</a> </b>with a <b><a href="https://datavizcatalogue.com/methods/density_plot.html" target="_blank" rel="noopener noreferrer">density</a> </b>plot and ordered the genders vertically, one on the top of the other, so the comparison would be easy for the bare eye.</p><p>If you look closely in Figure 1, you&#8217;ll notice a little difference between men and women, both in the deviation of histograms from the shared distribution, i.e., that normal approximation curve, and the center of the boxplot, which represent the <a href="https://en.wikipedia.org/wiki/Median" target="_blank" rel="noopener noreferrer"><b>median</b></a>. Running <a href="https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/t-test/" target="_blank" rel="noopener noreferrer"><b>t-test</b></a> resulted in a <a href="https://www.investopedia.com/terms/p/p-value.asp" target="_blank" rel="noopener noreferrer"><b>p-value</b></a> below 0.05, which means that on average, the pay differences between men and women are statistically significant. This significant result is impacted by a large number of cases in the dataset (about 9400 employees). The average yearly pay gap is about 4.5k US$. (I repeated the visualization and t-tests for all pay variables I had in my dataset, but for the purpose of simplicity, let&#8217;s remain with only one variable).</p><p> </p><h4 style="text-align: center;"><strong>Figure 1: Gender Pay Distributions</strong></h4><p><img fetchpriority="high" decoding="async" src="https://www.littalics.com/wp-content/uploads/2021/06/Figure1.png" alt="" width="913" height="558" /></p><p>Obviously, the average pay gap is not the whole story. Additional variables should be added, to deeply understand the source of the gap. Adding background variables, e.g., full vs. part-time job and tenure may change the story. For the analysis presented in Figure 2, I had to create new variables based on the raw data. I mention it because it is important to take into consideration that, usually, the data you download from your systems won&#8217;t be ready for analysis. A significant part of the Data Scientist time will be invested in cleaning, mounting, and preparing the data for the analysis.</p><p>Exploring gender pay averages across tenure ranges reveals that while both genders are promoted while gaining tenure, men are promoted with higher rates, as the different slope indicates. Running <b><a href="https://en.wikipedia.org/wiki/Analysis_of_variance" target="_blank" rel="noopener noreferrer">ANOVA</a> </b>reveals that the <b><a href="http://statisticsbyjim.com/regression/interaction-effects/" target="_blank" rel="noopener noreferrer">interaction</a> </b>between the gender and tenure variables is significant, meaning that the different slopes are not a random occurrence. Such interaction was not found between gender and full/part-time. However, we do witness full-time employees promoted at a higher rate, in comparison to part-time employees, as slops indicate. This interaction, between full/part-time and tenure, is also significant.</p><p> </p><h4 style="text-align: center;"><strong>Figure 2: Gender effect, Tenure effect, Full/part-time effect</strong></h4><p><img decoding="async" src="https://www.littalics.com/wp-content/uploads/2021/06/Figure2.png" alt="" width="913" height="558" /></p><p> </p><p>But who holds most of the part-time jobs? Apparently, the proportion of part-time employees in Montgomery County Maryland is significantly higher among women (18%), in comparison to men (3%). In other words, the accumulative gap between men and women throughout their careers, as they gain tenure, may stem from their assignment in full and part-time jobs. In a <a href="http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm" target="_blank" rel="noopener noreferrer">Linear regression model</a> that explains the annual salary by gender, assignment, and tenure, the gender is not a significant predictor, as opposed to the other variables: tenure and assignment. Together these variables explain 37% of the variance of annual pay, which is a fair result, but still, other factors impact it too. Positions and occupations may be among those factors.</p><p>Indeed, a critical reader may raise a question about the male&#8217;s and female&#8217;s occupation. The dataset includes some occupations with both genders and other occupations with only men or women. I repeated the whole analysis after screening out those male and female occupations, and I got similar results. Yes, analysis within each occupation is also needed. However, there are 390 occupations in this dataset, so I prefer to leave this task to People Analysts in Montgomery County Maryland. (For dynamic charts of this case study, <a href="https://littal.shinyapps.io/GenderPayGapDepartments/" target="_blank" rel="noopener"><b>by departments for example</b></a><a href="https://littal.shinyapps.io/GenderPayGapDepartments/" target="_blank" rel="noopener">,</a> please visit <span style="font-size: 16px; font-style: normal; font-weight: 400; color: var( --e-global-color-text ); font-family: var( --e-global-typography-text-font-family ), Sans-serif;">my </span><a style="font-size: 16px; font-style: normal; font-family: var( --e-global-typography-text-font-family ), Sans-serif; background-color: #ffffff;" href="https://github.com/Littal" target="_blank" rel="noopener"><b>GitHub</b></a>)<br /><br /></p><p> </p><h3><strong>Additional thoughts</strong></h3><p><br />The gender pay gap analysis in this article is straightforward. Most HR managers with a B.A. education can handle it, with a little help from a data scientist on some occasions. I encourage HR practitioners who start their journey in People Analytics to practice this analysis. The data is available, and the insights may be vital. According to <a href="https://www.gartner.com/en/search?keywords=gender%20pay%20gap" target="_blank" rel="noopener noreferrer"><b>Gartner&#8217;s Digital Employee Experience Survey</b></a> in 2018, #1 in the top ten memorable experiences that affect employee experience is &#8220;Being discriminated against at work&#8221;.  No doubt that transparency and closing the pay gap is crucial for employee engagement and indirectly to employer branding.</p><p>My last note may be the most important. Women still don’t get their fair share, according to an <a href="https://www.visier.com/clarity/radical-workforce-inclusion/" target="_blank" rel="noopener noreferrer"><b>analysis by Visier</b></a>. Data from this People Analytics platform reveals that the gender pay gap widened in 2017 rather than becoming smaller: In 2016, women made 81 cents to the dollar a man-made, but in 2017, women made 78 cents to the dollar, according to Visier data. Organizations still have a long way to go to close the gender pay gap, so why don&#8217;t you start by analyzing the situation in your organization?</p><p><span style="font-size: 16px; font-style: normal; font-weight: 400;">(To explore the R code used in this article, check my </span><a href="https://github.com/Littal" target="_blank" rel="noopener"><b>GitHub</b></a><span style="font-size: 16px; font-style: normal; font-weight: 400;">).</span></p>								</div>
				</div>
					</div>
		</div>
					</div>
		</section>
				<section class="elementor-section elementor-top-section elementor-element elementor-element-17a7eea9 elementor-section-content-middle elementor-reverse-mobile elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-id="17a7eea9" data-element_type="section" data-settings="{&quot;background_background&quot;:&quot;classic&quot;}">
						<div class="elementor-container elementor-column-gap-no">
					<div class="elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-12226f09" data-id="12226f09" data-element_type="column" data-settings="{&quot;background_background&quot;:&quot;classic&quot;}">
			<div class="elementor-widget-wrap elementor-element-populated">
						<section class="elementor-section elementor-inner-section elementor-element elementor-element-5aa2b641 elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-id="5aa2b641" data-element_type="section">
						<div class="elementor-container elementor-column-gap-default">
					<div class="elementor-column elementor-col-50 elementor-inner-column elementor-element elementor-element-7328819c" data-id="7328819c" data-element_type="column">
			<div class="elementor-widget-wrap elementor-element-populated">
						<div class="elementor-element elementor-element-315b01b9 elementor-widget elementor-widget-heading" data-id="315b01b9" data-element_type="widget" data-widget_type="heading.default">
				<div class="elementor-widget-container">
					<h4 class="elementor-heading-title elementor-size-default"><a href="https://www.littalics.com/the-people-analytics-journey/" target="_blank">Related Course</a></h4>				</div>
				</div>
				<div class="elementor-element elementor-element-3864160c elementor-widget elementor-widget-heading" data-id="3864160c" data-element_type="widget" data-widget_type="heading.default">
				<div class="elementor-widget-container">
					<h4 class="elementor-heading-title elementor-size-default"><a href="https://www.littalics.com/the-people-analytics-journey/" target="_blank">The People Analytics Journey</a></h4>				</div>
				</div>
				<div class="elementor-element elementor-element-5b1b3a6d elementor-widget elementor-widget-text-editor" data-id="5b1b3a6d" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
									<p>An overview of future role of HR leaders in improving business performance by informed decisions about people based on data. People Analytics transforming HR; The Role of People Analytics Leader; Case Studies and Simulations; Emerging trends of HR tech.</p>								</div>
				</div>
				<div class="elementor-element elementor-element-4c35c405 elementor-align-center elementor-widget elementor-widget-button" data-id="4c35c405" data-element_type="widget" data-settings="{&quot;_animation&quot;:&quot;none&quot;}" data-widget_type="button.default">
				<div class="elementor-widget-container">
									<div class="elementor-button-wrapper">
					<a class="elementor-button elementor-button-link elementor-size-lg" href="https://www.littalics.com/the-people-analytics-journey/" target="_blank">
						<span class="elementor-button-content-wrapper">
									<span class="elementor-button-text">The Syllabus</span>
					</span>
					</a>
				</div>
								</div>
				</div>
					</div>
		</div>
				<div class="elementor-column elementor-col-50 elementor-inner-column elementor-element elementor-element-1a1cc314" data-id="1a1cc314" data-element_type="column">
			<div class="elementor-widget-wrap elementor-element-populated">
						<div class="elementor-element elementor-element-7d218aaf elementor-widget elementor-widget-image" data-id="7d218aaf" data-element_type="widget" data-widget_type="image.default">
				<div class="elementor-widget-container">
																<a href="https://www.littalics.com/the-people-analytics-journey/" target="_blank">
							<img decoding="async" width="300" height="300" src="https://www.littalics.com/wp-content/uploads/2020/12/ThePeopleAnalyticsJourney.png" class="attachment-full size-full wp-image-3536" alt="" srcset="https://www.littalics.com/wp-content/uploads/2020/12/ThePeopleAnalyticsJourney.png 300w, https://www.littalics.com/wp-content/uploads/2020/12/ThePeopleAnalyticsJourney-150x150.png 150w" sizes="(max-width: 300px) 100vw, 300px" />								</a>
															</div>
				</div>
					</div>
		</div>
					</div>
		</section>
					</div>
		</div>
					</div>
		</section>
				</div>
		<p>The post <a href="https://www.littalics.com/gender-pay-gap-and-people-analytics-a-practice-with-open-data/">Gender Pay Gap and People Analytics: A Practice with Open Data</a> appeared first on <a href="https://www.littalics.com">Littal Shemer Haim</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.littalics.com/gender-pay-gap-and-people-analytics-a-practice-with-open-data/feed/</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
			</item>
		<item>
		<title>Predicting Employee Attrition: R vs DMWay</title>
		<link>https://www.littalics.com/predicting-employee-attrition-r-vs-dmway/</link>
					<comments>https://www.littalics.com/predicting-employee-attrition-r-vs-dmway/#comments</comments>
		
		<dc:creator><![CDATA[Littal Shemer Haim]]></dc:creator>
		<pubDate>Sat, 11 Feb 2017 10:09:36 +0000</pubDate>
				<category><![CDATA[Module 3]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[People Analytics]]></category>
		<category><![CDATA[Syllabus]]></category>
		<category><![CDATA[attrition]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[predictive]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[regression]]></category>
		<guid isPermaLink="false">http://www.littalshemerhaim.com/?p=471</guid>

					<description><![CDATA[<p>This article demonstrates how to predict employee attrition, using logistic regression in R programming vs DMWay software. It also encompasses some background about employee data and the cost of attrition.</p>
<p>The post <a href="https://www.littalics.com/predicting-employee-attrition-r-vs-dmway/">Predicting Employee Attrition: R vs DMWay</a> appeared first on <a href="https://www.littalics.com">Littal Shemer Haim</a>.</p>
]]></description>
										<content:encoded><![CDATA[<span class="span-reading-time rt-reading-time" style="display: block;"><span class="rt-label rt-prefix">(Reading Time: </span> <span class="rt-time"> 11</span> <span class="rt-label rt-postfix">minutes)</span></span>		<div data-elementor-type="wp-post" data-elementor-id="471" class="elementor elementor-471" data-elementor-post-type="post">
						<section class="elementor-section elementor-top-section elementor-element elementor-element-5e888889 elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-id="5e888889" data-element_type="section">
						<div class="elementor-container elementor-column-gap-default">
					<div class="elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-48165210" data-id="48165210" data-element_type="column">
			<div class="elementor-widget-wrap elementor-element-populated">
						<div class="elementor-element elementor-element-7d6f1d5a elementor-widget elementor-widget-text-editor" data-id="7d6f1d5a" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
									<p>We are all familiar with the story of David and Goliath, a shepherd who has defeated a mighty warrior, and the allegory of the underdog beating the giant. Is this story applicable to “People Analytics”?</p><p>In our fictional battle today, the giant would be <a href="https://www.r-project.org/" target="_blank" rel="noopener noreferrer">R</a>, the super-power open-source programming language. The underdog would be <a href="http://dmway.com/" target="_blank" rel="noopener noreferrer">DMWay</a>, the “new kid in town”: An Israeli start-up that develops an AI approach to predictive analytics, and claims to enable faster and better predictive models. Will their combat end as the old myth? Which of the two rivals enables us to build better predictive models? What can we learn from their contest about “People Analytics” practices? Let’s begin the fight in the arena of predicting employee attrition.<br /><span style="font-size: 16px; font-style: normal; font-weight: 400;">(To explore the R code used in this article, check my </span><a style="font-size: 16px; font-style: normal; font-weight: 400; background-color: #ffffff;" href="https://github.com/Littal" target="_blank" rel="noopener">GitHub</a><span style="font-size: 16px; font-style: normal; font-weight: 400;">).</span></p><h4><strong>Table of Contents</strong></h4><ul><li><a href="#part1">Employee attrition data</a></li><li><a href="#part2">The cost of employee attrition</a></li><li><a href="#part3">How to predict attrition?</a></li><li><a href="#part4">Logistic regression: R vs DMWay</a></li><li><a href="#part5">The next step: Deployment</a></li><li><a href="#part6">And the winner is…?</a></li><li><a href="#part7">Infographics</a></li></ul><p><a name="part1"></a></p><h3> </h3><h3><b>Employee attrition data</b></h3><p>The reason why our first round in this fictional battle is chosen to be employee attrition lies in the sad reality of HR open data. In a previous post, I mentioned how great it would be to practice analysis and coding based on <a href="https://www.littalics.com/will-people-analytics-be-open-source/" rel="noopener">HR open data</a>. Indeed, it was really encouraging for me to stumble into an employee attrition case, and yet, it was the only open data I found. Nonetheless, employee attrition is a severe problem for many organizations, as I specify below, so both competing technologies, R and DMWay, may offer valuable analysis.</p><p>The dataset for the following comparison between R and DMWay was downloaded from <a href="https://www.kaggle.com/" target="_blank" rel="noopener noreferrer">Kaggle</a>, a platform for data science competitions, where data scientists can help organizations to solve problems by accurate algorithms based on real data. But not all datasets on Kaggle are real. Datasets published by users on the open data platform are different from those associated with competitions. Some users publish simulated data just for fun or for practice. Unfortunately, that is the case with this employee attrition data.</p><p>The dataset titled <a href="https://www.kaggle.com/ludobenistant/hr-analytics" target="_blank" rel="noopener noreferrer">Human Resources Analytics</a> includes some variables from the realm of HR: numeric variables, e.g., employee satisfaction, employee evaluation, average monthly hours, tenure, and amount of projects, and categorical variables, e.g., work accidents, promotion in last 5 years, department, and salary level. All of these variables may predict the outcome of employee attrition: voluntarily leaving the company. The case study addresses a specific question: “Why are our best and most experienced employees leaving prematurely?” Answering this question would enable this fiction organization to take some actions in order to eliminate or decrease the undesirable outcomes.</p><p>At first sight, or rather, by exploring the variables, it seems that the dataset was well simulated as if it was derived from a real HR database. The <a href="http://www.icpsr.umich.edu/icpsrweb/NAHDAP/support/faqs/2006/01/what-is-codebook" target="_blank" rel="noopener noreferrer">codebook</a> available is not detailed enough for deeply understanding the meaning of all variables’ units. However, you may assume that some variables would have been re-coded or calculated (e.g., salary level), and others were excluded (e.g., demographics), for the sake of confidentiality. In addition, the data is well structured and clean, in a way you would not expect in case of real data extracted from the HR information system. Nevertheless, let’s continue, assuming that this <a href="ftp://cran.r-project.org/pub/R/web/packages/tidyr/vignettes/tidy-data.html" target="_blank" rel="noopener noreferrer">tidy data</a> is real and complete.</p><p>Data exploration reveals that employees who left the company are actually better, in comparison to those who stayed. As shown in Figure 1, although less satisfied and rewarded, employees who left are better evaluated, are involved in more projects, work more, have longer tenure, and are less involved in accidents. These results imply the enormous costs of employee attrition.</p><h4><strong>Figure 1: Employees who left the company in comparison to those who stayed</strong></h4><p>(Employees who left – “1”, Employee who stayed – “0”)</p><p><img loading="lazy" decoding="async" class="alignnone wp-image-4181 size-full" src="https://www.littalics.com/wp-content/uploads/2021/04/Figure1.png" alt="" width="719" height="718" srcset="https://www.littalics.com/wp-content/uploads/2021/04/Figure1.png 719w, https://www.littalics.com/wp-content/uploads/2021/04/Figure1-300x300.png 300w, https://www.littalics.com/wp-content/uploads/2021/04/Figure1-150x150.png 150w" sizes="(max-width: 719px) 100vw, 719px" /></p><p><a name="part2"></a></p><h3> </h3><h3><b>The cost of employee attrition</b></h3><p>Employee attrition is a huge issue for organizations in every industry. An organization can’t completely avoid employee turnover and attrition, but the rate of employees walking out the door may determine the organization’s doom.</p><p>Employees who leave take a significant value with them: professional knowledge, specific practices and know-how, relations within the organization and outside (with clients, suppliers, business partners, etc.), and more. But the damage does not end with this. There are enormous costs, sometimes up to the sum of few salaries per employee, which are tied to recruitment, onboarding, training, and ramping up of a new employee. Furthermore, an organization must take into account some alternative costs, namely the value of transactions that could have been made by a senior employee who actually left.</p><p>Indeed, <a href="http://www.predictiveanalyticsworld.com/patimes/reducing-the-costs-of-employee-churn-with-predictive-analytics-0521151/5398/" target="_blank" rel="noopener noreferrer">understanding the attrition cost</a> is recommended as the first step in any predictive analytics project. Modeling cost is an effective way to determine ROI (return on investment) of a predictive analytics project that addresses the issue of employee attrition. No wonder why analysts, in <a href="https://www.hrdconnect.com/2016/04/27/what-walmart-learned-from-hr-analytics/" target="_blank" rel="noopener noreferrer">companies like Walmart</a>, take the effort to demonstrate to management that reducing even a 1% attrition rate sometimes saves millions of dollars. Although modeling attrition costs is inapplicable in our simulated data, it is important to keep in mind that in a real project it would be a good practice to start with that. Furthermore, a decrease in attrition rate in this imaginary organization (31%) may not only save the recruitment and onboarding cost but probably reduce the alternative costs, since, in this example, employees who leave are considered to be the better ones.</p><p><a name="part3"></a></p><h3><b>How to predict attrition</b></h3><p>There are many modeling techniques that can be used to explain or predict attrition. However, in this article, I have chosen to cover the logistic regression. There are two reasons for my choice: First, the logistic regression is easier to interpret. It may not be the most accurate model, but it offers a pretty good solution without much effort. I believe that in the HR realm, the reasonable way to progress with predictive analytics is generally through good variable selection, that can be easily explained, and not by showing off with excessive models. Second, the objective of this article is not only to demonstrate the implementation of predictive analytics for employee attrition but rather to compare two technologies: R vs DMWay. Since the innovative solution of DMWay relies on regression models, it is more practical to compare the workflow and results of these two technologies by the same modeling.</p><p>When analyzing attrition, the goal is essentially to explain or predict a binary or logical variable (voluntary leaving the company) by various other available variables (in this simulation, all or some of the following: employee satisfaction, employee evaluation, average monthly hours, tenure, number of projects, work accidents, promotion in last 5 years, department, and salary level). There are plenty of resources on how <a href="https://www.r-bloggers.com/evaluating-logistic-regression-models/" target="_blank" rel="noopener noreferrer">Logistic Regression</a> works, but in a nutshell, logistic regression is suited for examining the relationship between a categorical response variable and one or more categorical or continuous predictor variables. It creates an equation that in effect predicts the likelihood of a two-category outcome using the selected predictors. Each of the predictors is associated with a significance mark (p-value) that indicates if the predictor is useful or not.</p><p>The implementation of logistic regression, both in R and DMWay, follows the same recipe: data partition into training and testing sets, using logistic regression to model attrition as a function of other predictors in the training dataset, evaluate the model by predicting attrition in the testing dataset, and analyzing how good the model is in terms of prediction accuracy and predictors’ importance. I followed these exact steps in the two technologies but had a totally different user experience, and even different results.</p><p><a name="part4"></a></p><h3><b>Logistic regression: R vs DMWay</b></h3><p>The R output of the logistic regression model is presented in figure 2. This is the fourth model I created, in an effort to generate the most accurate model, as specified below. In this model, all variables were included, except the department variable. Furthermore, the data were subset to include only highly evaluated, senior employees, who work full time. This is appropriate for the simulation objective: “Why are our best and most experienced employees leaving prematurely?” For the purpose of simplicity, and due to the lack of demographics, e.g., gender, the model does not include interactions of variables.</p><h4><strong>Figure 2: R output of the logistic regression model</strong></h4><p><img loading="lazy" decoding="async" class="alignnone wp-image-4182 size-full" src="https://www.littalics.com/wp-content/uploads/2021/04/Figure2.png" alt="" width="655" height="534" srcset="https://www.littalics.com/wp-content/uploads/2021/04/Figure2.png 655w, https://www.littalics.com/wp-content/uploads/2021/04/Figure2-300x245.png 300w" sizes="(max-width: 655px) 100vw, 655px" /></p><p>A quick orientation in the model results: The variable names are listed on the far left under “Coefficients”. In the case of categorical variables, the first value (e.g., “0” in work accident) is considered as a baseline, and other values are included in a separate line, indicating their impact relative to the variable baseline. The significance values are provided on the far right under Pr&gt;(|z|). In our case all variables’ estimates have significant value (equal to or lower than 0.05), i.e., they are unlikely to be obtained by pure chance.</p><p>Not surprisingly, the intention to leave the company has a negative relation with many variables. Specifically, being promoted, having a medium or high salary, seniority accumulation, and also work accident, decrease the intention to leave. On the other hand, excessive working hours, the number of projects, the last evaluation, and also satisfaction level, have a positive correlation with the intention to leave.</p><p>The model output seems straightforward. It implies what the fiction organization should do in order to keep its best employee. But is it accurate? To figure this out, it’s time to use the model to predict the outcomes in the test dataset. The results enable us to build a “<a href="http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/" target="_blank" rel="noopener noreferrer">confusion matrix</a>”, in which predicted results are compared to observed results. As shown in Figure 3, which is again R output, the prediction is not perfect. The model “was right” in about 87% of the case. The model “specificity”, i.e., its ability to correctly identify those who left is about 97%, whereas the model “sensitivity”, i.e., its ability to correctly identify those who stayed is about 59%.</p><h4><strong>Figure 3: R output for evaluation of logistic regression model</strong></h4><p><img loading="lazy" decoding="async" class="alignnone wp-image-4183 size-full" src="https://www.littalics.com/wp-content/uploads/2021/04/Figure3.png" alt="" width="381" height="433" srcset="https://www.littalics.com/wp-content/uploads/2021/04/Figure3.png 381w, https://www.littalics.com/wp-content/uploads/2021/04/Figure3-264x300.png 264w" sizes="(max-width: 381px) 100vw, 381px" /></p><p>Another way to evaluate this logistic regression model, and to compare it later to the model made by DMWay, is the AUC, which stands for Area Under the ROC Curve. To make a very very long story short, the “<a href="http://www.dataschool.io/roc-curves-and-auc-explained/" target="_blank" rel="noopener noreferrer">ROC curve</a>” helps to figure out the trade-off between true positive rate (e.g., employees predicted and actually stayed in the company) and false positive rate (e.g., employees predicted to stay but actually left the company), in different cutoff points of prediction, ranged from 0 to 1. In our model, the cutoff point was 0.5, meaning that predicted results equal or above it was considered as “left”. But we could choose other cutoff points and maybe gain better results. With different cutoff values, we move along a curve, where at each point we have different true positive and false-positive rates. Higher and steeper ROC curves are desired and are indicated by a higher area under it (AUC). The perfect theoretical model would have an AUC of 1, and a completely non-predictive model would have an AUC of 0.5. What about our model? It is actually pretty good, but not excellent. It has an AUC of 0.78.</p><p>Here we end our session in R and move forward to DMWay. How different the UX is in this software? Will we end up the process with a similar model and predictions? It is worth to mention that each step that was done so far in R was involved with coding because that’s what R is all about. In DMWay, however, the whole process is latent, and the user needs only to make some simple selections in only 4 menus. To illustrate how easy it is to start working with DMWay, you can simply <a href="https://youtu.be/7l2-u-NZnhI" target="_blank" rel="noopener noreferrer">watch this 7min video</a> in which a model is generated and deployed. However, in order to understand and evaluate it, you must already be familiar with the process of predictive analytics, though our R session already gave you the general idea.</p><p>I used the same simulated data, clicked buttons to run a model, since not even a single line of code is needed in DMWay, and then… Wow! The results were stunning. Take a look at the model ROC curve in figure 4, with AUC as perfect as 0.96! But what is the model behind this impressive chart? Exploring the model, as shown in figure5 reveals interesting points: First, the excluded variable here is the promotion, while in the R session I excluded the department. Second, some other variables that do include in the model appear in a piecewise mode, i.e., each of their values range has a different coefficient, hence, different influence on the predictive outcome. That is the reason why there is no need to take a subset of excellent employees to extract a good model, as I did in the R session. I must admit, though, that at first sight, the variety of lines in the model output makes it a little harder to explain employee attrition in simple words. However, the reds of the negative correlations make this output much more friendly. Furthermore, in order to tell a story, namely to explain what variables are most contributing to employee attrition in this specific model, all we need is a quick glance at figure 6, which presents the variables in descending order of contribution.</p><h4><strong>Figure 4: DMWay output for ROC in a logistic regression model</strong></h4><p><img loading="lazy" decoding="async" class="alignnone wp-image-4184 size-full" src="https://www.littalics.com/wp-content/uploads/2021/04/Figure4.png" alt="" width="832" height="527" srcset="https://www.littalics.com/wp-content/uploads/2021/04/Figure4.png 832w, https://www.littalics.com/wp-content/uploads/2021/04/Figure4-300x190.png 300w, https://www.littalics.com/wp-content/uploads/2021/04/Figure4-768x486.png 768w" sizes="(max-width: 832px) 100vw, 832px" /></p><h4><strong> </strong></h4><h4><strong>Figure 5: DMWay output for a logistic regression model</strong></h4><p><strong style="font-size: 1.33333rem; font-family: var( --e-global-typography-text-font-family ), Sans-serif;"><img loading="lazy" decoding="async" class="alignnone wp-image-4185 size-full" src="https://www.littalics.com/wp-content/uploads/2021/04/Figure5.png" alt="" width="1130" height="644" srcset="https://www.littalics.com/wp-content/uploads/2021/04/Figure5.png 1130w, https://www.littalics.com/wp-content/uploads/2021/04/Figure5-300x171.png 300w, https://www.littalics.com/wp-content/uploads/2021/04/Figure5-1024x584.png 1024w, https://www.littalics.com/wp-content/uploads/2021/04/Figure5-768x438.png 768w" sizes="(max-width: 1130px) 100vw, 1130px" /></strong></p><h4><strong>Figure 6: DMWay output for significant variables</strong></h4><p><img loading="lazy" decoding="async" class="alignnone wp-image-4186 size-full" src="https://www.littalics.com/wp-content/uploads/2021/04/Figure6.png" alt="" width="947" height="686" srcset="https://www.littalics.com/wp-content/uploads/2021/04/Figure6.png 947w, https://www.littalics.com/wp-content/uploads/2021/04/Figure6-300x217.png 300w, https://www.littalics.com/wp-content/uploads/2021/04/Figure6-768x556.png 768w" sizes="(max-width: 947px) 100vw, 947px" /></p><p>The model made by DMWay outperform the logistic regression in R. Perhaps more efforts in feature selection would have yield better results in R. But faster and better predictive models is the whole point of using DMWay. Ronen Meiri &#8211; Ph.D., Founder, and CTO of DMWay, explains that “DMWay automated solution is powered by a sophisticated analytic engine that mimics all the steps taken by experienced data scientists during the analytic process.” This may take weeks or months sometimes. “DMWay is led by leading data science experts”, says Meiri, “and encompasses many years of researching automation and simulating the work of a data scientist. While building predictive analytics models in the world of big data is time-consuming, costly, and risky, DMWay offers everyone the ability to build better predictive models in a matter of hours.”</p><p><a name="part5"></a></p><h3><b>The next step: Deployment</b></h3><p>According to the data mining process known as <a href="https://youtu.be/nNc_q08yWxw" target="_blank" rel="noopener noreferrer">CRISP-DM</a>, the next step after modeling and evaluation is deployment, namely putting the model in real use. While deploying a model generated in R involves excessive work, mainly to translate it to other programming languages used in the organization, DMWay offers an innovative solution: At the end of the model generation process, it can generate the code for deployment, in three programming languages: R, SQL, and Java. This code is ready for export to the organization’s database.</p><p>Does it mean that our fiction organization should now hurry to predict which of its best employees is the next to leave? Although it is technically possible, it is not ethically recommended. The only reason that I mention deployment in this context, is for mentioning this additional strength of DMWay. There are many other business outcomes related to employees that it would be wise to predict and deploy, however, in my opinion, pointing the next employee who is at “flight risk” in a certain moment, is not one of them.</p><p>The <a href="http://analytics-magazine.org/predictive-analytics-the-privacy-pickle-hewlett-packards-prediction-of-employee-behavior/" target="_blank" rel="noopener noreferrer">ethical issue of predicting employee attrition</a> has been long discussed, e.g., in the context of civil rights. I would use this specific model to understand employee attrition, in order to reduce it, or to test the impact of some organizational interventions, though. I think that when pointing to an employee that is not 100% intent to leave, there is a chance for different results, not all in favor of that employee. Furthermore, we should all remember that models will be always models, they can’t encompass the whole reality. The British statistician <a href="https://en.wikipedia.org/wiki/George_E._P._Box" target="_blank" rel="noopener noreferrer">George E. P. Box</a> said it most appropriately: “Essentially, all models are wrong, but some are useful”.</p><p><a name="part6"></a></p><h3><b>And the winner is…?</b></h3><p>To wrap up our test, let’s come back to the story of David and Goliath. As you probably recall, David goes into battle with only a sling &#8211; a simple and common tool among shepherds. He walks right up to Goliath and kills him with a single shot to the head. DMWay’s sling is not simple nor common. But for contemporary business analysts, data scientists, domain experts, and also executives, it turns out that it is effective to defeat the giant R, in terms of time-consuming, model accuracy, and deployment. The whole process of this battle is summarized in the following infographics.</p><p>Yet, to be honest, our fiction battle ends differently. In contrast to the biblical story, our giant is not defeated but rather lives happily ever after, since he is “open”, i.e., letting others to use his power. In fact, like much contemporary software, DMWay relies on R in the backstage.</p><p>So which tool would you pick for predicting employee attrition?<br />I’ll appreciate sharing your thoughts in a comment.</p><p><a name="part7"></a></p><p><img loading="lazy" decoding="async" class="alignnone wp-image-4187" src="https://www.littalics.com/wp-content/uploads/2021/04/Predicting-Employee-Attrition-R-vs-DMWay.png" alt="" width="600" height="1500" srcset="https://www.littalics.com/wp-content/uploads/2021/04/Predicting-Employee-Attrition-R-vs-DMWay.png 800w, https://www.littalics.com/wp-content/uploads/2021/04/Predicting-Employee-Attrition-R-vs-DMWay-120x300.png 120w, https://www.littalics.com/wp-content/uploads/2021/04/Predicting-Employee-Attrition-R-vs-DMWay-410x1024.png 410w, https://www.littalics.com/wp-content/uploads/2021/04/Predicting-Employee-Attrition-R-vs-DMWay-768x1920.png 768w, https://www.littalics.com/wp-content/uploads/2021/04/Predicting-Employee-Attrition-R-vs-DMWay-614x1536.png 614w" sizes="(max-width: 600px) 100vw, 600px" /></p><p> </p>								</div>
				</div>
					</div>
		</div>
					</div>
		</section>
				<section class="elementor-section elementor-top-section elementor-element elementor-element-433a0e9f elementor-section-content-middle elementor-reverse-mobile elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-id="433a0e9f" data-element_type="section" data-settings="{&quot;background_background&quot;:&quot;classic&quot;}">
						<div class="elementor-container elementor-column-gap-no">
					<div class="elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ed94585" data-id="ed94585" data-element_type="column" data-settings="{&quot;background_background&quot;:&quot;classic&quot;}">
			<div class="elementor-widget-wrap elementor-element-populated">
						<section class="elementor-section elementor-inner-section elementor-element elementor-element-3523a529 elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-id="3523a529" data-element_type="section">
						<div class="elementor-container elementor-column-gap-default">
					<div class="elementor-column elementor-col-50 elementor-inner-column elementor-element elementor-element-6c223c2e" data-id="6c223c2e" data-element_type="column">
			<div class="elementor-widget-wrap elementor-element-populated">
						<div class="elementor-element elementor-element-334144ff elementor-widget elementor-widget-heading" data-id="334144ff" data-element_type="widget" data-widget_type="heading.default">
				<div class="elementor-widget-container">
					<h4 class="elementor-heading-title elementor-size-default"><a href="https://www.littalics.com/the-people-analytics-journey/" target="_blank">Related Course</a></h4>				</div>
				</div>
				<div class="elementor-element elementor-element-55f8d43d elementor-widget elementor-widget-heading" data-id="55f8d43d" data-element_type="widget" data-widget_type="heading.default">
				<div class="elementor-widget-container">
					<h4 class="elementor-heading-title elementor-size-default"><a href="https://www.littalics.com/the-people-analytics-journey/" target="_blank">The People Analytics Journey</a></h4>				</div>
				</div>
				<div class="elementor-element elementor-element-33f7c29e elementor-widget elementor-widget-text-editor" data-id="33f7c29e" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
									<p>An overview of future role of HR leaders in improving business performance by informed decisions about people based on data. People Analytics transforming HR; The Role of People Analytics Leader; Case Studies and Simulations; Emerging trends of HR tech.</p>								</div>
				</div>
				<div class="elementor-element elementor-element-417e999a elementor-align-center elementor-widget elementor-widget-button" data-id="417e999a" data-element_type="widget" data-settings="{&quot;_animation&quot;:&quot;none&quot;}" data-widget_type="button.default">
				<div class="elementor-widget-container">
									<div class="elementor-button-wrapper">
					<a class="elementor-button elementor-button-link elementor-size-lg" href="https://www.littalics.com/the-people-analytics-journey/" target="_blank">
						<span class="elementor-button-content-wrapper">
									<span class="elementor-button-text">The Syllabus</span>
					</span>
					</a>
				</div>
								</div>
				</div>
					</div>
		</div>
				<div class="elementor-column elementor-col-50 elementor-inner-column elementor-element elementor-element-469bc07b" data-id="469bc07b" data-element_type="column">
			<div class="elementor-widget-wrap elementor-element-populated">
						<div class="elementor-element elementor-element-ce27edc elementor-widget elementor-widget-image" data-id="ce27edc" data-element_type="widget" data-widget_type="image.default">
				<div class="elementor-widget-container">
																<a href="https://www.littalics.com/the-people-analytics-journey/" target="_blank">
							<img decoding="async" width="300" height="300" src="https://www.littalics.com/wp-content/uploads/2020/12/ThePeopleAnalyticsJourney.png" class="attachment-full size-full wp-image-3536" alt="" srcset="https://www.littalics.com/wp-content/uploads/2020/12/ThePeopleAnalyticsJourney.png 300w, https://www.littalics.com/wp-content/uploads/2020/12/ThePeopleAnalyticsJourney-150x150.png 150w" sizes="(max-width: 300px) 100vw, 300px" />								</a>
															</div>
				</div>
					</div>
		</div>
					</div>
		</section>
					</div>
		</div>
					</div>
		</section>
				</div>
		<p>The post <a href="https://www.littalics.com/predicting-employee-attrition-r-vs-dmway/">Predicting Employee Attrition: R vs DMWay</a> appeared first on <a href="https://www.littalics.com">Littal Shemer Haim</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.littalics.com/predicting-employee-attrition-r-vs-dmway/feed/</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
			</item>
	</channel>
</rss>
