Harnessing the Transparency of Hiring Algorithms

We had an interesting meeting here at Wharton this past week that brought together a group of data-science faculty members and a group of executives who manage HR analytics. The idea was to create a research agenda and try to find an answer to the question: “What’s interesting here?”

The answer: a lot of things.

Let’s take a look at one of them: the practice of hiring, where the interest in data analytics is greatest. After World War II, it was pretty common for a job candidate at a big company to be whisked to its headquarters for a week of assessments (or what we now call pre-employment tests). These would include IQ, personality, in-basket simulations, structured interviews with psychiatrists and so forth.

What happened to all that? It disappeared in many companies, was scaled way back in others and some new companies–many of which are now huge–never bothered doing such assessments at all.

Why was that? We don’t know for sure, but I’m betting at least part of it had to do with the wave of new equal-opportunity legislation and the enforcement of it by government agencies. The watershed event here, in my view, was a series of employment-discrimination lawsuits against AT&T in the 1970s. AT&T may well have had some of the most sophisticated hiring practices at the time, but the courts held that they discriminated against protected groups.

Many companies took note of that, and the perception grew that if you were doing anything sophisticated in testing or prediction, you might well be a target for an investigation. Why? A cynical view is that it would be easier in that case to assess adverse impact: “Let’s just look at the test you were using to hire and promote and see what the impact of that test was.” If, on the other hand, you were doing nothing systematic and hiring managers were all doing something different, it was harder to see if what you were doing was discriminatory, in part, because it was just hard to see what you were doing at all.

That takes us back to data science and the current fascination with machine learning and the algorithms it generates. It is different from the statistical analysis in HR that most of us have at least a passing familiarity with, which typically starts with a hypothesis (e.g., does IQ predict who will be a good hire?) and then looks to at the relationship between the candidates’ IQ scores and their subsequent performance as new employees.

Machine learning doesn’t start with a hypothesis. It starts with a goal, which is to create a model–an algorithm–that will track as closely as possible the performance of employees. The more data we have on those employees, the better. What you get at the end of that process is an algorithm that combines the data on previous candidates in such a way that it does a very good job of figuring out who succeeded, and it can then be used to predict which of the new candidates should be hired. Hiring managers can just plug their information into the algorithm and hire the new candidates with the highest scores.

This algorithm will predict outcomes better than anything we’ve seen before. Like any formal model, though, it is easy to see whether it has a disparate impact on protected groups. It may very well have such an impact, too, because the models are based on who has been successful in the past, and we know that measurements of past performance often contain bias against women and minorities.

To defend against charges of bias, employers would have to validate the algorithm, showing that it did, in fact, predict who would be a good employee better than alternative tests that did not have adverse impacts. That is especially tricky to do with algorithms because simply explaining what they do is hard, and justifying it harder still, since there are so many things in the model working together, we don’t have a clear hypothesis as to why it works.

Traditional hiring practices are biased as well, and there is an extremely good chance that the algorithm will be much better at predicting who will be a good employee while also being less biased than the hiring decisions of untrained hiring managers.

But are companies willing to use the algorithm? My guess is the answer, at least for many organizations, will be no, because avoiding the risk of a problem they understand is better than coming to grips with doing a poor job of hiring. They’ve gotten used to ducking that issue for years.

Peter Cappelli, Wharton
Peter Cappelli
Peter Cappelli is HRE’s Talent Management columnist and a fellow of the National Academy of Human Resources. He is the George W. Taylor Professor of Management and director of the Center for Human Resources at The Wharton School of the University of Pennsylvania in Philadelphia. He can be emailed at [email protected]