Sumser: Tackling the bias in your data

By: | November 16, 2020 • 4 min read
Emerging Intelligence columnist John Sumser is the principal analyst at HRExaminer. He researches the impact of data, analytics, AI and associated ethical issues on the workplace. John works with vendors and HR departments to identify problems, define solutions and clarify the narrative. He can be emailed at hreletters@lrp.com.

The moment you decide to measure something, you introduce bias into your data. The very fact that there is a number to manipulate is an indication that bias is present. Choosing to measure and monitor one thing always means choosing not to measure and monitor something else.

How Measuring Introduces Bias

In other words, all data requires context. If you just measure your feet and don’t measure your ankles or calves, your boots won’t fit as well. But, it’s nearly impossible to completely measure your lower leg and turn it into a boot. So, you measure select parts of your leg when you buy custom boots.

Advertisement

You choose not to measure some things for efficiency, their relevance to the immediate task, affordability or because of their complexity. Those decisions inherently introduce bias in your data.

The basic bias is what you think is important. We measure the things we want to understand and rarely measure anything else. In this way, our management systems resemble our nervous systems. The difference is that our unconscious processes handle the entire volume. Management schemes barely scrape the surface.

See also: Steps to avoid bias in a post-COVID-19 remote world

Our sensory systems take in vast volumes of information, but our conscious minds are only able to consume a tiny fraction. Our ability to pay attention is limited to four or five simultaneous objects, usually fewer.

We depend on bias to keep ourselves sane. Bias is best understood as a cognitive shortcut. We, like our management systems, don’t give attention to things that don’t seem important, and assume the world is the way we see and experience it. We relegate those “unimportant” things to the unconscious and are often blind to the experience of others.

On my office wall is a graphic representation of all of the cognitive unconscious biases. I keep it there to remind me of the sheer impossibility of keeping all of them in mind. There’s a school of thought that believes the task is impossible.

But, if you wanted to monitor and control all of your unconscious biases, you would have to remember and be able to identify all 183 of them. While I’m not saying it’s impossible, it is highly impractical.

Bias is an immutable characteristic of data. Without the bias that ties data to its source, it becomes numbers—easy to manipulate but meaningless on their own. We want our data to mean something, even in its tiniest instance. Yet, every tiny shred of meaning also contains the bias of both its content and its selection.

Bias and Discrimination in HR Tech

This is not what HR tech companies are talking about when they tell you they are going to remove or mitigate bias. They are talking about a very different topic. The bias that contemporary intelligent systems providers are trying to work with is a narrow range of things. They want to control, manage or “eliminate” the biases associated with job discrimination.

They are wrestling with the important question of the ways in which people are treated differently when, legally, they are required to be treated the same. The intelligent tools providers that make claims about bias are all talking about trying to reduce discrimination in HR processes.

Related: Why ethics should be at the heart of HR tech decisions

This is an admirable goal. The marketing currently associated with it is either extremely cynical or completely naïve. Discrimination based on race, religion, gender, disability or other irrelevant variables is imbued in our history and culture. We should all be working towards its eradication.

But even classifying humans by protected classes has its own issues. The data used to validate the effectiveness of these offerings is self-reported. At some point, the people being evaluated are offered the opportunity to declare the protected classes to which they belong. And that’s where the problem begins.

If you give the tiniest bit of thought to the question, you’ll see that there is almost no real incentive or possibility for accurate reporting. Each of the protected classes is a hard-to-verify social construct. Whether or not I think I am a member is a highly personal choice.

Also see: How to use HR tech for an integrated DEI approach

Advertisement

More importantly, whether or not I think I am a member of one of those classes is never the question. Discrimination happens because of what someone else thinks I am. It does not have to match my self-concept. There is no social function that will correct your attitudes toward me because you believe I am in one of those categories.

Bias not only exists within the data itself; it also exists within the consumer of the data. Discrimination comes from people and organizations in the data they choose to use, how they interpret it and whether they can understand and overcome data’s inherent limitations and bias.

Data is always a reflection of what happened and what matters to the people creating and using the tools. Intelligent tools are good for having new insights, raising interesting questions and exploring why we believe the things we do. All intelligent tools contain bias. Some can also help reveal it. But no system based on data and logic will ever eliminate discrimination.