Algorithms Are Not Monoliths
Recap from Learning Collider’s Horizons Summit Panel on the Use of AI in Hiring
In July 2024, Learning Collider led a break-out session at Jobs for the Future’s Horizons Summit in Washington, D.C. Our panelists highlighted how algorithms in AI can be designed and developed to support more equitable outcomes for workers, employers, and workforce development programs.
Below is an abbreviated recap of the session, edited for clarity. First, an introduction to our panelists:
Sara Nadel - Chief Operating Officer, Learning Collider
Sara has extensive experience in designing and building HRTech. She co-founded StellarEmploy, a hiring platform designed to improve recruiting and retention for frontline employers by connecting workers to best-fit jobs through a proprietary preferences survey and algorithm. (StellarEmploy is now owned and operated by Learning Collider.)
Nitya Raviprakash - Data Science Director, Learning Collider; Fellow, NationSwell
Nitya has a background in both computer science and economics. Her experience includes working as a research engineer for Microsoft where she helped build an AI chatbot. At Learning Collider, she leads a team of data scientists and works on projects related to algorithmic fairness.
Kadeem Noray - Postdoc Fellow, MIT Department of Economics; Associate, Opportunity Insights; Affiliate, Learning Collider
Kadeem’s research focuses on talent selection and allocation in the economy. He investigates how algorithms can be used to improve and increase the fairness of talent selection in various institutions.
Henry Hipps - Co-Founder & CEO, Diffusion Venture Studio
Henry’s expertise draws from the many intersections of education, workforce development, venture-building, and philanthropy. Before launching Diffusion Venture Studio with Learning Collider’s founder Peter Bergman, Henry was a Senior Advisor and Entrepreneur in Residence at Owl Ventures, the world’s leading education sector venture capital firm. At Diffusion, Henry’s team is positioning resources and research to launch education and workforce technology ventures. Before our panel, Henry’s flight was delayed. His important perspective was missed but not forgotten:
‘Building and scaling fair algorithms requires capital at every angle and phase. Policymakers, employers, investors, and technologists must consistently invest in efforts to ensure that algorithms supporting critical hiring decisions result in more fair and equitable judgments.’
AI in Hiring: Panel Highlights
Sara: I'll start by addressing the issue of “algorithms are not monoliths” head on. As we think about algorithm design and evaluation, let's pretend we’re talking informally with an executive who is considering an AI-based technology to use in their talent management department. To advise them on their purchase, what questions would you ask them? And what would you recommend they consider?
Nitya: I would first ask, “how are you planning to use the product?” and then, “for what purpose?” There is potential for AI to be used in every stage of the hiring funnel and each application has unique risks. You really have to think about its purpose and clearly define the problem you’re trying to solve.
You also need to determine the inputs into the algorithm you’re purchasing. Are there tangible reasons for each input? How do those inputs relate to the algorithm’s prediction? Are you trying to predict the retention rate of an applicant or their likelihood of being hired? Those two predictions require different inputs. And sometimes your inputs are accurate but the design of the algorithm can cause biases or result in low-quality predictions.
A recent example of problematic AI design comes from AON Consulting, a purveyor of hiring technology. The American Civil Liberties Union (ACLU) has called for a probe by the FTC to investigate some of AON’s products, which seem to have adverse impacts on applicants with mental health conditions. By asking certain stress-related questions in their hiring survey, the algorithm down-scores applicants with mental health challenges. This is an example of using close proxies to certain demographic groups or types of applicants which results in harmful bias in hiring AI.
Kadeem: One consideration that comes to mind relates to how a feature you think would be potentially helpful actually ends up turning around and makes it harder to get what you're looking for.
For example, a project I'm working on with Learning Collider is trying to understand how ChatGPT - and generative AI more broadly - makes judgments that are different from humans in the context of recruiting. Recruiters decide who should be interviewed and generative AI can be used to help make these decisions. Many companies are already doing this.
In our project, which is in a technical interview setting, one thing that we're observing is that human recruiters seem to give the benefit of the doubt to non-white applicants. In other words, humans seem to think that non-white applicants should advance to the interview more frequently. But AI actually eliminates this difference. Just looking at the data, you might think, ‘well, humans seem to be engaging in a kind of biased way.’ And in this case you might want to use AI to eliminate this “bias.”
However, not all differences and judgments are necessarily biased in the way we traditionally think about them. A human recruiter might recognize that there are a lot of factors that contribute to minority programmers needing to be particularly resilient and talented to even get to a technical interview, and relax typical resume expectations. Or, they may be concerned about the diversity of their tech team because they know diversity improves team success, even if the team's collective experience is slightly less prestigious. It’s not necessarily bias that’s being exhibited.
AI doesn't necessarily know what the human recruiter is trying to do in maximizing the diversity of the applicant pool who passes the technical interview. AI doesn’t take that into account. So that’s one circumstance where we're seeing some preliminary evidence that AI may eliminate one sort of bias but actually end up making talent pipelines, weaker in a sense, or diverging from what the recruiter might consider to be a fairer outcome.
‘I’ve been working in this space of filtering applicants for a long time. There are so many ways an algorithmic filter can help and there are so many ways it can get it wrong.’
Sara: I’d like to add some comments to the case that Nitya presented first where the hiring assessment is down-scoring people with mental health conditions. Legally, you cannot have a hiring process that is consistently biased against any protected classes. There is no standard way to implement this, but there’s an 80/20 rule where you need to demonstrate that any protected class in question is getting interviewed at least 80% of the rate of non-protected classes. It doesn’t matter if those individuals actually end up not performing as well in the job, they need to be getting the same opportunities of interviews.
Kadeem, I appreciate your example as well. I’ve been working in this space of filtering applicants for a long time. There are so many ways an algorithmic filter can help and there are so many ways it can get it wrong.
Another example from a Learning Collider paper presents methods for building experimentation into your hiring algorithm. You don't want an algorithm to just replicate what people have been doing because people can be biased. You want your algorithm to experiment and give applicants a chance who haven’t historically been given a chance. In a way, humans in Kadeem’s example are doing that while the generative AI in question is not.
There are also a lot of hiring filters that are considering names. There’s an infamous example from the 2010s where a hiring AI system showed two unlikely factors that would predict job success: being named “Jared” and playing high school lacrosse. Similarly, Amazon’s so-called “holy grail” AI recruiting system filtered out resumes from women. You don’t want a filter looking at names, gender, or high school sports. You also don’t want a filter taking into consideration other characteristics that correlate with gender or race.
Since we’re in D.C., I’ll use the example of Howard University, a historical black university. There are fewer graduates from historically black colleges and universities (HBCUs) than there are from non-HBCUs, say the top 20 universities in the U.S. It follows that there are a lot of employers with few to zero current employees who are graduates of those universities. When these employers use an algorithm designed to replicate historical hiring decisions, a Howard graduate would be much less likely to be recommended for an interview than, say, a University of Michigan graduate, despite the high quality of their education. Unless the algorithm is designed to experiment, it will likely select for consistency and avoid characteristics that it’s unfamiliar with.
‘Using AI doesn’t absolve you of needing to think about exactly how you want to incorporate AI and consider the exact model specifications and its data inputs.’
Kadeem: A point I want people to take away from my previous example is that AI isn’t necessarily going to be blind to some of these normative concerns that a human decision maker might have. It depends on the algorithm design. Using AI doesn’t absolve you of needing to think about exactly how you want to incorporate AI and consider the exact model specifications and its data inputs.
The example you just presented, Sara, is one where starting out, you make sure to over-sample people from rare or underrepresented backgrounds who never make it into interviews so that we - both human recruiters and AI - learn what actually correlates with their success. This isn’t an attempt to just hit a quota; in the research you referenced (Hiring as Exploration), it actually improved the performance of applicants selected for interviews.
Because the algorithm learned that somebody from Howard - or another underrepresented background - would be successful in the interviews, we could better predict which of those Howard graduates are going to do well. In this case, you have more HBCU graduates in the hiring pipeline, you get to see more of their performance, and more of them are hired.
One thing I’m concerned about is that people purchasing hiring algorithms and AI recruiting tools may think that somehow, they won’t need to think about how to solve these kinds of normative problems. I actually think using these kinds of systems requires more complex thinking.
Nitya: One of the really interesting points that was brought up from that research is how considering the educational institution of an applicant can actually expand access rather than limit it. And this really depends on the way you're including those applicant features in your data.
The Howard example represents a situation where you incorporate the applicant’s university or college and design the algorithm to limit access if the institution is unknown. But if you’ve designed the algorithm to expand access when the institution is unknown, we get to see how applicants from more universities perform in interviews, or even in the job. This is an excellent way to expand, rather than limit, access and opportunity. AI used in this way is very productive in solving this type of problem.
Sara: I'd like to talk a little bit about real opportunities for well-designed AI and algorithms in the workforce. Kadeem, why don't you start because I know that you've done a lot of work in education and with applications that extend to upskilling or reskilling. Tell us what you're most excited about in that space.
Kadeem: Broadly, one of the advantages of AI is how it can solve computationally complex and difficult problems. I’m working on a project with a scholarship program that is very interested in selecting people that are very talented, and also selecting a diverse cohort of applicants.
To maximize the diversity of the cohort, you have to first define what that means. Is it some balance of gender, race, and geography? Is it 50/50 gender parity? Is it representation from all 50 states in the U.S.? Is it considering diversity in other terms, such as coming from rural and urban areas, diverse mental and physical health statuses, household incomes, etc?
It turns out that maximizing diversity is a computationally complex problem, particularly when you have, say, 5,000 applicants and you only have scholarships to award to 300. This isn’t an astronomically large number but to say that a human decision maker will be looking at all 5000 applicants and considering all of their characteristics in order to select the most talented and diverse cohort of 300, isn’t really possible.
My co-author and I designed an algorithm to trace out performance diversity like a frontier. So on one axis, you have performance on something like, say, a test score, and on the other axis, you have distance from the optimally diverse set of targets that you want. This allowed the scholarship program to visualize and figure out where they wanted their cohort to end up on the frontier.
We applied this algorithmic tool to their historic applicant data and revealed the program could improve diversity significantly without affecting academic performance, and actually, they could improve academic performance without compromising diversity. So this is one circumstance where we could leverage AI to visualize and measure a problem that decision makers cared about, but they just didn't have the tools to solve it.
Sara: Your example reminds me of some really important questions to pose to anyone building or purchasing AI models: who defines what optimal diversity is, who owns the ethical decisions around algorithm design, who is responsible for understanding the tradeoffs in hiring and recruiting technology, and who dictates which tradeoffs to make?
Nitya, you recently gave a very well received presentation at NationSwell around algorithm design. What would you say now is your favorite future possibility for AI and algorithms in workforce?
'Generative AI and LLMs are not meant to be sources of absolute truth. They can produce subjective judgments and generate inaccuracies or "hallucinations."'
Nitya: I really want people to think carefully about how they're using AI. Generative AI and LLMs are not meant to be sources of absolute truth. They can produce subjective judgments and generate inaccuracies or "hallucinations."
My favorite topic in this space is effectively building guardrails around these systems and the regulations that need to be in place. I think we need to upskill the creators of AI and the users of AI to think critically about what these guardrails need to be. It requires a lot of collaboration across many different types of stakeholders in order to make sure that you're engaging the right audience to answer these questions.
I encourage people to ask more questions about the tools that they're using, for data scientists to own data processing, and for more engagement and upskilling across various roles and perspectives in the complex processes of building, implementing, and evaluating AI systems.
. . . . .
Learning Collider’s panelists would like to thank our thoughtful and engaged audience and those who asked questions and commented on the topic, highlighting important points around the fallibility of human decision making and who gets a seat at the table in the design, selection, implementation, and evaluation of AI systems.