Putting AI to use in mortgage lending decisions could lead to discrimination against Black applicants, according to new research. But researchers say there may be a surprisingly simple solution to mitigate this potential bias.
In an experiment using leading commercial large language models (LLMs) to evaluate loan application data, Lehigh researchers found that LLMs consistently recommended denying more loans and charging higher interest rates to Black applicants compared to otherwise identical white applicants.
This discovery is particularly alarming given the historical and ongoing racial disparities in homeownership.
“This finding suggests that LLMs are learning from the data they are trained on, which includes a history of racial disparities in mortgage lending, and potentially incorporating triggers for racial bias from other contexts,” said Donald Bowen III, assistant professor of finance in the College of Business and one of the authors of the study.
The study used real mortgage application data, drawn from a sample of 1,000 loan applications included in the 2022 Home Mortgage Disclosure Act (HMDA) dataset, to create 6,000 experimental loan applications. In the experiment, researchers manipulated race and credit score variables to determine their effects.
The results were stark: Black applicants consistently faced higher barriers to homeownership, even when their financial profiles were identical to white applicants.
Based on the experimental results using OpenAI’s GPT-4 Turbo LLM, Black applicants would, on average, need credit scores approximately 120 points higher than white applicants to receive the same approval rate, and about 30 points higher to receive the same interest rate.
Models also exhibited bias against Hispanic applicants, generally to a lesser extent than against Black applicants.
The bias against minority applicants was highest for “riskier” applications that had a low credit score, high debt-to-income ratio, or high loan-to-value ratio.
Researchers also tested other LLMs, including OpenAI’s GPT 3.5 Turbo (2023 and 2024) and GPT 4, as well as Anthropic’s Claude 3 Sonnet and Opus, and Meta’s Llama 3-8B and 3-70B.
Bias was generally consistent across the spectrum of LLMs in regard to interest rate recommendations. However, researchers found high variation in approval rates produced by different models.
ChatGPT 3.5 Turbo was found to show the highest discrimination, while ChatGPT 4 (2023) exhibited virtually none.
“It’s somewhat surprising to see racial bias, given the efforts LLM creators take to reduce bias overall combined with the large amount of regulations relating to fair lending,” Bowen said, noting that the training data of these models almost certainly includes federal regulations prohibiting the use of race as a factor in making lending decisions.
But even more surprising was the ability to remove persistent bias in results with a simple solution—instructing the LLM to use no bias in making decisions.
When the LLMs were instructed to ignore race in their decision-making, the racial bias virtually disappeared.
“It didn’t partly reduce the bias, or overcorrect. It almost exactly undid it,” Bowen said.
AI can be programmed to avoid bias
This finding suggests that while AI models may inherit biases from the data they are trained on, they can also be reprogrammed to be more equitable.
Testing different mitigation prompts, researchers found that the simple, specific command to “use no bias in making these decisions” was more effective than a legalistic instruction to “[m]ake sure you comply with the Fair Lending Act and ECOA in making this decision.”
“With the simple mitigation adjustment, approval decisions are indistinguishable between Black and white applicants across the credit spectrum. For interest rates, the bias is reduced as well, most so for the lowest credit score applicants,” Bowen said.
The findings come at a crucial time, as the financial industry and other sectors are ramping up efforts to make their operations more efficient using AI. According to Bowen, while it’s unlikely any major firms are making loan decisions based purely on recommendations from an LLM, AI is playing a large and growing role in many facets of financial services.
He discussed some of the already-widespread uses, including for investment advising and customer service, in an episode of the ilLUminate Podcast from the College of Business.
“Documenting and understanding biases is crucial for the development of fair and effective AI tools in financial decision-making, and ultimately to ensuring they do not reinforce existing inequalities,” Bowen said. “Thus, it is critical for lenders and regulators to develop best practices to proactively assess the fairness of LLMs and evaluate methods to mitigate biases.”
The study is currently available as a working paper. Authors include Lehigh researchers McKay Price, professor and chair of finance, and Ke Yang, associate professor of finance; and Luke Stein, assistant professor of finance at Babson College.
Story by Dan Armstrong