We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Racist ratings linger in five-star systems — a thumbs up could fix that

2025/2/19

There are some things you wouldn't mind being stuck with, like a large unexpected inheritance. An always-on-the-verge-of-death phone that has to be plugged in just right so it charges is not one of those things. Switch to Verizon and we'll pay off your old phone. Up to $800 via prepaid MasterCard so you can get a new one. Just trade in any phone from our top brands on any unlimited plan.

Nature.

Welcome back to The Nature Podcast. This week, how a ratings change could reduce racial discrimination and the institutions with the most retractions. I'm Sharmini Bandel. And I'm Nick Petrichow. Before we get to the show proper, I just wanted to welcome back the one and only Sharmini Bandel. Sharmini! I have returned. I've graced you with my presence once again.

I was a bit worried that you might not be choosing enough obscure evolutionary topics for the podcast, so I thought I'd better just come and, you know, get a handle on that. Well, I'm looking forward to doing more obscure evolutionary topics. They are your staple. But we've got many other things going on in the show this week. For example, first up, there's a paper on how changing from a conventional five-star rating scale to a thumbs-up or thumbs-down rating

could help reduce racial discrimination. We actually see this equalising of what workers were receiving regardless of demographics, and now workers were getting, on average, 95% upvotes regardless of whether they were likely to be perceived as white versus non-white. This is Tristan Botello, one of the authors of the New Nature paper.

Using data from 70,000 ratings, they showed both that racial discrimination existed on an online platform when it used five-star ratings, and also that once the rating system changed to thumbs up or thumbs down or upvote-downvote, then that discrimination disappeared. People would receive equivalent ratings regardless of race.

The platform connected customers to qualified workers who would perform various home service jobs, like repairing a dishwasher or fixing a sink.

Afterwards, people would have the opportunity to rate the workers. And when Tristan had been speaking to people who run the platform as a potential data source for his research, he heard something that caught his attention. The CEO said, oh, one change you might be interested in was exactly this. We used to use five stars. Then we transitioned to upvoting and downvoting.

So that piqued my interest and I asked, okay, tell me more about that. And you might hope that these kind of big companies making all these changes have this like very data-driven, like we thought about this for months, but he was actually just using a popular online platform one night, saw that they were doing upvoting and downvoting, and just the next day at their meeting said to the CTO, we should change it. Then it just happened. This gave Tristan and his collaborators a semi-natural experiment which they could investigate.

They found when the platform was using a 5 star rating, there was a difference in how people were rated based on their perceived race. For example, black workers were rated on average 4.69 stars, whereas white workers were rated on average 4.79 stars.

Overall, workers who were categorised as non-white by Tristan and his colleagues had a rating of 4.72 versus the 4.79 that white workers had. Although this is a small difference, what's interesting is that many platforms actually use these ratings for many different purposes. One could be to allocate workers to new jobs, prioritising highly rated workers. Other platforms, such as the one we study, they also use the rating to figure out how much money the worker is going to get paid.

In this case, that difference in what they were getting paid meant that the workers categorised as non-white received about 9 cents less per dollar compared to the white workers. This disparity between ratings is not necessarily surprising. Past studies have shown that people rate differently on the basis of race.

For example, a study by a think tank showed that when people evaluated the work of a fictitious attorney, they were more critical of their writing when a photo of a black associate was shown versus a white associate. But five-star rating systems are everywhere. Think about if you've ever ordered a taxi for an app or got a food delivery, which means that this discrimination could also be everywhere.

So a simple change to the ratings, moving to a thumbs up or down, could have a big effect. We're bullish on the idea that it could work well, but we're even more motivated by the fact that this is actually a pretty simple thing for firms to do. I think what we're finding generalizes in a very kind of logical way.

And more importantly, the cost of testing this out is very, very low. So many different organizations of different types are using evaluation systems that they could easily kind of roll out even a test period to a subset of customers where they could see whether or not the 0-1 binary system works for them. But Tristan and his colleagues were interested to figure out why this simple change to the thumbs up or down could have such an impact on how people were rated.

So they did some follow-up studies. They thought that the five-star scale may be allowing people's subtle preferences to come into play in a way that a simple thumbs up or thumbs down wouldn't.

They figured that that scale makes the person focus just on whether the experience was good or bad. We were even able to run an experiment where we gave individuals a five-star scale but told them to really focus on good versus bad. And we found that when we really focused the individual on this good versus bad distinction, we really saw the differences go away.

Through this and other follow-up studies, Tristan and the team determined that essentially if people had the five-star scale, they could hesitate over giving someone the highest rating. They may not necessarily consider themselves to hold racist beliefs, but still hold biases that could hold them back from giving the perfect rating. The thumbs up or down focused the raters just on the good versus bad distinction.

The effect was also biggest on those people that believe that black people have little employment discrimination in the US. In other words, those that held what are known as modern racist beliefs appeared to be driving the difference in ratings in the five-star system. But in the thumbs-up or thumbs-down approach, they rated people equally regardless of race.

Now, it's worth saying that this change to the rating system didn't change anyone's beliefs or tackle racism in any tangible sense, but it did prevent discrimination through these ratings, at least on this platform. And for Lauren Rivera, a researcher of workplace inequalities and who's been writing a News & Views article on this work...

This study is a breakthrough. This is a change that is super fast and super cheap to implement. And so it's really efficient. It does not address the underlying issue of sources of structural racism or racism in our society, but it is one way to begin to enhance opportunity.

She also thought that this is a change that could be implemented even in the context of the current US administration, which has shown hostility towards work on diversity, equity and inclusion. One of the things that's so beautiful about this type of change is that this is a change that is possible.

on its face very neutral right we're just changing performance ratings it doesn't require any training it doesn't require any messaging necessarily that at the present political moment in certain places would be challenging Lauren did have some questions about the work though

For example, in this case, people are being rated on things like whether or not they fixed a sink, something that is either done or not. So a thumbs up could simply be that the job was completed.

But for other areas where rating is a bit more complex, Lauren wonders what would happen there. So I think about it like online teletherapy providers for mental or physical health or, you know, online tutoring services, online educational services. And I do have a question as to what would happen here because performance isn't as clear cut.

Tristan suggests that maybe these ratings could be done in a granular way, allowing people to rate thumbs up or down on different aspects of the work, which could possibly help address this. But Lauren also wondered what would happen to the workers' motivations. If they're just getting a thumbs up or thumbs down, would they just phone it in, giving the bare minimum to receive the positive rating?

Tristan suspects that this may not be the case. Oftentimes, I think workers understand this idea that the five is expected. And if anything, they would see a four or three, a two or one as quite bad.

But I think the upvote, downvote would actually make that risk more salient to them. Like if I don't do a really good job, I could get a zero, I could get a downvote and that would be catastrophic. So I actually think it could motivate workers to even work harder, but it's definitely outside of the data. And I think one thing that we would have to collect more data to understand, but I can say at least in the field, the firm hasn't experienced that whatsoever.

This work potentially identifies an easy change that firms could make to reduce racial discrimination of their workers, at least on these online platforms, even if it isn't a panacea for racism. But this is only one part of the process. So in the future, Tristan is interested to look at the start of the process, where people are hired in the first place.

As here, there could be more pernicious effects. For example, people could cancel a job based on the race of the worker. So I would love to take that step back at a similar large scale and understand, OK, so how do we think about designing even that initial information that affects that first evaluation stage, which is, did the customer actually choose to go through with the job or not?

That was Tristan Botello from Yale University in the US. You also heard from Loan Rivera from the Kellogg School of Management, also in the US. For more on that story, check out the show notes for a link to Tristan's paper and Loan's News and Views article. Coming up, a nature investigation has uncovered where the most retractions come from. Right now though, it's the research highlights with Dan Fox.

If you're ever serving dinner for a cockatoo, new research can give you a useful taste tip. Try dunking their food in soy yogurt. In 2022, researchers saw three goffin's cockatoos soaking dry biscuits in water before eating them.

Now, the researchers have followed up that observation with more experiments, offering 18 of these birds a bowl of cooked food, including noodles, carrots and cauliflower, along with water and blueberry-flavored and plain soy yogurt.

Over 14 sessions, 9 out of 18 cockatoos dunked their food in yogurt, showing a clear preference for the blueberry-flavored yogurt over the plain and for the combination of yogurt and noodles over other pairings. None of the cockatoos dipped their food in water.

Japanese macaques are the only other non-human animals reported to flavour their food, but their behaviour hasn't been studied with controlled experiments, making this study the first experimental evidence for food flavouring in animals, according to the authors. You can tuck into that research in Current Biology.

Historical documents have revealed that in the 16th century, the people of Transylvania lived through climate extremes that their Western European neighbours avoided. The period known as the Little Ice Age was a time of climatic cooling that caused crop failures and famine across much of Europe. But evidence is patchy, so scientists have had to cobble together the story of how climate affected society from piecemeal records.

Now, a team searching through Transylvanian chronicles have found evidence of some 40 summers during the 16th century that were noted as being particularly warm for the region, with unusual bouts of heat and droughts between 1527 and 1544, a time when much of Western Europe was experiencing less extreme conditions.

Later in the century, Transylvania also experienced abnormally heavy rainfall resulting in the devastating floods of the 1590s. This research could help inform society how to respond to today's climate change. You don't have to search through historical records to find that research. It's in Frontiers in Climate.

Next up, Noah Baker has been catching up with Features Editor Richard Van Norden about one of those topics that never fails to perk up our ears here at The Nature Podcast, retractions.

Retractions are a key and important part of the world of academic publishing. Papers can be found after publication to contain mistakes, oversights or administrative errors, but they can also be shown to be fraudulent, false or otherwise linked to misconduct. And no matter what the reason is, a robust procedure to retract those papers is vital. And over time, the rates of retractions around the world have been growing. So-called paper mills are churning out more and more sham papers and dishonest predatory journals yet further distort that picture.

Now in a new analysis, our very own Richard Van Norden has been looking at some of the latest data on retractions to see what more he can glean, in particular identifying retraction hotspots. Richard, thank you so much for joining us. Hi Noah, pleasure to be here. So you've written this feature which is essentially an analysis of a bunch of new data. I suppose to start with, can you tell me where that data's come from and what you're aiming to do with it? Yeah, so in the last few years a number of private firms have been launching research integrity software

And that software is trying to warn publishers and institutions that,

of when a paper comes in, might it be a bit dodgy? And the way they do that is, well, look at the characteristics of who wrote that paper or what that paper is citing, or perhaps even some content in the paper. And a key part of that is data sets of retractions. Now, as a result of all that, there are now quite a lot of data sets of retracted papers, which are bigger even than the public data sets like that of Retraction Watch that some of our listeners might be aware of. So I contacted...

these firms and said, well, can they help me do something which I've always wondered about, which is, okay, which institutions, which universities have the highest retraction rates? Because that would indicate perhaps that

the environment at that research institution is not conducive to reliable science. And maybe that's something that could be highlighted. But it turns out this is an incredibly difficult thing to do. It surprised me to read when I first read your feature, that it's difficult to just decide on what we mean by retractions here, even in the first place. I thought that seemed like a very simple metric, you know, there's a paper that got retracted. But how does one quantify retractions in these datasets? Well, the problem is the retraction state is really messy.

which is really the fault of publishers for not being clear. But if you look at various online sources of data, like PubMed, like Crossref, these are all massive repositories of data about papers. They actually differ from each other in what they say is retracted.

But the really big messiness also when it comes to institutions or universities is how are you sure that you've got all the author affiliations mapped to the right institutions so that you can count how many retractions they've got? And

In what I was looking at from these firms, there were quite a lot of errors, which I think is inevitable in a large data set, that they broadly agreed with each other, which is encouraging. Okay, so we're going to get into a little bit more about how you did that analysis. But I think the first thing that I wanted to hear about is where are your hotspots? Essentially, I don't think anyone's going to be surprised to hear it's hospitals in China. And that's well known to be because hospitals in China were encouraging physicians to put out research papers. But of

a lot of them were very busy doing their medical work and so it's known that they turned to paper mills in order to get the papers they needed and then those got retracted as Sleuths pointed out the problems and

And it's really interesting that hospitals in China have far greater retraction rates than universities in China. And Jinning First People's Hospital is the one that came up as the highest retraction rate of about 5%. So that 5% of all the articles that I've ever put out have now been retracted, which is extraordinarily high because the global average is like 0.1%. So it's enormous. But it's not just China. Also coming up in our lists were places in India,

in Ethiopia, in Pakistan, Saudi Arabia. Saudi Arabia in particular, when you look at the last four or five years, universities in Saudi Arabia are coming sort of top of the list of institutes with the most retractions. King Saud University, Taif University, King Abdullah Ziz University, they're all coming up with hundreds of retractions.

Almost all of these retractions that I'm talking about here are misconduct related. So yes, there can be retractions that are a result of honest error, but Retraction Watch's work has shown that they're a great minority compared to the retractions that are related to misconduct. But I think what I found was that even when you're talking about a country like China, Saudi Arabia, India, I thought what was really interesting is that there's a scatter across the institutions of

So you can have some institutions in those places that have very low retraction rates and some that have very high retraction rates. And I think that's really interesting because it's suggesting there's something about the environment in the institutes with the high retraction rates that might be different

from those with the low retraction rates. And I suppose the other big question to ask here is it could be the environment of these institutions, but also we've seen many examples of individual researchers having extraordinarily high retraction rates, which could potentially, if you have one researcher at one institution, could tarnish the whole institution. To what extent is this individuals and to what extent is this institutions? Yeah, I did check that. And in most cases, the retractions are spread across a number of individuals.

So it seems like something's going on at the institution. Now, when I say a number, it could still be, you know, 20 or 30. It's not always the case. So at Gaza University in Pakistan,

It was something like four individuals were responsible for a quarter of all their retractions. So sometimes it was more connected to individuals and sometimes not. But in general, it was spread over quite a number of different author names. And we should also add here that these numbers of retractions can also be thought of kind of like a proxy for how much attention is put onto these institutions.

And this is also true of particular fields. So if you were to look at face value at a lot of the data, you might think that anesthesiology, for example, was extraordinarily rife with misconduct.

But in fact, that is to some extent an artifact of the fact that this particular community has been paying a really great deal of attention to this and has been trying to make a difference here. Exactly. Yeah. So the journal editors in anesthesiology have really worked hard to clean up their fields. Yeah. And as a result, many of the leading individuals with numbers of retractions are all anesthesiologists. Although it turns out it's apparently quite...

quite easy to make up data in that field. I don't know. But yeah, no, that's a perfect example. But I mean, what I also think is that as the retractions data grows, this will become more and more useful information. And this is a kind of starter way to do that, to look at the institutions. I've heard people argue against this and say, yeah, well, if we start doing league tables of institutions, then this will discourage institutions from getting work retracted.

But I think a key point there is that it's not usually the universities themselves that are doing the retracting. It's the journals and the publishers that are doing the retracting. So the institutions are not really in control of the process. Absolutely. And you mentioned that various of the sleuths that you've been speaking to are under the impression that there should be

vastly more retractions made than currently are made. Is that a trend you expect to see going forward based on this analysis and based on your reporting? Yeah, I mean, it depends on publishers and journals' willingness to retract. And so that depends on how much heat sleuths and journalists can put on the publishers to get them to retract. Because there are hundreds of thousands of papers over which there are serious questions. Many of them, they look like paper mills.

they look like fakes. But nothing is going to be done about it unless publishers make decisions. And maybe not all of these are. Essentially, it's got to be the journal editor's judgment that they no longer stand by the reliability of the paper. And so that is quite a lot of leeway because doubts that you can have, and especially the authors don't get back to you and don't allow any of your concerns, that is enough grounds for retraction. But it's a lot of work if you've got to do that for every single paper

I need to be very careful because by retracting, you're essentially saying these authors have produced unworthy work and you're essentially degrading the reputation of the authors. So there's potentially legal concerns there. And that's why retractions take so long for journals to figure out. Absolutely. Richard, thank you so much. I'm sure there's going to be thousands more of these to follow in the next day, let alone the next couple of years, but we will be watching it very closely at Nature.

Noah Baker speaking to nature's Richard Van Norden there. For more on that story, check out the link to Richard's feature in the show notes. Finally on the show, it's time for the briefing chat, where we discuss a couple of articles that have been highlighted in the Nature Briefing. Sharmini, what have you been reading about this week? Well, there's been a couple of articles in Nature about more impacts or ongoing impacts of the Trump administration on scientists in the US. So one of them that I've been reading about is the Environmental Protection Agency.

So that's regulating pollution and it's about protecting human health and the environment. And it's been one of those agencies that Republican politicians have been worried about saying that it hampers the economy, the US, by having excessively strict regulations against pollution and things like that. And this was also a part of the government that Trump targeted in his previous administration. But what's he doing with it this time around? So this is all part of...

bigger plans to downsize and defund government agencies in general, trying to make, they say, a more efficient and effective federal government.

So one of the things is that there are staff members who are still in their probationary period. So that's usually they have only just started their job within one year and therefore they can be fired at any time. So there was an email that went around last week to a thousand employees involved with regulating air pollution and notified that they might be fired, but not everyone has actually heard yet. So some researchers who've spoken to Nature have said they're

quite scared. One of them said, I'm scared to open my computer every morning. No, I can certainly see how that would generate anxiety for people. And one of the Trump administration's actions has been to target research that includes DEI. These are programs that deal with diversity, equity and inclusion. Inclusion

Is this part of that strategy? So this is general downsizing. So for example, for every four employees that leave, the agency heads can only hire one employee to replace them. So there's this massive downsizing. Then there's also, yes, the cancellation of all these DEI programs. Staff involved in those at the EPA have been placed on administrative leave.

So that's programs, for example, environmental justice, protecting communities that are vulnerable to pollution and climate change. That's definitely been a big impact at the EPA as well. And then the other article that I read is about the DEI purge.

at NASA. And it's particularly significant there because NASA has a long history of working towards inclusivity. There was a lot going on there that was related to DEI that's now being rolled back. And Nature reporters spoke to researchers at NASA and other people outside NASA working in the sort of space field who feel quite betrayed by these changes after so long of working towards something that a lot of it's being

cancelled, given up. People are even talking about in the NASA buildings, people covering up posters on the wall like pride flags or pictures celebrating women in science, those kind of things being taken down. And employees aren't supposed to put pronouns in their email signatures anymore, for example. It certainly sounds like this is quite a big change then to the agency. So what have been the reactions of the people working there?

So similarly, this has provoked a lot of emotional reactions from the people who Nature's spoken to. One person said it feels like a betrayal. It's inefficient, it's wasteful, and it's also just messed up. Another anonymous NASA scientist said, I get a sinking feeling in my stomach when I have to check my work email. Every time I reload it, it's like, oh God, will there be some new heinous missive in there?

But NASA employees there are also talking about basically buckling down and sort of getting on with their work. Regardless, a senior scientist nature spoke to said, we believe in the mission and we know that our work is important. We know that it matters for the nation. So to some extent, they're just getting on with it.

Well, this is a story that nature will be keeping a keen eye on as we go forward. And speaking of things nature is keeping an eye on, I've got a related story that I was reading in Nature. It's also to do with the Trump administration because last Thursday, Robert F. Kennedy Jr. has become a powerful force in US science because he's become the Secretary for Health and Human Services there, which has won.

broad oversight over things like the NIH, the National Institutes of Health, which is the biggest biomedical funder in the world, along with the CDC, the Centers for Disease Control and Prevention, and the Food and Drug Administration, the FDA. And so what's he aiming to do with this new position?

Well, the first thing to say about RFK Jr. is he's a vaccine sceptic. And so this has caused a lot of consternation from researchers because he now has this broad oversight, as I mentioned.

His main aims are to make America healthy again. That's his pledge. And he's promised a focus on things like food and pollution. However, that is at the cost of some other things. He wants to downsize the work on infectious diseases, for example. Is that just a matter of limited money and where they want to put the spending? Or has he sort of...

So as you mentioned, this is part of a broader effort to downsize the government. So Trump's close advisor, Elon Musk, has pledged to cut the federal budget by at least $2 trillion. So part of that is this, but also it's part of what RFK Jr. sees as priorities. And so I think it's important to look at this as a way of saying,

In his eyes, he thinks that chronic conditions such as obesity and cancer, asthma and some other things have received less attention than infectious diseases, but they are responsible for more healthcare costs.

However, data from the NIH showed that actually cancer alone receives more federal funding than all infectious diseases combined. And many researchers have pointed out that infectious diseases and chronic conditions are two parts of the same thing. This focus on one rather than the other creates a false dichotomy.

Because, for example, many infectious diseases can lead to chronic conditions. Think of things like human papillomavirus, HPV, causing cervical cancer and things like that. So researchers are sort of concerned that this isn't the best news for human health in America. Yeah, I mean, there is concern because of some of the claims the RFK Jr. has made about health. I've mentioned that he was a vaccine sceptic. He has...

spread misinformation about vaccines, such as the widely debunked myth that vaccines cause autism. And researchers are concerned that he doesn't have a good track record of scientific inquiry. One researcher who spoke to Nature for this article said, it's hard to know what he would do. And one thing that is quite concerning, especially with him wanting to downsize work on infectious diseases, is the continuing...

outbreak of bird flu, which so far has made at least 68 people in the United States ill since the start of 2024. And obviously researchers are keeping a keen eye on in case it becomes more widespread and could potentially cause disease more broadly around the world. So this is a new appointment. So I guess we're just waiting to see what kind of actual impacts this is going to have. Yes. And Kennedy has been quite high

hard to get a handle on because he has made some claims such as false claims about vaccines, but then in a Senate hearing, he said that he supports vaccines. So it's just a bit hard to know exactly what he will do. And as I mentioned, some of the things that he said just aren't really backed by the science. For example, he said in that same hearing that scientists, quote, know that obesity is caused by, quote, an environmental toxin.

And he asked why researchers haven't dedicated themselves to finding and eliminating it. And one obesity researcher who spoke to Nature pointed out that obesity is a complex condition driven by environment, genetics, development and behavior. And this assertion by him reflects limited knowledge of that complexity. So that'll be another...

ongoing story that's going to continue to impact scientists. So I'm sure we'll be hearing more about it. Well, thanks for that, Nick. And listeners, if you want to read more about these stories and find out where you can sign up to the Nature Briefing and get more stories like this, you can check the show notes and we'll put some links there. That's all for this week. We'll be back next week with more news from the world of science. In the meantime, if you want to keep in touch, you can follow us on Blue Sky or X, or you can even send us an email to podcast at nature.com.

I'm Nick Patricio. And I'm Sharmini Bundel. Thanks for listening.

With new line on my plan. Additional terms apply for trade-in and pay off your phone offer. See Verizon.com for details.

We'll see you next time.

And you can stop worrying about what your kids get their hands on. Start shopping at thrivemarket.com slash podcast for 30% off your first order and a free gift.

Racist ratings linger in five-star systems — a thumbs up could fix that 31:45 Share

Nature Podcast

Shownotes Transcript

Racist ratings linger in five-star systems — a thumbs up could fix that