We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Episode 295: Statistics for the Boards

Episode 295: Statistics for the Boards

2024/11/18
logo of podcast Anesthesia and Critical Care Reviews and Commentary (ACCRAC) Podcast

Anesthesia and Critical Care Reviews and Commentary (ACCRAC) Podcast

AI Chapters Transcript
Chapters
This chapter introduces the five classifications of data: interval (discrete and continuous) and categorical (dichotomous, ordinal, and nominal). It explains the differences and which statistical techniques to use for each.
  • Two types of interval variables and three types of categorical variables exist.
  • Interval data has equal distance between intervals; categorical data is divided into discrete categories.
  • T-test and ANOVA are used for interval data; chi-square test is used for categorical data.

Shownotes Transcript

Ladies and gentlemen, we are now boarding Group A. Please have your boarding passes ready to scan. If your phone is cracked, old, or was chewed up by your Chihuahua travel companion, please refrain from holding up the line. Instead, go to Verizon and trade in any phone in any condition from one of their top brands for the new Samsung Galaxy S25 Plus with Galaxy AI and a watch and tab on any plan. Only on Verizon.

With new line on my plan, service plan required for watch and tab. Additional terms apply. See Verizon.com for details. At Emory University, we believe in those with the ambition to achieve, the passion to learn, and the optimism to see the possibilities ahead.

Founded on a belief that the wise heart seeks knowledge. An Emory education combines experiential learning in Atlanta and beyond with unrivaled collaboration and discovery. All to prepare you for a world that needs your leadership. Learn more at emory.edu. ♪

Hello and welcome back to ACRAC. I'm Jed Wolpaw and today we're going to tackle a difficult but highly tested topic and that is statistics. This is something that appears on in training exams and board exams and

And that, you know, really is something you can do well on. These should be kind of guaranteed questions you get right because they're not that complicated, but you do have to have that underlying understanding so that you have those key testable answers.

The other thing is that having a basic understanding of this, and believe me, I am no statistical expert, but having an understanding of the basics really will help you evaluating studies and literature as you go through your career. So I think this is worthwhile, and we're going to try to tackle it here.

All right, let's talk about the kind of data that is collected. This is tested a lot. And the question is, how do you classify data collected in an experiment? There are five classifications, and you have to know which of these classifications the data goes in in order to choose the correct statistical technique for analyzing them. That's another thing that comes up a lot on tests.

So the way to think about this is there are two kinds of interval variables and three kinds of categorical variables. So you want to divide in your head between interval and categorical, and then let's talk about each of those. So for interval data, there is an equal distance between intervals. So for example, 5, 10, 15, 20, 25, 30, right? That would be interval data.

You might think of this as, for example, the number of living children that someone has. That is either going to be 1, 2, 3, 4, etc. There is an equal distance between each of those intervals. Now, if the data is recorded as integer only, like those living children, there's no 1.3, right? It's just all integers, 1 or 2 or 3 or 4, etc., then that's called discrete data.

That's discrete interval data. If it is another form of interval data is continuous and that can have decimals. So, for example, let's say you were measuring the temperature of a patient. It could be thirty five point nine, thirty six point one, thirty six point three.

So that is still interval data because those are intervals, but it's continuous. Those intervals can be, you can define them. It could be by tenths, by hundredths, depending on how far you want to go. But still, there is an equal distance between each one. Let's say you're going to tenths, meaning you are measuring at 36.1, then 36.4.

Between each possible measurement, 36.1, 36.2, 36.3, there is an equal distance. But because it's not just integers, it is continuous and not discrete. All right, so interval data can be either discrete or continuous. It's discrete if it is divided only into integers. It is continuous if it can have decimals as well.

The good news is that both of these are going to use the same statistical techniques, so you don't have to remember different statistical techniques for each one. Now, we'll talk more about the statistical tests later, but I just want you to have all this kind of in your mind up front. So, again, we'll get into the details later, but remember that for interval data, whether it's discrete or continuous, if you have just two samples...

So before and after an intervention, you have two sets of data that you've collected. Then you are going to use a t-test. Now it would either be paired or unpaired, and we'll talk about that later, but you're going to use the t-test.

If that's assuming a normal distribution. Okay. If you have multiple samples, so you're collecting at multiple times. So maybe you're collecting after an intervention, five different heart rate measurements at different times after the intervention. So you're comparing more than just two, then you would use a NOVA analysis of variance. Okay.

If the distributions are not normal, then you're going to use non-parametric tests. Those are much less commonly tested. So for our purposes, remember that for interval data, whether it's discrete or continuous, you will use a t-test.

either paired or unpaired, a t-test for two samples, or a ANOVA test for multiple samples. All right? So that is what I want you to remember for interval data. For categorical data, what is categorical data? So categorical data is, as it sounds, that you are placing things into two or more discrete categories. There are three kinds of categorical data. There's dichotomous, so that's like mortality. There's only two options, dead or alive.

And then there's ordinal data, which has three or more categories, but it's categories that can be ranked, ordered relatively to each other. So for example, ASA class is a great example of this. It doesn't mean there has to be the same difference between the categories. So

For example, there is not necessarily the same difference between an ASA1 moving to an ASA2 and an ASA3 moving to an ASA4. So there's no rule that the individual categories have to have the same distance between them or the same change from one to the other. But clearly we know the order that those five go in. They go from 1 to 2 to 3 to 4. So ordinal can be easily placed in order.

That's a good way to remember it. Ordinal data are categories that can be placed in a logical order. And then the final is nominal. So nominal has categories but with no logical ordering, like, for example, eye color. So there are more than two eye colors. There's brown, blue, green, hazel, etc.,

But there's no obvious order to put them in. So those are nominal. So again, no known order. So three kinds of categorical data. Dichotomous, which is just two categories like dead or alive. Ordinal, which has three or more categories that can be ranked in an obvious order. And venous.

than nominal, which three or more categories that have no logical order. A common mistake is to treat ordinal data as if it were interval data because both have what seem like discrete intervals. But remember, ordinal data, while it does have a logical order, does not necessarily have the same distance or the same change between each category where the

interval data does. So that's the difference. And you have to be careful because if you label it as the wrong kind of data, you may choose the wrong statistical technique. Now, what I want you to remember, and again, we'll talk about this more, but for categorical data, you are going to use the chi-squared test almost always. And if you just remember that, you will have a good chance of getting the question correct.

So the best answer for any question that is giving you categorical data and asking what kind of statistical test you want to use is the chi-square test. And that kind of sounds similar, right? Categorical chi, both start with C, cat chi. So that's how you want to remember it.

For categorical data, any kind of categorical data that only has two samples, you can also use the Fisher's exact test. So they might trip you up a little bit and give you a two sample categorical data and then ask you which is the correct test. And maybe the only answer they don't give you chi square. They only give you Fisher's exact test.

So that would be right for multiple samples. You use chi-square. And even for only two samples, you can use chi-square. So keep that in mind. Again, we'll review those tests again later. The kinds of questions that you're going to see will often give you an experiment. They'll tell you what they were looking at, what the results were, and they'll ask you what the best test

type of test to evaluate the results are. And so often, often these questions are getting at categorical. So they want you to be able to identify categorical data. They'll give you, for example, a situation where they're giving a drug and then testing for mortality.

So dead or alive, those are two discrete categories. Or they're giving a drug and looking at post-op nausea and vomiting, yes or no. They had post-op nausea and vomiting, yes or no. That would be another categorical data. Or they're looking at was there a heart rate increase, yes or no. That's different than measuring the heart rate increase.

So if you're measuring the heart rate at multiple times and looking at was it 72, 73, 74, 75, that's interval. But if it's just was there an increase, yes or no, that's categorical.

All right, let's touch a little bit on descriptive statistics. So this is trying to describe a population. Now, normally you can't describe an entire population. So you take a sample that you hope is representative of the population you want to describe, and you make inferences about the population based on your sample.

The most common descriptive statistics for interval data are going to be about central location. So these are mean, median, and mode, for example, and about variability. So how far does it vary from that central description? Now, the idea is really an assumption usually of normality. So the normal function, also known as the Gaussian function, uses population mean and population variance,

And when that's plotted, it produces a bell-shaped curve. And most biologic data obey a relatively normal distribution. There is a theory or a theorem in statistics called the central limit theorem, which allows the assumption of normality, at least for certain purposes, even if the population is not normally distributed.

There are different statistical tests for populations and for data that is not even close to normally distributed, and we'll talk about that.

But the three most common summary statistics of central location for interval data are arithmetic mean, median, and mode. So what are those? The mean is the average. So you're going to add up all of your data numbers and then divide by the number of samples you have. So very simply, let's say that you have three...

And those three measurements are two, four and six. You're going to add those up. Two plus four plus six is going to be twelve. And then you can divide by three because there were three samples. And that will give you your mean, which is twelve divided by three, which is four. So the mean there is four. Median is the sample.

central number. So if you have nine numbers, then the fifth number, whatever it is, is the median because there are four numbers on either side.

And the mode is the most common number. So if you have a data set that is 1, 2, 3, 3, 4, 4, 4, 5, 6, 7, 7, 8, 9, then 4 is the mode because there were three 4s and nothing else had three numbers. And so the number that appears the most number of times in your data set is the mode. You will almost always see either a mean or a median.

And you want to pay attention because they're not the same thing. One way in which median is really nice and is commonly used, and this can also come up on test questions, is that if you have a sample with numbers that are all fairly close together but with a couple of really extreme outliers...

Those outliers are going to really throw off the mean because when you add them all up, so imagine that you have a data set that's 1, 2, 3, 4, and 110 outliers.

Right. Your mean is going to be way skewed by that hundred and ten. But your median will not be because it's still that middle number. It's not going to be way up near one hundred and ten. It's not pulled towards the extreme outliers the same way that the mean is. And so that's where a median is nice. And so.

You want to think about using median when you have a relatively tight group with some extreme outliers. The other thing we look at is spread or variability. So all sets of interval data will have variability unless they all have the exact same number. So a set that is just five threes would obviously not have any variability.

most sets will have variability. And the way to look at this, and I'm not going to get into the specific math of how you arrive at the variability, but just know that what you're looking at is how far each value deviates from the mean, and that is going to give you variance. And the sample variance, again, is used to estimate the population variance. If you take the square root of variance, that gets you the standard deviation.

And so if you have a population from which you take a sample and it's roughly symmetric, roughly normal variation, then one standard deviation from the mean is going to encompass 68% of your values. Two standard deviations will encompass 95% of your values and three standard deviations will encompass 99% of your values. This is highly tested. They might, for example, tell you the mean of,

tell you the standard deviation, and then say which of the following is true. And they might say something like, let's say they give you that the mean is 10 and the standard deviation is 2. And one of the answer choices might be that between 8 and 12,

68% of the variables will lie. That would be true because one standard deviation above and one standard deviation below the mean, so if the mean is 10 and the standard deviation is 2, then going up to 12 and down to 8 between one standard deviation on either side will encompass 68% of the values.

Or they might take it to the next standard deviation and say the mean is 10, the standard deviation is 2. Between 6 and 14, what percent of the values lie? And the answer there would be 95% because now they've given you two standard deviations on each side of the mean. So that's a really important concept to have understood.

All right, let's move to looking at types of research design. So there are longitudinal and cross-sectional studies. Longitudinal will obviously change over time, and cross-sectional is just looking at one certain point in time. If we look at longitudinal, that can be divided into prospective studies, which are cohort studies, or retrospective, which are case control studies.

Now, this can be a little tricky to remember because when you're doing a case control study, what you're doing is identifying an outcome and then finding one cohort that has experienced that outcome and another cohort that has not. Now, you hear me using the word cohorts. That's why I'm saying this is tricky because you're still identifying cohorts in a case control study, but it's not a cohort study. A case control study finds those cohorts and then works backwards to

So what you're doing is you're saying, all right, let me find an outcome. So for example, maybe perioperative MI. I'm going to look and find a group of people who all had an MI post-op. And then I'm going to find another group who I think is pretty similar but who didn't have an MI post-op. And then I'm going to go back and look at what happened that may have caused them to have it. Maybe specifically I'm going to look at their exposure to one specific drug.

Or I'm going to look at their exposure to different drugs. So I might, for example, say, did one group get rocuronium more often and the other group got vecuronium more often? And then I can try to draw inferences about whether one or the other of those drugs may be higher risk to cause post-op MI.

So that's a really good study for really rare outcomes because doing prospective work, which would be to say, all right, I'm going to do a cohort study. I'm going to identify a cohort of people and then follow them through, either give them a treatment or just look to see what treatments they get and then see who develops a post-op MI. Well,

Well, post-op MI is pretty rare, as is post-op death or intra-op death. So those things that are really rare, you'd have to have an enormous cohort to follow prospectively to have enough of the outcomes to actually be able to figure anything out. So for rare outcomes...

Case control retrospective studies are very useful because you identify the people who had the outcome, and then you look back and compare them to people who didn't have the outcome and see what the differences were. So that's a really key thing to remember. And again, this comes up on tests. Case control studies are good for rare outcomes. But...

It's retrospective. And so the problem with retrospective studies are that they come with the potential for a lot of confounding and bias. Prospective studies are better in the sense of eliminating some of that. And of course, randomized controlled trials are a form of prospective study that is the best. But again, the problem is if you're looking for a rare outcome, you need a huge, huge study. So again, prospective studies, you start with an input.

And you look for an output. So, for example, you might take patients who are having neurosurgery and you divide them into two groups. You give one fentanyl and you give one remifentanyl. And you look for who has the post-op myocardial infarction. So that's a prospective study. The same version but in a case control retrospective study would be to look, as I said, at who had the post-op MIs.

Find a matched group who don't, who didn't have the post-op MIs, then look back and see who got Remy and who got fentanyl, and then try to make inferences from that.

You have to remember, of course, that you don't establish causality with these trials, especially the retrospective trial. You can just find inferences. You can hypothesis generate. You can establish a association, but it does not establish causation.

Prospective studies can be further divided into a deliberate intervention where the investigator is actually deciding what intervention is going to happen versus purely observational where the investigator just observes but doesn't dictate what is to happen.

So the difference would be an investigator saying, all right, one group is going to get SIVO and one is going to get ISO, and then we'll see what happens. Or just watching to see what happens and seeing who happens to get ISO and who happens to get SIVO, that would be observational. Observational studies may show differences among groups, but whether it's due to the treatments that they were getting or differences among patients is very difficult to know if it's not a randomized trial.

Deliberate interventions can then be divided into interventions that have concurrent controls and others that have historical controls. And obviously, it's much better if you can to have concurrent controls. And a randomized control trial is an example of a longitudinal, prospective, deliberate intervention study with concurrent controls.

Randomization is really key if you want to try to eliminate the potential for bias and confounding, but you want to be careful because selection bias can still happen if the research personnel...

know which group the next person will be assigned to because that can affect how they treat that person, how they arrange things, et cetera. And so you really need to not only randomize but conceal the randomization so that nobody knows you have a computer, for example, that randomly does the assignments without the research personnel knowing if the next person is going to go into arm one or arm two.

That's important to keep in mind if you're designing a randomized blinded study. And again, you want to blind not only after entrance to the study, but during that time of randomization. But ideally, everyone would be blind. The patients, the researchers, the treating physicians, the more blinding there is, the more reliable the study. Let's talk about statistical hypotheses.

So you want to have two mutually exclusive statements about some parameter of the study population. For example, if you're looking at drug one and drug two and measuring which produces a higher creatinine after giving them, so you're looking at kidney damage from the drugs, statement one is drug one produces a higher creatinine than drug two. Statement two is that drug two produces a higher creatinine than drug one.

And the null hypothesis is that they are the same. And this can be confusing, this terminology of null hypothesis. And then the alternative hypothesis is they are not the same. So this also comes up and can be tested. The null hypothesis is null and refers to difference. There is no difference. The null hypothesis is always that there is no difference between the two drugs or the two interventions.

And then the alternative hypothesis is what most people think of as their hypothesis. So going into this trial, you're probably hypothesizing that there is a difference between these two drugs. So that's called the alternative hypothesis. And that is different from the null hypothesis. Stay with us. We'll be right back. Hey, folks, absolutely no joke. Last night we were eating our factor meals and my daughter said, how do they make it taste so good? It's like we're at a restaurant.

Even my two younger daughters, who are very picky eaters, are loving every meal we get from Factor. Some favorites are the chicken tikka masala and the chicken taco bowl, but they love everything. In addition to 40 different meal options across eight dietary preferences, every week you can also choose from smoothies, add-ons, breakfasts, and more to keep you going all day.

We added on some breakfast options and the kids love those too. The convenience is amazing. Two minutes and the food is ready to go. Honestly, I'd still eat them for the convenience even if they weren't so delicious. But the amazing thing is that it's super fast and incredibly tasty. I wouldn't have believed it until I tried it and they're super flexible. You can change your order up anytime, pause or reschedule. Eat smart with Factor. Go to factormeals.com slash factor podcast for 50% off plus free shipping.

Use code FACTORPODCAST, that's F-A-C-T-O-R-P-O-D-C-A-S-T, Factor Podcast at factormeals.com slash factorpodcast. Ladies and gentlemen, we are now boarding Group A. Please have your boarding passes ready to scan. If your phone is cracked, old, or was chewed up by your Chihuahua travel companion, please refrain from holding up the line. Instead, go to Verizon and trade in any phone in any condition from one of their top brands for the new Samsung Galaxy S25+,

With Galaxy AI and a watch and tab on any plan. Only on Verizon. With new line on my plan. Service plan required for watch and tab. Additional terms apply. See Verizon.com for details. At Emory University, we believe in those with the ambition to achieve, the passion to learn, and the optimism to see the possibilities ahead.

Founded on a belief that the wise heart seeks knowledge. An Emory education combines experiential learning in Atlanta and beyond with unrivaled collaboration and discovery. All to prepare you for a world that needs your leadership. Learn more at emory.edu. All right, and we're back. All right, let's talk about types of error. So if you incorrectly reject the null hypothesis...

Now, I hate this because it's like a double negative and it can be confusing. But remember, the null hypothesis is there is no difference. There is no difference. So if you incorrectly reject that, in other words, you say, yes, there is a difference when in fact there is not, that's a false positive. And a false positive is a way easier way to think about that because, again, we kind of understand that. False positive, meaning we said it's positive, but in fact it is not.

But the statistical way of thinking about that is wrongly rejecting the null hypothesis. That's called a type one or alpha error. And usually before you collect your data, you're going to select a value for alpha. Usually that's going to be 0.05. And what that means is that one in 20 times there will be a false positive. And we kind of accept that. The probability of a type one error depends on your chosen level of significance, like the 0.05, and whether or not there is, in fact, a difference between the two conditions.

Now, if you fail to reject a false null hypothesis, in other words, you think there's no difference between the two things, but there really is a difference, that, of course, is a false negative because now you're saying, oh, these are the same, meaning we found a negative result here. But in fact, there was a difference that you missed. That's a false negative. It's also called type 2 or beta error. And when you hear the power of a test, the power of a test is 1 minus beta.

The probability of type 2 error depends on four things. As I mentioned, the size of alpha. A small alpha makes type 2 error more likely, which is why you don't just pick the smallest possible alpha you can.

Second, the more variability in the two populations being compared, the greater the chance of a type 2 error. Third, the number of subjects. So the more subjects you have in a study makes type 2 error less likely. And finally, and most importantly, the magnitude of the difference between the two conditions. So if there is really a difference, the bigger the difference, the lower the chance of a type 2 error.

And that makes sense, of course. It's hard to think there's no difference when, in fact, there's a huge difference. It's hard to miss a huge difference. All right? So what you want to remember there is false positive, in other words, rejecting the null hypothesis when, in fact, you should not have. So saying there was a difference when, in fact, there was not, that's type 1 alpha error. And then the opposite is

accepting the null hypothesis or not rejecting it when in fact you should have rejected it is a false negative type two or beta error. P value is something we see all the time reported from studies. And this is saying, so for example, if the P value most commonly given as significant is 0.05, then we're saying that there's a 95 or greater percent chance that in fact we have

correctly rejected the null hypothesis. So, for example, if you see a chart of result data and it's telling you how they put an asterisk by the data that is in fact null,

statistically significant, meaning that the p-value is less than 0.05, and they ask you, well, what does this really mean? So let's say that they tell you that, let's use our creatinine example, that the creatinine after giving one drug was 4, and the creatinine after giving another drug was 2, and they have an asterisk, and the asterisk means that the p is less than 0.05,

And they say, well, what does that mean? What are we saying to you with this data? And so the inference you can make there is that there's a 95 or greater percent chance that there is, in fact, a difference between the creatinine rise after these two drugs, that there's a 95 or greater percent chance that drug two causes a higher creatinine than drug one.

So that's the interpretation of the p-value, and you may see that come up in questions. Now, one thing you may hear is that that's a little bit of an oversimplification, and the reason is that

traditional frequentist approach, the kind of what we're used to seeing, it's just giving us that p-value based on the data from the experiment. It doesn't look at prior knowledge. But the Bayesian inference, so you may hear about a Bayes theorem or Bayesian analysis, Bayesian inference reports not only information

as a function of the observed data in the experiment, but also looks at prior knowledge. So let me give you an example. Let's say you start with a hypothesis that drug one is greater than drug two, but it's really, or drug one has a,

better mortality effect than drug two, but it's a real long shot. It's actually really unlikely that that's true. So let's just use the example that it's a 19 to one long shot so that it's way more likely actually based on what we already know, maybe some prior studies that those two drugs are the same. So it's really unlikely that drug one is better than drug two. So let's just say it's a 19 to one long shot.

then you do the study and find there is a difference with a p-value of 0.05. So if you use Bayesian inference and you're taking into account the fact that it was unlikely beforehand, you only actually now have an 11% chance of that actual result being true. In other words, of there really being a difference. Even though your p-value is 0.05, it's not a 95% chance, it's only an 11% chance. So that's if you use Bayesian thinking. I will say that that's

really unlikely to come up to ask you to do that kind of math and reasoning on a test question. I think it's important to know for your own evaluation of literature, but it's unlikely to come up. So for test questions, if they just give you, tell you something is statistically significant and ask you what that means, I would say the answer should be that there's a 95% or greater chance that there is in fact a difference between those two things. That's the way to think about it on tests.

But remember, Bayesian inference takes into account prior knowledge. That's the difference between Bayesian inference and the frequentist approach. All right, you're going to hear about confidence intervals. And what that's saying is how likely it is that the population parameter is estimated by a sample statistic like the mean. So if a test were done 100 times and a 95% confidence interval would mean 95 of each 100 would contain the true value of the mean.

You'll hear standard error of the mean, which is just standard deviation divided by the square root of the sample size. Standard deviation is used to describe the spread of a sample values. Standard error is the precision with which the population mean is known. And remember with standard deviation...

As long as it's a normal distribution, then one standard deviation, as we talked about, will contain 68% of the values, two standard deviations on either side of the mean will contain 95% of the values, and three standard deviations on either side of the mean will contain 99% of the values. All right, let's go back to what we were discussing right up top where I said we'd come back to talk about the different statistical tests.

So the student's t-test, usually just referred to as a t-test, is for interval data. It's either going to be paired when each subject has two measurements taken on for themselves, so maybe one before an intervention and one after. So maybe you have somebody's pain score on the 1 to 10 scale before and after an intervention.

Those are two values, each from the same person. And that's interval data. And so for that interval data, you are going to use the paired t-test. Paired meaning that the subject are paired with themselves. There are two values, each from the same subject. So paired t-test.

If you have two separate groups, so it's not the same person with getting the two values, but you're looking at different people and getting different measurements of the values, then you're going to be unpaired. So two separate groups, like an experimental group and a control group, if you're comparing them, that is going to be an unpaired or two-sample t-test because you're comparing the means of two groups. So interval data...

Things like discrete interval data would be like the pain score on a 1 to 10 scale. You are going to use a student's t-test either paired if the same subject has the measurements for themselves or unpaired if you're comparing two groups. That can be tricky because of the unpaired. You're still thinking, oh, I'm pairing these two values, but it doesn't refer to that. It refers to whether it's the same person getting the two values or different people like in a control group and an intervention group.

All right. Analysis of variance, also known as ANOVA. So when there are more than one or two groups of data, so I gave the example earlier of tracking heart rate over time and collecting maybe five sets of heart rate measurements, then you want to use ANOVA. Sometimes a t-test is incorrectly used because you might compare

The heart rate's at time one to time two, and then at time two to time three, and then time three to time four. So kind of make individual pairs and use a t-test. But this actually magnifies the chance of a type one error. And so you don't want to do that. If there are multiple, more than two groups of data, you would use ANOVA.

So these tests, both the t-test and ANOVA, are what are called parametric tests. They depend on a relatively normal distribution of data. And a lot of times these test questions will tell you, they'll say, assuming a normal distribution of data.

If you have a non-normal distribution, then you need to use non-parametric tests. These are less commonly tested, but still probably worth knowing. So for non-normal distribution, you're going to use these non-parametric tests. For paired interval data, you would use the non-parametric sign test. For non-paired data, the Mann-Whitney rank sum test.

And instead of you have multiple groups where you would have used a Nova if it were a normal distribution. If it's not a normal distribution, then there's something called the Kruskal-Wallis one way analysis of variance. So, again, just keep those in mind. But usually they're going to give you the assumption of normality.

So this has all been talking about interval data. For categorical data, again, this is in many ways very simple. Most questions are going to give you data and have you figure out if it's categorical. So mortality, right, yes or no, that's categorical. So if they give you something that is categorical like that or post-op nausea and vomiting, yes or no, if they give you categorical data, then it's pretty easy because it's

If it is categorical data, whether it's dichotomous or nominal or ordinal, so any kind of categorical data, you can use the chi-square test. If it's two samples, you can use the chi-square. If it's multiple samples...

And then the other thing just that they might throw in there, remember, is that Fisher's exact test, which you can also use just for two samples, but not for multiple. But almost always these questions are going to want you to recognize that it is categorical data and therefore that you can use the chi-square test. Something that may come up with hypothesis testing is a risk ratio test.

And probably most people know this, but if the confidence intervals around that risk ratio include one, then that's not significant because one means there's no difference between the two treatments. Linear regression. So this is to determine a relationship between two variables. In other words, does Y depend on X? So you do a correlation analysis, which is going to produce a correlation coefficient, which is known as R.

This is a measure of linear covariation of x and y. It varies from negative 1 to 1. If it's 0, that means there's no linear correlation between the two variables x and y. You'll also hear a lot about r squared. That's the coefficient of determination and gives the fraction of the variation of y that is explained by variation in x.

So, for example, if r squared is 0.75, then that means 75% of the variation in y is explained by the variation in x.

So that might come up on questions as well, and it's really important to remember what does R squared mean. You'll also see it reported a lot. So an R squared is going to tell you the variation in one thing that's explained by the variation in the other. In other words, the relationship between those two variables.

It's also really important to look at the data itself because you can have the slope. So if you think about a scatterplot and then you kind of have that straight line drawn through it to kind of estimate the relationship between the variables.

Well, that can really be changed quite a lot by one outlier that may be way higher or way lower. So if you looked and you saw data that all was going one way, then this huge outlier that really pulled that slope of that line away from where it would have been, you might want to rethink how much trust you're going to put in that R-squared.

So that's important to think about as well. You also hear about multivariable linear regression. So when you have a continuous response like heart rate, you want to know if it's related to multiple variables, which are called covariates. On the other hand, multivariable logistic regression. So the first thing we talked about with continuous responses is multivariable linear regression.

If we're talking about binary data like alive or dead, now we want to use multivariable logistic regression. And this helps control for confounding, which could occur because of an apparent association between two variables being influenced by a third variable, the confounder. So, for example, let's say you're looking at the relationship between smoking and getting an M.I.,

Well, male sex, poverty, and sedentary lifestyle could be confounders because they are associated with both smoking and coronary artery disease, and they may produce, make it look like there's a relationship between smoking and coronary artery disease when in fact there is not. So multivariable logistic regression tries to control for those things.

There's another thing you may hear about called propensity score matching and analysis. So this basically provides an estimation of the treatment effect in non-randomized studies. You try to match subjects

who will have similar confounders. So a very basic example, let's say you just took 100 people and you looked to see what you gave them an intervention and you look to see what the outcome was. Well, maybe gender actually is a confounder here because there's something about being female or male that affects the results of this drug being given. So if you then propensity matched

in a very, very simple way, and you just took all the men, matched them with men, and looked at the outcomes, and took all the women together and looked at the outcomes, you're eliminating that variable of gender. Now, in reality, when propensity score matching and analysis is done, you're looking at a huge number of potential confounders and trying to match the groups so that each group has about the same number of

So that now you have similar groups you're comparing. So a randomized trial does this kind of automatically, or at least we hope it does, by randomly assigning people with potentially different confounders to each group. And so we hope that if we have a good number of subjects, what that will mean is that both groups...

are relatively the same. Propensity score matching tries to do that same thing for non-randomized trials by trying to match, pair up people so that they are similar, so that each group has similar confounders in each group, and therefore you can do better comparisons. So

Lastly, systematic reviews and meta-analyses. So I think we all are aware that these are pooling other trials, usually randomized trials.

One of the really important things to do during a systematic review and meta-analysis is to try to assess the bias in the trials that you are including. Yes, even randomized trials can have bias. The types of bias you can see are, one, selection bias. So this means there's differences between patients in each arm. Even though they were randomized, you still can have differences.

performance bias, this is when there are differences in the care between the groups that were not part of the trial. So if you're giving each group a different drug or you're giving one group a drug and one a placebo, obviously that difference you want to be there, that one is getting the drug and one is getting the placebo. But there shouldn't be any other differences in the care they're receiving. But let's say that the group getting the drug has

has a lot more physician time at the bedside. There's something about that group that for some reason the physicians spent a lot more time at the bedside of the patients who got the drug compared to the bedside of the patients who got the placebo. That would be performance bias. And the way to remember this is the performance of the trial or the performance of the physicians caring for the patients was different in the two groups, and that affects the outcome or can affect

Attrition bias is the difference in dropout between the two arms of a trial. And detection bias are differences in ascertaining and recording outcomes. So you're detecting your outcomes differently in one group or the other. And to try to detect bias, bias detection is going to look at the process of randomization.

the concealment of allocation, the use of blinding, and reporting and analysis of dropouts. So those are key things to look at during meta-analysis. You'll see, of course, reported in these studies, these meta-analyses and systematic reviews, a forest plot. The forest plot is the name for that graph where you have the diamonds from each study that show you the effect size and how it fell.

Does it favor one treatment or another, for example? And then you'll have a summary statistic, so a diamond at the bottom, which will show you how it all comes together. And so what meta-analyses allow you to do is to maybe find a signal that is statistically significant, even when individual studies may not have been large enough to have statistical significance.

Last little definition, you'll hear about efficacy and effectiveness and how they may be different. Efficacy means that something works under ideal conditions, like in an experiment. Effectiveness is about whether it works under typical circumstances in real life. All right. Hopefully you got as much out of that as I did. That was really fantastic. Let us know what you thought. Go to the website, akrak.com, where you can leave a comment. Others can learn from what you have to say.

If you are a fan of the show, you can follow us. We're on Twitter. We are on Facebook. We are on Reddit. And we are on Instagram. Instagram.

I'm at jwolpa on Twitter and we're at Akrak Podcast and you can find us on all those other platforms as well. If you are a fan of the show, please consider going to Apple Podcasts or wherever you get your podcasts and leaving a comment and a rating. It really helps others find the show. If you'd like to support the making of the show, please consider going to patreon.com slash Akrak. That's p-a-t-r-e-o-n dot com slash a-c-c-r-a-c where you can become a patron of the show even

Even if it's just a dollar or two that you pledge, it makes a big difference and we really appreciate it. You can also make donations anytime by going to paypal.me slash ACRAC or looking up Jay Wolpaw on Venmo. Thank you so much to those who have already made donations and become patrons. We really appreciate it. Thanks as always to our fantastic ACRAC crew. Sonia Amanat is our tech lead and Sophia Wu is our social media manager. William Mao is our production assistant.

Thank you so much for the great work that you do. Our original ACRAC music is by Dr. Dennis Kuo. You can check out his website at studymusicproject.com. All right. That is it for today. For the ACRAC podcast, I'm Jed Wolpaw. Thanks for listening. Remember, what you're doing out there every day is really important and valued.

What is... Dax, are you... Tracking all our cars on Carvana Value Tracker? On all our devices? Yes, Kristen, yes I am. Well, I've been looking for my phone for... In Dax's domain, we see all. So we always know what our cars are worth. All of them? All of them. Value surge! Truck's up 3.9%! That's a great offer. I know. Sell? Sell. Track your car's value with Carvana Value Tracker today.

If you're turning 65 or new to Medicare, you could get a Humana Medicare Advantage PPO plan with a $0 premium. Plus, our plans have a yearly cap on what you'll pay out of pocket. That means your covered medical costs, including all doctor visits and emergency care, will never go above a maximum out-of-pocket amount that you know beforehand. Learn more at GetHumana.com. Humana, a more human way to health care. Humana is a Medicare Advantage PPO organization with a Medicare contract. Enrollment in any Humana plan depends on contract renewal.