So a couple of weeks ago I announced that our new group, the UQ Skeptics, were holding our first event: a double-blind beer taste test. The event was last Friday, 54 people participated in the taste test (that's all our budget could handle), and the results are in! Before looking at the results, however, here is some background info (skip this section if you just wanna see the pictures of the results):
Aims: The main aim of the evening was to test how well people could identify beers on taste alone. We were also interested in how people would rate how much they like different beers when they don't know what they're drinking. Experimental setup: We chose six different brands of beers: Tooheys New, Becks, Hahn Super Dry, Corona, James Squire, and XXXX Gold. Every participant was given a score card where they first rated their confidence in their ability to recognise the beers on taste alone on a scale of 1 - 5. Participants then tasted the beers one at a time, and were given approximately 100mL of each beer. Beers were coded a, b, c, d, e, and f, and the order in which participants were required to taste the beers varied from participant to participant in a pre-specified way to ensure any order effects were controlled for. After each taste, each participant was required to give the beer an "enjoyment" rating out of 10, and then tell us which beer they thought they were drinking. So, for example, if a participant though beer "a" was Tooheys New, they would place an "a" in a box below Tooheys New. Participants repeated this process until they had tasted all six beers. Double-blinding procedure: No one, including the organisers, knew which codes corresponded to which beers. This double-blinding was achieved via a two-step procedure. First, one experimenter was given an order in which he must deliver the beers to a second experimenter. For example, they might have had to get Becks first, then Corona, then James Squire etc etc. Only this one experimenter knew the order that the beers came out. After delivering these beers, this experimenter left the test area. The second step involved a second experimenter. The second experimenter took the six beers in the order they were delivered, and then coded them using a new, unique order of codes. So, for example, the first beer might have been given the code "f", the second beer might have been given a code "d" etc etc. Therefore, in order for anyone to know which beers were which, they would have to know both the order in which the beers were delivered AND the order in which they were coded. Data entry: Each person's responses were entered into a pre-made spreadsheet BEFORE the beer codes were revealed. This actually makes the test a triple-blinded test: not even the data entry could be affected by knowledge of the beers. Unblinding procedure: The two experiments who knew each of the coding orders then revealed the order in which the beers were delivered, and the order in which they were coded. This identity information was entered into the results spreadsheet, which then automatically tallied all the results. RESULTS! Figure information is given below each figure. http://trog.qgl.org/up/1104/Skeptics-BeerTastingCorrelation-v02.png How many beers can people identify correctly? The orange bar on the left shows us that, on average, the 53 people who participated correctly identified two out of six beers. This average is better than what we would expect if people were just guessing (which would be one out of six). Error bars represent standard error. Did people’s confidence in their ability correspond to their actual ability? Yes! But only to a small degree. The circles show participants’ confidence prior to the tasting (plotted on the horizontal axis), and how many beers they correctly identified (plotted on the vertical axis). The colours and the sizes of the dots increases if more people had that confidence/identification combination. The dotted blue line represents the correlation between confidence and accuracy. This positive correlation (the upwards slope) tells us that the more confident people were in their ability, the more beers they correctly identified. However, because the slope is fairly shallow, the predictiveness of people’s confidence isn’t great, and this relationship was only marginally statistically significant (r = 0.27, p = 0.052)! http://trog.qgl.org/up/1104/Skeptics-BeerTasting-SpecificIDs.png Which beers could be identified? The orange bars are plotted on the left vertical axis. These bars tell us what percentage of our 53 participants could recognise each beer. James Squire was recognised by almost 75% of people, and was statistically significantly more recognisable than any other beer (p’s < .001)! There were less people who could recognise Becks compared with XXXX Gold (this difference was also statistically significant, p = 0.044). How much did people like each beer when they didn’t know what they were drinking? The red stars are plotted on the right vertical axis. They tell us the average “enjoyment” rating out of ten for each beer. It seems like people enjoyed Corona and James Squire over anything else. Overall, it looks like people were pretty hesitant to say they really liked or really disliked a beer when they didn’t know what the beer was. Half the height of each star approximates the standard error. http://trog.qgl.org/up/1104/Skeptics-BeerTasting-Progression.png Were people just getting drunk and liking beers more and more? They sure were! This plot shows the average “enjoyment” of the beers tasted in the order they were tasted. Because of the way we balanced the order in which people would taste each beer, these enjoyment ratings represent the average of all beers just according to the time the beer was tasted. The blue dotted line represents the correlation of enjoyment over time. This correlation was significant, and tells us that people tended to give higher enjoyment ratings to beers they drank later in the evening (p = 0.003, partial eta squared = 0.16)! However, this does not affect any of the scores for the specific beers because we balanced the order that everyone tasted the beers! But what if people were just getting drunk and getting wilder with their responses? Probably not - we tested to see if enjoyment scores began to vary more and more as the night when on (this would be shown by the standard deviation error bars getting bigger). But this was not the case! So, this upwards increase might reflect the fact that people just enjoy beer more as they drink more of it! Hit up our Facebook group if you want to hear about more stuff happening soon, become a fan of our group to get your dose of daily skepticism, and you can contact the group via uqskeptics@gmail.com ! If you'd like the UQ Skeptics to host a similar event at your function, let us know. Double-blind testing is a fun way to test how well people can identify different things, like wine types, spirit types, mp3 audio quality, smells, and so much more! |
Fixed the images so they should actually be readable now.
|
Awesome work and somewhat interesting. Statistics that any true Aussie beer drinker could enjoy!
edit: Now, that would be a grand 1st year assignment for Intro to stats. If only the Uni would give approval. |
Out of curiosity, what was your process for either selecting participants or understanding their prior background/demographics? If there was, was there any correlation between backgrounds/demographics and the ability to correctly identify the beers? For instance I only drink once every few months so the chances of me being able to distinguish beers is highly unlikely, but someone who is a regular drinker can most likely at least figure out their favourite/least favourite? Or was the public nature of the experiment just too difficult to attempt to get legitimate backgrounds from participants?
|
not really impressed with your statistical resolution there bhb, looks like you just put down a Duplo set and picked up a stats package ;P
|
Out of curiosity, what was your process for either selecting participants or understanding their prior background/demographics? If there was, was there any correlation between backgrounds/demographics and the ability to correctly identify the beers? For instance I only drink once every few months so the chances of me being able to distinguish beers is highly unlikely, but someone who is a regular drinker can most likely at least figure out their favourite/least favourite? Or was the public nature of the experiment just too difficult to attempt to get legitimate backgrounds from participants? The only info we got about our participants was age and gender, and not everyone filled out that info. The mean age was 26, the youngest participant was 18, and the oldest 46. This is a nice range of ages, but I didn't bother running a correlation to see if there was a relationship with age and identification accuracy because only 35 people reported age. As for gender, there were 26 males, 21 females, and 6 people who didn't report their gender. I guess it might be interesting to see if ability/confidence/enjoyment correlates with gender? |
edit: Now, that would be a grand 1st year assignment for Intro to stats. If only the Uni would give approval. I wouldn't trust first years with this sort of stuff :) I actually think it could make a fantastic Honours thesis. There are so many different questions to explore that would all be really interesting. |
You havent included Co2 rise in those graphs.
Did everyone rinse their mouth out before tasting each beer ? Nobody swallowed the beer ? coz you should factor in Alc effecting the ability to judge even at a small level. Id like to see a Water tasting test. Tap Vs Franklins no frills Vs etc |
Did everyone rinse their mouth out before tasting each beer ? We did... that's the last graph! There were definitely some practical limitations. For example, we didn't want to tell people not to drink anything before or in between tasting (tasting went for over an hour so people woulda got bored and left). So some people might have had a few beers beforehand, and others would taste one of the beers, go have some more of their own beers, then go back and taste the next beer. While this may have affected the results, it probably made the test far more representative of how we normally drink beer. |
So, let me get this straight. Yep! How could you not identify a Corona? Well there's a few different possibilities. In my opinion, people struggled to recognise it because it's simply not that recognisable. However, there are some other possibilities. For example, lots of people we tested here might have had little experience with Corona, or maybe Corona actually tastes a lot like at least one of the other beers. It's interesting to note, though, that even though it doesn't seem to be distinguishable from XXXX Gold, Tooheys New, or Super Dry, people still enjoyed Corona more than those beers. |
You havent included Co2 rise in those graphs. lol |
Faceman really is troll of the decade, if his leftism wasn't to damn nutty he might actually be funny, hints of it with the Co2 lol
|
Corona is yellow how can you not tell it apart from brown beers?
|
Corona is yellow how can you not tell it apart from brown beers? Yeh, we wanted to use opaque, black cups to make it hard to see any colour. Another option would have been to put some food dye in each beer... but it turns out we didn't need to: Either people weren't trying to use the colour of the beer, or, if they were, it didn't actually help. One thing we noted on the night was that the colour of the beer in the jugs seems to vary greatly just according to how much beer was left in the jug, or where it was placed on the table in relation to the down lights. |
Also, one of the participants thought he would be able to guess entirely on colour of the beers - but ended up changing all his guesses after trying them! In the end he got 4/6, so it seems to me that colour didn't affect the results too much.
|
I know I wouldn't be able to tell the difference between tooheys new and xxxx. I hate both.
I think though am interesting test would have been to get people to have to write which beer they liked and disliked them blindly have them match it. Should ruin or make ego with those findings. Corona after another beer to me tastes like it has a hint of pine nuts.. Squire is distinctive but I can't afford it. Anyway good test Edit: Please ignore poorly used m and n's, phone is annoying |
I got three out of six, including Corona, although it was a bit of a guess, honestly - I thought I'd be able to nail Corona because I dislike it so much but in the end all the pov beers tasted basically the same to me (I had had several beers before we started though so I think my judgment was off).
The Squires was the only one I was 100% confident about from the get-go; I thought it was interesting that it was so clearly the most recognisable beer, but wasn't rated significantly highly taste-wise than the others. |
Good stuff, me rikey.
Results are quite surprising to me. Maybe because I'm a general beer slut I just overrate my ability to pick beers. I'd be interested to see how I would go, although I'm not about to carry out the experiment ;) Maybe next time at the pub I'll just get a mate to buy me a random beer, keep my eyes closed when he comes back and see if I can pick it, haha. Anyway, loving the experiment - I pity the committee that has to make the next UQ Skeptics meeting as interesting, haha. |
Yeh the next meeting will be hard to match in terms of enjoyment for large masses of people, but I think we might have a completely different focus for the next meeting, like have a lecture on the research into cognitive bias or something.
One other fun idea I had was that we get a s*** load of guys and girls (hopefully in the pub again to lubricate the mingling), and then have everyone write down a one sentence pick up line. We'd enter these into a spreadsheet as being something a guy would say to a girl, or vice versa, but not tell anyone who wrote each line. We could then have all girls rate the lines that they think would work on them, and we can have all guys rate the lines THEY think would work on the girls. Then we can compare differences between what the genders think... it's a work in progress at the mo' but I think it has potential to be quite entertaining and informative! |
Yeh the next meeting will be hard to match in terms of enjoyment for large masses of people, but I think we might have a completely different focus for the next meeting, like have a lecture on the research into cognitive bias or something. I'm not really sure how this society thing works (sounds like a cult), but you can be talking to me about ANYTHING and I'll enjoy it if you make it entertaining. If you've had a good think about your stage presence and know how to get an audience involved (i don't mean making them stand up and talk or reply to you with 'YES!' i mean in the simple sense of nodding along in agreement, or faces of confusion and thought when you've posed questions) then you'll be doing the right thing. If you know your audience (your society members?) then you should be able to easily entertain them if you know the kinds of things they want to hear. It's hard to beat anything like going out for beers for entertainment. But there's still plenty of worthwhile entertaining things you can do while on stage which is better than some alternatives (like going home and playing starcraft). Just got to entertain more than their alternative. Feel free to keep posting your future events here too. I'm interested. |
Nice write up!
I have a few questions about your results though. 1) Maybe I'm just being a bit dense here, but the way I'm interpreting the first graph seems to indicate there are a lot more than 53 unique prediction/actual combinations (assuming the lowest number for the range presented in for circle size legend). Am I just missing something obvious or has something gone astray? 2) Just eye-balling graph 2 it looks like the inclusion of Squire is almost the entire reason the subjects managed to beat the guess rate! (assuming that removing it didn't have any effect on guessing the remaining beers) 3) So is graph 3 telling us that even though there is no statistically significant difference between any given beer number in terms of enjoyability, that there is still a positive correlation? How would you interpret that, in terms of prediction etc? Anyway, sounds like fun all round! |
1) Maybe I'm just being a bit dense here, but the way I'm interpreting the first graph seems to indicate there are a lot more than 53 unique prediction/actual combinations (assuming the lowest number for the range presented in for circle size legend). Am I just missing something obvious or has something gone astray? Yep, you're absolutely right. I always new something was up with that graph... but I was rushed when getting it finished and excuses excuses. Overall the pattern and combinations are accurate, the problem is purely in the representation of number of combinations. The data used to analyse the relationship between those variables was not contaminated with the same problem. 2) Just eye-balling graph 2 it looks like the inclusion of Squire is almost the entire reason the subjects managed to beat the guess rate! (assuming that removing it didn't have any effect on guessing the remaining beers) Definitely helped, but we would predict that only 1/6 of all people (~17%) would guess each of the other beers. All but Becks was around TWICE that value, so it seems overall people could correctly identify more than one beer, but it seems like we need to do another test to find out whether results are specific to just the specific combination of beers here. I think if we excluded James Squire altogether it could be quite interesting. 3) So is graph 3 telling us that even though there is no statistically significant difference between any given beer number in terms of enjoyability, that there is still a positive correlation? How would you interpret that, in terms of prediction etc? Actually, I ran one post-hoc test comparing the first beer to the last beer, and they were significantly different (the last beer was reliably rated as "more enjoyable" than the first beer, p < 0.05). But, regardless of specific differences, the positive correlation tells us that, regardless of what beer you're drinking, you're more likely to say that you enjoy a beer if you've had about 4 or 5 other beers first! I think this is quite important, and I'd predict that these values start to plateau after 6 - 8 beers. That might be useful for future tests, because it would suggest that tasters should have a few beers before they start tasting, so that their baseline enjoyment ratings are raised to an even level. The other possibility that we didn't test for is that there could have been an interaction between one (or more) of the beer types and the time at which it was tasted. For example, people might have rated Corona ~5 if they tasted it first, but ~9 if they tasted it 6th. Our analyses can't tell whether or not this actually happened. |
Yep, you're absolutely right. I always new something was up with that graph... but I was rushed when getting it finished and excuses excuses. Overall the pattern and combinations are accurate, the problem is purely in the representation of number of combinations. The data used to analyse the relationship between those variables was not contaminated with the same problem.Phew, you had me worried that 10 years out of uni and I can't even read a graph anymore :P Definitely helped, but we would predict that only 1/6 of all people (~17%) would guess each of the other beers. All but Becks was around TWICE that value, so it seems overall people could correctly identify more than one beer, but it seems like we need to do another test to find out whether results are specific to just the specific combination of beers here. I think if we excluded James Squire altogether it could be quite interesting.Yeah that's what I was getting towards. Squire seems to be different enough to tell quite easily, so there might be grounds for excluding it (this is clearly very dangerous territory though). In the absence of squire and assuming the other results stay the same, we'd be looking at something more like 1.25 average compared to a guess rate of 1. Probably still significant but not quite so impressive. Actually, I ran one post-hoc test comparing the first beer to the last beer, and they were significantly different (the last beer was reliably rated as "more enjoyable" than the first beer, p < 0.05).Ahh, in that case what are the error bars on graph 3 representing? Based on what you've just cited here it seems like it's the star height that represents the confidence interval and the bars or something else? |
Ahh, in that case what are the error bars on graph 3 representing? Based on what you've just cited here it seems like it's the star height that represents the confidence interval and the bars or something else? Error bars in graph 3 represent standard deviation. I chose to show standard deviation instead of standard error so that people could see that, on average, an individual's enjoyment rating doesn't deviate more from the mean over time. That is, people don't get drunk and start to rate some beers really high and some beers really low, but they are in fact reporting higher and higher enjoyment ratings. |
Just a bump to let people know we're holding another event this coming Friday at the UQ Pizza Cafe. Details are on our Facebook events page !
|
joined the facebook group billy, will try to make an event at some point in the future!
|
joined the facebook group billy, will try to make an event at some point in the future! |
w0ot. We're planning an epic trivia night at the end of May/start of June. You won't wanna miss that one.
|