Some places the study was featured.
The following is reposted from a 2015 piece I wrote for Bluff magazine. It was originally located at this URL but has become unavailable due to Bluff going out of business. I saw this study mentioned recently in Maria Konnikova’s book ‘The Biggest Bluff’ and was reminded about this piece and noticed it was offline, so I wanted to share it again. A few notes on this piece:
The original title below and was more negative-sounding than I liked; Bluff chose it. Also, if I could rewrite this piece now, I’d probably choose less negative-sounding phrasing in some places.
Regardless of the exact factors that might be at work in the found correlation, I realize it’s scientifically interesting that a significant correlation was found. But I also think it’s possible to draw simplistic and wrong conclusions from the study, and my piece hopefully gives more context about the factors that might be at work.
Image on left taken from Michael Slepian’s media page.
The Slepian Study on Betting Motions Doesn’t Pass Muster
A 2013 study¹ conducted at Stanford University by graduate student Michael Slepian and associates found a correlation between the “smoothness” of a betting motion and the strength of the bettor’s hand. In a nutshell, there was a positive correlation found between betting motions perceived as “smooth” and “confident” and strong hands. The quality of the betting motions was judged by having experiment participants watch short clips of players making bets (taken from the 2009 WSOP Main Event) and estimate the hand strength of those bets.
This experiment has gotten a lot of press over the last couple years. I first heard about it on NPR. Since, I’ve seen it referenced in poker blogs and articles and in a few mainstream news articles. I still occasionally hear people talk about it at the table when I play. I’ve had friends and family members reference it and send me links to it. It’s kind of weird how much attention it received, considering the tons of interesting studies that are constantly being done, but I guess it can be chalked up to the mystique and “sexiness” of poker tells.
The article had more than casual interest for me. I’m a former professional poker player and the author of two books on poker behavior: Reading Poker Tells and Verbal Poker Tells. I’ve been asked quite a few times about my opinion on this study, and I’ve been meaning to look at the study more closely and write up my thoughts for a while.
In this article, I’ll give some criticisms of the study and some suggestions for how this study (and similar studies) could be done better. This isn’t to denigrate the work of the experiment’s designers. I think this is an interesting study, and I hope it will encourage similar studies using poker as a means to study human behavior. But I do think it was flawed in a few ways, and it could be improved in many ways.
That’s not to say that I think their conclusion is wrong; in fact, in my own experience, I think their conclusion is correct. I do, however, think it’s a very weak general correlation and will only be practically useful if you have a player-specific behavioral baseline. My main point is that this study is not enough, on its own, to cause us to be confident about the conclusion.
I’ll give a few reasons for why I think the study is flawed, but the primary underlying reason is a common one for studies involving poker: the study’s organizers just don’t know enough about how poker works. I’ve read about several experiments involving poker where the organizers were very ignorant about some basic aspects of poker, and this affected the way the tests were set up and the conclusions that were reached (and this probably applies not just to poker-related studies but to many studies that involve an activity that requires a lot of experience to understand well).
Poker can seem deceptively simple to people first learning it, and even to people who have played it for decades. Many bad players lose money at poker while believing that they’re good, or even great players. In the same way, experiment designers may falsely believe they understand the factors involved in a poker hand, while being far off the mark.
Here are the flaws, as I see them, in this study:
1. The experimenters refer to all WSOP entrants as ‘professional poker players.’
This first mistake wouldn’t directly affect the experiment, but it does point to a basic misunderstanding of poker and the World Series of Poker, which might indirectly affect other aspects of the experiment and its conclusions.
Here are a couple examples of this from the study:
The World Series of Poker (WSOP), originating in 1970, brings together professional poker players every year (from the study’s supplemental materials)
These findings are notable because the players in the stimulus clips were highly expert professionals competing in the high-stakes WSOP tournament.
The WSOP Main Event is open to anyone and most entrants are far from being professional poker players. Categorizing someone’s poker skill can be difficult and subjective, but Kevin Mathers, a long-time poker industry worker, estimates that only 20% of WSOP Main Event entrants are professional (or professional-level) players.
This also weakens the conclusion that the results are impressive due to the players analyzed being professional-level. While the correlation found in this experiment is still interesting, it is somewhat expected that amateur players would have behavioral inconsistencies. I’d be confident in predicting that a similar study done on only video clips of bets made by professional poker players would not find such a clear correlation.
2. Hand strength is based on comparing players’ hands
This is a line from the study that explains their methodology for categorizing a player’s hand as ‘weak’ or ‘strong’:
Each player’s objective likelihood of winning during the bet was known (WSOP displays these statistics on-screen; however, we kept this information from participants by obscuring part of the screen).
They relied on the on-screen percentage graphics, which are displayed beside a player’s hand graphics in the broadcast. These graphics show the likelihood of a player’s hand winning; it does this by comparing it to the other players’ known hands. This makes it an illogical way to categorize whether a player believes he is betting a weak or strong hand.
If this isn’t clear, here’s a quick example to make my point:
A player has QQ and makes an all-in bet on a turn board of Q-10-10-8. Most people would say that this player has a strong hand and has every reason to believe he has a strong hand. But, if his opponent had 10-10, the player with Q-Q would have a 2.27% chance of winning with one card to come. According to this methodology, the player with the Q-Q would be judged as having a weak hand; if the test participants categorized that bet as representing a strong hand, they would be wrong.
It’s not stated in the study or the supplemental materials if the experimenters accounted for such obvious cases of how using the percentage graphics might skew the results. It’s also not stated how the experimenters would handle river (last-round) bets, when one hand has a 100 percent winning percentage and the losing hand has 0 percent (the only exception would be a tie).
It’s admittedly difficult to come up with hard-and-fast rules for categorizing hand strength for the purposes of such an experiment. As someone who has thought more than most about this problem, for the purpose of analyzing and categorizing poker tells, I know it’s a difficult task. But using the known percentages of one hand beating another known hand is clearly a flawed approach.
The optimal approach would probably be to come up with a system that pits a poker hand against a logical hand range, considering the situation, or even a random hand range, and uses that percentage-of-winning to rank the player’s hand strength. If this resulted in too much hand-strength ambiguity, the experiment designers could throw out all hands where the hand strength fell within a certain medium-strength range. Such an approach would make it more likely that only strong hand bets and weak hand bets were being used and, equally important for an experiment like this, that the player believed he or she was betting either a strong or weak hand.
3. Situational factors were not used to categorize betting motions
When considering poker-related behavior, situations are very important. A small continuation-bet on the flop is different in many ways from an all-in bet on the river. One way they are different: a small bet is unlikely to cause stress in the bettor, even if the bettor has a weak hand.
Also, a player making a bet on an early round has a chance for improving his hand; whereas a player betting on the river has no chance to improve his hand. When a player bets on the river, he will almost always know whether he is bluffing or value-betting; this is often not the case on earlier rounds, when hand strength is more ambiguous and undefined.
This experiment had no system for selecting the bets they chose for inclusion in the study. The usability of the clips was apparently based only on whether the clip meant certain visual needs of the experiment: i.e., did the footage show the entirety of the betting action and did it show the required amount of the bettor’s body?
From the study:
Research assistants, blind to experimental hypotheses, extracted each usable video in each installment, and in total extracted 22 videos (a standard number of stimuli for such studies; Ambady & Rosenthal, 1993) for Study 2 in the main text.
Study 1 videos required a single player be in the frame from the chest-up, allowing for whole-body, face-only, and arms-only videos to be created by cropping the videos. These videos were therefore more rare, and the research assistants only acquired 20 such videos.
The fact that clips were chosen only based on what they showed is not necessarily a problem. If a hand can be accurately categorized as strong or weak, then it doesn’t necessarily matter when during a hand it occurred. If there is a correlation between perceived betting motion quality and hand strength, then it will probably make itself known no matter the context of the bet.
Choosing bets only from specific situations would have made the experiment stronger and probably would have led to more definite conclusions. It could also help address the problem of categorizing hand strength. For example, if the experiment designers had only considered bets above a certain size that had occurred on the river (when all cards are out and there are no draws or semi-bluffs to be made), then that would result in polarized hand strengths (i.e., these bets would be very likely to be made with either strong or weak hands).
Also, the experiment’s method for picking clips sounds like it could theoretically result in all strong-hand bets being picked, or all weak-hand bets being picked. There is nothing in the experiment description that requires a certain amount of weak hands or strong hands. This is not in itself bad, but could affect the experiment in unforeseen ways.
For example, if most of the betting motion clips chosen were taken from players betting strong hands (which would not be surprising, as most significant bets, especially post-flop, are for value), then this could introduce some unforeseen bias into the experiment. One way this might happen: when a video clip shows only the betting motion (and not, for example, the bettor’s entire torso or just the face, as were shown to some study groups), this focus might emphasize the bet in the viewer’s mind and make the bet seem stronger. And if most of the hands-only betting clips were of strong-hand bets (and I have no idea how many were), the study participants watching only the hand-motion betting clips would falsely appear to be making good guesses.
My main point here is that thinking about the situational factors of a betting motion, and incorporating that into the experiment in some way, would have resulted in less ambiguity about the results. (It appears that it was difficult to find usable clips from a single WSOP event; in that case, the experimenters could just add footage from another WSOP Main Event to the study.)
4. The number of chips bet was not taken into account
The experiment designers did not take into account the chips that were bet. In their words:
During betting, each player pushes poker chips into the center of the table. Each chip has a specific color, which indicates a specific value. These values range from $25 to $100,000. This range of chip values has a crucial consequence for the current work. The number of chips does not correlate with the quality of the hand (see Table 1A in the main text). Players could move a stack of 20 chips into the center of the table, and this could be worth $500 or $2,000,000 (the winner of the 2009 WSOP won $8,547,042, thus the latter bet magnitude is a bet that can be made in the WSOP). Because no participants were professional poker players, nor considered themselves poker experts, they were not aware of chip values. They could not, then, use the number of chips as a valid cue to judge poker hand quality.
It’s true that your average person would not know what the chip colors at the WSOP Main Event mean. But it seems naïve to think that seeing the chips being bet couldn’t possibly have an effect on the experiment.
For one thing, the number of chips being bet could bias a participant to think a bet was stronger or weaker, whether correctly or incorrectly. What if all the strong-hand bets in the study were also bets that involved a lot of chips? (This is not implausible because smaller bets with weak hands are common early in a hand, when bets are small, whereas larger bets later in the hand are more likely to represent strong hands.) And what if some of the study participants were able to deduce (consciously or unconsciously) the strength of the bet from the number of chips?
Also, it’s possible that some of the test participants were knowledgeable (consciously or not) about some WSOP chip colors and what their denominations were. Or they were able to deduce (consciously or not), from the arrangement and number of chips, what the chip values were. (For example, large denomination chips are generally required to be kept at the front of a player’s stack.)
Again, this could have been addressed by selecting bets taken only from specific situations and only of certain bet sizes. If all bets chosen were above a certain bet size, and this was communicated to the study participants, then this would have lessened the impact of the chips being able to be seen.
5. Quality of “smoothness” was subjective
The experiment was based on the perceptions of study participants watching the assembled video clips. It was not based on objective measurements of what constitutes “smoothness” of a betting motion. This was a known issue in the experiment:
Thus, both player confidence and smoothness judgments significantly predicted likelihoods of winning, which suggests that movement smoothness might be a valid cue for assessing poker hand quality. It is unknown, however, how participants interpreted “smoothness” or whether the players’ movements that participants rated as smooth were truly smoother than other players’ movements. Other physical factors, such as speed, likely played a role.
This is not a major criticism; I think using perception is a fine way to find a correlation, especially for a preliminary study. But I think it does mean that we have no reason to be confident in the idea that smoothness of betting motion is correlated with hand strength. If there is are correlations between betting motion and hand strength (which I believe there are), these could be due to other aspects of arm motion or hand motion, such as: the betting speed, the position of the hands, the height of the hand, or other, more obscure, factors.
Again, I don’t mean to denigrate the experiment designers and the work they’ve done. I think this was an interesting experiment, and I think it’s probable the correlation they noticed exists (however weak the correlation may be).
Also, as someone who is very interested in poker behavior, I’d love to see similar studies be done. My main goal in writing these criticisms and suggestions was to emphasize that poker is complex, as is poker behavior. There are many behavioral factors in a seemingly simple hand of poker and taking these factors into account can make an experiment stronger and the results more conclusive.
Patricia Cardner, PhD, EdD, is a poker player and the author of Positive Poker, a book about the psychological characteristics of professional poker players. She had this to say about poker’s use in scientific studies:
“While researchers often have the best of intentions, it is difficult for them to fully understand the nuances of poker. Researchers who reach out to poker players for help can make more informed decisions about the research areas they choose to pursue, increase reliability and validity, and improve the overall quality of their results and conclusions.”
¹: Slepian, M.L., Young, S.G., Rutchick, A.M. & Ambady, N. Quality of Professional Players’ Poker Hands Is Perceived Accurately From Arm Motions. Psychological Science (2013) 24(11) 2335–2338.