Variability in men’s singles tennis strategy at the us open
Resumen strategy at the us open in men’s singles tennis
Variables used in performance analysis have some key differences to those used in other sports science disciplines such as kinanthropometry. In kinanthropometry, variables such as height, body mass and even fitness test performances are relatively stable with changes occurring over long term periods. Variables used in performance analysis, on the other hand can vary considerably from match to match as well as within the same match (O’Donoghue, 2004). The main source of variability in performance is opposition effects (McGarry and Franks, 1994), with other sources of variability in performance including scoreline effects within games (Shaw and O’Donoghue, 2004) and match venue (Devlin et al., 2004). Further evidence of match to match variability was provided by an exercise that compared independent samples of tennis players from different regions (Wells et al., 2004). When each player’s performance was represented by a single match, there was much greater within sample variance than when 2, 3, 4 or 5 matches were used to derive a typical performance for each player. It is worth considering the differences between performance indicators (Hughes and Bartlett, 2002) in sports performance and variables. All performance indicators are variables but not all variables are performance indicators. In other disciplines such as computer science, performance indicators are individual raw measurements or metrics derived from combinations of raw measurements (Jain, 1991) that have the following metric properties (Bevan, 1991):
- There is an objective measurement procedure.
- The measure is a valid of the aspect of performance of interest.
- There must be a means of interpreting the values made using the measurement.
Introduction
Variables used in performance analysis have some key differences to those used in other sports science disciplines such as kinanthropometry. In kinanthropometry, variables such as height, body mass and even fitness test performances are relatively stable with changes occurring over long term periods. Variables used in performance analysis, on the other hand can vary considerably from match to match as well as within the same match (O’Donoghue, 2004). The main source of variability in performance is opposition effects (McGarry and Franks, 1994), with other sources of variability in performance including scoreline effects within games (Shaw and O’Donoghue, 2004) and match venue (Devlin et al., 2004). Further evidence of match to match variability was provided by an exercise that compared independent samples of tennis players from different regions (Wells et al., 2004). When each player’s performance was represented by a single match, there was much greater within sample variance than when 2, 3, 4 or 5 matches were used to derive a typical performance for each player. It is worth considering the differences between performance indicators (Hughes and Bartlett, 2002) in sports performance and variables. All performance indicators are variables but not all variables are performance indicators. In other disciplines such as computer science, performance indicators are individual raw measurements or metrics derived from combinations of raw measurements (Jain, 1991) that have the following metric properties (Bevan, 1991):
- There is an objective measurement procedure.
- The measure is a valid of the aspect of performance of interest.
- There must be a means of interpreting the values made using the measurement.
These properties together with demonstrated reliability are essential for a sports performance variable to be considered as a performance indicator. Even when performance indicators possess these properties, values are often unrepresentative of typical player or team performances. Sometimes this is not an issue as the purpose of a performance analysis exercise might be to report on an individual performance. However, for scientific purposes the issue of unrepresentative data can effect the conclusion of a study leading to an increased chance of making a Type II error (Wells et al., 2004). The purpose of the current paper is to explore the nature of variability in sports performance. The example performance indicator to be used is the percentage of points in a tennis match where a player goes to the net. The scope of the paper will be restricted to men’s singles tennis at the US Open between 2002 and 2005 inclusive.
Methods
MATCHES The percentage of points where a player went to the net was used as an indicator of strategy within the current study. Data from 319 US Open men’s singles matches played between 2002 and 2005 inclusive were gathered from the match statistics pages of the official tournament website (www.usopen.org, accessed on 9/9/2002, 8/9/2003, 13/9/2004, 12/9/2005) allowing the percentage of points that each player went to the net within each match included to be determined. Matches were included if they were completed without players withdrawing or being disqualified and if the number of net points was included in the match statistics provided on the official tournament website. This provided a total of 638 values for 171 different players. RELIABILITY The 2002 US Open men’s singles final between Pete Sampras and Andre Agassi was observed by the author with points being recorded as net points where a player crossed the service box line and played at least one shot from within the service boxes during the rally. The author’s totals of 105 net points for Pete Sampras and 13 for Andre Agassi agreed with the totals reported for in the match statistics on the official tournament internet site. There was also agreement that the match contained 277 points. DATA PROCESSING The frequency of net points played by the winning and losing players and the total number of points played in the match was recorded for each match allowing the percentage of net points to be determined for the winning and losing players within the match. These 638 values from the 319 matches did not come from 638 different players but from 171 individuals. Therefore, the values recorded were arranged into sets for the 171 individual players. A record summarising the percentage of net points was produced for each player. This record consisted of the player name, the number of matches he played in within the data set, the mean percentage of net points played and the standard deviation of net points played. ANALYSIS OF VARIABILITY Intra-player variability was evaluated using the sets of values for the four players who played more than 15 matches within the data set. Table 1 shows that in each case the player’s values were normally distributed (-1.96 < zSkew < +1.96; -1.96 < zKurt < +1.96). The mean and standard deviation for the percentage of points where each player went to the net was determined for these players.
Table 1. Skewness and kurtosis of percentage net points for the 4 players who played in more than 15 matches.
Inter-player variability was evaluated using the mean value recorded for each of the 171 players in the data set. The distribution of the player mean for percentage of net points was positively skewed (zSkew = +7.31) and lepokurtic (zKurt = +3.91). However, the distribution of the player mean for the natural logarithm of the percentage of net points was normal (zSkew = +0.64, zKurt = +0.38). Pearson’s r was used to explore if there was any association between various derivatives of the mean and standard deviation for a player’s percentage net points. This was done using the data for those 42 players who had played 5 or more matches within the data set. The coefficient of determination, r2, would indicate the proportion of the standard deviation that was explained by any derivative of the mean. Regression analysis was then done to model the standard deviation in terms of the mean, investigating the distribution of residual values for the 42 players.
Results
The mean value for percentage net points for the 171 players was determined and was found to be positively skewed (15.95+9.17%, zSkew = +7.31, zKurt = +3.91). Figure 1 illustrates the lepokurtic nature of the variable as well as the positive skew that exists. However, the natural logarithm of player mean values was found to be normally distributed (2.62+0.54, zSkew = +0.64, zKurt = +0.38).
The mean and standard deviation (SD) of the percentage of points a player went to the net was determined for the 42 players who were involved in 5 or more matches within the data set. A relationship was found between the natural logarithm of the mean and the standard deviation (SD = -3.93 + 3.50 ln(Mean), r2 = 0.487) with no relationship between ln(Mean) and residual values (r = 0.000) and the residuals being normally distributed (0.00+1.66, zSkew = +1.25, zKurt = -0.30). Figure 2 shows the relationship between the mean and SD. Thirty nine of these 42 players had a lower intra-player standard deviation than the inter-player standard deviation of 9.17%.
The data for the 4 players who played 15 or more matches of the data set is summarised in Table 2. The intra-player distribution of the percentage points where they went to the net was found to be normally distributed in each case.
Table 2. Percentage of points where 4 players went to the net during a series of men’s singles matches at the US Open between 2002 and 2005 inclusive.
Discussion
The current investigation has provided evidence that there is less intra-player variability for the percentage of net points in men’s singles tennis at the US Open players than inter-player variability. This is not the case for all performance indicators in all sports, as O’Donoghue (2004) provided examples of one soccer player (David Beckham) whose work rate varied between matches more than that the player means between different midfielders and another soccer player (Michael Owen) whose variability in work rate within the same match was comparable with the variability in player mean work rate between different players. However, the inter- and intra-player variability for this indicator of player strategy is distributed in fundamentally different ways. Match to match variability within individual players is normally distributed in contrast to the skewed distribution between different players. This information should be taken into account when evaluating player performances from individual matches. Knowledge of inter-player and intra-player distributions for the values of performance indicators allow realistic synthetic data to be produced. There are legitimate purposes of synthesising data, especially where investigations are testing profiling techniques (O’Donoghue and Ponting, 2005). Such investigations purposely and openly use synthetic data because the volume of data required cannot feasibly be collected though observational techniques or even from internet sources. A performance indicator may require over 30 matches to stabilise (Hughes et al., 2001) but a tennis player will play a maximum of 7 singles matches at a given Grand Slam tournament each year. The analysis undertaken of inter-player and intra-player variability in %net points has allowed a procedure for synthesising realistic data to be devised. The steps of the procedure are described as follows: 1. Randomly determine the natural logarithm for the player’s mean value. This is done by firstly generating a random probability (between 0 and 1) and looking up the associated z-score from the standard normal distribution. Secondly, the ln(mean value) value is calculated as being 2.62 + 0.54 z. 2. The exponential of ln(mean value) will be the player’s mean value for %net points. The first 2 steps need to be repeated if a value of less than 0% or greater than 100% is produced. 3. The expected SD for the player’s value will be determined from the regression equation; SD = -3.93 + 3.50 ln(Mean). 4. The actual SD for the player will be synthesised by determining a random residual value and adding this to the expected SD. Firstly, a random probability (between 0 and 1) is produced and the associated z-score is looked up from the standard normal distribution. Secondly, the residual is determined as being 1.66 z. Thirdly, the actual SD will be the sum of the expected SD and the residual. This step needs to be repeated if a SD of less than 0% is produced. 5. A random value for %net points can now be synthesised for an individual match for the synthetic player. Firstly, a random probability (between 0 and 1) is produced and then the associated z-score from the standard normal distribution is used; individual performance value = player’s mean value + z x player’s SD. This step needs to be repeated if a value of less than 0% or greater than 100% is produced. There are a number of different studies that can be undertaken using data that can be synthesised using the kind of procedure described here. Firstly, the effect of limited reliability on the results of investigations can be determined. Reliability is a critically important issue in performance analysis (Hughes et al., 2004). This is not only the case in scientific research but also in coaching contexts where player, coach and team decisions need to be supported by reliable data (O’Donoghue and Longville, 2004). The additional variability due to measurement error may mean that a significant difference to be produced (Atkinson, 2002). Therefore, any significant result found in the presence of measurement error is one that there can be confidence in. Synthetic data can be used to represent true values for athletes as well as measured values that synthesis the effect of measurement error. The results of inferential statistical procedures can be compared when applied to synthesised true and synthesised measured values. Future investigations should analyse the impact of measurement error in independent sample comparisons, related sample comparisons and correlation studies. A second area for future research is the effect of using unrepresentative data. Wells et al. (2004) used real performance data to show the impact of using individual and multiple match data to derive values for players within a study. Using synthetic data would allow greater investigation of this problem. For different types of performance indicator it would be useful to understand how many matches are required to produce a typical profile for a player. The combined effects of limited reliability of measurement and unrepresentative data can also be investigated using synthetic data. In conclusion, player strategy in tennis can be indicated by the percentage of points where they go to the net. The value for this performance indicator is influenced by the player’s typical strategy but also by individual match effects especially opposition effects. The amount of between player variability is greater than within player variability for most players. The distribution of this performance indicator is normal for an individual player’s performances but is not normal between different players.
Bibliografía
- Bevan, N., Kirakowski, J. and Maissel, J. 1991 “What is usability?” Human aspects in computing: design and use of interactive systems with terminals, Bullinger, H-J. (Eds.), Amsterdam: Elsevier, 651-654.
- Devlin, G., Brennan, D.A. and O’Donoghue, P.G., 2004 “Time-motion analysis of work-rate during home and away matches in collegiate basketball” Performance Analysis of Sport 6, O’Donoghue, P.G. and Hughes, M.D. (Eds.), Cardiff: CPA Press, UWIC, 174-178.
- Hughes, M., Evans, S. and Wells, J. 2001 “Establishing normative profiles in performance analysis” International Journal of Performance Analysis of Sport (e), 1, 4-27.
- Hughes, M. and Bartlett, R. 2002 “The use of performance indicators in performance analysis” Journal of Sports Sciences, 20, 739-754.
- Hughes, M., Cooper, S.M. and Nevill, A. 2004 “Analysis of notation data: reliability” Notational analysis of sport, 2nd Edition, Hughes, M. and Franks, I.M. (Eds.), London: Routledge, 189-204.
- Jain, R. 1991 The art of computer systems performance analysis: techniques for experimental design, measurement, simulation and modelling, New York: Wiley.
- McGarry and Franks, I.M. 1994 “A stochastic approach to predicting competition squash match-play” Journal of Sports Sciences, 12, 573-584.
- O’Donoghue, P.G. 2004 “Sources of variability in time-motion data; measurement error and within player variability in work-rate” International Journal of Performance Analysis of Sport-e, 4(2), 42-49.
- O’Donoghue, P.G. and Longville, J. 2004 “Reliability testing and the use of statistics in performance analysis support: a case study from an international netball tournament” Performance Analysis of Sport 6, O’Donoghue, P.G. and Hughes, M.D. (Eds.), Cardiff: CPA Press, UWIC, 1-7.
- O’Donoghue, P.G. and Ponting, R. 2005 “Equations for the Number of Matches Required for Stable Performance Profiles” International Journal of Computer Science in Sport(e), 4(2), 48-55.
- Shaw, J. and O’Donoghue, P.G. 2001 “The effects of scoreline on work-rate in amateur soccer” Performance Analysis of Sport 6, O’Donoghue, P.G. and Hughes, M.D. (Eds.), Cardiff: CPA Press, UWIC, 84-91.
- Wells, J., O’Donoghue, P.G. and Hughes, M.D. 2004 “The need to use representative player data from multiple matches in performance analysis” Performance Analysis of Sport 6, O’Donoghue, P.G. and Hughes, M.D. (Eds.), Cardiff: CPA Press, UWIC, 241-244.