RANDOMIZATION
AND COMPOSITION OF PREFERENCES IN HORSE RACES
Annibal Parracho Sant’Anna
Universidade Federal Fluminense
ABSTRACT
Models for preferences, established
by ranking the available options according to isolated criteria, are developed
here. Randomization of ranks allows us to compute the probability of each
option being raised to the position of best choice. These partial measures of
preference are finally aggregated into global measures. It is verified that, in
horse races, measures of preference built through odds of the options to hold
the first position according to some criterion or to a combination of criteria
are more correlated with the measure of preference given by the final betting
distribution than those given directly by the ranks.
Keywords: Random ranks – Modeling Preferences –
Multicriteria Decision Analysis
1. Introduction
The choices of the bettors in
horse races constitute a collective decision-making process, aiming to identify
the true probability of each horse in the track becoming the winner of the next
race. Sounding the chords that announce
the countdown to the opening of the starting-gates to the animals, hundreds of
gamblers search the official boots where they will place their bets. Each
bettor will choose the horse that, in his opinion, offers the lower
cost/benefit ratio. In this way they will raise the ratio between placed bets
and winning chances in the points where this ratio looks small. Thus, the
moment when the race starts catches a probability distribution of bets that
faithfully mirrors the point of view of the group about the winning chances of
the horses.
It is obvious that this
probability distribution does not need to correctly represent the true
probability distribution of the animals facing the starting gate become the
winner by the end of the race, in the sense that only random disturbances,
intervening during the race to modify that distribution, would determine the
winner. It is possible that the process
of formation of preferences of the bettors do not take into due account factors
that systematically affect the results of the races. Such factor may be out of
the range of knowledge of the bettors that may also form their preferences
based on erroneous theories.
Besides, it is well
established, since Kahneman and Tversky (1979), that, in many situations, the
distribution of preferences effectively observed deviates from what would
result from the rational application of the information and theoretic models
available to the decision makers. Quantifying these deviations, thoroughly
catalogued in McFadden (1999), has been object of research in the last
decades. We follow here the approach of
Gomes and Lima (1992) of just trying to derive the functional form that better
mirror the distortions objectively found throughout the whole set of possible
odds, without trying to explain the effect of each factor on the evaluation of
each option.
In the case of horse races, in
the urge of last moment gambling, it is possible that emotional factors deviate
in the same direction from the objective of profit from the possible
distortions in the observed cost/benefit ratio of each possible bet. Among these factors, are, by one side, the
convenience in substantiating the choice on a simple comparison and, by another
side, the unreliability inherent to the result of any rational procedure of
choice, given the aspects that people are always forced to leave aside any
analytical model.
The conjugation of these two
principles would cause to concentrate the bets in the options of higher
probability and to calculate the chance of the other options with regard to
those. If we rank from the less
preferred to the most preferred, that means, by giving rank one to the worst
option, and then replace such ranks by the probabilities of each option
presenting the highest rank, we come to obey these two principles. We focus
attention on the most probable results and strongly reduce the measures of
preference for the less probable. Larger distances between the options of
higher probability, leading to a probability distribution approximately
exponential, have been observed in different contexts. Various situations of that nature are
described by Lootsma (1993).
The replacement of ranks by probabilities of being the first is
applied here to the results of application of ordering criteria: the
preferences derived from the past performance of the competitors, supplied by
the Track Official Program for special proofs and established before the riding
commitments are signed by the jockeys, and the preference exercised by the best
jockeys choosing their mounts.
We start modeling the
preferences by ranking the animals according to each criterion. Then, we
consider the rank attributed to each animal as the observed value of a random
variable, whose expected value it estimates.
From the joint distribution of these random variables, built by applying
hypotheses about the form of the distribution and hypotheses of independence
and identical dispersion that may later be checked, we derive the probability
of each option to be the preferred.
If we were able to rank the
options globally, this procedure might be applied directly to the global ranks
do derive final preferences. But this is not usually the case and, after
obtaining the probability of being the best according to each criterion, we
still have to combine these partial evaluations. The principle of simplification leads us to combine them through
the comparison with the options in higher evidence.
In our case, the first
criterion, official preference derived from former performances, is a more
reliable criterion than the preferences signaled by the jockeys. These, not
only may found their choices on their knowledge of the official ranking, but
are also bounded by long-term links with trainers and owners. In such a
situation, the reference option will be that preferred according to the main
criterion. This option will be betting on the animal in the best position in
the official ranking. Following this approach, we will reduce the pair of measures
of preference for each animal according to the different criteria into a final
one-dimensional measure by projecting the vector of preferences for each horse
on the direction given by the vector of preferences by the animal preferred
according to the main criterion.
These transformations are
applied here to explain the preferences of the bettors in proofs for which the
Jockey Club of Rio de Janeiro supplies ranks for the competitors. Conclusions
can then be extracted from fitting models explaining the preference reflected
in the final distribution of bets through the measures of preference derived
from the two criteria above referred.
In the first place, it is shown that the adjustment of the linear
regression model improves when we replace the ranks by the odds derived from
the probability that the option occupies the position of highest preference.
And improves even more when we aggregate the criteria by projecting on the
direction determined by the option preferred according to the main criterion.
In Section 2, it is discussed
the randomization of the ranks to generate the probabilities of winning and
subsequently the probabilities of being the best choice. Section 3 deals with
the composition of the preferences derived from distinct criteria. In Section 4, the examples of application to
the races in the Jockey Club of Rio de Janeiro are presented. In Section 5, final comments.
2.
Adding
a Random Component
In
this section, it is developed a mechanism to introduce a random component in
the model for the preference. With the addition of this random component, the
preference, initially postulated in a deterministic fashion, comes to be seen
as an estimate of the center of a probability distribution.
The
simplest form of supplying the initial measure of preference is through the
ordering of the options, from the least preferable to the most preferable. This ordering does not need to be strict,
ties being admitted as well as positions left empty to allow for larger
distance between some options. What matters is that, once the indications of
preference are transformed into numerical values, by treating these values as
observations of random variables we are able to calculate the probability of
each option coming to take the position of highest preference.
Measurement errors are usually
modeled with a normal distribution. Instead of that, to increase the
possibility that options classified close together change their ranks, we
impose the uniform distribution for the random components of the preference
measures.
The
expected value of each preference is estimated by the position where the option
is deterministically placed in the initial classification. In the uniform
family, the distribution around the expected value is perfectly identified by
the information on a dispersion parameter.
If we wish to permit that any two options may invert their relative
positions, the range of the distribution must be larger or equal to the
difference between the initial highest and lowest preference measures. If the preferences are given in terms of
ranks, this difference equals the number of available options less 1.
Formally,
the transformation applied to the ranks will then consist of replacing the rank
Rij of the j-th option according to the i-th criterion by the probability
that, by this criterion, that option would be placed in the position of highest
preference, under the assumption that, for all i and j, the preference by the
j-th option according to the i-th criterion is a random variable uniformly
distributed around the respective register Rij.
And
these uniform distributions are assumed independent, all those relative to the
same criterion endowed with the same range parameter, given, for the i-th
criterion, by the maximum of the differences Rik – Ril,
for k and l varying along all the options.
Since
the ratio between the expected value of the range of the random sample of size
n extracted from a population uniformly distributed in a given interval and the
range of this interval is (n-1)/(n+1),
to derive an estimate for the uniform populational range from the sample range,
this should be divided by (n-1)/(n+1), raising to the available number of
options plus 1 the range of each random rank, in the case of ordinal
preferences. This correction may,
however, become excessive in the present situation, where the initial
attribution of preferences is not carried out randomly, but, on the contrary,
the random disturbances just reflect the inaccuracy in the knowledge of an
underlying ordering.
Analogously,
the fact that the expected values of the variables in the sample are different
would make the estimates derived in the usual way from the sample standard
deviation overestimate the dispersion. In fact, for the case of ranking n
options, the discrete uniform distribution in the set of integers from 1 to n
has variance given by n(n+1)(n-1)/12, much larger than that of the uniform
distribution in an interval of range n-1.
Thus, its relative range, given by the ratio from the range to the
standard deviation, has order n-1/2, decreasing with the sample
size, whereas the relative range of the uniform distribution in any interval is
constant.
Thus,
assuming a uniform distribution for the random disturbance, it is not of good
advice to estimate the standard deviation of each measured preference by the
sample standard deviation. But, if we assume a normal distribution, for which
the standard deviation is a natural parameter and the density gradually
decreases with the distance from the mean, this may suitably increase the chance
of rank inversion. In fact, in the normal case, the relation between the
dispersion attributed to each measure and the dispersion observed in the
initial measurements must be bigger than in the uniform case if we wish
non-negligible probabilities of inversion. For instance, in the case of 10
options, the expected value of the normal relative range, around 3, implies a
probability of inversion between the first and the last ranked option of,
approximately, 0,1%.
If the number of options is large, the hypothesis
that every inversion is possible may be unrealistic. Nevertheless, for the goal of calculating the probabilities of
being the preferred, for samples of size 10 or more, little difference results
from assuming the range to be 10 or any number closer to the precise number of
options.
In
the opposite case, when the number of options is small, it may be suitable to
model the dispersion with a larger range than the sample size, to more
correctly represent the chances of preferences inversion. This may be
performed, in practice, by adding one or two fictitious options in the extreme
of lowest preference.
We
might also relax the assumption of identical dispersion and amplify or reduce
the standard deviation of one or another rank, to correspond to a stronger or
weaker conviction about the position of some better or worse known options.
However, precisely modeling the dispersion is often difficult.
The
independence between the random components is also a simplifying assumption
that may lead far from reality. As the
ordering comes from comparing the options, it would be more reasonable to
assume a negative correlation. To model that precisely, it would be enough to
assume identical correlations and derive their value from the fact that the sum
of the ranks is a constant. This
correlation would, however, decrease quickly, in absolute value, with the
increase of the number of options, also leaving to display considerable
numerical effects in the case of 10 or more options.
Since
the decisions evaluated in the present study involve from 10 to 20 options, we
may feel comfortable with the assumptions of independence and identical uniform
distribution with range determined by the sample range. The results presented
below use these hypotheses. The normal distribution with standard deviations
given by the sample standard deviation was also applied and led to similar
results.
3. Combination of Multiple Criteria
3.1.
Classes
of Alternatives
The determination of the
preference in terms of probability of the option being the best, starting from
an initial deterministic classification, may be applied, separately, to simple
criteria that will be later combined, or to a unique criterion that, possibly,
results from previous combination of simpler criteria. This section presents
the aggregating alternatives that will be tried ahead. These alternatives are classified in two
groups, according whether equal importance is initially given to all the
involved criteria or a previous weighting of the criteria is applied.
There are distinct
alternatives to ensure equal importance to the criteria. Two of them are
developed here. According to them, entering initially with equal probability,
the different criteria may come to present very different influences in the
final result. The first is based on the
composition of the probabilities of being the preferred option in a final
probability. The second, based on Data Envelopment Analysis (DEA), measures the
preference by the proximity of a convex envelope of the set of preference vectors.
After those, forms of
aggregation based on weighting the criteria are listed. Of these, to keep the practice of comparing
to the option of higher preference, receives more attention a new form of
composition, based on weights derived from the projection on the direction
determined by the vector of preferences for the option preferred according to
the most important criterion.
3.2.
Compositions
with Equal Importance
Equal initial importance for
all criteria may be applied by several means.
The simplest of these consist on calculating the average of the measures
of preference according to the various criteria or any norm of this vector of
preferences. A probabilistic way to compose giving equal importance to the
criteria consists in using as the global measure the probability of the option
being the preferred by at least one of the criteria. Formally, denoting by Pij
the probability of the j-th option being the best according to the i-th
criterion, the global measure of the preference for that option, and the
respective odd, will then be given by 1-PP(1-Pij)
and by [1-PP(1-Pij)]/PP(1-Pij), i varying, in the
product, along all the available criteria.
Another aggregation
alternative, following the same principle of attributing higher preference to
the options nearer the position of preferred according to some criterion,
consists in measuring the preference by the proximity to the convex envelop of
the set of vectors of preference. This
is the criterion of global efficiency of Farrel (1957), whose calculation can
be implemented through the algorithm of Data Envelopment Analysis with Constant
Returns of Scale (DEA-CRS) oriented to the minimization of the input. To
formulate the problem in the language of DEA, it is enough to treat each option
as an evaluated unit, taking the preferences according to the different
criteria as outputs resulting from the application of a constant amount of a
single input.
This last approach can be
applied whether the initial preference measures according to each criterion are
given in terms of ranks or of probabilities. Besides, the aggregate measure
resulting from the application of this algorithm is invariant to changes of
scale, that is, to changes that preserve the proportionality between the values
of the preferences attributed to different options.
The problem of optimization
solved to apply this concept, assuming that the options under evaluation are
ranked, according to each criterion, from the less preferable to the most
preferable, has the following formulation.
Rij denoting the position of the j-th option according to the i-th
criterion, the global preference by the o-th option is given by eo =
max åwiRio,
where the non-negative weights wi obey the constraints åwiRij
£ 1 for j varying
along the whole set of compared options.
In the summands are represented all the criteria admitted in the
analysis.
Allowing for multipliers wi
with null value, we permit that the global preference by any option be
increased by the exclusion of the criteria that place such option in a
disadvantaged position. This can
result, for instance, in the attribution of a maximal final preference to an
option presenting the same classification of another one in every criterion
except some for which is given a null weight, even if, according to these last
criteria, the other option would be preferred.
In the approach based on ranks, to avoid this possibility, it is enough
to prohibit ties.
The dual formulation of the
optimization problem above set corresponds to the envelope formulation of the
DEA model of Charnes, Cooper and Rhodes (1978) oriented for the input. In this formulation, the level of efficiency
of a production unit is given by the minimum of the possible quotients with
denominator given by the volume of aggregate input applied by the evaluated
unit and numerator given by the volume of aggregate input that a fictitious
production unit, generated by combining the hypothetical result of reducing or
increasing real production units proportionally in all of their inputs and
outputs, must consume to produce a volume of output at least equal to that
presented by the unit under evaluation.
When the preferences according to the available criteria take the place
of outputs resulting from the application of a fixed input, the global preference
is given by the minimal fraction of that standard amount of input that, applied
to a fictitious combination of options, would result in a mixed rank at least
equal to that of the option under evaluation.
Formally, the preference
by the o-th option will be given by the minimum value qo of q such that qSljRij ³ Rio, for any criterion i, with all
the lj nonnegative and adding to 1 and the sum
carried out along the whole set of evaluated options. Since all variables
represent preferences, all will grow in the same direction. This makes easy to
visualize the contribution lj of a general option in the
composition of the fictitious aggregated option which, applying only a fraction
qo of the hypothetical input, would surpass the
position of the o-th option. This score
q corresponds to the sum of the
contributions qlj of the options of reference.
The square norm provides
a simpler form of treating all the criteria equally from a global point of view
and, on evaluating each particular option, giving higher importance to the
criteria according to which that option receives higher preference. In fact, the norm measures the aggregate
preference through a weighted average, with the weight of each criterion given
by the own measure of preference for that option according to that
criterion. Ranking by the norm may then
be thought as a simplification of the DEA approach, eliminating the search for
the prices that maximize the relative efficiency, replaced by prices
proportional to the volumes of the outputs.
3.3. Criteria Weighting
The need to develop the
decision process starting from the set of effectively available options may
lead to weights for the criteria that vary according to the set of options to
be compared. In certain cases, some criteria
can even be applied to compare some of the available options but not all of
them. For other decision processes, we
may be able to establish a hierarchy among the criteria, with higher weights
for the criteria presenting properties such as relevance, reliability of the respective
measurement tools, absence of correlation to other criteria and so on.
A composition mechanism
that, before proceeding to the aggregation, defines weights giving different
importance to the different criteria consists in determining the global preference
by a norm of the projection of the vectors of preference on a unique direction.
If this direction is chosen on the set of available preference vectors, it
provides an example of extracting weights from the classification of the
options supplied by the own criteria. The direction we will choose, to keep the
simplified approach of comparing with the option in higher evidence, will be
that determined by the vector of preferences for the option considered best
according to the most important criterion.
In decisions such as the
gamblers bets, besides the preferences according to each criterion, we have a
global measure of preference given by the amount bet on each option. Then we
can derive weights by fitting a regression model from the global observed measure
on the set of explanatory variables constituted by the partial
preferences. But, the weights derived
from the estimates for the coefficients of the regression model do not
necessarily apply well throughout the whole set of options. To improve fit, we
may try other monotonic transformations of the explanatory variables. In the
application below, we compare results obtained using the squared norm instead
of the length of the projection on the direction of the option preferred
according to the main criterion.
The weights obtained
from fitting a regression model can, also, be combined with variable a
priori weights associated to peculiar kinds of preference vectors or made
vary according to other systems. Another alternative, if we aggregate through
the probability of being the best according to at least one criterion, is to
apply different exponents to each probability of not being the preferred
option. To the same goal, if we order
according to the distance to the excellence frontier, we may bound the
relations between the shadow-prices, that means to bound the weight of each
criterion.
4. Application to Horse Races
This section presents the results of an empirical investigation of
the influence of uncertainty and of the matrix of preferences effectively
observed on the structure of weights of the criteria eventually adopted. The data are of the preferences of the
bettors in races of big prize proofs realized in Rio de Janeiro during the week
of Big Prize Brazil. Of these races it
is analyzed, jointly and separately, the relation between the observed final
bets and the distribution of preferences previously supplied by the Official
Program and by the jockeys. This second
ranking was determined by ordering the animals according to the number of
victories of the respective jockey in the last season.
These are two important
criteria for the bettors, but, instead of two, more criteria should be
combined, to produce more realistic models.
Since all the modeling alternatives studied extend trivially to more
than two criteria, we use in this example the simplest model.
We consider first the 2001
meetings. Initially, is adjusted a regression model having as dependent
variable the vectors of observed odds derived from the final betting
distribution and having as explanatory variables the vectors of preferences
according to the two criteria: preference based on previous campaign and
preference of the jockey. These
preferences are given, in the first fit, in the form of ranks. In a second adjustment tried, they are given
in the form of odds of the option being the preferred, calculated as indicated
in Section 2.
The hypothesis investigated is
if it is possible to identify more uniform weights to explain the odds
effectively determined by the bets by measuring the preferences through the
probabilistic transformations of the vectors of ranks than through the original
ranks. The corroboration of this
hypothesis turns possible to develop a strategy of calculation of weights
attributed by the bettors to each criterion taking into account the
distributions of preference eventually observed.
Afterwards, forms of internal
aggregation previous to the attribution of weights are considered. First the criteria are aggregated
attributing equal importance to both criteria and establishing the preferences
in terms of closeness to the excellence frontier. Two transformations of variables, based in the two forms of
aggregation developed in Section 3.2, are then examined. The first is given by
the odd of the option being the preferred by at least one of the criteria. The
second is the distance to the DEA envelope of the vector of preferences for the
option given in terms of odds according to the two criteria separately.
Finally, another model is
adjusted, as described in Section 3.3, through the projection of the vectors of
ranks according to the two criteria on the direction determined by the ranks of
the option preferred by the main criterion. Once determined the L2
or the squared L2 norm of this projection, we assume these
measurements subject to uniformly distributed independent disturbances, with
mean equal to the measurement provided and range determined by the maximal
observed distance, and compute the odds of each option reaching the first
preference position.
The results of the adjustment of the regression models are
presented in Table 4.1 below.
Table 4.1.
Regression of Observed Odds Derived on Pairs of
Explanatory Variables
|
Ranks |
Odds |
Frontier |
Projection
|
R2
|
15% |
49% |
59% |
70% |
F
|
8,0 |
43,2 |
64,5 |
105,6 |
In all the regression
models, the estimates of the coefficients of the explanatory variables are
significant at the 1% significance level, except those of the rank according to
jockey choice in the first regression and the odd derived from the L2
norm of the projection in the last one.
The estimate for the coefficient of this last variable is negative. The
same happens to that of the score of closeness to the excellence frontier. The p-value corresponding to this last
estimate, although small, is, also, considerably higher than that of the other
explanatory variable. This suggests
that, among the two correlated explanatory variables employed, we should
prefer, in the two first regressions, the official ranking and the odds derived
from it, in the next, the odd of being preferred according to one of the two
criteria and, in the last one, the odd derived from the preference measured by
the squared norm of the projection.
The analysis of the
residuals of the four equations is clarifying.
Examining point by point, it is easy to perceive that the adjustment
improves from the first to the last regression, as the model gives up fitting
the large number of points with the dependent variable near the origin, that
means, with small volume of bets. The
predictions for these points in the last regressions systematically
overestimate the observations. The counterpart is a better approximation to the
points corresponding to options of higher preference that substantially
improves the global explanatory power.
The simple regression
from the odds derived from the bets on the odds derived from the squared norm
of the projection of the vector of ranks on that of the option preferred by the
main criterion presents a R2 of 69% and F statistics equal to
207,6. When the explanatory variable of
the simple regression is the own norm of the projection, without the use of the
calculation of the odd of the option being the better, the coefficient of
determination falls to 14% and the F statistics to 14,5. The sample coefficient of correlation falls
from 83% to 37%.
Thus, we find strong
indication that the transformation of the ranks in odds of the option being the
preferred increases the precision of the estimates of the coefficients of the
linear models explaining the odds found in the distribution of observed
bets. The application of this
transformation to explanatory variables built by combining the ranks through
the projection on the direction found most important also increases the precision
of the linear adjustment. After applying this transformation, we also find
statistical support for the conjecture that the mechanism of aggregation
involves projection on the direction of the option preferred by the main
criterion and to the use of the squared norm of the vector of projected
preferences.
Table 4.2 below presents the
correlations corresponding to each proof examined, between the final odds
offered by the bettors and each explanatory variable. As we advance from left
to right in this Table, each pair of correlation columns corresponds to more
complex transformations. In the two
first, the preferences according to each criterion are given directly by the
ranks. In the two following, they are
given by the odds of the options being the preferred. The variables of the following columns result from the
composition of the two criteria using the two algorithms developed in the
Section 3.2. Finally, the two last
result from the composition through projection, developed in Section 3.3.
The total referred in
the first line of this Table refers to the seven proofs in habitual distances
for which were provided preliminary ranks, whose correlations are listed in the
following lines of the Table. We can see, in the total, a correlation of 83%
for the vectors of odds derived from the bets and of odds derived after
projection on the direction determined by the option preferred according to the
dominant criterion.
The two main big prize proofs of the week,
the own Big Prize Brazil and the Marvelous City Prize, run one after the other,
were analyzed separately because the job of ordering the competitors to these
races is carried out jointly, resulting, in this opportunity, in the assignment
of two animals for the number 1 of the Marvelous City Prize and in the doubt
about the effective composition of the field of animals lining up for this last
proof, until a few hours before the races.
Beyond that, a peculiar answer to the official indications is expected
of the bettor in those races because they are run in the long distance of one
mille and a half, distance seldom practiced, where factors as the experience of
the jockey and the genetic structure of the animal may be taken into account in
a different way by the bettor. The
correlations relative to these proofs are included at the end of Table
4.2.
In Big Prize Brazil, the
big favorite of the bettors, Canzone, jumped inside the starting gate,
injuring itself, and was retreat. The
bets were reopened for a few minutes, but, given the overcrowding of the Track
and the circumstances of the retreat, only a few bets were redirected. The correlations relative to this race were,
then, calculated maintaining Canzone between the competitors and
attributing it 30% of the finally observed bets. The correlations of bets and
odds derived from squared norms of projections are, also for these proofs,
among the highest.
Table 4.2.
Correlation of Observed Odds with
other Preference Measures
Race
|
Jockey
Rank |
Official
Rank |
Jockey
Odd |
Official
Odd |
DEA
Score |
Best for
one at least |
Projection
Norm |
Squared
norm |
Total |
23% |
37% |
35% |
65% |
48% |
71% |
77% |
83% |
Breno |
53% |
54% |
83% |
85% |
68% |
93% |
93% |
98% |
Tirolesa |
30% |
43% |
58% |
64% |
44% |
71% |
71% |
83% |
Sukow |
9% |
64% |
-16% |
92% |
52% |
62% |
80% |
86% |
Seabra |
46% |
52% |
20% |
26% |
49% |
39% |
62% |
54% |
Presidente |
36% |
47% |
28% |
41% |
51% |
50% |
62% |
52% |
Delegações |
-19% |
82% |
-13% |
92% |
66% |
53% |
83% |
84% |
Mossoró |
68% |
70% |
36% |
62% |
71% |
70% |
83% |
79% |
Brasil |
19% |
60% |
1% |
83% |
63% |
68% |
91% |
93% |
Cidade |
48% |
42% |
44% |
39% |
42% |
50% |
49% |
49% |
Using natural logarithms
of odds, the results obtained are similar, the advantage for the projection approach
being not, however, so accentuated.
This loss of correlation can be explained by the effect of the
application of logarithms to increase distances between options of low
preference that the probabilistic approach had reduced. Table 4.3 presents the correlation,
in the set of seven races in the first lines of Table 4.2, of the logarithms of
odds derived from the bets with the same explanatory variables of that Table,
replaced, also, those corresponding to odds by the natural logarithms.
Table 4.3.
Correlation of Observed Logarithmic Odds with Other
Measures
Jockey
Rank |
Official
Rank |
ln
Jockey Odd |
ln
Official Odd |
DEA
Score |
ln best
for one at least |
ln
Projection Norm |
ln
Squared Norm |
34% |
58% |
41% |
66% |
65% |
65% |
66% |
71% |
5. Conclusion
The strategy of ranking
the options and then derive the probabilities of each option to be classified
in the first place led to distributions of probabilities of each option finally
being chosen closer to the distributions of observed bets than the vectors of
ranks. This transformation improves
prediction also when applied to the projections on the direction in main
evidence.
The squared norm of the
projection produced the best transformation of the data, not considering the
probabilistic approach. It is
interesting to notice that this conclusion would not be possible without the
subsequent probabilistic transformation.
The results obtained seem impressive in the
context of formation of preferences of bettors in horse races. The two
classifications provided explained most of the final preference. To extend this
approach to other contexts just requires the assumption that the motivation to
take the best option as reference is present. This should be, by its turn,
object of empirical investigation.
Charnes, A., Cooper, W. W. e Rhodes. E. (1978). “Measuring the Efficiency of Decision Making Units”. European Journal of Operations Research, 2: 429-444
Farrell, M. J. (1957). “The Measurement of Productive Efficiency”. JRSS, A, CXX, 3, 253-290.
Gomes, L. F. A. M. and Lima,
M. M. P. M. (1992). From Modeling Individual Preferences to Multicriteria ranking of Discrete
Alternatives: a Look at Prospect Theory and the Additive Difference Model. Foundations of Computing and Decision
Sciences 17: 172-184.
Kahneman, D. e Tversky, A.
(1979). “Prospect Theory: An Analysis of Decisions Under Risk”. Econometrica, 47, 263-291.
Lootsma, F. A. (1993). “Scale Sensitivity in the
Multiplicative AHP and SMART”. Journal of
Multicriteria Decision Analysis 2:
87-110.
McFadden, D. (1999). “Rationality for Economists?” Journal of Risk and Uncertainty, Special
Issue on Preference Elicitation, 19, 73-105.