Statistics

Wikipedia tells us that,

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied.

I decided to study the statistics of horse racing. If ever a pursuit provided statistics that could be studied, it horse racing.

The key is discovering the trend and as I continually say, I want the facts not the fiction. All to often in these strange times we have groups of people who think they have facts simply becuase they believe it. Sorry folks it does not work that way.

One of the most overused words when it comes to punting is edge. You must have an edge, you need to find an edge, what is your edge etc., etc., etc.,

if an edge is finding more winners at better prices, then correct, we need the edge but too many people speak about this edge like some mythical beast like the unicorn or the Loch Ness monster. No.

If you want this most quoted edge there is only one means to achieve it. That is hard work combined with preparation.

If you have real data and an eye for seing the trends that this data shows you will find this edge. With no real data and a just a “feeling”, you will have a blunt not an edge.

Prepare the data, find the trends, find the facts and practice consistency.

Get Smart

Horse racing is a “chaos” sport. That is why I am not a speed map person. The gates crash back and your selection misses the kick by a length or your selection cops a squeeze from other runners and you speed map is tissue paper after a second. I can see why people love their speed maps and each to their own, but, they are not for me.

Do I watch video replays of races? I sure do, but I don’t watch videos to find winners, I watch videos to see if a runner was strong at the end of a race. Not those that were storming home but one that was strong.

Over the years I have watched many punters continually back the “one” that was storming home and those punters continually go storming home in anger. Why? The horse that was storming home over a 1000m then storms home over 1200m and then storms home over 1400, etc., etc., etc.,

The other type of runner that will drive you crazy, and broke, from watching videos is the runner that “found” trouble. Like the stormer, the one that found trouble continues to find trouble at most of its subsequent starts. Money munchers is the common term.

They always seem to “miss it by that much”.

The final aspect of the stormers and the trouble finders that I dislike is that when they finally win, is that they have been that obvious on the videos, that everyone is on them and you end up having taken 2 or 3 points less than you should taken. The real value is not available.

Data Driven

So I am a data driven punter, realistic enough to know that I will not back every winner but if I back enough winners at a better price than what my data says, if I show discipline and if I practice consistency, the results will come.

Carly Fiorina was the first female to head up a Fortune Top 20 company in the States and I have zero idea if Carly has ever backed a horse in her life but this quote of hers has been printed out by me and sits on my desk,

“The goal is to turn data into information, and information into insight”.

or as Seth Stevens-Davidowitz opined,

“The big data revolution is less about collecting more and more data. It is about collecting the right data”.

I have collected more data than I will ever use and am not afraid to chuck away data when it proves ineffective. I also have discovered that data that works and continually works for one set of runners will not magically work for aother set of runners.

I am not a mathematician so I am not 100% that the formulas I apply are correct in how I apply them. If the results are consistent then that will do me.n

Settle Down

So what are the areas the I decided to analyse when it came to horse racing statistics? They will come as a surprise to no-one as I assume they are some of the most common stats and are the information that lots of people use. It is how they are used that counts.

Criteria 2 - Age

Age234 5678 8+
Winning %0%54%31% 11%3%1%0% 0%
Avge. Runners
Per Race
0.024.674.30 1.600.440.170.07 0.03
Real Winning %0%12%7% 7%6%3%0% 0%

This Table is produced from 181 races with 2043 runners

This data indicates that 3yo have both the highest actual strike rate and probable strike rate when adjusted, for the category of average runners per race.

Interestingly, the adjusted strike is equal for 4 & 5 years olds even though 4 year olds have almost three times as many actual winners.

Criteria 2 - Sex

SexFillyColtMare HorseGelding
Winning %24%6%17% 0%53%
Avge, Runners
Per Race
2.100.221.92 0.026.33
Real Winnning %11%25%9% 0%8%

This Table is produced from 144 races with 1523 runners

In this race would you wait until a colt starts in a race before having a bet? That’s once every 5 races. Or would you try and pick which of the two fillies starting (on average) will win?

Criteria 4 - Prep. Stage

Prep,
Stage
1st Start'1st Up'2nd Up Other
Winning %24%31%23% 22%
Avge, Runners Per Race2.472.602.42 2.63
Real Winning %10%12%10% 9%

This Table is produced from 152 races with 1539 runners

Nothing highlighted here as this is where I/you may ask the question, given the closeness of the Real Winning % would you bother applying this criteria on this race?

Criteria 5 - Race Type

AgeBM1st St2M 2F3 M3F M4U M FM MOMPicnicOBM MMMSLG
Winning %15%5%1% 18%5%5%5% 45%0%0%1% 1%1%
Avge, Runners Per Race1.910.150.05 1.240.180.700.35 5.640.080.050.03 0.080.02
Real Winning %8%34%11% 14%26%7%15% 8%0%0%20% 7%50%

This Table is produced from 192 races with 2012 runners

Now when we get to the Tables such as this, we need to use our analytical skills a bit harder if today’s runners dicate so. If we had runners who competed in 2yo maidens, 3F maidens and a Listed or Group race last start, it maybe a race to sit out. If only one rep. from those race types starts, it maybe a clear indication. Picking a winner from those that ran in Open Maidens last start could be a problem.

Or, you concentrate solely on the last start open maiden runners?

I might go for the last start 3yo maiden runner.

It’s a conumdrum.

Criteria 6 - Distance competed over last start

Dist.FS10001100 1200130014001500 160017-18-1920-21-222400 24 +
Winning %
0%0%1% 1%2%44%13% 28%3%5%1% 1%
Avge,
Runners
Per
Race
0.450.010.05 0.720.653.631.06 2.690.700.730.10 0.10
Real
Winning %
0%0%25% 2%4%12%12% 10%5%6%11% 11%

This Table is produced from 86 races with 939 runners. Today’s race is over 1600,

So do we wait for the rare runner who competed over 1100m last start or look elsewhere? Apart from that runner there is no clear standout here, is there?

What about traditional form that says horses must progress by 200m each race in a prep.?

Again, we may come back to the category that wins most races over the 1600. Most starters as well.

Criteria 7 - SP in Yesterday’s race

SP 1st StartOdds On$2 - $2.9 $3 - $3.90$4 - $4.90$5 - $7.5$8 - $10 $11 - $15$16 plus
Winning %4%1%8% 10%6%20%13% 13%26%
Avge.
Runners
Per Race
0.490.050.28 0.490.401.150.96 1.135.88
Real
Winning %
8%18%29% 21%15%17%13% 11%4%

This Table is produced from 240 races with 2597 runners.

I know what I am avoiding this time and that’s the ones who started as outsiders last start although they do win 1 in 4 races this time running. Hmmm. I also know ehich ones I will looking at just that little bit smarter if I think a trifecta might be had.

Criteria 8 - Losing Margin

Beaten By (L)1st Start0.1 - 0.951 - 1.95 2 - 2.953 - 3.954 - 4.955 & +
Winning %18%13%14% 12%9%8%25%
Avge.
Runners
Per Race
1.820.600.69 0.910.990.975.04
Real
Winning %
10%22%21% 14%10%8%5%

This Table is produced from 106 races with 1169 runners.

Again our winners come from the best performed runners at their previous start. Those beaten a distance are poor investments although again, they win 1 in 4 at their prior run.

Criteria 9 - Prize Money

Prize
Money
FS< 25k 21k - 27k 31k - 40k 41k- 60k 61k - 100k 100k - 200k 200k +
Winning %4%3%47% 38%6%1%1% 0.4%
Avge.
Runners
Per race
0.490.966.13 2.760.360.060.04 0.01
Real
Winning %
8%3%8% 14%16%14%20% 50%

This Table is produced from 240 races with 2597 runners. Prize money levels apply before recent increases

This is probably the Table that make the most “sense” with the best % of “real” winners coming from the higher grades

There is one Table to go, you may have noticed that Criteria 1 has been skipped so far but I wanted to mention some things about the above Tables.

All the above Tables have been taken from different maiden race types or track conditions.

The results over different maiden race types and track condition vary considerably. However, most race types and track conditions are consistent individually. I wouldn’t use them otherwise.

For track conditions I group Good 3 & Good 4, Soft 5 & Soft 6, Soft 7 & Heavy 8, Heavy 9 and Heavy 10.

I am undecided if I will split them when there is enough data for each track condition. For example, the greatest quantity of races are in the category of Good 3 & Good 4, however on current numbers, it would take another 10 years to get enough Good 3 tracks to rate them alone.

I am unsure if qualified statisticians or form buffs agree with my maths. Unsure and could not care less. I practice Continuous Improvement and check the trends daily, weekly and monthly. If the trends change dramatically I will look at my work critically.

I do not mind change at all, I relish necessary change.

The recent change with most tracks now giving a more exact race distance really pissed me off when it first came in. Now I live with it and just get on with it although it makes time ratings more difficult, in my opinion. No-one has yet to convince me one way or another if a horse gets quicker or slower over the last 14 metres of a 1214m race. 50 cents eachway more than likely.

And now Criteria 1 - Today’s race Starting Price.

Criteria 1 - Today’s SP

Today's
SP
Odds On$2 - $2.9$3 - $3.90 $4 - $4.90$5 - $7.5$8 - $10$11 - $15 $16 plus
Winning %14%15%17% 14%10%10%5% 14%
Avge,
Runners
Per race
0.230.520.67 0.521.140.920.99 5.92
Real
Winning %
60%29%26% 27%9%11%5% 2%
Real
Winning %
Real SP
$1.67$3.46 $3.87$3.75$10.89 $8.78$21.25$42.42

This Table is produced from 86 races with 939 runners.

I do not make selections solely from this Tables criteria. However if I see one in the last row in the Green, I take notiice. The above is real data.

If I see a runner in this type of race that looks like it will start between $4.00 & $4.90 it is more than likely that I will have some sort of stake on it or at least feature in that leg of the Early Quaddy.

It has a winning strike rate better than the price it is paying so why wouldn’t you be on it?

Conversely I would be looking at my stake, as part of bankroll management, if the one I fancied was in the $2 range.

All will say is that if I had a backed every runner that looked to be starting between $4.00 and $4.90 in this type of race since March 2020 for $10 a win, I would of been a happy camper.

Is this what they call SP profile? Another term that 5 years ago I considered balderdash.