Aerobatics Server

ACRO E-mail Archive Thread: SPAM:Re: [acro] TBL and scoring program

[International Aerobatic Club] [Articles etc] [Communications] [Contest and Patch Information] [Aviation Organizations] [Other Aerobatics Info] [Aerobatics Images] [Other Aerobatics Links] [Other Aviation Info (including weather)] [Search ACRO Website]

Disclaimer: These aerobatics pages are developed by individual IAC members and do not represent official IAC policy or opinion.



[Usage Statistics]


ACRO E-mail Archive Thread: SPAM:Re: [acro] TBL and scoring program


                


Thread: SPAM:Re: [acro] TBL and scoring program

Message: RE: SPAM:Re: [acro] TBL and scoring program

Follow-Up To: ACRO Email list (for List Members only)

From: "Michael Golan" <mg at mivzak.com>

Date: Mon, 27 Dec 2004 00:09:52 UTC


Message:

 Tim Carter wrote lots of comments against what I tried to show. 
I'm sorry to say Tim comments are mostly wrong. However, Math cannot be
tought online, nor demonstrated, nor do I believe most mebers would like
a Math refersh for Xmas!

Instead of discussing every point Tim has made, I'd like to point out a
few more things in general:

1. Throwing out the "low/high" scores DOES work, pretty well. It also
has a major disadvantage of requiring at least 5 judges. Indeed, with 9
judges and throwing out of the top/bottom 4 scores, I'm pretty sure I
can statistically show that TBLP and the result will be very similar
ranking! And this is, to answer someone's question, what they try to use
in the olypmics. 

2. When judging unlimited freestyle, and one good judge scores a 5
average, while two poor judges score 8 averge, the good judge gets
promoted by TBLP! His average is moved up to something like 7, the other
judges averege goes down to something like 7.5, and additionally, where
the good judge has seen a "real 8" figure, it gets further promoted to a
near 9. This is the math.

3. Saying that "math cannot fix bad judging" is not true. To prove that
math CAN do it, I will have to use even higher math than the one in
Appendix 2 ... (yes, I've a math degree, and yes, I've a full
understading of every word and formula in Appendix 2).  But I will give
you a basic explanation. 

We assume that an "error factor" was added, randomly, to the actual
judges results. That is, if each judge used a perfect video, and had
infinite amount of timing, he would eventually score each figure
perfectly with score X. We now assume the actual score was X+e, or X-e,
where "e" represents an "error". This error is caused by lack of time,
bad eyesight, stupidity, etc, OR bias. It is critical to note that bias
is only one issue in the overall desgin of TBLP. 

TBLP can be shown, mathematically, to produce the "hidden X" better than
the "X+e" by using statistical analysis to elimiate the MOST LIKELY
ERRORS. This does not eliminate them all, nor can it make good judges
from bad ones. But it CAN minimize errors. And it can reject one bad
judge when you have 4 good ones. And YES, it will eliminate the one good
judge if you have 4 bad ones! 

4. TBLP is a 3-part program. One is the re-alignment of the scores. Two
is the removal of some judges scores as likely errors. Three is the
complete removal of a judge as likely biased.

Even if you feel Part 2 and/or Part 3 are wrong, you should accept that
part one is good, as I've shown in the 5-judges example. 

Again, mathematically it is easy to show that straight averages are
going to change the ranking order to a degree where, if the judges
simply took a vote "who was best" they will often get a different result
than what the simple average would give.

Again, logically, the simple-average actually gives an advantage to the
high-scoring judges. I get a feeling that most judges opposing TBLP are
the "harsh/good/low scoring" judges ... That's a shame. Everyone need to
understand that if I score "8s and 9s" and you score "5 and 5.5" than
when we're applying averages, my big numbers dwarf yours, making them
less significant.

I'd be happy to sit down with aynone and explain the math in detail. It
is not TOO complicated. But since I'm staying in Israel for most of the
year, that's a little distance we need to cover :-)

--Michael


> -----Original Message-----
> From: Tim Carter [mailto:1timcarter at comcast.net] 
> Sent: Sunday, December 26, 2004 7:46 PM
> To: Michael Golan; acro at aerobaticsweb.org
> Subject: SPAM:Re: [acro] TBL and scoring program
> 
> 
> While I am no proponent of TBLP, I don't necessarily think
> it should be thrown out.  The main advantage to it is that 
> judges know it exists and are therefore less likely to 
> intentionally bias their scores.  (I don't think many would 
> do that anyway).  The much worse problem is poor quality 
> judging.  Judges are volunteers, just like everyone else at a 
> contest.  We barely have enough to support a contest, so we 
> have to accept what we get.  No system is going to be perfect 
> and we are always going to be subject to errors, missed items 
> in downgrades, and uncertainty on some items. We also have a 
> wide variance on judges application of rules that are very 
> specific.  Scores don't matter nearly as much to me as 
> others, but I do understand the concerns for accuracy.  We 
> can only accept that they will never be perfect.  We can try 
> to make them the best we can.
> 
> All that being said, you are going to have a very hard time 
> convincing me that TBLP accomplishes what it sets out to do. 
> I'm capable in math and would be happy to sit down and let 
> you walk me through it while I toss in examples that prove it 
> wrong.  The goal is to remove bias.  The fact is that it does 
> as much unintentionally as it does intentionally.
> 
> > Some more comments on TBLP. I've added numerical exmaples
> showing why
> > the classic claims againt TBLP have no real foundation.
> Hope this helps
> > convince some of you.
> 
> Not convinced.
> 
> > Actually, I can easily explain mathmatically how the
> winner with TBLP
> > will be ranked#1 by two judges, and ranked#2 by the third
> judge. Yet if
> > you use averages, he would not win!!!
> 
> That doesn't prove anything "mathmatically".  When you can
> show me bias by a judge that is "mathmatically" corrected,
> and prove that competance by a judge IS NOT "mathmatically" 
> removed, then I'll buy it.
> 
> > If the two judges are "harsh" enough, their level of
> differentiation
> > between pilot#1 and pilot#2 (in their rank) is small
> enough that the 3rd
> > judge gets to unfairly choose the winner.
> > TBLP is smartly designed to keep this "ranking order".
> 
> What if the first two judges are ones who can never see
> their way to give someone a 10 just because they think
> nobody should be given a "perfect" score?  Seen that several 
> times.  What if #3 gives the 10 when it is earned.  How does 
> TBLP decide that #3 is "biased" rather than "more correct"?
> 
> > > No program can counter incompetent judging. It was not 
> designed to 
> > > do that.
> >
> > Again, this isn't true. If you we add a "robot" to score
> as a 5th judge,
> > and he just gives out "random" scores, simulating
> "incompetent judge",
> > TBLP will do a pretty good job of throwing him out. The
> final rank will
> > look more like the rank with the robot removed, than with
> him averaged
> > in.
> 
> Again, you've selected a bad example to "prove" your point. 
> TBLP isn't supposed to throw out "incompetent judges", and 
> furthermore "incompetent is far from the same as "random". 
> I've seen a great deal of "bad judging" that rarely gives 
> anything outside of 7-8.5 range.  Take 2 of those, throw in 
> one with poor eyesight, or poor understanding of the rules, 
> and one strong judge and then tell me who gets TBLP "corrected".
> 
> > I'm sorry to note that you feel TBL does this. Average
> does this too, to
> > a greater
> > extent. If you have 5 judges, ranking pilots A and B this
> way:
> > Judge #   1   2  3  4  5
> > Rank:
> > PilotA     1   2  2  1  3
> > PilotB     2   1  1  2  1
> >
> > Which pilot should be the winner?
> > If every judge scored 3000 points to its #1 pilot, 2900 to
> pilot#2, 2800
> > to pilot#3,
> > Than PilotB wins,  no questions asked...
> > But what if the actual scores where like this:
> >
> > Judge#        1       2       3       4      5
> > PilotA    3000  2950  2970  3000 2950   average = 2974
> > PilotB    2950  3000  3000  2900 3000   average = 2970
> >
> > TBLP is very good at detecting this and helping PilotB
> rightfully win!
> 
> Wow.  TBLP accomplished exactly the same thing that throwing 
> out the high and low scores would have done.
> 
> PilotA average without high and low = 2973
> PilotB average without high and low =  2983
> 
> PILOT B WINS!!!!  Proving that throwing out the high and low 
> score is just as good as TBLP (TIC).
> 
> 
> > Well, 4+7+7 is 6 average, 1.41 as standard deviation. So
> your "4" is 1.4
> > "away" from the average. Your score will be adjusted a
> little bit, to
> > something like 4.5.
> > This is pretty "fair". Furthermore, if you're correct and
> you always
> > "nit pick" Your overall scores will be lower than the
> others judges. As
> > a result, all your scores will be upped, without changing
> your overall
> > rank, so your score might begin with 4.5, and not get
> adjusted at all!
> 
> Really?  Is this what TBLP does?  Again, I'll be glad to sit 
> down with you and walk through the math to see if that is 
> correct.  My experience doesn't support that theory, but I'll 
> keep an open mind.
> 
> > >I have even seen an incident where a zero didn't carry
> but got past the
> > chief's table
> > >and into the score room  and posted. When they went back
> to correct the
> > score and
> > >add 60 points to the score with no other changes to any
> other pilots,
> > the pilot dropped on
> > > spot. That is wrong!
> >
> > Not really. It should how well TBLP works. The "zero"  was
> totally
> > tossed out by TBLP, as a clear mistake! When the real
> score went in
> > (likely the lowest score of the other judges), TBLP used
> it for the
> > computation, now lowering the pilot's scores. Furthermore,
> it might have
> > even removed that specific (clearl harsh judge) from the
> competitor's
> > calculation, and it was now restored!
> 
> Clear mistake?  Scoring system mistake, yes, because they
> are not supposed to be enterred under current rules.
> Clearly harsh judge???  Maybe the only one who noticed that
> the roll was supposed to be the same direction as the
> previous one, but was really opposite?  The computer gets to 
> decide if it was "clearly" a mistake, or bad judging or bias 
> or whatever.  My contention is that you cannot use a 
> mathmatical formula to correct subjective judgements.  Nor 
> can it differentiate between bias, incompetance, 
> misapplication of rules, or poor eyesight.
> 
> The saving grace of the whole discussion is that (just my 
> opinion here) most of the time, the best pilots of the day 
> end up in the correct position.  I don't think that is a 
> RESULT of TBLP, but it happens.  That's why I really don't 
> care for it to be thrown out even though I don't buy into the 
> theory it is based on.
> 
> > I'm very sorry you feel this way. As someone who
> understands the Math
> > involved, I can
> > assure you only of these:
> >
> > 1. Ranking order on its own is not changed by TBLP. It
> actually try to
> > make sense of what it means when judge A give 2900 vs 3000
> to
> > competitors 1 and 2, while another judge gave 2800 vs
> 3000.
> 
> Math can't do that ("make sense of what it means when....").
> 
> > 2. TBLP pretty rarely throw scores out, and when it does,
> it usually
> > makes good sense, e.g. four judges give 7s and 8s, the
> last judge gives
> > a 4, and still scores everything else similar to them.
> 
> And the last judge was "wrong" because the MATH says so?  I 
> don't think so.
> 
> > 3. Average is far from perfect. In fact, average will
> clearly make the
> > less harsh judges make more of a difference in the final
> rank than
> > judges like you. This is because in a simple average, a 9
> is worth twice
> > as much as 4.5 ...
> 
> Oops.  I guess maybe you do understand the math of TBLP
> (sorry, I couldn't stop my sarcastic side from joining 
> in........I'm just pulling you chain here, because I really 
> don't know how well you understand the math, and would 
> welcome the chance for ME to understand it better)
> 
> > 4. TBLP is nothing more than a very fancy/smart average,
> with clear
> > mistakes tossed out. Just as you don't expect all figures
> to have the
> > same value for the final ranking (hence the K factor!),
> you can't expect
> > all judges to have the same value for the final ranking
> (hence TBLP).
> 
> Again, I don't think so.  In the current unlimited
> freestyles there are so many parts of each figure that MOST 
> judges don't apply the real total deductions to them.  The 
> ones who do, get adjusted by this "fancy/smart" system.  For 
> example, if you took two 30K figures that each scored an 7.5, 
> then combined the two figures into one 60K figure that was 
> flown with exactly the same mistakes, it should be scored a 
> 5.  Say that one judge did that while the others all gave it 
> a 7.  Say the rest of the figures were all less than 30K and 
> were all scored fairly by all judges.  The one good judge 
> gets adjusted.  It may or may not make a difference in the 
> results, but it is not correct, and the "fancy/smart" program 
> is wrong.
> 
> Let's take this to the next higher level, sit down and work 
> through a few thousand examples to see if we can make it fit 
> reality.  When we do, let's do it at a bar!
> 
> Cheers,
> 
> Tim Carter
> 
> 


                


If you have aerobatics related information that you would like to make available, please contact me at the email address below.

Last Update: Fri May 4 13:13:12 2012


© Dr. Günther Eichhorn
Springer 233 Spring Street New York, NY 10013 USA, Email Guenther Eichhorn