Some critical words about the TIOBE Programming Community Index.
Open letter to the TIOBE Programming Community Index -
when reading your "Programming Community Index for February 2005" I stumbled across a strange "-tv" in your search patter. I am very much interested to know why you exclude sites with the word "tv".
You see I made a litte positive test on Google to see the impact of such exclusion and I noticed that there are many legitimate sites featuring embedded programming of "tv satelites", "tv satelite receiver", "software for television production" or "Digital Tv Tuner".
In the top ten this migh not make a big difference but if you look down I fear the impact is significat. I wonder how many of ABAP 10 green arrows are due to the fact tha ABAP is unsuitable for embedded programming. Or how many of Ada's 5 red arrows to the fact that embedded programming is THE core arena for Ada programming.
Looking further down I would hasart a guess that the "-tv" rule came into force between Mar.04 and Jun.04 - there was a mayor shuffle at that time.
If that was true than all those red and green arrows are currently worthless since they are calculated on a 12 month basis. And if my suspicion is true 12 month ago you collected you data with the old rules. Comaring Apples with Oranges.
Some statistics of my on
The following little statistic shows the impact the -tv has on a selected group of programming languages:
|+"Ada programming" -tv||46,800|
|+"C programming" -tv||1,480,000|
|+"ABAP programming" -tv||42,300|
Of corse this has been a google search only. To see the hole picture one need to make searches on MSN, Yahoo and google groups.
I did make little test on +tv - there are many legitimate sites having both programming and tv.
BTW: Ada has such a hard hit because of the use of Ada in tv satelites and tv satelite receivers. Pages on that subjecte are not counted. And I will point that out to them - if it helps I don't know.
Now there are new rules where not only "tv" is excluded but "channel"as well. Changing the rules again will mean that the trend information of the index is again worthless for another year.
Some statistics of my on
So how does the new rules affect the index? The good news is than on the pure numbers the amount of Ada related sites have more then doubled. The bad news is that a wooping 17.4 have been excluded.
|+"Ada programming" -tv -channel||138,000|
|+"Java programming" -tv -channel||3,100,000|
|+"Perl programming" -tv -channel||1,610,000|
|+"C++ programming" -tv -channel||1,820,000|
|+"PHP programming" -tv -channel||2,320,000|
|+"C# programming" -tv -channel||734,000|
|+"ABAP programming" -tv -channel||131,000|
|+"C programming" -tv -channel||1,800,000|
I have trebble checked the C result - more hits without "tv" "channel" - must be a bug in the Google software. I see it as another reason why all that exclusion make the index worse and not better.
The Reason behind it
Thanks again for your critical view on the TPC index.
Let me first stress that I am not biased. I don't care about Ada being in 1st, 5th or 34th position. This has nothing to do with our portfolio, because we adjust our portfolio based on the TPC index and not the other way around. What we are trying to do is to measure programming popularity with as simple rules as possible in an objective way.
A while ago the language ABC (predecessor of Python) had a high score. This was mainly due to the TV channel ABC and not because of ABC's raising popularity. That's why we excluded "tv" and later on "channel". For ABC the difference is 26,000 hits versus 886 hits! That's what I call a difference (2934.5%)!
Now let me give you a brief course on statistics. If we add the "missing" 17.4% to Ada's ratings this will become 0.540% * 1.174 = 0.634%. Now Ada is passing Fortran, wow! But... Fortran misses also 14.3%, so Fortran's ratings will become 0.600 * 1.143 = 0.686%. In other words, the "-tv -channel" addition has no effect on the position of Ada whatsoever! To be honest, I think most staticians will laugh about your conclusion. It is a beginner's mistake.
I agree with you that we exclude now also legitimate pages with "-tv -channel". But these are more or less equally distributed over all languages. They hardly influence the ratings of languages because all languages suffer from it more or less to the same degree. Only languages which have a high amount of false positives are corrected in this way.
Finally, there is indeed a bug in Google with exclusions. I have submitted this already to Google.com, but apparently this has not been fixed yet.
Looking forward to your next review!
At least they are not targeting Ada in particular. I personaly would have used the "-tv -channel" only in ABC and not for the other languages since all the other languages won't share there name with tv station.