Using QlikView and DistroWatch to report on the most popular open source distributions (BSD, Linux, Unix)

Ok, so I am into FreeBSD and open source software, but I have recently had to do a QlikView implementation for my company LANDesk. QlikView has a feature where you can pull data from the web and report on it. So I needed to learn how to use the web-based reporting, so I decided to do a report from www.distrowatch.com.

Report Goals
There are few things that interests me from the data at DistroWatch:

  • Which base platforms are the most used?
  • Which platforms should software companies focus on supporting?
  • Where does BSD sit in the rankings.

How the report was made
So on the main DistroWatch page, there is a report that will give you the average hits per day (hpd) that a Distro’s web site gets. At the bottom there is a link to full popularity page of just these reports:
http://distrowatch.com/stats.php?section=popularity

So at first glance, you see Ubuntu is the best and Fedora is second and so on. I wanted to take the statistics a bit further. I wanted to know what main base distribution was the most used. What I mean by base distro is this: Ubuntu is #1. But Ubuntu is not a base distribution, instead it is based on Debian. Mint is #3 and is also based on Debian. Debian itself is #6 and it is a base distribution. Fedora is a base distribution.

QlikView can connect to this web page and consume this data. It was also able to loop through and click go to the link for each distribution where it was able to pull the “Based on” results. I did a few little tweaks to clean it up.

So I used QlikView to match each Distribution to its base distribution and built my report. I gathered the cumulative hits per day (hpd) of each base distro by summing the hpd from itself and its child distros. The results are staggering.

Result of the Report
I am going to show you a screen shot of the report, but I am only going to show the top 10 base distributions because otherwise it is to hard to view the report.

# 1 – Debian
Well, I have to say that I new that Debian (13818 hpd) was popular because of Ubuntu, but I didn’t know how far ahead it was compared to other base distributions. I expected Red Hat to be a lot closer but its just not. Lets look at the top ten Debian platforms by hits. In QlikView this is easy, I can simply click on the Debian pillar in the report.

So not only is Debian’s cumulative hits per day first, but it is first by a long ways. The cumulative hits per day of distros based on Debian is more than three times larger than any other base distribution’s cumulative hits. It is pulling away from the pack and nobody is going to catch up any time soon.

What I don’t know is are these new users or are other distributions losing members to Debian or Debian-based distros?

You might be grumbling to yourself and saying some incorrect statement like: Well, Ubuntu doesn’t have Enterprise support like Red Hat. But like I said, that is an incorrect statement. See their support page:
http://www.ubuntu.com/support

# 2 – Red Hat
Now, lets look at the top ten distros under Red Hat.

Ok, can I tell you that I was surprised at these results. I realize that Fedora was huge, I mean it is second on the distro list under Ubuntu, but I had missed the fact that CentOS was getting more than twice the hits Red Hat itself gets. The rest are hardly worth mentioning.

Historically mong Enterprise environments Red Hat is the most known distro, but when you look at these stats, you have to wonder if Ubuntu has taken over. The numbers for Fedora are fine, but for Red Hat they are not really that good. In fact, I keep hearing about companies using CentOS instead of Red Hat and as you can see, CentOS is getting a lot more exposure than Red Hat.

I will make this statement. Based on this data, if you are a software company considering whether to support Debian or Red Hat first, based on this data you have to choose Debian. If you were to make up some fuzzy logic for Red Hat (which due to its enterprise presence may or may not actually be valid) and weight the distributions based on other factors and somehow found a way to say Red Hat and its distro’s cumulative hits per day were worth three for every one, it would still be less than the cumulative hits per day Debian gets.

# 3 and #4 – Mandriva and Slackware
Ok, back to the report. Something that shocked me from the first chart and I had to analyze it further. Slackware? I had no idea that it was third. However, is it really third? It has a lot of very small distros based on it and Slackware itself gets 590 hpd and most the distros get less than 100 hpd. Mandriva is fourth but arguable could be third over Slackware. In fact, I have to call Mandriva third over Slackware. Sometimes you have to look at the data and make a judgment like this. Sorry Slackware, I am not trying to be biased (otherwise I would be talking up FreeBSD). I have no bias to any Linux distribution. I just say this based on the fact that Mandriva (1048 hpd) and the based-on-Mandriva version PCLinuxOS (773 hpd) both get more hits by a long way than Slackware’s top distros. The only reason Slackware got more hpd was because it has a lot of distros that were really small, while there were very few small distros based on Mandriva. The difference in the amount of small base distros is most likely due to the fact that Slackware is one of the oldest Linux distros, if not the oldest remaining distro, so naturally it has more distros based on it.

# 5 – Gentoo
Gentoo’s cumulative 1804 hpd was fourth. I have to apologize to Sabayon (760 hpd) as I had never heard of it until now. Gentoo itself only gets 428 hpd.

# 6 – BSD
What is next. Well, finally BSD shows up at number 6 with 1743 hpd. For those of you that are reading this and only know about Linux, BSD is NOT Linux. It does not run on the Linux kernel and is not likely to use many GNU tools. I hope I don’t drip with too much bias as FreeBSD is my favorite open source distribution.

Lets pull up the chart of BSD distros. There are 15 distributions listed under BSD, which is probably more than most people would believe since BSD often claims that it is not as broken up as Linux, but it has had its share of forks.

FreeBSD (553 hpd) is the main distribution. Of the Linux distributions, only Debian has more software packages available than FreeBSD.

PC-BSD (355 hpd) is to FreeBSD as Ubuntu is to Debian. For being such a new distribution PC-BSD is doing rather well. It is pretty comparable in ease of use to Ubuntu, Fedora, and OpenSUSE. Yes, PC-BSD is fully featured, running a nice windows environment with everything you could want, including a working Flash Player, the ability to configure your wireless card, and more. I recommend that if you are looking for a new desktop distribution, you at least install PC-BSD and give it a try. Ok, so my bias does show a little here.

# 7 – SUSE
So I was very surprised that SUSE wasn’t on this list until #7. Well, OpenSUSE is doing its part getting 1327 hpd. Remember, OpenSUSE is #4 if you just go by distro and not cumulative base distros. I think in time SUSE could be more popular. SUSE is newer than some of the other base distros and so it only has four distros listed. Novell’s SUSE Linux Enterprise (121 hpd) is the second most popular SUSE distro, however, it just not getting any were near the hits I expected it to be getting.

The others
And then there are the rest of the top ten: #8 Arch, #9 Puppy, and #10 Solaris (Or is that Oracle now?). Sorry if your distro was left out, this report is in the control of those who visit the distro’s web pages.

How accurate is this data?
On DistroWatch’s popularity page, it says:

The Page Hit Ranking statistics have attracted plenty of attention and feedback. Originally, each distribution-specific page was pure HTML with a third-party counter at the bottom to monitor interest of visitors. Later the pages were transformed into plain text files with PHP generating all the HTML code, but the original counter remained unchanged. In May 2004 the site switched from publicly viewable third-party counters to internal counters. This was prompted by a continuous abuse of the counters by a handful of undisciplined individuals who had confused DistroWatch with a voting station. The counters are no longer displayed on the individual distributions pages, but all visits (on the main site, as well as on mirrors) are logged. Only one hit per IP address per day is counted.

There are other factors to consider, such as the fact that some of the distributions are Live CD distros and not really platforms meant to be installed. It would be interesting to exclude them and only include installable distros but for lack of time, I didn’t.

I did nothing to verify the accuracy of the data at DistroWatch and any errors you see are not likely mine, as all the data was pulled from DistroWatch, please report any error to them and once they fix these errors, the QlikView report’s data can be reloaded.

Also, this data includes all hits from all areas: Consumer, Enterprise, Education, etc. Unfortunately there is no way I know of to tell where the hits came from. If there is a distribution that is 100% education hits, there would be no way to know that. Obviously if your target is Enterprise, you are left wondering which open source distros are really the most used in Enterprise environments. Unfortunately this report doesn’t answer that question. This is not a report of installed platforms, it is a report of cumulative hits per day. It is what it is.


Copyright ® Rhyous.com – Linking to this article is allowed without permission and as many as ten lines of this article can be used along with this link. Any other use of this article is allowed only by permission of Rhyous.com.

6 Comments

  1. Lee Murray says:

    some people like the idea of being minimalistic in return for speed and spending less money on hardware which is why OS like arch and bsd are growing in popularity

  2. Steve Barker says:

    Hits on an information page may not transform themselves into users.

    Unfortunately, counters tend not to list distros, but occasionally real world examples can be found:

    http://www.northamptonshire.gov.uk/awstats/cgi-bin/awstats.pl?config=nccinternet

    See Operating Systems - Versions

  3. [...] distros are displayed. Note: For information on how these reports are created, see this post: Using QlikView and DistroWatch to report on the most popular open source distributions (BSD, Linux, ... Leave a [...]

  4. Craz says:

    I guess it would be interesting to see how the distribution evolves over time as well? I guess you are just using an average to calculate "hits per day"? But maybe you don't have the raw day data?

    • rhyous says:

      No I don't have the raw data.

      The data is coming from the first table (the 12 month table) in this site: http://distrowatch.com/stats.php?section=popularity

      I could pull the 1 month table every month and store it, then after a while, I would have trends. Or maybe I could contact someone at DistroWatch and try to get raw data.

Leave a Reply

Powered by sweet Captcha