Category: The size of the web

There is surprisingly little information about the size of the Hungarian web. Perhaps because for those who deal with websites on a daily basis, this question is about like why is the sun shining. We never really think about it, the answer may seem simple, but the more we immerse ourselves in the subject, the less it is.

To answer this question, we must first clarify what we consider to be Hungarian and a website. In my opinion: a web presence that has substantial content in Hungarian, published on a unique domain name.

860 thousand Hungarian domain names?

I think that we can safely consider the existence of a unique domain name as a minimum requirement, since how can you take content seriously if its owner has not even invested a couple of euros in publishing it?

According to domain.hu‘s statistics (which is the official .hu registry), in February 2023 approx. 860,000 domain names were registered under the .hu ccTLD. However, this does not mean at all that there are that many active Hungarian websites indeed. Many people keep a domain name with high hopes for selling it for a lot of money one day, or simply to prevent it from being owned by someone else. In addition to this, many owners have not yet completed their website, or do not even want to publish a website on the domain, because e.g. it is only used for email addresses.

In my database, 730,000 .hu domain names have ever worked, so I think that I have sufficient data to determine the final numbers.

482 thousand working websites?

If we try to look up these hundreds of thousands of domain names, a little more than half of them will give us a sign of life — that is, if we entered these addresses into the browser, we would get some kind of answer from the web server in this case. Of course, a list of domain names of this magnitude can only be managed in an automated way, with the help of scripts.

290 thousand websites with significant content?

Many times, the web page that appears as a response will not contain significant information, only an error message or some default home page. For the sake of example, in this group there are also a few cases that can be named independently when we also cannot talk about substantive content, and therefore not even about an independent website:

-21,000 default CMS home pages: the owners have set up some kind of content management system for their domain, but have not yet started to fill it with their own content, so the page is practically empty, even though the engine behind it is ready to go (“Welcome to WordPress! This is the first entry” and alike.)

-11,000 parked domains: However, many domains are just parked. In such cases, there is no meaningful content on the site; usually, we can only be informed about the fact that the given domain name is for sale/rent. In addition, sometimes the same content is available on several domain names: I don’t think these should be counted as separate websites either.

– 6,000 domains using outdated technologies: e.g. the so-called websites with frames, or those old pages made with Macromedia/Adobe Flash, where the navigation was made exclusively with this technology that is not handled by today’s modern desktop and mobile browsers, so we can rightly consider them as abandoned pages.

250,000 Hungarian-language websites under the .hu domain?

Based on the method I used to compile the database that backs the pedia.hu website, roughly this is the number of Hungarian-language websites that can be found under the .hu domain.

To this we can add a few tens of thousands of domains with other top-level domain than .hu, which are primarily general endings, such as e.g. .com, .eu or sites registered under the domain endings of neighboring countries. It is much more difficult to find these sites, so it is also more difficult to estimate their number. While we can safely assume that a website hosted under the .hu ccTLD is probably written in Hungarian or related to Hungary, on the other hand, it is highly questionable whether we should count certain websites as unique Hungarian sites, for example, all Google domains, that are registered under many top-level domains and have a Hungarian interface.

So with this, we have reached a grey zone, where on one side there are non-Hungarian websites of Hungarian people, companies, and organizations, and on the other side foreign websites that are also available in Hungarian, often through low-quality automatic translations. Or what if someone has Hungarian ancestors and therefore a typical family name, but this is all, or if it’s about an other site that deals with a dog breed of Hungarian origin, should we consider their sites as part of the Hungarian web?Here again, everything depends on how exactly we define what is Hungarian and what is a website.

Three hundred thousand Hungarian websites — is that all?

It is not possible to exactly determine how many Hungarian websites there are, since even the discovery of an active domain in itself runs into many technical difficulties, and among the nearly half a million domains that are in use in one way or another, there will always be one that started yesterday, or that shut down yesterday, or just it was not available yesterday due to an error, so it is not included in the statistics. And of course if we want to be very strict, e.g. then we could label quite a few websites that have not been touched for years as inactive and therefore too obsolete to make it into the list.

However, it can be stated with great certainty that when we look for the answer to the question of how many active Hungarian websites exist, we cannot talk about millions, or even half a million. Taking the above into account, so stating that there are three hundred thousand active Hungarian websites in total, can be a good approximation.

Of course, these three hundred thousand websites are also very different in size, as there are many sites consisting of only one web page, e.g. the Hungarian language Wikipedia, which has more than half a million web pages, and this fact makes the estimation quite difficult.

Of course, if you like, you could also add to the total more web presences that do not meet the criteria I outlined above: for instance, the sites hosted on blog farms or even further sites with unique content that are hosted on subdomains, and ultimately you could think about counting Facebook pages too since many companies and organizations have solely a web presence on social media sites.

How accurate is this estimate?

As I mentioned, the database serving as a starting point is comparable in size to what the domain.hu statistics show. However, there is another way to determine what proportion of existing pages have been discovered, namely by systematically querying all technically possible domain names. For example, if we examine domain names with a length of 4 characters, then taking into account the 26 letters of the English alphabet, the 10 numbers and the hyphen (which cannot be at the beginning or end), we get 363737*36 = 1,774,224 variations.

Well, out of these domain names, there were 8,600 working domains in the database before this double check, which only increased by 500 after the crawling through these more than one and a half million variations, so it is highly likely that if I had systematically checked the longer domain names as well, then I could only find a similar number of undiscovered domains, that would have yielded 6% more websites, which would modify the total sum of 250,000 to 265,000, and this would not significantly alter the final estimate of roughly 300,000.