Redacting website addresses

I made a small change to the stats by now redacting website addresses that are visible in the names of Steam users. I never really liked it that I was in a way indirectly advertising these websites but that shall no longer be the case. The algorithm that I created to detect addresses is not very good though as it might be overly aggressive, but so far it seems to work nicely.

The main reason I decided to do this change right now, is because I'm suspecting that some people actually try to get some website addresses visible on this site. There is currently one very easy way of getting yourself visible here which is having the highest Steam level compared to the number of owned badges. An incredibly easy way of achieving this is just by getting a single Steam sale badge and leveling it enough, and this is indeed what the currently best Steam user has done, and they are indeed advertising a website. Well, these Steam users shall still be visible on this website but the website addresses will be hidden.

Sep 30th, 2016

Faster Steam user crawling

I recently made a rather big change to Steam Bot by making it crawl Steam users using multiple threads. The Steam Bot is now able to crawl users way faster than before, and I could make the stats show more backgrounds and names without losing accuracy of the stats. The code changes are also visible on GitHub if you happen to be interested.

Sep 10th, 2016

Source code released on GitHub

I have now released the source code of the Steam Bot on GitHub under the MIT license as the bot is currently in a very good shape and I'm not really planning to create any new features anytime soon. Do consider however, that you should not run your own instance of the Steam Bot because there already is one being run here and it doesn't give any benefit to run more instances. I'm only releasing the code for people to learn stuff.

The Steam crawler project on GitHub

Sep 5th, 2015

A bug in background images

It seems that some time ago Valve copied all the background images over to a new server which in turn affected the URLs of the images. As the SteamBot used the full URLs to identify different backgrounds there had been two entries for almost all of the most common backgrounds in the stats for quite a while now. One entry was using the old URL that luckily still worked and the other entry was using the new URL. Eventually the SteamBot would have automatically fixed this situation by recrawling the users with common backgrounds and updating those to use the new URL but that would have taken really long.

Now the situation is fixed however and the new system that identifies backgrounds is way more robust and shouldn't break anymore even if Valve decides to change the URLs again. Anyway, as you can see there can still be rather big bugs although this project has been going on for seven and a half months now. This is not the only bug either as I know that there are duplicates in the database of some Steam users. That bug however is not quite significant as it doesn't really affect the stats seen on this site but I do have plan to fix that at some point.

Jul 13th, 2014

Back in action

SteamBot is now once again crawling new users after a month of only recrawling already crawled users. As I wrote earlier I had to stop adding new users to the database because it hit its limit. Now there is a new database system to store all the users that have been crawled, so SteamBot is back in action.

To say something about the new database, SteamBot is now actually crawling around 20 percent faster compared to using the old database which is very surprising as I thought the new database could only make SteamBot slower. Analyzing all the data, to get the statistics seen on this site, does take about 1.5 times longer in the new database as I thought it would but there are some other things that actually make the database overall faster.

But let's look at the memory consumption as that was the reason I had to create this new database system in the first place. So, now SteamBot uses only about 100 megabytes of memory compared to the old 1.3 gigabytes which is just terrific! However, the memory consumption does rise up to 700 megabytes when analyzing the data but that is still a lot better than the previous 1.6 gigabytes. We'll see when that becomes a problem but shouldn't be too soon.

Also, if you still want to know more about the different databases that SteamBot has had, you should read my previous news post that is right below.

May 18th, 2014

SteamBot reached its limit

There are now more than 1.3 million users in the SteamBot database and sadly that seems to be the limit for the current version of SteamBot. The database of SteamBot is actually custom made created by me and the current version would be the second version of the database. Now before I talk about the current situation and the future of SteamBot, let's first look at the past.

Now I can say that the first version of the database was really terrible. The way it worked was basically that it created a separate text file for every single user the SteamBot crawled and stored all those into a single folder. Now the limit for that database came much earlier and if I remember correctly it was around 0.3 million users. At that point it was almost impossible to open the folder containing the text files in my file manager because it would take ages. Analyzing all that data would take around five minutes as the SteamBot would have to open and read all those files. Also while the SteamBot was running it would really slow down the computer as it was constantly using the hard drive. That meant I really had to come up with something better if I wanted to continue with SteamBot.

The next and the current version of the database stores everything in a single big file. When the 0.3 million users could eat more than a gigabyte of hard drive space in the old database due to some file system overhead, the current 1.3 million users use only about 100 megabytes of hard drive space in the new database. That's like 40 times better! But that is not the whole story as basically, everything, the whole database is also stored into the RAM memory all at once and sadly indeed it takes a lot more than 100 megabytes of memory there. It's around 1.3 gigabytes of memory and the usage can rise up to 1.6 gigabytes when analyzing the data. Because of that the SteamBot has started crashing because it is now running out of memory. The positive thing about this approach has been that analyzing the 1.3 million users only takes around two minutes and compared to the old database that's like 10 times faster. Anyway, the SteamBot itself is actually a Python program running on 32 bit Python interpreter. I believe I could raise the memory limit by moving to 64 bit Python but I would rather not as I'm running this bot on an old laptop with only 4 gigabytes of memory and the current memory usage is a pretty hefty chunk.

But then, what is the plan? I'm thinking of creating yet another version of the database. It would still keep all the data in a single big file as there is no reason to change that but the idea would be to take it all away from the RAM memory. Only the most recent crawls would be in RAM before combining those with the other data stored in the hard drive. This way the only restriction would probably be the size of the database file and hard drive space but as the current database file is around 100 megabytes and there's about 60 times more Steam users than currently crawled the file would grow up to about 6 gigabytes. Shouldn't be a problem if I ever even get that far. However, the downside would be that the analyzing time will probably grow as to analyze the data, the data has to be read from a file instead of directly from memory but I guess that is the price I have to pay.

However, the situation right now is that I'm very busy at the moment which means I can't code a new database right now or probably anytime soon. In the mean time I will let the SteamBot slowly keep on recrawling users that have already been crawled earlier. That's also one thing that SteamBot does to keep the statistics up to date.

TL;DR provided by Heavy: Bot is dead, not big surprise. But wait, not so dead as you think! SteamBot is not done with you yet! You SteamBot does not forget!

Apr 13th, 2014

No more inventory

There was quite a long break here that started in December almost three weeks ago during which the SteamBot was not crawling at all. Now this break was intentional and there is a good reason for it. Around that time the number on people's Steam profiles that tells their inventory size just disappeared. The numbers for the other stuff like game and friend counts are still there but for inventory there's just the link.

So, I decided to wait and hope that the number eventually returns as there can often be many kinds of problems in the Steam community. I was thinking that maybe it is because the winter sales were on and Valve wanted to lessen the stress on their servers. Well, the sales are now over but the number is still not back so I decided to just let it go.

Still, in the end I'm kind of happy to let it go because that is the one thing that has been causing the most trouble so far. The downside is that we don't get to know who has the largest inventory.

Jan 8th, 2014

Steam Bot launch

So, I guess this is the moment Steam Bot is officially launched. There shouldn't be too many bugs and so forth altough those are always possible.

Now we'll get to see how this goes and if this ever becomes anything.

Nov 27th, 2013