webDip dev coordination forum / public access todo list
It is currently Mon Sep 25, 2017 11:40 am

All times are UTC

Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: Sun Jun 02, 2013 5:04 am 
Site Admin

Joined: Sat Jun 28, 2008 6:24 am
Posts: 892
For a webDiplomacy server even a little downtime can be a real nightmare. With live games even a few minutes down can keep people from entering orders, and since you can't go back to previous turns in Diplomacy it's crucial that the server is as reliable as possible.

webDiplomacy itself is pretty reliable; there hasn't been a bug within the software itself that has affected a game \ince 2007, any mid-process failures are rolled back and flagged, and the system has a great track record regarding security. However there are a huge number of external things that can go wrong, and frankly I think I've experienced them all. Here is a list of what has gone wrong for me, and what to do about it.

Disk space
webDiplomacy accumulates data over time, so disk usage always goes up, and once it's full the server will stop working. You need to keep an eye on:
  • The database folder (usually /var/db/mysql). If this gets too full it means your /var partition is too small and you need to move the place the database files are stored, since there's not really much you can do to shrink the database without destroying data.
  • The log folder (usually /var/log). This can fill up fairly quickly, especially if you log all web request details.
  • The webdiplomacy cache folder. This can grow huge, since it contains all rendered game images for all turns. It can be wiped clean at any time without destroying any data, but it is best to do this during an off-peak period since there will be a big activity spike as lots of previously cached data is regenerated.

CPU/memory usage
You don't need a huge amount of CPU or memory normally; the official server has a pretty feeble Core 2 Duo and does okay with 4GB of RAM, but if you do go over the limit for whatever reason users will start making page requests faster than the server can respond to them, the number of web processes will increase, the server gets less efficient as it tries to juggle more tasks, and pretty quickly you're exceeding connection limits and requests start getting dropped.
The worst part is some requests will work, so you can have players that can't lodge their orders even while the gamemaster is processing games.
  • If you are on shared hosting there's not much you can do; another user can hog the CPU and you're pretty much screwed. All you can do is use "top" to at least check that it isn't you, and perhaps see who it is.
  • Unoptimized queries can have a big effect. For most pages the real work webDip does is in the database, so if pages are loading slowly that might be a good place to start looking. Check "SHOW FULL PROCESSLIST" to identify if there is a particular query it's getting stuck on. This is a problem that may only appear as your server grows, so it can seemingly come out of nowhere. (Hopefully you won't need to worry about this if you're using unmodified code)
  • Database deadlocks. Related to query optimization, but harder to pin down because it may depend on a couple of users doing something at the same time. InnoDB will recognize deadlocks and throw an exception, but MyISAM tables can get locked with a spinlock that uses 100% CPU and blocks all other requests. This can cause a locked up server which will apparently be completely fine again after you restart MySQL. (Again hopefully this won't happen if you're using unmodified code)

inode limits
You can get "disk is full" errors all of a sudden even with plenty of disk space free, which is very confusing. This is because UNIX filesystems have inode limits, i.e. limits on the number of files you can have. Run "df -hi" to check whether any of your partitions are running out.
In particular keep an eye on /tmp/phpsess : This fills up with PHP $_SESSION files, which are tiny but hundreds of thousands can accumulate over time. Once this fills up with files you will need to delete and recreate the folder, because there will be too many files within to delete individually, and it will then take a minute or two for the filesystem to free up the inodes.

  • If you are on shared hosting file permissions become very important. Make sure only your account can access the files within your account's folder. In particular make sure that your config.php file can not be read by any other users
  • If you're on a dedicated server make sure you're running a firewall that only allows access to the shell server and web server. Keep remote MySQL connections disabled
  • Avoid running other software on the same account / system as your webDiplomacy server. Even if webDiplomacy is secure it doesn't matter if you're running a vulnerable phpBB on the same server.
  • If you have order logs and error logs available make sure they're not accessible from the web

  • Keep your config.php's downtimeTriggerMinutes setting low, so that if game processing isn't done it will stop until you start it back up. If you've skipped even a couple of gamemaster process cycles people might not be getting their orders in. Better to be safe than sorry and keep this setting low
  • Don't run the gamemaster script from the server that's running webDiplomacy. If the server is online and running but the whole datacenter goes offline the gamemaster script will keep processing games even though no-one can access the server. Run the gamemaster script from another server hosted in a different datacenter.

  • Subscribe to security mailing lists for Apache, MySQL, phpMyAdmin, and your OS. Don't upgrade your software unless there is a security vulnerability or other reason to do so because it may have a breaking change.
  • Use MySQL data journalling to take database backups if possible, to avoid having to take the server down to back it up.

  • Use an uptime monitoring service like pingability, so you'll be informed when your site goes down. pingability is $10/mo, so if that's too much use pingability-lite; it won't call you if things go down, but you'll get an e-mail (and a call probably won't wake you anyway). If you add this to your config.php footer code the server will warn you about various issues even if it's online:
                    if( isset($_REQUEST['pingability'] ) ) {
                    // Generate a status code for
                    $status = "pSxqXCuhhq (normal)";
                    if( defined('ERROR') )
                            $status = "42cqO55jiA (error)";
                    if( !isset($Misc) )
                            $status = "I489TWdvz2 (misc environment)";
                            if ( $Misc->Maintenance )
                                    $status = "7BYDynhGpx (maintenance)";
                            if ( $Misc->Panic )
                                    $status = "a/j7BwXSKk (panic)";
                            if ( ( time() - $Misc->LastProcessTime ) > Config::$dow$
                                    $status = "J9RhUjPGOS (downtime)";

                    if( (disk_free_space('/var')/disk_total_space('/var') < 0.1)
                    || (disk_free_space('/tmp')/disk_total_space('/tmp') < 0.1)
                    || (disk_free_space('/usr')/disk_total_space('/usr') < 0.1)
                    || (disk_free_space('/')/disk_total_space('/') < 0.1) )
                            $status = "uhN5ClX8/a (diskspace)";

                    $buf .= '<p style="font-size:6pt">Status code: '.$status.'</p>';

                    return $buf;
  • Check your registration page regularly. If games stop working you'll hear about it quickly, but it may be a while before anyone tells you that your registration page is broken, and it relies on captcha code and e-mail code which may break while the rest of the site is working fine.
  • Give a few trusted regular users access to the Panic mode button, so that if things are screwing up and you're not around they can at least pause the server

If you've had any other experiences / have any other tips please post them here.

PostPosted: Thu Jul 11, 2013 6:30 am 

Joined: Sat Mar 28, 2009 7:13 am
Posts: 185
For the cache folder, what about setting up a cron job that runs every day removing files whose modified date is more than 30 days ago or so? That way you wouldn't get a spike of all the current phase maps having to be redone and instead only when someone looks back far enough will a map have to be redone.

And could something similar be done for /tmp/phpsess?

PostPosted: Thu Jul 11, 2013 11:52 am 

Joined: Wed Jul 29, 2009 10:22 am
Posts: 841
I have something like this for vDip.

You can specify:
File age (> 50 kB) (in days):
File age (files < 50 kB) (in days):

but you need to run this manually.

PostPosted: Thu Jul 11, 2013 12:13 pm 
Site Admin

Joined: Sat Jun 28, 2008 6:24 am
Posts: 892
Yup that's an option, though there are so many to search through, and deleting them all at once is so efficient, that you wonder whether a one time spike might be better than a smaller daily one.

Either way it's best to keep an eye on it, because as much as you try to automate stuff always slips through

Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group