Register
View RSS Feed

WoW-GPS Dev Blog

WoW-GPS 2.0 - Progress

Rate this Entry
In case you missed it, I had mentioned in the comments of my last post: http://stormspire.net/blogs/kathroma...2-0-links.html that the proposed structure of many, tiny, linked, raw json files didn't appear to be a viable option anymore, and that I'd be pursuing sqlite as a potential alternative. Well, here' s an update on how that's going:

-Rewrote the API code to write to sqlite DB instead of json via filesystem. Quite a few hiccups along the way, but I think I've got everything optimized as much as possible. I've currently got some optimization parameters in place that leave the DBs more susceptible to corruption - so I'll have to keep an eye on that to ensure I can strike the proper balance.

-RE: Speed - mixed feelings so far. For the most part, parsing through a single realm file takes between 30-70 seconds, and the results are unpredictable, unfortunately. As an example, I recently added both Illidan and Tichondrius to the list of realms I was fetching, to test the workers under presumably max load, and the Illidan file was taking a good 60-70 seconds initially, whereas the Tich file was closer to 15-20 seconds, and contains slightly more data.

-In order to balance out the processing, I split the worker process into 3 independent scripts (per region), and configured the queue DB to automatically cycle through all 3 workers. Accounting for connected realms, I really only need to get through about 140 US realms, or approx. 50 per worker. Definitely manageable. The problem, however, lies with the EU realms, where the connected realms only drop the unique total down to about 210, which would mean I'd need to do about 70 realms per worker, and the EU API data has been far more consistent (as noted by TUJ, TSM, etc.) for months now. With the workers running every 10 minutes, that means I'd have to get through ~12 realms each run though, leaving about 50 secs for each one. Cutting it a bit close, so I'll have to keep an eye on this as well. Definitely not much room for error, so if I can't lock down some sort of redundancy that can absolutely prevent DB corruption, I guess I might have to look at throwing in a forth worker (which I really don't want to do, because it's already a bit of a pain to manage 3...). I'm curious to see what happens if the worker jobs run over 10 minutes, if it merely delays the next one from starting, I doubt this will be an issue - will need to test this out further.

-For connected realms, I'm going to keep running with the symlink vein I started on with the json files and just symlink the entire "main realm" directory. Ideally, Blizzard links up a few more of the EU realms to cut that ~210 number down a bit closer to the US totals, but I'm hoping the other side of that coin means smaller realm files that might not take as long to process.

At this point, I'm currently running 8 heroku apps to fetch/parse this API data.

2 worker apps (1 US, 1 EU) that check for new realms and realm connections on a daily basis. They also check for new API data fro each realm every 10 minutes and then initiate the requests to copy the raw files to the Data server. These apps then add the raw file URLs to a DB queue on the data server. Any new files are automatically assigned to 3 separate worker processes equally. This script will also initialize any directory/db structure for new realms.

6 worker apps (3 US, 3 EU) that query the DB queue every 10 minutes for new raw files to process. Once found, they will initiate the script on the Data server to parse through the raw files, insert the data into the sqlite DB, cleanup old data, and create a quick DB backup. After all files have been processed, this script also initiates a cleanup of the DB queue and raw files that have been processed.

So, theoretically, I'll have 8 different processes all accessing the server simultaneously. The nice thing about the setup is that they'll never be writing to the same sqlite DB at the same time, so we shouldn't have to worry about DB corruption from that route. I've tried to offload as much of the processing work to the heroku apps as possible - they handle all the checking, etc. with the Blizz API, then simply send the final file url to the data server to initiate the CURL transfer. One possibility, if the load on the Data server proves to be too high would be to have the 6 worker apps (which might need to expand into 8 or 10 at that point) fetch the raw file data from the Data server, do the processing, then send the final data back to the Data server when it's finished, meaning the majority of the processing would occur on the Heroku servers.

There are 2 potential issues with this. The first is the amount of data that would need to be transferred and the possibility that it increases the length of time that the script runs. Given a 5-10MB file that would then need to be transferred both TO and FROM the heroku server, this has the potential to be a large bottleneck, so it would definitely need to be worth the effort. The second potential issue here is in just HOW much more efficient this would end up being for the Data server. As it stands, the processing script is really only running through each auction and adding the raw data into an INSERT statement within a BEGIN and COMMIT transaction block. No calculations, no ordering, etc. Just loop and insert. So, the only way to optimize this further would be if the Heroku app were to generate one long string consisting of a single "INSERT INTO VALUES (a,b,c), (a,b,c)..." statement. Perhaps it's just me, but the idea of this gives me an uneasy feeling...

Another possibility, if I were to offload the data processing to the Heroku apps would be to skip the whole CURL copy, DB queue part of the process. The biggest issue here for me would be figuring out how to split the load between several workers. It would be possible to have the Heroku apps share a small DB (the free Heroku DB has an approx 10,000 row limit - which should be enough to store info for every realm), but I'd need to figure out some way to predetermine file sizes, so that a process would know if it needed to grab say 3 large realms, or 10 small ones. Now, depending on how the Heroku task scheduler handles overlapping tasks, this might not be as big of an issue, but I'd have to test that out, first. My other mental hangup here is in "putting all my eggs in one basket" so to speak. Part of what I like about the existing setup is that it's a relatively quick process to copy the raw data files from Blizz over to the Data server. Then, if some error occurs during the processing, the raw data is still sitting there, in the queue, waiting to be picked up next time around. If the Heroku apps were doing the fetch, then parsing and some error were to occur, there's no guarantee that the same file will be available once the process gets back around again. IE. If the processing script hit a file 5 minutes before a fresh data dump, if an error occurred and it ended up being 10 minutes before the script came around again, that entire snapshot would be lost. So really, it's about striking a good balance between redundancy and performance.

I think what I'll likely end up doing is setting up another set of Heroku apps for a handful of other realms (I'm up to 6 right now, via the current process) to test the other process. Ideally, I'd like to benchmark the same realms against each other, but I'd have to create a second set of distinct folders on the Data server to keep things separate so I think it makes more sense to maybe try to find 6 similar servers, instead.

All in all, I'm pretty pleased with how things have been progressing the past couple days, and would like to get something solid in place before slowly scaling up to full realm capacity.

As always, any thoughts, suggestions, concerns or general feedback are welcome and appreciated.
Categories
Uncategorized

Comments

  1. Kathroman's Avatar
    So, I started to test out the alternative approach (fetch AND parse via Heroku) and it seems like there's actually no performance gain with sqlite when using "INSERT INTO VALUES (a,b,c), (a,b,c)..." instead of a Transaction, so really, I'd only be freeing up whatever resources were being used for the Data server to open and loop through the JSON data, which really should be minimal. So, I think this process would add extra processing time in exchange for a small abatement in Data server resources.

    Not a great trade-off, but it also would be fewer moving parts to manage, so I'm interested to see if there are other performance gains or stability improvements as well. I think I'll likely keep both running (once the second process is ready) in conjunction and scale them both up as I work towards 100% US realm coverage before choosing between the two.
  2. Kathroman's Avatar
    Update: Seems like Plan B might be dead in the water, already.

    After getting the script all ready to go, I've tried with a single realm (a relatively small one, even) to send the sql statement through to the Data server. Although I was only sending 1 faction at a time (since each has it's own DB) I was getting HTTP 413 errors - Request Entity Too Large. Unfortunately, I've not found any way to override the setting on the Data server. The htaccess files don't seem to be configured to accept php_value parameters, and the php.ini file doesn't seem to be picking up new values, either.

    Gotta love shared hosting...

    I guess this means we're moving forward with Plan A
  3. Kathroman's Avatar
    So, I've bumped the US realm queue up to 50 realm, and I've also finished and tested the connected realms linking script. I was also able to successfully query a DB for a connected realm. Good progress so far, no DB corruption over the past couple days.

    Unfortunately, I realized a typo in the delete/cleanup code where it was actually deleting any non-current auctions, rather than any older than 2 weeks. Good news is that the delete code seems to be playing nicely with the insert code. Bad news is that I'm going to have to wait 2 weeks before I can get a realistic stress-test going.

    Overall, though, I'm pretty pleased with how things have been progressing. Still would like to test out the queue worker to see how they handle overflow. I think next week I'll probably add 50 new realms at once (either that, or I'll jump right in with EU data running 50 -100 to start) since that will automatically queue up that first file for each of them and 50 realms should be more than 3 workers should be able to process in 10 mins.
  4. Kathroman's Avatar
    Well, I've got some good news and some bad news.

    The bad news is that a couple of DBs got corrupted last night. Probably when I was fiddling around with @Sapu94's new TSM server

    The good news is that a couple of DBs got corrupted last night I know it might not seem like good news, but it provided me with an opportunity to test a few pretty important things and adjust a few settings that should hopefully help things down the road. I was able to confirm that my backup system is functional, and also that the heroku scheduler doesn't run on a hard 10 min schedule so queue overlap doesn't appear like it will be an issue.

    I also think I may have accidentally optimized my inserts. I only see a couple examples, but I had mentioned in a previous comment that the Illidan results were strangely longer (significantly) than Tichondrius. So, I've been keeping a close eye on the Illidan data the past few days and watching it creep up to 80-110 seconds, it's now down to 6-8 seconds with the last few batches. Very strange, but not something I'm going to complain about, that's for sure
  5. Kathroman's Avatar
    Well, Illidan jumped up to a whopping 11.3 seconds earlier

    Very exciting turn of events - one that would even lead into scaling back down to 1 queue worker per server if things hold up until we hit max capacity
  6. Kathroman's Avatar
    Quick update - I'm now running ALL US realms, and we're only a couple days away from the first 14-day mark, at which point the DB process will start trimming old data, so I'm very curious to see how the processing times respond.

    I've also started running the first 50 EU realms, so I think within about 3 weeks we should be able to scale up to full capacity for data.

    Very exciting
  7. Kathroman's Avatar
    Had my first batch of deletes this morning. Everything seemed to go smoothly and I've not noticed any significant increase in the script parsing times.

    Just bumped the EU script up to 150 realms (which will push it past the US total) which leaves about 60 left. I'll probably take those ones a bit slower and keep a close eye on things, but everything seems to be plugging along nicely, so far