View RSS Feed

WoW-GPS Dev Blog

WoW-GPS 2.0 Stats

Rate this Entry
Now that I've got a good chunk of data to work with on the back-end, I thought it might be fun to start running some item stats calculations - both to test response time from the Data server and to see how my alternative method would stack up against other data sources.

For those of you who have been following along, you might recall that I mentioned I wanted to try a different pricing method in the comments of this post: I'd like to take a moment to further explain WHY I've decided to go with this method, and then run through some of the comparisons between other data sources.

So, why change? It's been previously suggested (at multiple turns) that I'm running the risk of merely reinventing the wheel, so is this all REALLY necessary? The primary reason for changing this up is performance. Sources like TUJ, Wowuction and TSM will store data for EVERY SINGLE "hourly" snapshot. So, if an item is posted for a full 48 hrs, for example, then there will be 48 individual rows for a single auction. With my setup, I only store 1 row for each auction and update said row as the auction continues to appear in subsequent scans. This results in a smaller DB, whose benefits include less disk space on the server, and faster read/write times. Given the speed handicap we're already working with because I'm using sqlite instead of a "proper" production DB, this becomes a significantly more important specification. The flipside here is that I end up with significantly fewer data points to be able to run through pricing algorithms, etc. This leads into my next point of: "Do we need all those data points, anyway?"

At first glance, you might be thinking "more is obviously better" when it comes to data/stats, and in many cases this is absolutely correct. When we're talking about something like prospecting, milling or DE yields, the more (accurate) data points you can gather, the better. Or, with rarely seen items, if you can look at something like region-wide data as opposed to a single item on a single realm, you can absolutely get a more accurate idea of an item's value, but what exactly are we getting with all this snapshot data? With both TUJ and Wowuction and then to a lesser degree with TSM (because the data is priority-weighted by recency) we're giving each of these snapshot data point an equal weighting in the pricing calculations, but are they truly equal?

Consider 2 stacks of Ghost Iron Ore. One is posted for 30g and sells after 4 snapshots. The other is posted at 70g and sits on the AH for all 48 snapshots. In the traditional data model, the first auction (which actually sold) will be given 12x LESS weight in the pricing calculations as the (probably overpriced) unsold one. Is this really better data? My contention is that it is not, and in fact, these "shorter" auctions should actually be given higher priority, which may be possible at some point, but for the time being I believe that at least giving them equal priority is a step in the right direction.

So, all this build-up - let's get to some results:

One of the first things I wanted to check on was response time. Since the Data server is going to completely separate from the Application server, and since I've gone with (again) a less commonly used DB solution, I wanted to make sure we wouldn't end up seeing any crippling bottlenecks in fetching data down from it. I have to say, I was quite pleased with the results. I tested a few items, some with large quantities, to help represent a full request load. My process also involves querying the DB, then returning ALL the raw data to the application server where any processing will then occur. Every response was completed in under 1 second, and when I realized that I'd have to loop through a second time in order to sanitize the unreliable data ( such as gold-cap auctions for windwool ), there was no significant increase. For me, this is a huge win because it means that I can probably leave all data processing/calculations to the application server and keep the Data server as quick and lean as possible.

Here are some comparisons of the actual pricing data:

Brilliant Primordial Ruby:
TUJ: Market = 63.05g. Mean = 55.78g
Wowuction: Market Price = 63.09g. 14 day median market = 44.59g
WoWGPS: Current Market Price (15th percentile) = 63.09g. Current Mean = 66.06g. 1 week Market Price: 43.23g. 1 week Mean = 53.80g

Ghost Iron Ore
TUJ: Market = 1.71g. Mean = 1.93g
Wowuction: Market = 1.90g. 14 day median = 1.75g
WoWGPS: Market = 1.91g. Mean = 2.12g. 1 week Market = 1.69g. 1 week Mean = 1.89g

Windwool Cloth:
TUJ: Market = 1.01g. Mean = 1.12g
Wowuction: Market = 1.05g. 14 day median = 0.99g
WoWGPS: Market = 1.05g. Mean = 1.53g. 1 week Market = 0.97g. 1 week Mean = 1.71g

Living Steel: (Higher Priced Item)
TUJ: Market = 240g. Mean = 265.07g
Wowuction: Market = 240g. 14 day median = 259.99g
WoWGPS: Market = 240g. Mean = 288g. 1 week Market = 250g. 1 week Mean = 280.01g

Relic of Chi-Ji: (Lower qty Item)
TUJ: 7000g. Mean = 10399g
Wowuction: 6999.99g. 14 day median = 11999g.
WoWGPS: Market = 6999.99g. Mean = 6999.99g. 1 week Market = 9998.04g. 1 week Mean = 10498.99g

Interestingly enough - not a huge difference between the majority of the information, other than with the Mean in a couple places. I'll admit that I was a bit nervous before running the numbers that I might see some pretty wild discrepancies.

So, to draw a conclusion from all of this, it appears possible to extract meaningful pricing data by using my more efficient WoWGPS data process. Suffice it to say that I'm EXTREMELY pleased with these results.


  1. Spirituality's Avatar
    Looking great ! keep up the good work and keep us up to date.
  2. Kathroman's Avatar
    So, the realm I was using for testing (Emerald Dream-US) has finally reached full data capacity, so I decided to rerun the stats.

    I noticed a very slight increase in the request time, but it wasn't consistent with data size (ie. current auctions vs. full 2 weeks), so I can't conclusively attribute that to the data increase. Probably just some general latency that I won't really be able to impact much.

    So, things are looking good right now