    [Coding] Parsing the AH using the New Armory/Scripting Questions


    I run my own little parser and AH script -- nothing of the scale of TUJ, but I was wondering if you would be willing to share some of the tips and tricks you've come across for parsing the AH.

    1. Do you know how to switch characters using the new armory? (It seems like it calls a POST request to but when I try to do that, I get a 404) If you still use the old JSON interface and haven't run into this issue, that's fair.

    2. How long does it take you to scan the entire AH for a given realm? It takes me about 10 minutes, and I've read posts mentioning sub-minute "parse times" but I'm not sure what you're referring to by that.

    Thank you -- I love the Undermine Journal and look forward to seeing it grow.

    Maybe I can help: I run my own scans on my EU realm atm: eu.wowarmory and it takes about 2-3min but I am not properly using keep-alive and not multithreading (i.e. it is as of now: login, and as long as there are more pages: fetch page, parse-json, execute sql insert)

    And afaik the new armory does not output JSON/XML for browsing, has that changes? do you really parse that ugly html?

    I'm on Mal'Ganis/Horde, one of the largest AHs -- 60,000 auctions isn't uncommon. I get 200 at at time.

    I'm using JSON for everything, but there's no JSON method to see active auctions listed by the logged in character, so I figured I'd try the XHTML -- it isn't too bad to parse using XPath, but the problem is it seems to use whatever character I log into the armory (manually) with and not what I'm logged in normally.

    If I recall correctly from the dev blog, TUJ uses the XML methods on wowarmory rather than the JSON - I believe XML gets compressed and JSON doesn't, so bandwidth usage on the scanner machines is lower.

    For me JSON is easier to parse and I don't have bandwidth problems at the moment. But one should really just use whatever is easier to parse for him/her.

    I'm not running anything on the armory, so I don't know offhand how to change characters there. I still use the old feeds. I've never had to "switch characters" using; every request gets sent with the character, faction and server, so I never bothered.

    Takes about 3-4 minutes to scan a realm (which means gather all the info from the Armory into a file) and now under 5 seconds to parse it (update old auctions and insert new auctions into the database).

    I've always used XML for The Undermine Journal because can gzip the result. It won't gzip JSON. The gzipped XML is about 25% the size of comparable JSON. I'm also more comfortable with parsing XML, so it's win/win.



