User:Katie Horn/TODO Nightly Audit Refactor

From Wikitech

General Notes

  • Considering the timing of the GC WX cutover, I'm not even going to try to port WR1 processing at this stage. Nuke all the old stuff. Nuke it and be glad.
  • Submodules for gateway-specific settings
    • parser library location for their files
    • Source location for their audit downloads
    • Completed location for their audit downloads
    • Working file location
    • A "disable message sending" mode.
    • Plus Or Minus Search (in days) - Okay, this one makes total sense, but takes some explaining. And yes: We still need it, per gateway.
  • Make a template for new submodules that includes tokenized required voodoo. [gateway]_get_recon_data() and such.
  • There seems to be some disagreement about where recon data normalization should happen. I don't necessarily want it in the recon parser itself (because sharing pure libraries there would be nice), but as long as it happens in the submodule, I'm okay deferring this skirmish for later.
  • How on Earth are we currently deciding that payments logs are missing? Seems to be drastically over-reporting.

Structure

  • First piece would seem to be a thing that takes recon files, and returns the distilled (and normal!) version of only what's missing from the db. This crosses the line of universal and specific several times. Enjoy the drupal submodule voodoo problem.
    1. Open, parse, normalize one recon file
    2. Throw out everything that we already know about at this stage, because by *far*, the most processor-intensive part is finding the missing information from these dang things in the payments logs. We want to not have to do that as frequently as possible.
      • ...and if there isn't anything in that file we don't know about, move the whole recon file to completed.
    3. Pass back the rest for processing in the second piece.
      • Dumb Current Thing: Negative transactions. I did something extremely silly there... like add a negative sign before the oid as the array key, or some other BS. Stop it.
    4. Controller decides to go for more recon files or not, based on run parameters. Maybe the default run is only... the last three days or so, to appropriately handle logrotate timing issues.
  • Second piece takes all the missing transactions, rebuilds them, and stuffs them into the donations or refund queue
    1. The Great Log Hunt Mechanism. This is going to take a great deal of investigation and tuning, as it's the majority of processor time spent.
      • Make sure all the right logs are available in distilled format in the working dir.
      • Get the distribution of dates for when we think the payments were initiated. <--ow. Hopefully we've already built this in the process of finding all the missings.
      • Open the most likely... f***. How does this work? Are we happy with that? (no)
      • Don't re-investigate log files you've already looked at.
      • Don't look at everydamnthing. In fact, delete really old cruft.
      • If we can't find the relevant log no matter what we do, *and the transaction itself is of a certain age*, we should rebuild with defaults and be glad about that too.
    2. Rebuilding the message
    3. Send the message to the right queue (donations or refunds)
  • Third... output? Reporting? File moving aroundness? Cleanup?
    1. As stated... somewhere above, it would be sort of neat to somehow target payments log files that haven't had any action in a long time, and delete them.
      • but that's really fiddly from here. I think.


Tricky Bits

Oh, come on. It's all much trickier than it should be.

Audit Modes

  • Running one file per day is a nice thought, but due to our logrotate latency, we probably won't have all of today's information available when the new audit files come in.
    • +1 day is usually necessary just so we have all the records in one place.
    • Unless you want to tune the job to run just after the freshest logrotate, instead of just after the latest download... hurm.

Newish Ideas

  • It would be SUPER GREAT if we could come up with a way to intelligently archive or nuke old working versions of the payments logs.
  • In fact, just... go ahead and redo the whole piece that figures out which payments logs to distill / keep. Because currently, it's terrible.
  • Do I need to worry about accounts at this point? You know, for... times in which we are going to have multiple accounts per gateway. :/
    • Probably.
  • Array of function callbacks? I mean, since we don't actually have class inheritance in drupal modules. Cheap way to do it to start with, but w/e.
    • I mean, it's a bit less cheap than the standard drupal way of relying on function naming scheme voodoo, which we have already established I am not above anyway. Why not cheap out entirely?