So I was fiddling a bit. Sadly, despite being able to skip the OCR step, there's...

So I was fiddling a bit. Sadly, despite being able to skip the OCR step, there's obviously a lot of OCR errors in the text as is, which would make reliably stripping out data somewhat more difficult.

The reality is that I don't really have the time to commit to this unless everything worked perfect. Ah well, maybe someone else will do it.