On my machine that produces 33k of output. I like one liners and using the shell, but this tool was built not for fun because I was debugging things that were painful.
If you're worried about paging, you could always just pipe it to less, or if you still wanted colors you could swap out diff for vimdiff. Nice tool, though.
This is very nice. I'm working on something similar as part of my build process so that I can more directly notice changes as they are made in markup, and catch changes that I didn't expect.
The issue I've hit, that you may want to consider, is a suppression list of sorts. The ability to silence diffs on things that look like dates for example would be rather valuable.
I wrote a similar tool while debugging a web scraper for my current dayjob employer, but it was buried inside of our application rather than standalone.
Nice tool jgrahamc. You should consider expanding how you detect/show diffing of the response bodies, since that has a lot of applications: detecting content changes, detecting ads/malicious code, detect crawl duplicates, security audits. etc.
Years ago I found the Levenshtein distance is super helpful to determine how different the responses are, and used it as part of a black box web security scanner. You can do this just on the Raw HTML, but that's noisy and shows a number of differences. It's better to use an HTML-aware string distance function, that diffs just page content. I used that a channel for detecting blind SQL injection (in combination with some other things).
I also found that you can go a level higher, and use Levenshtein on just the HTML tag structure of different responses. By looking at page structure, and applying different weights based on the HTML tags that were added/removed you can group similar pages, which usually maps to the different functional areas/templates of a site. As in, you can say "these 5 pages are all product details pages", "these 10 pages are all blog posts", etc. Super helpful from a security scanner, since this could inform crawling/auditing choices and speed up audits. It also allowed us to say "you have a XSS vulnerability in your Blog comments form" instead of just saying "you have XSS vulnerabilities in these 100 pages".
Anyway, there is a lot of value in detected how different/similar various responses are. See some of Google's published work about detecting near duplicates for web crawling...
why the makefile and src directory? Why not put the source in the root directory, so that go get github.com/jgrahamc/httpdiff would just work? The makefile doesn't even do anything useful, and just makes it so that poor windows folks can't build your code for no good reason.
Yes, you have to do the archaic `go get github.com/jgrahamc/httpdiff/src/httpdiff`. Truly a travesty. While the makefile allows you to avoid the whole GOPATH nonsense, which gets a plus from me.
I recently did this manually by piping responses to a temp file called before and after, and then using vimdiff as my differ which proved quite effective.
This tool looks great, but would not have worked with my particular use case, which was doing some migration of user data, and diffing the user accounts to make sure that they had changed in the expected way.
It might be an idea to add the ability to query the same host twice, but have a user input trigger when to test each host.
I'm working on a project[1] that is based around this functionality. The big "diff" is that I called it response-diff in our original spec, and we're planning to write it in Node.js.
Not OP but I'll give it a shot. Golang has a great standard HTTP package that makes this project a breeze to implement. Combine that with cross-compilation and statically linked binaries and you get a tool that can be ran virtually anywhere a developer would want to without needing to set up a new environment.
It doesn't really make a difference in a case like this. Though certainly not an ideal algorithm, the chance of a collision for something like this is low enough to never be a concern.
counterpoint: is there any reason not to use a stronger hash function?
Why leave something dangerous lying around when /probably/ nothing is going to go wrong... until someone picks it up and decides to do something with it that was unexpected, when better alternatives abound?