httpdiff – diff responses to two HTTP/HTTPS requests

tlrobinson · on March 24, 2015

The Unix way:

    diff <(curl -vs https://news.ycombinator.com/ 2>&1) <(curl -vs https://news.ycombinator.com/ 2>&1)

As a shell function:

    httpdiff () {
        diff <(curl -vs "$1" 2>&1) <(curl -vs "$2" 2>&1)
    }

    httpdiff https://news.ycombinator.com/ https://news.ycombinator.com/

jgrahamc · on March 24, 2015

Which works well for that one example. But not others.

    diff <(curl -Lvs https://www.google.com/ 2>&1) <(curl -Lvs http://www.google.com/ 2>&1)

On my machine that produces 33k of output. I like one liners and using the shell, but this tool was built not for fun because I was debugging things that were painful.

tlrobinson · on March 24, 2015

That's nothing a couple "1> /dev/null"'s can't solve (or redirect to a temporary file then "diff -q"). But yeah, yours is a bit nicer.

Hello71 · on March 24, 2015

    $ diff -y -W $COLUMNS <(curl -Lvs https://www.google.com/ 2>&1) <(curl -Lvs http://www.google.com/ 2>&1)

Edit: furthermore, with curl you can use options like -b, -c, -d, which would all have to be reimplemented in your system.

jez · on March 24, 2015

If you're worried about paging, you could always just pipe it to less, or if you still wanted colors you could swap out diff for vimdiff. Nice tool, though.

SixSigma · on March 24, 2015

Came here to say similar.

"Those who don't understand Unix are condemned to reinvent it, poorly." – Henry Spencer

natefinch · on March 24, 2015

There's this thing called ux... John's program has a nice one, diff and curl and bash (used together in the way indicated here) have a terrible one.

SixSigma · on March 24, 2015

I said "similar", I use plan9 so my shell script would have been slight different and my interface cleaner.

Using a 1970s TTY in the 21st century is dumb.

jdc · on March 25, 2015

How does plan9 help make your interface cleaner?

SixSigma · on March 25, 2015

Because I can use the built in plumber to do more interesting things. The terminal scrolls, I can send commands from the mouse interface.

I would enhance the shell script a bit to output a few more commands, should I need them.

If it was a tool I used regularly I could sharpen it.

c22 · on March 25, 2015

If you got it sharp enough would you share it?

SixSigma · on March 25, 2015

Sure. But I have no interest in doing it right now.

You can look at my shell script HTTP client if you like.

http://plan9.bell-labs.com/sources/contrib/maht/rc/httplib.r...

and some of my other plan9 code

http://plan9.bell-labs.com/sources/contrib/maht/

brokentone · on March 24, 2015

This is very nice. I'm working on something similar as part of my build process so that I can more directly notice changes as they are made in markup, and catch changes that I didn't expect.

The issue I've hit, that you may want to consider, is a suppression list of sorts. The ability to silence diffs on things that look like dates for example would be rather valuable.

johns · on March 24, 2015

Awesome tool! Our Traffic Inspector also has this feature for requests captured through our API debugging proxy: https://www.runscope.com/docs/comparisons

nijiko · on March 24, 2015

I made a gist with a collections of one-liners here:

https://gist.github.com/Nijikokun/d6606c036d89d3b1574c

some_furry · on March 24, 2015

I wrote a similar tool while debugging a web scraper for my current dayjob employer, but it was buried inside of our application rather than standalone.

(Also, it was PHP, which a lot of people hate.)

billyhoffman · on March 25, 2015

Nice tool jgrahamc. You should consider expanding how you detect/show diffing of the response bodies, since that has a lot of applications: detecting content changes, detecting ads/malicious code, detect crawl duplicates, security audits. etc.

Years ago I found the Levenshtein distance is super helpful to determine how different the responses are, and used it as part of a black box web security scanner. You can do this just on the Raw HTML, but that's noisy and shows a number of differences. It's better to use an HTML-aware string distance function, that diffs just page content. I used that a channel for detecting blind SQL injection (in combination with some other things).

I also found that you can go a level higher, and use Levenshtein on just the HTML tag structure of different responses. By looking at page structure, and applying different weights based on the HTML tags that were added/removed you can group similar pages, which usually maps to the different functional areas/templates of a site. As in, you can say "these 5 pages are all product details pages", "these 10 pages are all blog posts", etc. Super helpful from a security scanner, since this could inform crawling/auditing choices and speed up audits. It also allowed us to say "you have a XSS vulnerability in your Blog comments form" instead of just saying "you have XSS vulnerabilities in these 100 pages".

Anyway, there is a lot of value in detected how different/similar various responses are. See some of Google's published work about detecting near duplicates for web crawling...

pokoleo · on March 24, 2015

Why not just write this as a bash one-liner?

    function httpdiff {
        diff <(curl -L $1) <(curl -L $2)
    }

TazeTSchnitzel · on March 24, 2015

Because `diff` only understands naïve textual differences. A tool that understands the HTTP format can give you more meaningful info.

jgrahamc · on March 24, 2015

Try

   httpdiff https://www.google.com http://www.google.com/

beersigns · on March 24, 2015

Getting x509 issues. Output is below:

$ httpdiff https://www.google.com http://www.google.com

Doing GET: https://www.google.com http://www.google.com

Error doing GET https://www.google.com: Get https://www.google.com: x509: certificate signed by unknown authority

I'm sitting behind a pretty heavy proxy though; could be that. That or OS X(10.9.5) certificate store issue maybe?

finnigja · on March 24, 2015

Taking it down a level or two, occasionally you want to do something similar with DNS - https://gist.github.com/chair6/1748a6676120a0aacea2.

Humjob · on March 24, 2015

Very cool!

I'm a bit of a newb though - how do I install this?

jgrahamc · on March 24, 2015

    1. You need Go

    2a. Type 'make'. It will build the binary and place it in bin/
    2b. Use the go tool chain directly to build. Does the same thing as 2a.

natefinch · on March 24, 2015

why the makefile and src directory? Why not put the source in the root directory, so that go get github.com/jgrahamc/httpdiff would just work? The makefile doesn't even do anything useful, and just makes it so that poor windows folks can't build your code for no good reason.

lclarkmichalek · on March 24, 2015

Yes, you have to do the archaic `go get github.com/jgrahamc/httpdiff/src/httpdiff`. Truly a travesty. While the makefile allows you to avoid the whole GOPATH nonsense, which gets a plus from me.

jgrahamc · on March 24, 2015

Because... bad habits from larger projects.

dmritard96 · on March 25, 2015

I wrote a tool for doing this with arbitrarily nested json docs a while back: https://github.com/ChannelIQ/jsoncompare

Newky · on March 24, 2015

I recently did this manually by piping responses to a temp file called before and after, and then using vimdiff as my differ which proved quite effective.

This tool looks great, but would not have worked with my particular use case, which was doing some migration of user data, and diffing the user accounts to make sure that they had changed in the expected way.

It might be an idea to add the ability to query the same host twice, but have a user input trigger when to test each host.

anonfunction · on March 25, 2015

I'm working on a project[1] that is based around this functionality. The big "diff" is that I called it response-diff in our original spec, and we're planning to write it in Node.js.

[1] https://github.com/Mashape/changehook

m0dest · on March 24, 2015

Should rename to "diff two HTTP(S) responses" instead of requests.

jgrahamc · on March 24, 2015

Better?

artursapek · on March 24, 2015

Perfect use case for Go.

jnpatel · on March 24, 2015

Can you elaborate?

anonfunction · on March 25, 2015

Not OP but I'll give it a shot. Golang has a great standard HTTP package that makes this project a breeze to implement. Combine that with cross-compilation and statically linked binaries and you get a tool that can be ran virtually anywhere a developer would want to without needing to set up a new environment.

artursapek · on March 25, 2015

You took the words right out of my mouth!

lukasm · on March 24, 2015

is there any reason to use md5? (apart from backward compatibility etc.)

rudolf0 · on March 24, 2015

It doesn't really make a difference in a case like this. Though certainly not an ideal algorithm, the chance of a collision for something like this is low enough to never be a concern.

akjsdfh · on March 25, 2015

counterpoint: is there any reason not to use a stronger hash function?

Why leave something dangerous lying around when /probably/ nothing is going to go wrong... until someone picks it up and decides to do something with it that was unexpected, when better alternatives abound?