Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
httpdiff – diff responses to two HTTP/HTTPS requests (github.com/jgrahamc)
197 points by jgrahamc on March 24, 2015 | hide | past | favorite | 39 comments


The Unix way:

    diff <(curl -vs https://news.ycombinator.com/ 2>&1) <(curl -vs https://news.ycombinator.com/ 2>&1)
As a shell function:

    httpdiff () {
        diff <(curl -vs "$1" 2>&1) <(curl -vs "$2" 2>&1)
    }

    httpdiff https://news.ycombinator.com/ https://news.ycombinator.com/


Which works well for that one example. But not others.

    diff <(curl -Lvs https://www.google.com/ 2>&1) <(curl -Lvs http://www.google.com/ 2>&1)
On my machine that produces 33k of output. I like one liners and using the shell, but this tool was built not for fun because I was debugging things that were painful.


That's nothing a couple "1> /dev/null"'s can't solve (or redirect to a temporary file then "diff -q"). But yeah, yours is a bit nicer.


    $ diff -y -W $COLUMNS <(curl -Lvs https://www.google.com/ 2>&1) <(curl -Lvs http://www.google.com/ 2>&1)
Edit: furthermore, with curl you can use options like -b, -c, -d, which would all have to be reimplemented in your system.


If you're worried about paging, you could always just pipe it to less, or if you still wanted colors you could swap out diff for vimdiff. Nice tool, though.


Came here to say similar.

"Those who don't understand Unix are condemned to reinvent it, poorly." – Henry Spencer


There's this thing called ux... John's program has a nice one, diff and curl and bash (used together in the way indicated here) have a terrible one.


I said "similar", I use plan9 so my shell script would have been slight different and my interface cleaner.

Using a 1970s TTY in the 21st century is dumb.


How does plan9 help make your interface cleaner?


Because I can use the built in plumber to do more interesting things. The terminal scrolls, I can send commands from the mouse interface.

I would enhance the shell script a bit to output a few more commands, should I need them.

If it was a tool I used regularly I could sharpen it.


If you got it sharp enough would you share it?


Sure. But I have no interest in doing it right now.

You can look at my shell script HTTP client if you like.

http://plan9.bell-labs.com/sources/contrib/maht/rc/httplib.r...

and some of my other plan9 code

http://plan9.bell-labs.com/sources/contrib/maht/


This is very nice. I'm working on something similar as part of my build process so that I can more directly notice changes as they are made in markup, and catch changes that I didn't expect.

The issue I've hit, that you may want to consider, is a suppression list of sorts. The ability to silence diffs on things that look like dates for example would be rather valuable.


Awesome tool! Our Traffic Inspector also has this feature for requests captured through our API debugging proxy: https://www.runscope.com/docs/comparisons


I made a gist with a collections of one-liners here:

https://gist.github.com/Nijikokun/d6606c036d89d3b1574c


I wrote a similar tool while debugging a web scraper for my current dayjob employer, but it was buried inside of our application rather than standalone.

(Also, it was PHP, which a lot of people hate.)


Nice tool jgrahamc. You should consider expanding how you detect/show diffing of the response bodies, since that has a lot of applications: detecting content changes, detecting ads/malicious code, detect crawl duplicates, security audits. etc.

Years ago I found the Levenshtein distance is super helpful to determine how different the responses are, and used it as part of a black box web security scanner. You can do this just on the Raw HTML, but that's noisy and shows a number of differences. It's better to use an HTML-aware string distance function, that diffs just page content. I used that a channel for detecting blind SQL injection (in combination with some other things).

I also found that you can go a level higher, and use Levenshtein on just the HTML tag structure of different responses. By looking at page structure, and applying different weights based on the HTML tags that were added/removed you can group similar pages, which usually maps to the different functional areas/templates of a site. As in, you can say "these 5 pages are all product details pages", "these 10 pages are all blog posts", etc. Super helpful from a security scanner, since this could inform crawling/auditing choices and speed up audits. It also allowed us to say "you have a XSS vulnerability in your Blog comments form" instead of just saying "you have XSS vulnerabilities in these 100 pages".

Anyway, there is a lot of value in detected how different/similar various responses are. See some of Google's published work about detecting near duplicates for web crawling...


Why not just write this as a bash one-liner?

    function httpdiff {
        diff <(curl -L $1) <(curl -L $2)
    }


Because `diff` only understands naïve textual differences. A tool that understands the HTTP format can give you more meaningful info.


Try

   httpdiff https://www.google.com http://www.google.com/


Getting x509 issues. Output is below:

$ httpdiff https://www.google.com http://www.google.com

Doing GET: https://www.google.com http://www.google.com

Error doing GET https://www.google.com: Get https://www.google.com: x509: certificate signed by unknown authority

I'm sitting behind a pretty heavy proxy though; could be that. That or OS X(10.9.5) certificate store issue maybe?


Taking it down a level or two, occasionally you want to do something similar with DNS - https://gist.github.com/chair6/1748a6676120a0aacea2.


Very cool!

I'm a bit of a newb though - how do I install this?


    1. You need Go

    2a. Type 'make'. It will build the binary and place it in bin/
    2b. Use the go tool chain directly to build. Does the same thing as 2a.


why the makefile and src directory? Why not put the source in the root directory, so that go get github.com/jgrahamc/httpdiff would just work? The makefile doesn't even do anything useful, and just makes it so that poor windows folks can't build your code for no good reason.


Yes, you have to do the archaic `go get github.com/jgrahamc/httpdiff/src/httpdiff`. Truly a travesty. While the makefile allows you to avoid the whole GOPATH nonsense, which gets a plus from me.


Because... bad habits from larger projects.


I wrote a tool for doing this with arbitrarily nested json docs a while back: https://github.com/ChannelIQ/jsoncompare


I recently did this manually by piping responses to a temp file called before and after, and then using vimdiff as my differ which proved quite effective.

This tool looks great, but would not have worked with my particular use case, which was doing some migration of user data, and diffing the user accounts to make sure that they had changed in the expected way.

It might be an idea to add the ability to query the same host twice, but have a user input trigger when to test each host.


I'm working on a project[1] that is based around this functionality. The big "diff" is that I called it response-diff in our original spec, and we're planning to write it in Node.js.

[1] https://github.com/Mashape/changehook


Should rename to "diff two HTTP(S) responses" instead of requests.


Better?


Perfect use case for Go.


Can you elaborate?


Not OP but I'll give it a shot. Golang has a great standard HTTP package that makes this project a breeze to implement. Combine that with cross-compilation and statically linked binaries and you get a tool that can be ran virtually anywhere a developer would want to without needing to set up a new environment.


You took the words right out of my mouth!


is there any reason to use md5? (apart from backward compatibility etc.)


It doesn't really make a difference in a case like this. Though certainly not an ideal algorithm, the chance of a collision for something like this is low enough to never be a concern.


counterpoint: is there any reason not to use a stronger hash function?

Why leave something dangerous lying around when /probably/ nothing is going to go wrong... until someone picks it up and decides to do something with it that was unexpected, when better alternatives abound?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: