That was also the conclusion I drew, but it makes me wonder why that would take too much power to be practical on a smartphone as the article suggests. It seems like it would only need to capture the text of the question and push it over the network to IBM's servers.
I think what they mean is, the datacenters actually running Watson would consume too much power. A mobile app with millions of users would require many many instances of Watson running, so 10 racks of servers per instance doesn't sound feasible.
While I agree with this interpretation, here's an explicit quote from the OP:
"Even though most of the computations occur at the data center, a Watson smartphone application would still consume too much power for it to be practical today."
Perhaps they are referring to speech recognition, which would either consume a lot of bandwidth being sent over the wire as audio or consume a lot of power to be processed on-device.