You get to do projections
as in the Pythagorean theorem.
The coefficients you need in the
projections are just the
values of some inner products.
With random variables, those
coefficients are covariances,
that is, much the same as
correlations, that commonly
can estimate from data.
In the multivariate Gaussian
case, uncorrelated implies
independence.
Fourier theory is easier in
L2 than in L1. E.g., in
classic Fourier series, the
error in the approximation
is in L2 and is from the
L2 orthogonality of the
harmonics.
Yes, L-infinity can
also be nice: The
uniform limit of a sequence
of continuous functions
is continuous.
Or, with L2, often get a
Hilbert space but with
L1 or L-infinity usually
get at best just a Banach space --
that is, a complete, normed
vector space. Then, yes, can
get the Hahn-Banach theorem,
but the same thing in Hilbert
space is easier.
There is a sense in which L1 and
L-infinity are duals of
each other, but L2 is self-dual
which is nicer.
Filling in all these details and
more is part of functional analysis
101. There tough to miss at least
three books of W. Rudin:
Principles of Mathematical Analysis,
Real and Complex Analysis,
and Functional Analysis.
There's more, but I've
got some bugs to get out
of the software of my
Web pages!
I like the question -- asked
it myself at the NIST early in my career.
The answer I gave here is better
than what people told me then.
I've indicated likely most of the
main points, but my answer here is
rough and ready (I typed too fast),
and a quite polished answer is
also possible -- I just don't have
time today
to dig out my grad school
course notes, scan through Rudin,
Dunford and Schwartz,
Kolmogorov and Fomin, much of
digital filtering, much of
multi-variate statistics, etc.
I intuit that you're getting at the real answer with the self-dual. That makes a lot of sense. Also, from a practical perspective L2 is very nice because it causes the problem of error reduction to be quadratic, so it scales well.
No: If what you want is the L-infinity
norm, then go for it. A standard place
for that is numerical approximations of
special functions -- want guarantees on
the worst case error. And there is some
math to help achieve that. It's
sometimes called Chebyshev approximation.
But, in practice, the usual situation,
e.g., signal processing, multi-variate
statistics, there's no good reason
not to use L2 and many biggie reasons
to use it. E.g., for a given box
of data, commonly the better tools in
L2 just let you do better.
Or
to the customer: "If you will go for
a good L2 approximation, then we
are in good shape. If you insist
on L1 or L-infinity, then we will
need a lot more data and still
won't do as well.".
Again, a biggie example is just
classic Fourier series. Sure,
if you are really concerned about
the Gibbs phenomenon, then maybe
work on that. Otherwise, L2 is the
place to be.
E.g., L1 and L-infinity can commonly
take you into linear programming.
Generally you will be much happier
with the tools available to you
in L2.
Again, just now I just don't
have time for a more full,
complete, and polished explanation.
A really good explanation would
require much of a good ugrad
and Master's in math, with
concentration on analysis and
a wide range of applications.
I've been there, done that but
just don't have time to
write out even a good summary of
all that material here.
The coefficients you need in the projections are just the values of some inner products. With random variables, those coefficients are covariances, that is, much the same as correlations, that commonly can estimate from data.
In the multivariate Gaussian case, uncorrelated implies independence.
Fourier theory is easier in L2 than in L1. E.g., in classic Fourier series, the error in the approximation is in L2 and is from the L2 orthogonality of the harmonics.
Yes, L-infinity can also be nice: The uniform limit of a sequence of continuous functions is continuous.
Or, with L2, often get a Hilbert space but with L1 or L-infinity usually get at best just a Banach space -- that is, a complete, normed vector space. Then, yes, can get the Hahn-Banach theorem, but the same thing in Hilbert space is easier.
There is a sense in which L1 and L-infinity are duals of each other, but L2 is self-dual which is nicer.
Filling in all these details and more is part of functional analysis 101. There tough to miss at least three books of W. Rudin: Principles of Mathematical Analysis, Real and Complex Analysis, and Functional Analysis.
There's more, but I've got some bugs to get out of the software of my Web pages!
I like the question -- asked it myself at the NIST early in my career. The answer I gave here is better than what people told me then.
I've indicated likely most of the main points, but my answer here is rough and ready (I typed too fast), and a quite polished answer is also possible -- I just don't have time today to dig out my grad school course notes, scan through Rudin, Dunford and Schwartz, Kolmogorov and Fomin, much of digital filtering, much of multi-variate statistics, etc.