Yeah, this approach has been proposed, in an even more aggressive form, see David W. Wall, "Global Register Allocation at Link Time", 1986 [0], specifically for machines with large number of registers: the paper straight up says that it's much less of a problem if you a) have less registers, b) don't compile things separately. And this problem with large register files has been predicted long before, see wonderfully named "How to Use 1000 Registers", 1979 by Richard L. Sites (he would later go work on VAX and design Alpha ISA at DEC).
Register allocation at link time would work only for ISAs where all registers are equivalent or they are partitioned only in a small number of equivalence classes.
It cannot work for an ISA like x86-64, where register allocation and instruction choice are extremely interdependent (e.g. the 16 Intel-AMD general-purpose registers are partitioned into 11 equivalence classes, instead of in at most 2 to 4 equivalence classes, like in the majority of other ISAs, where only a stack pointer and perhaps a return link pointer and/or a null register may have a special behavior). With such an ISA, the compiler must know the exact register allocation before choosing instructions, otherwise the generated program can be much worse than an optimum program. Moreover, an ideal optimizing compiler for x86-64 might need to generate instructions for several alternative register allocations, then select the best allocation and generate the definitive instruction sequence.
With such a non-orthogonal ISA, like x86-64, for the best results you need to allocate registers based on the register usage of the invoked procedures, as assembly programmers typically do, instead of using a uniform sub-optimal ABI, like most compilers.
[0] https://dl.acm.org/doi/10.1145/12276.13338
[1] https://files01.core.ac.uk/download/pdf/9412584.pdf