There's been some theories floating around that the 128gb version could be the best value for on-premise LLM inference. The RAM is split between CPU and GPU at a user-configurable ratio.
So this might be the holy grail of "good enough GPU" and "over 100GB of VRAM" if the rest of the system can keep up.
> It seems like tools will have to adapt to dynamic VRAM allocation, as none of the monitoring tools I've tested assume VRAM can be increased on the fly.
amdgpu_top shows VRAM (the old fixed thing) and GTT (dynamic) separately.
No need for a reboot, echo 9999 >/sys/module/ttm/parameters/pages_limit
You're talking about an allocator policy for when to allow GTT and when not, not the old firmware-level VRAM split thing where whatever size the BIOS sets for VRAM is permanently away from the CPU. The max GTT limit is there to decrease accidental footguns, it's not a technological limitation; at least earlier the default policy was to reserve 1/4 of RAM for non-GPU use, and 1/4*128 GB=32GB is more than enough so you're looking to adjust the policy. It's just an if statement in the kernel, GTT the mechanism doesn't limit it, and deallocating a chunk of memory used by the GPU returns it to the general kernel memory pool, where it can next be used by the CPU again.
You're still thinking of the old school thing, where you set the split in the firmware and it's fixed for that boot. There's dynamic allocation on top of it these days.
I have that split set at the minimum 2 GB and I'm giving the GPU a 20 GB model to process.
So this might be the holy grail of "good enough GPU" and "over 100GB of VRAM" if the rest of the system can keep up.