A Qwen 2.5 14B IQ3_M should completely fit in your VRAM, with longish context, with acceptable quality.
An IQ4_XS will just barely overflow but should still be fast at short context.
And while I have not tried it yet, the 14B is allegedly smart.
Also, what I do on my PC is hook up my monitor to the iGPU so the GPU’s VRAM is completely empty, lol.
Same issue here.
I think it’s just a temporary issue, but I really will stop using Reddit if old.reddit.com goes down. There are niches that just aren’t on here… but that UI.