My local AI journey
A couple of months ago, my employer provided a Framework Desktop with 128gb of unified memory in order to allow me to explore LLM's locally. I essentially had two complementary goals, firstly, I wanted to use a privacy-friendly code assistant and secondly, I wanted to explore this ecosystem and learn more about AI usage and building software that makes use of AI.
To add to the learning curve, I decided to install Arch Linux. I've been on Ubuntu for a long time on my laptops, and I recently switched from Windows to Bazzite on my gaming pc. But I felt that as an experienced Linux user, it was time to explore a more... advanced version of Linux. Framework provided me with a useful guide, so while the installation felt like typing over commands at some points, I got through it and learned a bit about Linux fundamentals. The wiki is also a very useful tool. I am having fun installing and using it, but honestly: I would generally not recommend Arch over a stable distribution like Ubuntu. Stability of an OS is probably worth a lot more more than getting the bleeding edge.
After figuring out how to install an ssh server, I wanted to get up and running with AI fast, I went ahead and installed Ollama. It did not run. I knew that there were firmware issues, and somehow I figured out that I needed to downgrade the kernel. Yay! After figuring out how (through the wiki) I finally got Ollama to run. I started running gpt-oss:120b and integrated it with IntelliJ and a local ui client.
This was usable! But I was not done. This model 'only' uses about 60gb of memory, while I have 128gb available. Through Donato Capitella (which I knew through the LLM Chronicles), I knew you should be able to run even bigger models.
In a next blog, I'll talk a bit about how I got that to work, and what I found. Till then!