Unloading Ollama Models Instantly
Ollama is great. It makes running open-source large language models accessible to the masses. When running it in conjunction with a frontend such as open-webui or continue.dev, it's a very powerful tool enabling local and private AI completions.
However, most of those frontends take advantage of the keep_alive
parameter to mantain a model in VRAM for a set period of time. 99% of the time, this is exactly what you want. The other 1% of the time, you want that model out of your VRAM immediatelyso you can use your GPU for something else (think, gaming or model fine-tuning).
Fortunately, there's an easy solution to this. To clear a model from memory immediately, you just need to pass a value of 0
for keep_alive
AND the same model that's currently in memory.
E.g.,
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": 0}'
If you attempt to pass it a different model other than the one that's currently in memory, this doesn't work. For the sake of speed and not having to remember this command, we can create a bash/zsh alias that
- Looks up the in-use model via
ollama ps
- Uses the output of
ollama ps
to pass the current model into the curl command
the output of ollama ps
looks like:
NAME ID SIZE PROCESSOR UNTIL
llama3.2 38056bbcbb2d 21 GB 100% GPU 2 minutes from now
The code
clear_ollama_model() {
# Extract the model name from 'ollama ps'
local model_name=$(ollama ps | awk 'NR==2 {print $1}')
# Check if a model is loaded
if [ -z "$model_name" ]; then
echo "No model is currently loaded."
return 1
fi
# Use the extracted model name in the curl command
curl http://localhost:11434/api/generate -d "{\"model\": \"$model_name\", \"keep_alive\": 0}"
}
# Create an alias for this function
alias clear_ollama="clear_ollama_model"
How it works
ollama ps | awk 'NR==2 {print $1}'
: This command extracts the model name from the second line of ollama ps output, which corresponds to the currently loaded model.NR==2
: Selects the second line.{print $1}
: Prints the first field (the model name) from that line.
- Function Definition: The function
clear_ollama_model
encapsulates the logic for clearing the model. It checks if a model is loaded and then sends the appropriate curl command to clear it. - Alias Creation: The alias
clear_ollama
is created to call the function, making it easy to run from your terminal.
Usage
After adding this code to your ~/.zshrc, you need to reload the configuration file or restart your terminal session:
source ~/.zshrc
Now, you can simply type clear_ollama
in your zsh terminal to clear the currently loaded Ollama model from VRAM.