Unloading Ollama Models Instantly

Ollama is great. It makes running open-source large language models accessible to the masses. When running it in conjunction with a frontend such as open-webui or continue.dev, it's a very powerful tool enabling local and private AI completions. However, most of those frontends take advantage of the keep_alive parameter to mantain a model in VRAM for a set period of time. 99% of the time, this is exactly what you want. The other 1% of the time, you want that model out of your VRAM immediatelyso you can use your GPU for something else (think, gaming or model fine-tuning).

Fortunately, there's an easy solution to this. To clear a model from memory immediately, you just need to pass a value of 0 for keep_alive AND the same model that's currently in memory. E.g.,

curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": 0}'

If you attempt to pass it a different model other than the one that's currently in memory, this doesn't work. For the sake of speed and not having to remember this command, we can create a bash/zsh alias that

  1. Looks up the in-use model via ollama ps
  2. Uses the output of ollama ps to pass the current model into the curl command

the output of ollama ps looks like:

NAME               ID              SIZE     PROCESSOR    UNTIL
llama3.2    38056bbcbb2d    21 GB    100% GPU     2 minutes from now

The code

clear_ollama_model() {
  # Extract the model name from 'ollama ps'
  local model_name=$(ollama ps | awk 'NR==2 {print $1}')

  # Check if a model is loaded
  if [ -z "$model_name" ]; then
    echo "No model is currently loaded."
    return 1
  fi

  # Use the extracted model name in the curl command
  curl http://localhost:11434/api/generate -d "{\"model\": \"$model_name\", \"keep_alive\": 0}"
}

# Create an alias for this function
alias clear_ollama="clear_ollama_model"

How it works

  • ollama ps | awk 'NR==2 {print $1}': This command extracts the model name from the second line of ollama ps output, which corresponds to the currently loaded model.
    • NR==2: Selects the second line.
    • {print $1}: Prints the first field (the model name) from that line.
  • Function Definition: The function clear_ollama_model encapsulates the logic for clearing the model. It checks if a model is loaded and then sends the appropriate curl command to clear it.
  • Alias Creation: The alias clear_ollama is created to call the function, making it easy to run from your terminal.

Usage

After adding this code to your ~/.zshrc, you need to reload the configuration file or restart your terminal session:

source ~/.zshrc

Now, you can simply type clear_ollama in your zsh terminal to clear the currently loaded Ollama model from VRAM.