I’ve been playing around with Ollama in a VM on my machine and it is really useful.

To get started I would start by making sure you have capable hardware. You will need recent hardware so that old computer you have laying around may not be enough. I created a VM on my laptop with KVM and gave it 8gb of ram and 12 cores.

Next, read the readme. You can find the Readme at the github repo

https://github.com/ollama/ollama

Once you run the install script you will need to download models. I would download Llama2, Mistral and LLava. As an example you can pull down llama2 with ollama pull llama2

Ollama models are available in the online repo. You can see all of them here: https://ollama.com/library

Once they are downloaded you need to setup openwebui. First, install docker. I am going to assume you already know how to do that. Once docker is installed pull and deploy open web UI with this command. Notice its a little different than the command in the open web UI docs. docker run -d --net=host -e OLLAMA_BASE_URL="http://localhost:11434 -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Notice that the networking is shared with the host. This is needed for the connection. I also am setting the environment variable in order to point open web UI to ollama.

Once that’s done open up the host IP on port 8080 and create an account. Once that’s done you should be all set.

  • Scrubbles@poptalk.scrubbles.tech
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 months ago

    Personally I’ve really enjoyed text-generation-webui. It made it really easy to ramp up and learn. Very cool stuff you got though, I’ll probably be looking at a comparison between them!

  • RedNight@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 months ago

    Ollama has been great for self-hosting, but also checkout vLLM as its the new shiny self-hosting toy

  • mozz@mbin.grits.dev
    link
    fedilink
    arrow-up
    1
    ·
    8 months ago

    Does it work out okay with 12 cores purely on CPU? About how fast is the interaction?

    I played around a little with Ollama and gpt4all but it seemed to me like it wasn’t fast enough to be useful on pure CPU, but if I could just throw cores at it then I might revisit the issue.

    • Possibly linux@lemmy.zipOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      8 months ago

      It wasn’t usable a few months ago. However, when I setup ollama it was “fast” and it works ok. It takes anywhere from instant to 5min for responses. LLava seems to take the longest which makes sense. For llama2 it is fairly fast unless you ask it for obscure information.

  • thantik@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    8 months ago

    The biggest thing that I want to learn is how to either A: add “tools” for the AI to run, or B: “fine-tune” the model by feeding it data that’s relevant to me.