Selfhost an LLM

Shimitar@downonthestreet.eu · 13 hours ago

Selfhost an LLM

hendrik@palaver.p3x.de · edit-2 12 hours ago

There’s another community for this: [email protected]
Though we mostly discuss the news and specific questions there, beginner questions are a bit more rare.

I think you already got a lot of good answers here, LMStudio, OpenWebUI, LocalAI…
I’d like to add KoboldCpp that’s kind of made for gaming/dialogue, but it can do everything. And from my experience it’s very easy to set up and bundles everything into one program.

vane@lemmy.world · 11 hours ago

You can host ollama and open-webui on container. If you want to wire search you can connect open-webui to playwright (also container) and searxng (also container) and llm will search the web for answers

badcommandorfilename@lemmy.world · 13 hours ago

I just started using https://lemonade-server.ai/

It has so far been pretty effortless and would be good if you are new to selfhosting

edit-2 10 hours ago

Openwebui is awesome and allows u to use it as an api for all the models u have it hooked up to. Can point it at ollama or any openai api compatible endpoint (like open routers)

splendoruranium@infosec.pub · edit-2 13 hours ago

I read about OLLAMA, but it’s all unclear to me.

There’s really nothing more to it than the initial instructions tell you. Literally just a “curl -fsSL https://ollama.com/install.sh | sh”. Then you’re just a “ollama run qwen3:14b” away from having a chat with the model in your terminal.
That’s the “chat with it”-part done.

After that you can make it more involved by serving the model via API, manually adding .gguf quantizations (usually smaller or special-purpose modified bootleg versions of big published models) to your Ollama library with a modelcard, ditching Ollama altogether for a different environment or, the big upgrade, giving your chats a shiny frontend in the form of Open-WebUI.

Mike Wooskey@lemmy.thewooskeys.com · edit-2 13 hours ago

Sounds like you already know what you need to know to host Ollama in a Docker container. Ollama is an LLM “engine” - you can interact with LLM models via a CLI or you can integrate them into other services via an API.

To have a web page chat like ChatGPT or others, I installed OpenWebU. I love it! A friend of mine likes LMStudio, which i think is a desktop app, but I don’t know anything about it.

ikt@aussie.zone · 13 hours ago

+1 LM Studio, so easy to use and so powerful

three@lemmy.world · 11 hours ago

LM Studio is a beast.

ragingHungryPanda@piefed.keyboardvagabond.com · 11 hours ago

i had to do a particular command to get the AMD GPU properly available in docker. i can’t find that if you need

iii@mander.xyz · edit-2 13 hours ago

One of these projects might be of interest to you:

Do note that CPU inference is quite a lot slower than GPU or the well known SAAS providers. I currently like the quantized deepseek models as the best balance between quality of replies and inference time when not using GPU.

thr0w4w4y2@sh.itjust.works · 13 hours ago

there’s a good tutorial to host ollama and a vector database here