Self-hosting LLMs

GreenSofaBed@lemmy.zip · 18 days ago

Self-hosting LLMs

Avid Amoeba@lemmy.ca · 18 days ago

If you need to serve only one user at the time, ollama +Webui works great. If you need multiple users at the same time, check out vLLM.

Why can’t it serve multiple users? Open Web UI seems to support multiple users.

The Hobbyist@lemmy.zip · 18 days ago

I didn’t say it can’t. But I’m not sure how well it is optimized for it. From my initial testing it queues queries and submits them one after another to the model, I have not seen it batch compute the queries, but maybe it’s a setup thing on my side. vLLM on the other hand is designed specifically for the multi co current user use case and has multiple optimizations for it.