I think nobody wants AI in Firefox, Mozilla

solrize@lemmy.ml · 4 days ago

I think nobody wants AI in Firefox, Mozilla

sudo@programming.dev · edit-2 2 days ago

If I can pick my own API (including local) and sampling parameters

You can do this now:

selfhost ollama.
selfhost open-webui and point it to ollama
enable local models in about:config
select “local” instead of ChatGPT or w/e.

Hardest part is hosting open-webui because AFAIK it only ships as a docker image.

Edit: s/openai/open-webui

brucethemoose@lemmy.world · edit-2 2 days ago

Open WebUI isn’t very ‘open’ and kinda problematic last I saw. Same with ollama; you should absolutely avoid either.

…And actually, why is open web ui even needed? For an embeddings model or something? All the browser should need is an openai compatible endpoint.

sudo@programming.dev · 22 hours ago

The firefox AI sidebar embeds an external open-webui. It doesn’t roll its own ui for chat. Everything with AI is done in the quickest laziest way.

What exactly isn’t very open about open-webui or ollama? Are there some binary blobs or weird copyright licensing? What alternatives are you suggesting?

brucethemoose@lemmy.world · edit-2 14 hours ago

https://old.reddit.com/r/opensource/comments/1kfhkal/open_webui_is_no_longer_open_source/

https://old.reddit.com/r/LocalLLaMA/comments/1mncrqp/ollama/

Basically, they’re both using their popularity to push proprietary bits, which their devleopment is shifting to. They’re enshittifying.

In addition, ollama is just a demanding leech on llama.cpp that contributes nothing back, while hiding the connection to the underlying library at every opportunity. They do scummy things like.

Rename models for SEO, like “Deepseek R1” which is really the 7b distill.
It has really bad default settings (like a 2K default context limit, and default imatrix free quants) which give local LLM runners bad impressions of the whole ecosystem.
They mess with chat templates, and on top of that, create other bugs that don’t exist in base llama.cpp
Sometimes, they lag behind GGUF support.
And other times, they make thier own sloppy implementations for ‘day 1’ support of trending models. They often work poorly; the support’s just there for SEO. But this also leads to some public GGUFs not working with the underlying llama.cpp library, or working inexplicably bad, polluting the issue tracker of llama.cpp.

I could go on and on with examples of their drama, but needless to say most everyone in localllama hates them. The base llama.cpp maintainers hate them, and they’re nice devs.

You should use llama.cpp llama-server as an API endpoint. Or, alternatively the ik_llama.cpp fork, kobold.cpp, or croco.cpp. Or TabbyAPI as an ‘alternate’ GPU focused quantized runtime. Or SGLang if you just batch small models. Llamacpp-python, LMStudo; literally anything but ollama.

As for the UI, thats a muddier answer and totally depends what you use LLMs for. I use mikupad for its ‘raw’ notebook mode and logit displays, but there are many options. Llama.cpp has a pretty nice built in one now.