• 1 Post
  • 36 Comments
Joined 9 months ago
cake
Cake day: March 22nd, 2024

help-circle
  • What are bitnet models and what does that change in a nutshell?

    Read the pitch here: https://github.com/ridgerchu/matmulfreellm

    Basically, using ternary weights, all inference-time matrix multiplication can be replaced with much simpler matrix addition. This is theoretically more efficient on GPUs, and astronomically more efficient on dedicated hardware (as adders take up a fraction of the space as multipliers in silicon). This would be particularly fantastic for, say, local inference on smartphones or laptop ASICs.

    The catch is no one has (publicly) risked a couple of million dollars to test it with a large model, as (so far) training it isn’t more efficient than “regular” LLMs.

    Doesn’t Open AI just have the same efficiency issue as computing in general due to hardware from older nodes?

    No one really knows, because they’re so closed and opaque!

    But it appears that their models perform relatively poorly for thier “size.” Qwen is nearly matching GPT-4 in some metrics, yet is probably an order of magnitude smaller, while Google/Claude and some Chinese models are also pulling ahead.


  • The environmental cost of training is a bit of a meme. The details are spread around, but basically, Alibaba trained a GPT-4 level-ish model on a relatively small number of GPUs… probably on par with a steel mill running for a long time, a comparative drop in the bucket compared to industrial processes. OpenAI is extremely inefficient, probably because they don’t have much pressure to optimize GPU usage.

    Inference cost is more of a concern with crazy stuff like o3, but this could dramatically change if (hopefully when) bitnet models come to frutition.

    Still, I 100% agree with this. Closed LLM weights should be public domain, as many good models already are.







  • They’re kinda already there :(. Maybe even worse than raspberry pies.

    Intel has all but said they’re exiting the training/research market.

    AMD has great hardware, but the MI300X is not gaining traction due to a lack of “grassroots” software support, and they were too stupid to undercut Nvidia and sell high vram 7900s to devs, or to even prioritize its support in rocm. Same with their APUs. For all the marketing, they just didn’t prioritize getting them runnable with LLM software





  • B580 24GB and B770 32GB

    They would be incredible, as long as they’re cheap. Intel would be utterly stupid for not doing this.

    With OpenAPI being backed by so many big names, do you think they will be able to upset CUDA in the future or has Nvidia just become too entrenched?

    OpenAI does not make hardware. Also, their model progress has stagnated, already matched or surpassed by Google, Claude, and even some open source chinese models trained on far fewer GPUs… OpenAI is circling the drain, forget them.

    The only “real” competitor to Nvidia, IMO, is Cerebras, which has a decent shot due to a silicon strategy Nvidia simply does not have.

    The AMD MI300X is actually already “better” than Nvidia’s best… but they just can’t stop shooting themselves in the foot, as AMD does. Google TPUs are good, but google only, like Amazon’s hardware. I am not impressed with Groq or Tenstorrent.



  • Its complicated.

    So there’s Intel’s own project/library, which is the fastest way to run LLMs on their IGPs and GPUs. But also the hardest to set up, and the least feature packed.

    There’s more than one Intel compatible llama.cpp ‘backend,’ including the Intel-contribed SYCL one, another PR for the AMX support on CPUs, I think another one branded as ipex-llm, and the vulkan backend that the main llama.cpp devs seem to be focusing on now. The problem is each of these backends have their own bugs, incomplete features, installation quirks, and things they don’t support, while AMD’s rocm kinda “just works” because it inherits almost everything from the CUDA backend.

    It’s a hot mess.

    Hardcore LLM enthusiasts largely can’t keep up, much less the average person just trying to self-host a model.

    OneAPI is basically a nothingburger so far. You can run many popular CUDA libraries on AMD through rocm, right now, but that is not the case with Intel, and no devs are interested in changing that because Intel isn’t selling any “3090 class” GPU hardware worth buying.