☆ Yσɠƚԋσʂ ☆

  • 139 Posts
  • 151 Comments
Joined 6 years ago
cake
Cake day: March 30th, 2020

help-circle






  • We already agree that companies like openai are a problem. That said though, even these companies have an incentive to use newer models that perform better to reduce their own costs and stay competitive. If openai needs a data centre to do what you can do on consumer grade hardware with a model like qwen or deepseek, they’re not gonna stay in business for very long.

    And yeah Global Times article is specifically talking about multimodal LLMs which is the same type of AI.



  • That’s actually a really interesting point you bring up that I haven’t thought of. LLMs make open source even more useful because they pave the way for users to modify the source themselves even if they’re not technical. If somebody grabs an open source app, and they want a specific feature, they can have a coding agent add it for them and describe what they want using natural language. Maybe it won’t be as good as what an actual human dev would do, but if it solves their problem then it’s a net win. This makes the whole open source model of development incredibly powerful because anybody can adapt software to their needs and create their own customized computing environment.

    I also think that it would make sense to move away from the way applications are structured currently where the UI is tightly coupled to the backend. It would make far more sense to structure apps using a client/server model with a well defined API between them. This approach makes it possible to make scripts that combine functionality from different apps, the same way we do scripts in the shell. And this is where LLMs could be really useful. You could just have a UI that’s a canvas with a prompt, and the LLM connect to the APIs all your apps provide. Then the LLM could leverage functionality from different apps and render a UI for the specific request you make. You could also have it create custom UIs for whatever workflow you happen to have.


  • this is definitely fair. i think my big issue with it is the inordinate amount of capital (land, carbon emissions, water) that go into it. maybe i’ve unfairly associated all ai with openai and gemini and meta.

    I very much expect the whole bubble to pop because these companies still haven’t found a viable business model. I agree the way these big companies approach things is incredibly problematic. At the same time, the best thing to do is to promote development of this tech outside corporate control. We already saw the panic over DeepSeek being open sourced, and the more development happens in the open the less leverage these companies will have. There’s also far more incentive to make open solutions efficient because people want to run them locally on commodity hardware.

    my understanding of deepseek is that most of their models are trained by engaging in dialogue with existing models. the cost of training and running those models should be taken into account in that case. if it is from scratch that might change things, if the carbon and water numbers are good.

    Sure, but that also shows that you don’t need to train models from scratch going forward. The work has already been done and now it can be leveraged to make better models on top of it.

    i thought we were talking specifically about language models and photo and video generation and whatnot

    Doing text, image, and video generation is just one application for these models. Another application of multimodal AI is that it can integrate information from different sensors like vision, sound, and tactile feedback, and this makes it useful for building world models robots can leverage to interact with the environment. https://www.globaltimes.cn/page/202507/1339392.shtml






  • I’ve been doing programming for a long time, and I can tell you that learning to use a language effectively takes a long time in practice. The reality is that it’s not just syntax you have to learn, but the tooling around the language, the ecosystem, its libraries, best practices, and so on. Then, there are families of languages. If you know one imperative language then core concepts transfer well to another, however they’re not going to be nearly as useful if you’re working with a functional language. The effort in learning languages should not be trivialized. This is precisely the problem LLMs solve because you can focus on what you want to do conceptually, which is a transferable skill, and the LLM knows language and ecosystem details which is the part that you’d be spending time learning.

    Meanwhile, studies about GPT3 are completely meaningless today. The efficiency has already improved dramatically and models that outperform those requiring a data centre even a year ago, can now be run on your laptop. You can make the argument that the aggregate demand for using LLM tools is growing, but that just means these tools are genuinely useful and people reach for them more than other tools they used to use. It’s worth noting that people are still discovering new techniques for optimizing models, and there’s no indication that we’re hitting any sort of a plateau here.


  • If you’ve just been using the web UI for DeepSeek, I highly recommend checking out using tools that let you run models on the actual codebase you’re working with. It’s a much better experience because the model has a lot more context to work with.

    There are two broad categories of tools. One is REPL style interface where you start a chat in the terminal, and the agent manages all the code changes while you prompt it with what you want to do. You don’t have as much control here, but the agents tend to do a pretty good job of analyzing the codebase holistically. The two main ones to look at are Aider and plandex.

    The other approach is editor integration as seen with Cursor. Here you’re doing most of the driving and high level planning, and then use the agent contextually to add code like writing individual functions. You have a lot more granular control over what the agent is doing this way. It’s worth noting that you also have a chat mode here as well, and you can get the agent to analyze the code, find things in the project, etc. I find this is another aspect that’s often under appreciated where you can use the LLM to find the relevant code you need to change. A couple of projects to look at are Continue and Roo-Code.

    All these projects work with ollama locally, but I’ve found DeepSeek API access is pretty cheap and you do tend to get better results that way. Obviously, caveat is that you are sending code to their servers.


  • I very much agree with all that. This is already a very useful tool, and it can save you a lot of time once you learn how to apply it effectively. As with any tool, it takes time to develop intuition for cases where it works well, and how to use it to get the results you want. I get the impression that a lot of people try using LLMs out of spite already having a bias that the tool is not useful, then they naturally fail to produce good results on the first try and declare it to be useless.

    As you point out, it’s an excellent tool for learning to work with new languages, to discover tricks for system configuration, and so on. I’ve been doing software development for over 20 years now professionally, and I know some languages well and others not so much. With LLMs, I can basically use any language like an expert. For example, I recently had to work on a Js project, and I haven’t touched the language in years. I wasn’t familiar with the ecosystem, current best practices, or popular libraries. Using an LLM allowed me to get caught up on that very quickly.

    I’m also not too worried about the loss of skill or thinking capacity because the really useful skills lie in understanding the problem you’re trying to solve conceptually and designing a solution that will solve it. High level architecture tends to be the really important skill, and I find that’s basically where the focus is working with agents. The LLM can focus on the nitty gritty aspects of writing the code, while I focus on the structure and the logic flow. One approach I’ve found very effective is to stub out the functions myself, and have the agent fill in the blanks for me. This helps focus the LLM and prevent it from going off into the weeds.

    Another trick I found is that’s handy is to ask the agent to first write a plan for the solution. Then I can review the plan and tell the agent to adjust it as needed before implementing. Agents are also pretty good at writing tests, and tests are much easier to evaluate for correctness because good tests are just independent functions that do one thing and don’t have a deep call stack. My current approach is to get the LLM to write the plan, add tests, and then focus on making sure I understand the tests and that they pass. At that point I have a fairly high degree of confidence that the code is indeed doing what’s needed. The tests act as a contract for the agent to fill.

    I suspect that programming languages might start shifting in the direction of contracts in general. I can see stuff like this becoming the norm, where you simply specify the signature for the function. You could also specify parameters like computational complexity and memory usage. The agent could then try to figure out how to fill the contract you’ve defined. It would be akin to genetic algorithm approach where the agent could converge on a solution over time. If that’s the direction things will be moving in, then current skills could be akin to being able to write assembly by hand. Useful in some niche situations, but not necessary vast majority of the time.

    Finally, it’s very helpful to structure things using small components components that can be tested independently and composed together to build bigger things. As long as the component functions in the intended way, I don’t necessarily care about the quality of the code internally. I can treat them as black boxes as long as they’re doing what’s expected. This is already the approach we take with libraries. We don’t audit every line of code in a library we include in a project. We just look at its surface level API.

    Incidentally, I’m noticing that functional style seems to work really well here. Having an assembly line of pure functions naturally breaks up a problem into small building blocks that you can reason about in isolation. It’s kind of like putting Lego blocks together. The advantage over stuff like microservies here is that you don’t have to deal with the complexity of orchestration and communication between the services.