• Tetragrade@leminal.space
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    4
    ·
    edit-2
    2 days ago

    You cannot know this a-priori. The commenter is clearly producing a stochastic average of the explanations that up the advantage for their material conditions.

    For instance, many SoTA models are trained using reinforcement learning, so it’s plausible that its learned that spamming meaningless tokens can delay negative reward (this isn’t even particularly complex). There’s no observable difference in the response, without probing the weights we’re just yapping.

    • entwine@programming.dev
      link
      fedilink
      arrow-up
      8
      arrow-down
      3
      ·
      2 days ago

      I’m not sure I understand what you’re saying. By “the commenter” do you mean the human or the AI in the screenshot?

      Also,

      For instance, many SoTA models are trained using reinforcement learning, so it’s plausible that its learned that spamming meaningless tokens can delay negative reward

      What’s a “negative reward”? You mean a penalty? First of all, I don’t believe this makes sense either way because if the model was producing garbage tokens, it would be obvious and caught during training.

      But even if it wasn’t, and it did in fact generate a bunch of garbage that didn’t print out in the Claude UI, and the explanation of “simulated progress” was the AI model coming up with a plausible explanation for the garbage tokens, it still does not make it sentient (or even close).

      • Tetragrade@leminal.space
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        edit-2
        2 days ago

        I’m not sure I understand what you’re saying. By “the commenter”

        I was talking about you, but not /srs, that was an attempt @ satire. I’m dismissing the results by appealing to the fact that there’s a process.

        negative reward

        Reward is an AI maths term. It’s the value according to which the neurons are updated, similar to “loss” or “error”, if you’ve heard those.

        I don’t believe this makes sense either way because if the model was producing garbage tokens, it would be obvious and caught during training.

        Yes this is also possible, it depends on minute details of the training set, which we don’t know.

        Edit: As I understand, these models are trained in multiple modes, one where they’re trying to predict text (supervised learning), but there are also others where it’s given a prompt, and the response is sent to another system to be graded i.e. for factual accuracy. It could learn to identify which “training mode” it’s in and behave differently. Although, I’m sure the ML guys have already thought of that & tried to prevent it.

        it still does not make it sentient (or even close).

        I agree, noted this in my comment. Just saying, this isn’t evidence either way.

        • MadhuGururajan@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          11 hours ago

          I’m sure the ML Guys thought of that & tried to prevent it.

          Deferring to authority is fine as long as you don’t make assumptions about what happened or didn’t happen.

          • Tetragrade@leminal.space
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            10 hours ago

            I mean, because it’s a risk that’s obvious even to me, and it’s not my job to think about it all day. I guess they could just be stupid. 🤷