• Tar_Alcaran@sh.itjust.works
      link
      fedilink
      arrow-up
      4
      ·
      2 days ago

      The big problem with training LLMs is that you need good data, but there’s so much data you can’t really manually separate all “good” from all “bad” data. You have to use the set of all data, and a much much smaller set of tagged and marked “good” data.