• HereIAm@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    19 days ago

    I understand you’re trying to be nice to minority languages, but if you write research papers you either limit your demographic to your own country, or you publish in English (I guess Spanish is pretty world wide as well). If you set out to read a new paper in your field, I doubt you’d pick up something in Mongolian.

    Even in Sweden I would write a serious paper in English, so that more of the world could read it. Yes, we have text books for our courses that are in Swedish, but i doubt there are many books covering LLMs being published currently for example.

    • chloroken@lemmy.ml
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      edit-2
      19 days ago

      I’m not “trying to be nice to minority languages”, I’m directly pushing back against the chauvinistic idea that the English Wikipedia is so important that those without it are somehow inferior. There is no “doom spiral”.

      As for scientific papers, it’s called a translation. One can write academic literature in one’s native langaue and have it translated for more reach. That isnt the case with Wikipedia which is constantly being edited.

      • Alaknár@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        19 days ago

        I’m not “trying to be nice to minority languages”, I’m directly pushing back against the chauvinistic idea that the English Wikipedia is so important that those without it are somehow inferior. There is no “doom spiral”.

        I think you missed the problem described here.

        The “doom spiral” is not because of English Wiki, it has nothing to do with anything.

        The problem described is that people who don’t know a “niche” language try to contribute to a niche Wiki by using machine translation/LLMs.

        As per the article:

        Virtually every single article had been published by people who did not actually speak the language. Wehr, who now teaches Greenlandic in Denmark, speculates that perhaps only one or two Greenlanders had ever contributed. But what worried him most was something else: Over time, he had noticed that a growing number of articles appeared to be copy-pasted into Wikipedia by people using machine translators. They were riddled with elementary mistakes—from grammatical blunders to meaningless words to more significant inaccuracies, like an entry that claimed Canada had only 41 inhabitants. Other pages sometimes contained random strings of letters spat out by machines that were unable to find suitable Greenlandic words to express themselves.

        Now, another problem is Model Collapse (or, well, a similar phenomenon in strictly in terms of language itself).

        We now have a bunch of “niche” languages’ Wikis containing such errors… that are being used to train machine translators and LLMs to handle these languages. This is contaminating their input data with errors and hallucinations, but since this is the training data, these LLMs consider everything in there as the truth, propagating the errors/hallucinations forward.

        I honestly have no clue where you’re getting anything chauvinistic here. The problem is imperfect technology being misused by irresponsible people.

        • AA5B@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          19 days ago

          Is it even getting misused? Spreading knowledge via machine translation where there are no human translators available, had to be better than not translating. As long as there is transparency so people can judge the results ……

          And ai training trusting everything it reads is a larger systemic issue, not limited to this niche.

          Perhaps part of the solution is machine readable citations. Maybe a search engine or ai could provide better results if it knew what was human generated vs machine generated. But even then you have huge gaps on one side with untrustworthy humans (like comedy) and on the other side with machine generated facts such as from a database

          • DoPeopleLookHere@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            0
            ·
            edit-2
            19 days ago

            Is it even getting misused? Spreading knowledge via machine translation where there are no human translators available, had to be better than not translating. As long as there is transparency so people can judge the results

            Assumes the AI is accurate, which is debatable

            Also how do you do citations on a translation?

            Its an interpretation, not a fact

            • AA5B@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              19 days ago

              Sure there are limitations. The point still stands: an imperfect machine translation is better than no translation, as long as people understand it is.

              Can we afford to allow a high bad deprive people of knowledge just because of the language they speak?

              The article complains about the affect on languages of poor machine translations, but the affect of no translations is worse. Yes those Greenlanders should be able to read all of Wikipedia without learning English and even if the project has no human translators

              • Euphoma@lemmy.ml
                link
                fedilink
                English
                arrow-up
                0
                ·
                19 days ago

                Wikipedia already has a button where you can go to another language’s version of that page where you can then machine translate it yourself.

                  • chloroken@lemmy.ml
                    link
                    fedilink
                    English
                    arrow-up
                    0
                    arrow-down
                    1
                    ·
                    18 days ago

                    Chauvinism is the term you’re seeking. And we all in the West suffer some degree of it.