Re-posted to fix my filename emoji. You can’t make this shit up

  • NeatNit@discuss.tchncs.de
    link
    fedilink
    arrow-up
    70
    ·
    edit-2
    21 days ago

    You complain about ASCII filenames but a few of the examples are obviously Unicode, namely using emoji, well outside of the ASCII character set. But since you’ve brought up Unicode file names, let me introduce you to bidirectional text!

    If you use Hebrew or Arabic, some of your directories or files will have right-to-left text in them. This is a recipe for disaster.

    If in English you’d have “C:\Users\Adam\Documents\Research\Paper.pdf”, which breaks down to:

    1. C:\
    2. Users\
    3. Adam\
    4. Documents\
    5. Research\
    6. Paper.pdf

    In Hebrew you’d have: “C:\משתמשים\אדם\מסמכים\מחקר\מאמר.pdf”, which breaks down to:

    1. C:\
    2. משתמשים\
    3. אדם\
    4. מסמכים\
    5. מחקר\
    6. מאמר.pdf

    The entire path goes backwards, and the “.pdf” extension is visually attached to the “Users” folder if the text is rendered naively. It’s insane. Fortunately many GUI shells nowadays separate each path item so they can’t get intermixed like this. Example:

    But still, if you copy a path into plaintext, it will still visually look wrong, and there is literally nothing that anyone can do about it. This is the correct way to render this text.

    Exact same issues occur in Arabic and the few other RTL languages usedin the world. It’s a massive pain.

    Edit: oh, and on commandline on Windows, the required characters aren’t even available by default so you get this lovely thing

    • optional@sh.itjust.works
      link
      fedilink
      arrow-up
      31
      ·
      21 days ago

      Why not use

      ꟻbq.משתמשים/אדם/מסמכים/מחקר/מאמר/:ↄ

      instead? If you want to write from right to left, you should go all the way.

    • lemming741@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      1
      ·
      edit-2
      21 days ago

      Excuse me, officer- this guy right here

      this meme was fueled by sleep deprivation, alcohol, and caffeine. any views implied or expressed are not to be taken seriously and may result in side effects such as nausea and vomiting

    • Cethin@lemmy.zip
      link
      fedilink
      English
      arrow-up
      3
      ·
      20 days ago

      The way this should work is it’s set as either left-to-right or right-to-left. (C:)/1/2/3.ext or ext.3/2/1/(:C). It shouldn’t render part of it one direction and part of it the other direction logically. It’s probably impossible to fix at this point, but this makes a lot more sense.

      • NeatNit@discuss.tchncs.de
        link
        fedilink
        arrow-up
        8
        ·
        edit-2
        20 days ago

        Yeah, that is pretty much how it works in some GUIs like in the screenshot, where each slash is replaced by >. But if you represent the path in a string, and put that string in some context that doesn’t know it’s a path and that it should be rendered by some special rules, then it’ll just be subject to the usual Unicode Bidirectional Algorithm (UBA).

        The UBA is a masterpiece, and I’m not being sarcastic. For everyday text with mixed directionality, such as a WhatsApp chat in Arabic/Hebrew with a bit of English or just some numbers mixed in, the UBA’s default output is the ideal way to order the characters.

        The problem is, special cases (such as file paths) just can’t be covered by a universal algorithm. You can insert special characters into the path, namely FSI and PDI (“First Strong directional Isolate” and “Pop Directional Isolate”) to make the text render the way you want under the UBA… But then, when you copy that path, the special characters would still be there so software would consider them part of the path, and then of course, File Not Found.

        • AnarchistArtificer@slrpnk.net
          link
          fedilink
          English
          arrow-up
          3
          ·
          20 days ago

          I was already interested based on your first comment but this:

          “The UBA is a masterpiece, and I’m not being sarcastic.”

          has thoroughly piqued my interest. Thank you for being an opinionated nerd on the internet.

  • rtxn@lemmy.world
    link
    fedilink
    arrow-up
    43
    ·
    21 days ago

    Fun fact: C:\: is a perfectly valid NTFS path. Windows won’t let you create it, though, because Windows doesn’t even fully support the NTFS specification. That’s why you have to specify the windows_names option when mounting an NTFS filesystem on Linux.

  • DarkAri@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    7
    ·
    21 days ago

    I guess the most annoying part of it to me is that you have put your locations in quotes if you use them in a shell. I do use spaces for file names sometimes, except when writing code or something, then I use underscores.

  • S_H_K@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    3
    ·
    21 days ago

    I remember hiding whacky wheels for DOS using ascii characters in the folder name. The professor tried to delete it using win 3.1… Hahahahaha… Good luck with that!

  • rdri@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    19 days ago

    I hate software that doesn’t support Unicode, and it’s also not difficult to implement. At one point I wrote a dll that hacked a way how one app was handling filenames, to force it to use CreateFileW instead of CreateFileA. Just that allowed it to support Unicode filenames basically.