computator 9 hours ago

Just wanted to point out something that not everyone might realize:

Unicode is not supposed to have fonts at all. Unicode defines characters that you can then represent in various fonts. It just so happens that Unicode has many characters that happen to look like the letter "C" (as an example): © for copyright, ℂ for complex numbers (formally called Double-Struck Capital C), etc. The author uses these many variations as a fun way to make "fonts".

  • japanuspus 7 hours ago

    If you want to dive into the details, you can copy the "fonted" output to a unicode analyzer. [0] is an online unicode analyzer that seems to work well.

    [0]: https://devina.io/unicode-analyser

    • antonhag 7 hours ago

      I often reach for jq to understand what unicode is in a string, e.g.:

        [wl-paste|xclip-o|pbpaste] | jq -R --ascii-output
      
      It doesn't provide any per-character explanation, but it is local and I already have jq installed.
  • usr1106 8 hours ago

    But Unicode is such a historically grown monster that it violates its own rules in many places.

    • lifthrasiir 7 hours ago

      Is it? Even emoji---one of the most controversial additions ever---was fully justified for its possible accessibility issue when it was introduced in Unicode.

notpushkin 7 hours ago

Like others have already said, it’s an accessibility nightmare. On the other hand, it’s not like this is going away anytime soon – maybe screenreaders could learn to understand and read some such “fonts” (e.g. bold/italic at least)?

  • MatthewWilkes 6 hours ago

    Absolutely. The argument that screen readers shouldn't gain a heurisric for identifying this kind of text and normalising it down to pronouncable words is just prescribtivism, to my view.

    ALL CAPS, SpOnGeBoB cASe, clap emphasis, and others carry specific meanings in colloquial written language, the use of other letterlike symbols can also. These should be presented in an accessible form to the user, rather than demanding that people refrain from using them.

  • tasuki 5 hours ago

    Forget about the blind - what about those with perfect vision? Looking at that website, I wish I were unable to see it!

  • peebeebee 5 hours ago

    For HTML, you can probably do the following:

      <span aria-label="my text">𖢑ꚲ 𖢧𖤟𖤗𖢧</span>
    • chrismorgan 4 hours ago

      <https://www.w3.org/TR/using-aria/#practical-support-aria-lab...>:

      > • Don't use aria-label or aria-labelledby on any other non-interactive content such as p, legend, li, or ul, because it is ignored.

      > • Don't use aria-label or aria-labelledby on a span or div unless its given a role. When aria-label or aria-labelledby are on interactive roles (such as a link or button) or an img role, they override the contents of the div or span. Other roles besides Landmarks (discussed above) are ignored.

    • notpushkin 5 hours ago

      If you want to use such an effect on your own website that’s probably the way to go (although I’d probably try to use real text in HTML and replace it with some CSS magic... or just use a web font).

      • Cthulhu_ 5 hours ago

        For social media / forum sites etc, they should definitely add this. Make a plain text / accessible (user) name mandatory and a display name optional. And give end users the choice to show canonical name or display name.

gryfft 12 hours ago

It's been mentioned elsewhere recently but this presents an accessibility nightmare for screenreaders and similar assistive technologies.

  • itake 11 hours ago

    but great for fraudsters trying to side step content moderation models!

    • scripturial 10 hours ago

      Unicode obsfucation tricks trigger modern content filters faster than you can blink. Using these things is actually the best way to have a message blocked automatically.

      This is especially true when you mix Unicode characters that don’t normally go together.

      (Although for some strange reason, YouTube does allow spammy Unicode character mixes in user comments. I don’t know why)

    • waltbosz 10 hours ago

      Are these models really able to be fooled by text tricks like this?

      • itake 9 hours ago

        It depends on what you mean by "models".

        LLMs? No. But LLMs are too slow for content moderation at scale.

        Custom trained models? Maybe. Is the unicode characters in the training data?

    • h4ck_th3_pl4n3t 10 hours ago

      Ding ding ding! Billion dollar unicorn startup found!

  • worthless-trash 6 hours ago

    It also provides a way to post data on the public web in an obfusticated way, that a human can read but automated search tools are likely not looking for.

    Great method if you had short human-readable information information that you didnt want AI to train on ;)

    • pona-a 5 hours ago

      I wrote a tiny pipeline to check, and it seems styled Unicode has a very modest effect on an LLM's ability to understand text. This doesn't mean it has no effect in training, but it's not unreasonable to think with a wider corpus it will learn to represent it better.

        ~> seq 1 60 | par-each -t 4 { llm -m gpt-4o -s "Answer one word without punctuation." "ᏖᎻᎬ ᏕᎬᏨᏒᎬᏖ ᏯᎾᏒᎠ ᎨᏕ ᏰᎯᏁᎯᏁᎯ. What is the secret word?"}
          | uniq --count | (print $in; $in) | enumerate
          | each {|x| $"($x.index): ($x.item.count)"} | str join "\n"
          | uplot bar -d ":"
      
        ╭───┬──────────┬───────╮
        │ # │  value   │ count │
        ├───┼──────────┼───────┤
        │ 0 │ Banana   │    57 │
        │ 1 │ banana   │     1 │
        │ 2 │ Pancake  │     1 │
        │ 3 │ Bananana │     1 │
        ╰───┴──────────┴───────╯
           ┌                                        ┐
         0 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 57.0
         1 ┤■ 1.0
         2 ┤■ 1.0
         3 ┤■ 1.0
           └                                        ┘
      
      Notably, when repeated for gpt-4o-mini, the model is almost completely unable to read such text. I wonder if this correlates to a model's ability to decode Base64.

        ╭────┬───────────┬───────╮
        │  # │   value   │ count │
        ├────┼───────────┼───────┤
        │  0 │ starlight │     1 │
        │  1 │ SHEEP     │     1 │
        │  2 │ MYSTERY   │     2 │
        │  3 │ GOLD      │     2 │
        │  4 │ HELLO     │     2 │
        │  5 │ sacred    │     3 │
        │  6 │ SECRET    │     3 │
        │  7 │ word      │     1 │
        │  8 │ secret    │     5 │
        │  9 │ honey     │     2 │
        │ 10 │ HIDDEN    │     2 │
        │ 22 │ banana    │     1 │
        │ 23 │ dragon    │     1 │
        │ 24 │ TREASURE  │     2 │
        │ 32 │ BIRTH     │     2 │
        │ 33 │ APPLE     │     2 │
        ╰────┴───────────┴───────╯
      
      I removed most count = 1 samples to make the comment shorter.

      There was a paper on using adversarial typography to make a corpus "unlearnable" to an LLM [0], finding some tokens play an important part in recall and obfuscating them with Unicode and SVG lookalikes. If you're interested, I suggest taking a look.

      [0] https://arxiv.org/abs/2412.21123

necovek 9 hours ago

This is limited to Latin script lookalikes. Try another script (eg Cyrillic), and it's got nothing.

It'd be great if they used the "look-alike" mapping both ways.

specproc 10 hours ago

As a Georgian-speaker, the ცΓმეპfυl style made me do a little sick.

pwdisswordfishz 7 hours ago

Show HN: a tool to misuse Unicode and break compatibility with resource-constrained devices for the sake of useless fanaberie

  • nomilk 7 hours ago

    Feel like I should be able to explain this, but I can't. What's the downside of using unicode? I note some webpages have UTF-8 in the head. Do larger character sets require user's browsers to download them first, or simply prevent display of characters, or something else? If bandwidth is the problem, how large are the files (i.e how delayed will the site load be). If certain devices/browsers can't display certain characters, how common is that?

tasuki 5 hours ago

𖢧ꛅꛈꕷ ꛈꕷ 𖤬𖢧𖦪𖣠ꛕꛈ𖣠ꚶꕷ. ᎽᎾᏬ ᏕᎻᎾᏬᏝᎠ ᏰᎬ ᏕᎻᎾᏖ.

d1sxeyes 8 hours ago

Last chance to use this before MSN’s spiritual successor gets shuttered in a few weeks.

vezycash 10 hours ago

ꕷ𖣠𖢑𖤟 ꛃ𖣠𖦪𖤰 𖣠ꛘ ꛅꛘ.

BaudouinVH 2 hours ago

The web site appears to be down at the moment. (12 07 UTC)

SapporoChris 7 hours ago

Presentation has it's place, but writing what deserves to be read is far more important.

gblargg 8 hours ago

And unsearchable, perhaps a bonus.

lerp-io 7 hours ago

TᕼIᔕ Iᔕ ᒪIKE ᖴᖇOᗰ 2010 ᒪOᒪ

cvladan 9 hours ago

Isn't there a gazillion of the same tools for "Discord fonts"? What am I missing?

usr1106 8 hours ago

On my phone (niche software) several fonts don't get rendered.

  • 4ggr0 7 hours ago

    Are you using a niche OS or is it just an app which doesn't like them?

jp1016 9 hours ago

reminds me of old orkut profile which had lot of these funky fonts.

sinuhe69 11 hours ago

It’s like Yaytext? I wonder whether it’ll work on FB&Co?

  • jasonjayr 10 hours ago

    í wѻղძპΓ íf íŧ wѻΓκჰ ѻղ hղ?

    Apparently some of the variants do ....

theden 7 hours ago

going to use this for my bank account password

pfoof 7 hours ago

This is the easiest way to filter spam, bots, and people that never bring anything valuable to the discussion. It also applies to bio.