Skip to content

Commit 1aa8f1a

Browse files
cb341claude
andcommitted
08.06.2026
08.06.2026 draft2 Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> 08.06.2026 draft3 Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> 08.06.2026 draft4 Formatting Polish Polish Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent f72117c commit 1aa8f1a

1 file changed

Lines changed: 219 additions & 0 deletions

File tree

pages/threads.md

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,225 @@ math: true
99

1010
Conversations, thoughts, half-ideas, things I am starting to explore.
1111

12+
## 08.06.2026
13+
14+
while studying DNS for exam prep, i ended up browsing the list of all top-level domains:
15+
16+
<https://www.iana.org/domains/root/db>
17+
18+
a few things stood out.
19+
first, some testing-related entries:
20+
21+
- `.test` is not listed
22+
- `.invalid` is not listed
23+
24+
because they are reserved and not delegated, they can never resolve in the public DNS. [RFC 6761](https://datatracker.ietf.org/doc/html/rfc6761) marks them special-use: [`.test` (6.2)](https://datatracker.ietf.org/doc/html/rfc6761#section-6.2) and [`.invalid` (6.4)](https://datatracker.ietf.org/doc/html/rfc6761#section-6.4).
25+
then some amusing generic TLDs:
26+
- `.foo` - generic, Charleston Road Registry Inc.
27+
- `.bar` - generic, Punto 2012 Sociedad Anonima Promotora de Inversion de Capital Variable
28+
29+
and then:
30+
- `.app` - generic, Charleston Road Registry Inc. (Google's registry entity)
31+
- `.mov` - generic, Charleston Road Registry Inc. (Google's registry entity)
32+
- `.zip` - generic, Charleston Road Registry Inc. (Google's registry entity)
33+
34+
`.app` feels normal enough.
35+
`.mov` and `.zip` immediately felt odd.
36+
37+
they are file extensions 1st. domains 2nd.
38+
39+
historically:
40+
41+
```text
42+
invoice.zip -> file
43+
```
44+
45+
now:
46+
47+
```text
48+
invoice.zip -> file
49+
invoice.zip -> domain
50+
```
51+
52+
same string.
53+
two meanings.
54+
55+
there is even a recent paper dedicated to this:
56+
> "The namespace for filenames and DNS names has overlapped since the introduction of DNS in 1985: .com was the original binary format used for DOS and CP/M systems. Recently the introduction of gTLDs such as .zip and .mov, coupled with the growing prevalence of web resources, has ignited new concerns about potential issues related to DNS and filename confusion."
57+
>
58+
> <https://arxiv.org/abs/2604.04805>
59+
60+
another quote:
61+
> "document[.]zip ... could refer to either a compressed archive file ... or a website"
62+
63+
and:
64+
65+
> "A potent scenario for confusion occurs when filenames are automatically hyperlinked"
66+
67+
the paper does **not** claim that `.zip` broke the Internet.
68+
it does show that the confusion is observable across software ecosystems.
69+
70+
the authors write:
71+
72+
> "harms primarily arise when filenames are mistakenly interpreted as DNS names"
73+
74+
and:
75+
76+
> "the DNS query itself leaks information about local filenames"
77+
78+
the technical argument is simple:
79+
80+
**DNS does not care.**
81+
**humans do.**
82+
83+
DNS sees valid labels:
84+
85+
```text
86+
.com
87+
.zip
88+
.mov
89+
.app
90+
```
91+
92+
while humans have spent decades learning that `something.zip` is a compressed archive. the protocol and the user now disagree.
93+
94+
some examples of abuse:
95+
96+
<https://www.fortinet.com/blog/industry-trends/threat-actors-add-zip-domains-to-phishing-arsenals>
97+
98+
---
99+
100+
another thing i noticed is the number of non-ASCII TLDs:
101+
102+
* `.公司` - generic, China Internet Network Information Center (CNNIC)
103+
* `.联通` - generic, China United Network Communications Corporation Limited
104+
* `.vermögensberatung` - generic, Deutsche Vermögensberatung Aktiengesellschaft DVAG
105+
* `.இந்தியா` - country code National Internet Exchange of India
106+
* `.москва` - generic Foundation for Assistance for Internet Technologies and Infrastructure Development (FAITID)
107+
108+
i like this.
109+
110+
the Internet should not be limited to ASCII.
111+
but it introduces another class of ambiguity.
112+
my first question was:
113+
114+
> can Unicode only be used in the TLD?
115+
116+
no.
117+
118+
it can appear anywhere in an internationalized domain name.
119+
this leads directly to homograph attacks.
120+
121+
example:
122+
123+
```text
124+
paypal.com
125+
```
126+
127+
leading with [`U+0070 LATIN SMALL LETTER P`](https://unicode-explorer.com/c/0070).
128+
129+
versus:
130+
131+
```text
132+
рaypal.com
133+
```
134+
135+
leading with [`U+0440 CYRILLIC SMALL LETTER ER`](https://unicode-explorer.com/c/0440).
136+
137+
they look almost identical.
138+
only different code points.
139+
140+
the Unicode Consortium uses essentially this example:
141+
142+
> Suppose that you get an email notifying you that your paypal.com account has a problem. You, being a security-savvy user, realize that it might be a spoof ... But actually it is going to a spoof site that has a fake "paypal.com", using the Cyrillic letter that looks precisely like a 'p'. You use the site without suspecting, and your password ends up compromised.
143+
>
144+
> <https://www.unicode.org/L2/L2005/05110-tr36-draft3/tr36-3r.html>
145+
146+
same pattern as `.zip`.
147+
148+
the computer sees one thing.
149+
the human sees another.
150+
151+
that difference is what enables IDN homograph attacks.
152+
153+
**how do browsers mitigate this?**
154+
155+
when a label trips its IDN filter, show the Punycode (`xn--...`) instead of the deceptive Unicode ([example: Chromium](https://chromium.googlesource.com/chromium/src/+/main/docs/idn.md)).
156+
157+
two cases:
158+
159+
1. **mixed-script:** `рaypal` (one Cyrillic `р`)
160+
trips the filter → shown as `xn--aypal-uye.com` → caught.
161+
2. **whole-script:** `аррӏе` (all Cyrillic)
162+
slips past it → stays Cyrillic, reads as `apple.com` → missed.
163+
164+
the first is the example i used above. the second is the dangerous one, still live as Xudong Zheng's 2017 proof-of-concept.
165+
166+
the link below reads `apple.com`. it does not go there:
167+
168+
[аррӏе.com](https://www.xn--80ak6aa92e.com/) ([writeup](https://www.xudongz.com/blog/2017/idn-phishing/))
169+
170+
a modern browser shows the Punycode now, because the filter learned to flag lookalikes of *known* top domains. in 2017 it showed `apple.com`. a confusable of a less famous domain still slips through.
171+
172+
inspect any conversion yourself: <https://www.punycoder.com/>
173+
174+
---
175+
176+
while reading about `.zip`, i kept noticing the same tradeoff:
177+
178+
> small local benefit. large distributed cost.
179+
180+
more examples:
181+
182+
- **Office macros: automation ↔ malware delivery**
183+
Microsoft now blocks internet-sourced macros by default, citing their routine abuse for ransomware and remote-access trojans.
184+
<https://techcommunity.microsoft.com/blog/microsoft_365blog/helping-users-stay-safe-blocking-internet-macros-by-default-in-office/3071805>
185+
186+
- **SMS 2FA: easy deployment ↔ SIM swapping**
187+
all five major US carriers used authentication an attacker could trivially subvert (Lee, Kaiser, Mayer, Narayanan, SOUPS 2020). NIST 800-63B restricts SMS as an authenticator.
188+
<https://www.usenix.org/conference/soups2020/presentation/lee>
189+
<https://pages.nist.gov/800-63-4/sp800-63b.html>
190+
191+
- **MIME sniffing: compatibility ↔ content-sniffing XSS**
192+
a browser that guesses a file's type can be tricked into running an uploaded "image" as HTML (Barth, Caballero, Song, IEEE S&P 2009). the WHATWG spec it informs is the standard.
193+
<https://www.adambarth.com/papers/2009/barth-caballero-song.pdf>
194+
<https://mimesniff.spec.whatwg.org/>
195+
196+
- **email tracking: rich HTML ↔ surveillance on open**
197+
85% of bulk emails embed third-party content; ~29% leak your address to a third party the moment you view them (Englehardt, Han, Narayanan, PoPETs 2018).
198+
<https://petsymposium.org/popets/2018/popets-2018-0006.php>
199+
200+
different technologies.
201+
202+
same shape.
203+
204+
the feature owner gets the benefit.
205+
everyone else inherits the complexity.
206+
207+
---
208+
209+
for `.zip`, i still struggle to see the upside.
210+
211+
most arguments eventually reduce to branding.
212+
213+
the downside is permanent ambiguity in a namespace that billions of people already associated with files.
214+
215+
maybe the cost is insignificant.
216+
maybe nobody notices.
217+
218+
but if someone proposed:
219+
220+
```text
221+
.pdf
222+
.docx
223+
.exe
224+
.jpg
225+
```
226+
227+
as top-level domains today, most engineers would (hopefully) immediately see the problem.
228+
229+
`.zip` and `.mov` feel different mostly because they already (unfortunately) exist.
230+
12231
## 02.06.2026
13232

14233
exam prep:

0 commit comments

Comments
 (0)