Skip to content

Hash hosts#15

Merged
attilaolah merged 2 commits intomasterfrom
hash
Jan 25, 2015
Merged

Hash hosts#15
attilaolah merged 2 commits intomasterfrom
hash

Conversation

@attilaolah
Copy link
Copy Markdown
Contributor

Instead of keeping a map of hosts, we can hash them and keep the hash sums to save space. Hashes are arrays, which means a fixed size, which is easier to manage and behaves more predictably wrt memory consumption.

After some profiling with the hosts file from http://hosts-file.net/download/hosts.txt, I noticed the following:

Memory string [Mb] [8]byte [Mb] improvement
Alloc 107.87 60.56 43.86%
TotalAlloc 215.78 209.75 2.79%
Sys 131.24 67.86 48.29%

The runtime for parsing the entire host file did not change noticeably (it stayed around 3.4 seconds).

So far the only downside of this approach seems to be that there is no way to list blocked hosts in the web UI. Because of that, we can only use this approach for the public hosts files (which are the huge ones anyway, private hosts files will likely be much smaller).

Closes #9, for now (we could still investigate that later).

@attilaolah attilaolah added this to the Gopher Gala milestone Jan 25, 2015
@attilaolah
Copy link
Copy Markdown
Contributor Author

Note that I'm halving the MD5 sum here, reducing it to 8 bytes (64) bits instead of 16 bytes. This gives us a chance of collision of one in sqrt(2⁶⁴), which is roughly one in four billion (1/4294967296). A hosts file containing that many hosts would be more than a gigabyte.

@attilaolah attilaolah self-assigned this Jan 25, 2015
@attilaolah
Copy link
Copy Markdown
Contributor Author

Out of curiosity, I also tried using uint64 as the may key, which is also 8 bytes. It behaves exactly like [8]byte, both speed and memory wise.

I'll merge this, seems like nothing broke.

@attilaolah attilaolah merged commit fe1697f into master Jan 25, 2015
@attilaolah attilaolah deleted the hash branch January 25, 2015 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Blocklist Tree

1 participant