Skip to content

Speed up attribute lookup#651

Merged
philss merged 1 commit into
philss:mainfrom
preciz:speed_up_attribute_lookup_2
Feb 27, 2026
Merged

Speed up attribute lookup#651
philss merged 1 commit into
philss:mainfrom
preciz:speed_up_attribute_lookup_2

Conversation

@preciz

@preciz preciz commented Dec 18, 2025

Copy link
Copy Markdown
Contributor

Improvements:
exact attribute => ~5% faster, ~50% lower memory usage
attribute present => ~10% faster, ~40% lower memory usage
attribute includes => ~100% faster, ~60% lower memory usage

This change utilizes more built in functions instead of Enum and uses String.contains? to check for match before performing String.split.

  read_file = fn name ->
    __ENV__.file
    |> Path.dirname()
    |> Path.join(name)
    |> File.read!()
    |> Floki.parse_document!()
  end

  inputs = %{
    "big" => read_file.("big.html")
  }

  Benchee.run(
    %{
      "exact attribute" => fn doc -> Floki.find(doc, "[class='noprint']") end,
      "attribute present" => fn doc -> Floki.find(doc, "[title]") end,
      "attribute includes" => fn doc -> Floki.find(doc, "[class~='wikitable']") end
    },
    time: 10,
    inputs: inputs,
    memory_time: 2
  )

defp get_value(attr_name, attributes) do
Enum.find_value(attributes, "", fn
defp get_value(attr_name, attributes) when is_list(attributes) do
case List.keyfind(attributes, attr_name, 0) do

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of the improvements came from this change?

If is not that much, I would prefer to keep the Enum.find_value/3 just because it's easier to read and maintain.

@preciz preciz Dec 19, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran the benchmarks back and forth changing this part only and it's faster in all cases, the biggest win in speedup is that it's ~11% faster in the "exact_attribute" case but most significantly it halves the memory usage in the "exact_attribute" and in the "attribute_includes" cases.

I believe this library is used heavily by a lot of companies where every speedup has a huge effect on throughput.
That is the case for our company.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's fair. Thank you for the research and explanation!

@philss philss merged commit 26b91e8 into philss:main Feb 27, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants