Speed up Find-shortest-function-signature (110M-iteration is_unique fix + buffer cache)#31
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
A real-world
Find shortest function signaturerun (issue #17 / PR #26) on a large database took 7.7 minutes for a single function. I profiled it in IDA with cProfile and foundis_uniquewas enumerating every match of each candidate signature: 110+ millionbin_searchiterations, withidaapi.user_cancelled()polling alone costing 135 seconds (29% of the run). The short signatures produced during growth match hundreds of thousands of locations, and the old code counted all of them just to answer "is this unique?".What I fixed
is_uniquebails at the second match. Uniqueness only depends on whether the count is 0, 1, or 2+, so there is no reason to enumerate every match. This is the dominant win: 110M iterations collapse to a few thousand.count_matchesstill enumerates fully, so the issue-[Feature Request] Cancelling search for any reason should return partial, even if incomplete signature instead of nothing, even if it isn't unique #22 partial-on-cancel count is unchanged.generate(), not once peris_unique. Profiling showedInMemoryBuffer.load(SEGMENTS)was being re-run on every uniqueness check (84% of wall time after the first fix). I thread an optional cached buffer throughis_unique/count_matches/find_all/_find_all_simd.MinimalFunctionSignatureGeneratornow pre-decodes the whole function into a small list up front (_DecodedInstruction+_decode_function_for_anchors) and grows anchors over cached data, instead of re-walking and re-reading bytes for every anchor._speedupsextension loads in source/symlink layouts. When the plugin is loaded from a source tree while a pip-installedsigmakernamespace package without a matching compiled extension shadows it, the package-level import misses. I added a fallback that loads the_speedupsextension sitting next to__init__.py, soSIMD_SPEEDUP_AVAILABLEis true in those layouts (and the shipped single-file plugin is unaffected, since it has no sibling_speedups).I also added
start_profiling/stop_profilinghelpers, exposed asEdit/Pluginsmenu actions, so this kind of slowdown can be diagnosed in-IDA without a sys.path dance.Changes
_DecodedInstructiondataclass_decode_function_for_anchorsbenchmark_predecode_function_sigNet behavior
Same function, measured on the test binary's largest function, before and after:
On the real-world database that motivated this, the 7.7-minute run now completes effectively instantly. Signatures produced are byte-identical; only the work to reach them changed.
Verification
idapro-tests(9.0/9.1)idapro-tests-9.2Zero regressions on either image.
Notes
SignatureMaker.make_signature,XrefFinder.find_xrefs,SigMakerConfigkwargs, theSignatureformat specs) is unchanged. New parameters are optional with behavior-preserving defaults.CREATE_UNIQUE(UniqueSignatureGenerator) still usescount_matchesfor its real-count display, so it does not get the early-bail. If it proves slow on very large databases, that is a separate follow-up.