Skip to content

Fix #17: shortest unique signature for current function (with xref fallback)#26

Merged
mahmoudimus merged 8 commits into
mainfrom
fix/issue-17-shortest-across-function
May 27, 2026
Merged

Fix #17: shortest unique signature for current function (with xref fallback)#26
mahmoudimus merged 8 commits into
mainfrom
fix/issue-17-shortest-across-function

Conversation

@mahmoudimus

@mahmoudimus mahmoudimus commented May 19, 2026

Copy link
Copy Markdown
Owner

Closes #17.

Background

Per issue #17, the existing Create unique signature for current code address grows from wherever the cursor sits. That works for "I want a hook right here" use cases, but it is a poor fit for the much more common case of "give me a stable signature for this function so I can find it again after a binary update":

  • The cursor position is arbitrary. The shortest unique signature for the function might start somewhere else entirely.
  • If the function body is not unique (small thunks, common stubs, library inlines), the current action just fails.

This PR adds a new form action that addresses both shortcomings.

Algorithm note

The minimal-function search iterates every instruction in the function as a possible start point, growing a signature from each until unique. The inner search is bounded by max_single_signature_length AND by the current best candidate's size, so as soon as a small candidate exists every subsequent start point gets pruned aggressively. An "ideal candidate" (size <= 5 bytes and zero wildcards) ends the outer loop early. Degenerate sigs (< 5 bytes) are rejected even when unique, since those are almost always a lone CALL with all-wildcard operand bytes that match thousands of places by accident.

When no body-internal unique sig exists, the orchestrator automatically falls back to XrefFinder.find_xrefs(pfn.start_ea, config), which generates a unique signature rooted at each caller of the function and returns the best. The xref fallback path was already in the codebase; I just wire it up as the automatic next step.

The candidate ranking changed too. GeneratedSignature.__lt__ now compares (size, wildcards) ascending rather than just size. Same-length sigs with fewer wildcards rank first. This is a strict improvement and the existing XREF action picks it up for free.

Wildcard policy: stop wildcarding x86 immediates

While reviewing this work I realized the function-search algorithm could not pick up obvious candidates like mov rcx, 0x13371338 even when that immediate was unique across the binary, because WildcardPolicy.for_x86() included BaseKind.IMM in its wildcardable set. An x86 immediate is a literal value baked into the instruction encoding; it does not shift between binary builds, so wildcarding it only removes bytes that would have made the signature unique. MEM, FAR, and NEAR still get wildcarded because those operands DO encode addresses that move between builds.

This change improves signature quality for every action with wildcard_operands=True, not just the new FIND_FUNCTION_SIG. It is what lets the function-search algorithm pick up short, distinctive constants as unique signatures.

The change cannot be unit-tested under the existing harness because the mocked idaapi collapses every o_* constant to a single int, aliasing every BaseKind enum member onto one value. The behavior is exercised by the integration tests against the real IDA test binary.

Net behavior

  • New action. "Find shortest unique signature for current function" radio on the main form. Existing CREATE_UNIQUE / FIND_XREF / COPY_RANGE / SEARCH actions are byte-identical to current main.
  • Inside a function with a body-internal unique sequence. Prints Function signature (offset +0x10 into function 0x140001040): followed by the existing Signature for 0x140001050: <bytes> line and copies the bytes to the clipboard.
  • Inside a function with no body-internal unique sequence but with xrefs. Prints No unique signature inside function 0x...; trying xref signatures..., then Xref signature into 0x... (from 0x...): followed by the existing display line.
  • Cursor outside any function. Prints Place cursor inside a function first. and exits cleanly.
  • Cancel mid-search. Prints Operation cancelled by user, no traceback.

Verification

Suite Before After
Host unit (tests/unit_test_sigmaker.py) 118 OK 130 OK (+12 new)
Docker idapro-tests (9.0/9.1) 134 OK 147 OK (+12 unit, +1 integration)
Docker idapro-tests-9.2 134 OK 147 OK

Zero regressions. New test coverage:

  • TestGeneratedSignatureOrdering: smaller-size beats larger; equal-size, fewer-wildcards beats more; equal-size-equal-wildcards is not strictly less; _wildcard_count helper works.
  • TestActionEnumAddsFunctionSig: FIND_FUNCTION_SIG == 4; existing enum values unchanged.
  • TestMinimalFunctionSignatureGenerator: returns the shortest-unique candidate; prune caps the inner search by current best; ideal-candidate early exit fires; raises Unexpected when no candidate exists; rejects degenerate sigs under 5 bytes; MIN_USEFUL_SIG_BYTES == 5.
  • test_minimal_function_signature_against_real_function (integration): end-to-end against the compiled test binary; asserts the returned sig parses, matches exactly one place, and that the match falls inside the original function body. Skips cleanly when the chosen function has no body-internal unique sig.

Adds _wildcard_count helper and extends __lt__ so equal-length
signatures rank by wildcard count next. XrefFinder picks this up
automatically since it sorts via the same __lt__, which is a strict
improvement (less-wildcarded sigs are more specific).
Wires up the action constant ahead of the radio button and dispatch
branch. Existing values (0..3) are unchanged.
…function

Iterates every instruction in the given function as a possible start
point. Each inner search is bounded by the current best candidate's
size, so as soon as a small candidate exists subsequent starts get
pruned aggressively. An ideal candidate (size <= 5 bytes, 0 wildcards)
ends the outer loop early. Degenerate short sigs (< 5 bytes) are
rejected even when unique. Raises Unexpected if no candidate exists.
Adds a 5th radio button to the main form: 'Find shortest unique
signature for current function'. Wires SigMakerPlugin.run to invoke
MinimalFunctionSignatureGenerator on the containing function of the
cursor; on Unexpected (no body-internal unique sig) falls back to
XrefFinder.find_xrefs and prints the best xref candidate with a
'Xref signature into X (from Y):' annotation. UserCanceledError
bubbles up to the existing handler for a clean cancel.
End-to-end against the compiled test binary inside the IDA container.
Asserts the generator returns a unique signature, that its IDA-format
text actually matches exactly one place, and that the match falls
inside the original function body. Skips cleanly if the chosen
function happens to have no body-internal unique sig.
An x86 immediate like the 0x13371338 in 'mov rcx, 0x13371338' is a
literal value baked into the instruction encoding. It does not shift
between binary builds, so wildcarding it just removes bytes that
would otherwise have made the signature unique. MEM/FAR/NEAR still
get wildcarded because those operands DO encode addresses that move
between builds.

This improves signature quality for every action that has
wildcard_operands=True, not just the new FIND_FUNCTION_SIG action
introduced earlier in this branch. It is also the difference that
makes the function-search algorithm pick up short, distinctive
constants like the example above as unique signatures.

The change cannot be unit-tested under the existing harness because
the MagicMock idaapi collapses all o_* operand-type constants to a
single int, aliasing every BaseKind enum member onto one value. The
behavior is exercised by the integration tests against the real IDA
test binary.
Targets the exact scenario from the reddit feedback that motivated
the WildcardPolicy.for_x86 change: a 64-bit `mov rax, imm`
instruction whose immediate must survive the wildcard_operands=True
path. Finds the 10-byte encoding in the compiled test binary, runs
SignatureMaker.make_signature with wildcard_operands=True and
wildcard_optimized=True (the form defaults), and asserts the
resulting signature contains the literal immediate bytes 0x28 and
0x15 as non-wildcard bytes.

Without the for_x86() fix this test fails (immediate bytes blanked
to ??); with it the test passes (immediate stays concrete).
Brings in PR #24 (issue #18 cancel), PR #25 (issue #22 partial on
cancel), and PR #28 (1.7.0 release). The auto-merge handled
src/sigmaker/__init__.py cleanly; tests/unit_test_sigmaker.py had
overlapping test-class insertions at the same anchor, resolved by
taking main's test file as the base and appending PR #26's three
new test classes (TestMinimalFunctionSignatureGenerator,
TestActionEnumAddsFunctionSig, TestGeneratedSignatureOrdering)
before the if __name__ block.
@mahmoudimus mahmoudimus merged commit 446d7cd into main May 27, 2026
2 checks passed
@mahmoudimus mahmoudimus deleted the fix/issue-17-shortest-across-function branch May 27, 2026 16:04
mahmoudimus added a commit that referenced this pull request May 27, 2026
* docs(changelog): add [1.7.1] entry for #26 (issue #17)

Documents the new 'Find shortest unique signature for current
function' action with xref fallback, the WildcardPolicy.for_x86()
change that preserves x86 immediates, and the new (size, wildcards)
ranking on GeneratedSignature.__lt__.

* docs(readme): clarify acknowledgements

Make explicit that the initial port drew from @A200K's
IDA-Pro-SigMaker, and that @kweatherman's sigmakerex is independent
prior work within the SigMaker ecosystem. Members of the community
later requested compatibility and feature parity with parts of
sigmakerex's functionality (see #17), so the link is worth flagging
directly. The long-form credits chain from sigmakerex's README is
preserved verbatim below the paragraph.

* chore: bump version to 1.7.1

Cuts the 1.7.1 release covering the shortest-unique-signature-for-
current-function action with xref fallback (#26 / #17), the
WildcardPolicy.for_x86 change that preserves x86 immediates, and
the (size, wildcards) ranking on GeneratedSignature.__lt__. See
CHANGELOG.md for the full set of changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

“shortest unique signature across function”/“use xref sigs if function is jot unqiue”

1 participant