Skip to content

Optimize InsertionContext.InsertText with SearchValues and 2-phase algo for tab expansion and newline normalization#574

Open
udlose wants to merge 17 commits intoAvaloniaUI:masterfrom
udlose:performance/insertion-context
Open

Optimize InsertionContext.InsertText with SearchValues and 2-phase algo for tab expansion and newline normalization#574
udlose wants to merge 17 commits intoAvaloniaUI:masterfrom
udlose:performance/insertion-context

Conversation

@udlose
Copy link
Copy Markdown
Contributor

@udlose udlose commented Apr 4, 2026

Note

I've attached the perf benchmark I used with the results. I found the optimal stackalloc size for the char[] passed to ValueStringBuilder to be 512 bytes. Benchmark results listed at end.

This pull request introduces several improvements and code quality updates across the codebase, focusing on performance, memory efficiency, code clarity, and modern C# idioms. The most significant change is a complete rewrite of the InsertionContext.InsertText method to use a high-performance, allocation-minimizing algorithm for snippet insertion. Additionally, the code now consistently uses GetValueOrDefault for dictionary lookups, improves resource management, and adds code analysis suppressions for clarity.

Performance and Memory Efficiency:

  • InsertionContext.InsertText has been rewritten to use a two-phase, SIMD-accelerated algorithm for tab expansion and newline normalization, leveraging ValueStringBuilder and SearchValues<char> to minimize allocations and improve speed. This also ensures correct handling of custom tab strings and CRLF boundaries.

Modern C# Idioms and Code Quality:

  • Dictionary lookups throughout the codebase now use GetValueOrDefault instead of TryGetValue(..., out ...) ? ... : null for improved clarity and conciseness. This applies to HighlightingManager, XmlHighlightingDefinition, and InsertionContext. [1] [2] [3] [4]
  • Default argument assignments have been updated to use the null-coalescing assignment operator (??=) for brevity and clarity in InsertionContext. [1] [2]

Resource Management and Reliability:

  • DataTransfer objects are now disposed using using statements in EditingCommandHandler to ensure proper resource cleanup. [1] [2]
  • Code analysis suppressions have been added to clarify ownership and disposal responsibilities for disposable objects in several classes, including TextMate, V2Loader, and TextEditor. [1] [2] [3]

Compatibility and Documentation:

  • Explicit use of Enumerable.Reverse is now enforced to avoid breaking changes with newer C# versions, with comments referencing the relevant compiler change.
  • Extensive documentation has been added to InsertionContext.InsertText explaining the new algorithm, its rationale, and its benefits.

Miscellaneous:

  • Minor formatting and copyright header fixes.
  • .runsettings has been added to the solution items for improved test configuration management.

Benchmark Results

Area Result
Memory Allocations Varied between parity and 3.62x less memory with 20 of 30 cases decreasing memory from 1.58x to 3.62x
Fast paths (Plain, TabsHeavy) Roughly 15x-21x faster than Original, depending on text length and tab mode.
Newline-heavy paths (LfHeavy, CrLfHeavy) Usually allocate about 2.0x-2.8x less memory than Original, but are often near parity or slower in runtime.
Mixed workloads Generally close to baseline, with modest wins in some Spaces4 cases and slight regressions in some KeepTabs cases.
Overall average across 30 cases Average allocated bytes drop from about 75.7 KB in Original to about 41.4 KB in both optimized variants, while average mean time stays in the same broad range and hides the split between huge fast-path wins and slower newline-heavy cases.

Benchmark File

inserttext_benchmark_comparison_uncapped_reference_style.html

InsertTextBenchmark.linq.txt

Test Coverage

image

udlose added 16 commits March 29, 2026 07:29
Replaced TryGetValue with GetValueOrDefault in several methods to simplify dictionary value retrieval. This change affects methods in HighlightingManager.cs, XmlHighlightingDefinition.cs, and InsertionContext.cs, improving code readability and conciseness.
Added static readonly SearchValues<char> fields for newline and tab/newline characters in InsertionContext. This enables faster, SIMD-accelerated character searching during snippet text insertion, improving performance and efficiency.
Refactored CompressingTreeListTests.cs to improve readability and align with modern C# conventions. Reformatted class and method braces, improved indentation, and replaced new string[0] with Array.Empty<string>(). No functional changes were made; all updates are stylistic.
Introduces InsertionContextTests.cs with extensive unit tests covering all major behaviors of the InsertionContext class, including constructor validation, InsertText logic (tabs, newlines, edge cases), active element management, state transitions, event firing, undo grouping, and regression scenarios. These tests ensure correctness and reliability for snippet insertion and editing features.
Introduces an internal static ReflectionTestHelper class in the test utilities. This class provides strongly-typed methods to invoke private and static methods, and to get or set private fields via reflection. It enables unit tests to access internal logic without exposing implementation details, improving test coverage and maintainability. Includes argument validation and clear exception messages.
Add comprehensive unit tests to InsertionContextTests.cs to verify that InsertText correctly normalizes embedded newlines in the Tab (indentation) string. Introduce helpers for custom indentation scenarios and ensure TestActiveElement always has a Segment. Improves coverage for edge cases involving indentation strings with newlines.
Refactored code to use C#'s using statement for DataTransfer, HeightTree, and StringWriter to ensure proper disposal of resources. Also marked BuildLongString as private static for better encapsulation and clarity.
Standardize copyright headers and improve code formatting. Move test classes to AvaloniaEdit.Tests namespaces and reformat for readability. Add SuppressMessage attributes for CA2213 and CA2000 with justifications. No functional changes; improves code style and maintainability.
Included the .runsettings file in the "Solution Items" folder of the AvaloniaEdit.slnx solution to support test configuration and management.
…net/runtime MIT source. The class is still internal currently.

Introduced ValueStringBuilder and related files to AvaloniaEdit.Utils, based on .NET runtime sources. This includes a stack-allocated, high-performance string builder (ValueStringBuilder), an extension for appending ISpanFormattable values, and a generic version for .NET 11+ supporting Utf8Char/Utf16Char. These utilities enable efficient, low-allocation string manipulation and formatting. No existing code was modified.
Added ValueStringBuilderTests.cs with a comprehensive suite of NUnit tests for the ValueStringBuilder class in AvaloniaEdit.Utils. Tests are adapted from .NET runtime's XUnit tests and cover constructors, append/insert operations, span handling, capacity management, disposal, indexer, and behavioral parity with StringBuilder. Licensing and attribution comments included.
Added SuppressMessage attribute to LoadDefinition to suppress the CA2000 warning about disposing XmlReader, with justification that disposal is handled by the caller. No functional changes made.
Added three unit tests covering edge cases where tab expansion and newlines (\r, \n, CRLF) interact at boundaries in InsertText. These tests ensure correct merging of CR and LF across tab and source text boundaries, preventing double-newline bugs and verifying proper newline normalization.
…gment reversal

Explicitly call System.Linq.Enumerable.Reverse when reversing segments to avoid a breaking change in C# 14, where Reverse() may resolve to MemoryExtensions.Reverse. Added a clarifying comment and moved using statements for consistency. This ensures backwards compatibility and correct behavior.
Rewrite InsertText with a two-phase algorithm: SIMD-optimized tab expansion and newline normalization using ValueStringBuilder. Handles custom Tab strings with newlines, provides a fast path for simple cases, and reduces allocations and document updates. Removes legacy substring loop and adds detailed documentation. Cleans up unused fields and adds necessary utility imports.
… to 512 based on benchmark results

Raised ValueStringBuilder's stack-allocated buffer from 256 to 512 chars during newline normalization. This reduces heap allocations for larger snippets and updates comments to reflect the new threshold.
@udlose udlose marked this pull request as ready for review April 4, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant