Commit 5bf10f2
perf(parquet/file): avoid double bool bitmap conversion (#707)
### Rationale for this change
Boolean columns currently get double converted when transferring between
Arrow and Parquet
### What changes are included in this PR?
**1. Arrow bitutil (`arrow/bitutil/bitmaps.go`)**
- Added `AppendBitmap()` method to `BitmapWriter`
- Directly copies bits from source bitmap using efficient `CopyBitmap()`
**2. Parquet encoder (`parquet/internal/encoding/boolean_encoder.go`)**
- Added `PutBitmap()` method to `PlainBooleanEncoder`
- Writes bitmap data directly without bool slice conversion
**3. Parquet decoder (`parquet/internal/encoding/boolean_decoder.go`)**
- Added `DecodeToBitmap()` method to `PlainBooleanDecoder`
- Reads directly into output bitmap
- Optimized fast path for byte-aligned cases
**4. Column writer (`parquet/file/column_writer_types.gen.go`)**
- Added `WriteBitmapBatch()` for non-nullable boolean columns
- Added `WriteBitmapBatchSpaced()` for nullable boolean columns
- Internal helper methods `writeBitmapValues()` and
`writeBitmapValuesSpaced()`
**5. Arrow-Parquet bridge (`parquet/pqarrow/encode_arrow.go`)**
- Modified `writeDenseArrow()` to detect boolean arrays
- Uses bitmap methods when available
- Falls back to original `[]bool` path if needed (backward compatible)
### Are these changes tested?
Yes, and new benchmarks are added as appropriate
### Are there any user-facing changes?
Just performance:
### Non-Nullable Boolean Columns
```
BenchmarkBooleanBitmapWrite/1K-16 314847 19126 ns/op 6.54 MB/s 36057 B/op 237 allocs/op
BenchmarkBooleanBitmapWrite/10K-16 174715 33985 ns/op 36.78 MB/s 53266 B/op 247 allocs/op
BenchmarkBooleanBitmapWrite/100K-16 34099 175655 ns/op 71.16 MB/s 218866 B/op 340 allocs/op
BenchmarkBooleanBitmapWrite/1M-16 3778 1568818 ns/op 79.68 MB/s 1763712 B/op 1237 allocs/op
```
### Nullable Boolean Columns (10% null rate)
```
BenchmarkBooleanBitmapWriteNullable/1K-16 214921 28002 ns/op 4.46 MB/s 39706 B/op 249 allocs/op
BenchmarkBooleanBitmapWriteNullable/10K-16 44618 134483 ns/op 9.29 MB/s 113690 B/op 268 allocs/op
BenchmarkBooleanBitmapWriteNullable/100K-16 5239 1149658 ns/op 10.87 MB/s 657178 B/op 451 allocs/op
BenchmarkBooleanBitmapWriteNullable/1M-16 556 10926274 ns/op 11.44 MB/s 5575200 B/op 2219 allocs/op
```
**Key Observations:**
- Direct bitmap path successfully avoids `[]bool` conversion
- Throughput scales well with data size (6.5 → 80 MB/s for non-nullable)
- Memory usage remains efficient with minimal allocations per operation
- Nullable columns have overhead from validity bitmap processing
(expected)
---------
Co-authored-by: Matt <zero@gibson>1 parent fc20f37 commit 5bf10f2
11 files changed
Lines changed: 868 additions & 9 deletions
File tree
- arrow/bitutil
- parquet
- file
- internal
- encoding
- utils
- pqarrow
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
167 | 167 | | |
168 | 168 | | |
169 | 169 | | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
170 | 203 | | |
171 | 204 | | |
172 | 205 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
596 | 596 | | |
597 | 597 | | |
598 | 598 | | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
862 | 862 | | |
863 | 863 | | |
864 | 864 | | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
0 commit comments