Commit 54e702c
committed
fix: correct fieldCount accounting and add reverse lookup in inverted index
Two related fixes in the inverted index:
1. fieldCount bug: `fieldCount` is incremented once per field during
build (each `addField` call = one field). But `removeFromInvertedIndex`
decremented it once per posting removed — so a field with 5 unique
terms would decrement `fieldCount` by 5 instead of 1, skewing IDF
calculations after any removal. Fixed by collecting distinct
(keyIdx, subIdx) pairs across all removed postings and decrementing
by that count.
2. Reverse doc→terms map: `removeFromInvertedIndex` iterated every
term in the vocabulary to find postings belonging to the removed doc.
Added `docTerms: Map<number, Set<string>>` — populated during build
and add, deleted during remove. Removal now only visits terms that
belong to the document: O(terms_in_doc) instead of O(vocabulary_size).1 parent e550ab1 commit 54e702c
13 files changed
Lines changed: 312 additions & 29 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1023 | 1023 | | |
1024 | 1024 | | |
1025 | 1025 | | |
| 1026 | + | |
1026 | 1027 | | |
1027 | 1028 | | |
1028 | 1029 | | |
| |||
1035 | 1036 | | |
1036 | 1037 | | |
1037 | 1038 | | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
1038 | 1046 | | |
1039 | 1047 | | |
1040 | 1048 | | |
| |||
1049 | 1057 | | |
1050 | 1058 | | |
1051 | 1059 | | |
| 1060 | + | |
1052 | 1061 | | |
1053 | 1062 | | |
1054 | 1063 | | |
| |||
1083 | 1092 | | |
1084 | 1093 | | |
1085 | 1094 | | |
1086 | | - | |
| 1095 | + | |
| 1096 | + | |
1087 | 1097 | | |
1088 | 1098 | | |
1089 | 1099 | | |
| |||
1092 | 1102 | | |
1093 | 1103 | | |
1094 | 1104 | | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
1095 | 1110 | | |
1096 | 1111 | | |
1097 | 1112 | | |
| |||
1113 | 1128 | | |
1114 | 1129 | | |
1115 | 1130 | | |
| 1131 | + | |
1116 | 1132 | | |
1117 | 1133 | | |
1118 | 1134 | | |
| |||
1135 | 1151 | | |
1136 | 1152 | | |
1137 | 1153 | | |
1138 | | - | |
1139 | | - | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
1140 | 1167 | | |
1141 | 1168 | | |
1142 | | - | |
1143 | 1169 | | |
1144 | 1170 | | |
1145 | 1171 | | |
| |||
1149 | 1175 | | |
1150 | 1176 | | |
1151 | 1177 | | |
| 1178 | + | |
| 1179 | + | |
1152 | 1180 | | |
1153 | 1181 | | |
1154 | 1182 | | |
| |||
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1021 | 1021 | | |
1022 | 1022 | | |
1023 | 1023 | | |
| 1024 | + | |
1024 | 1025 | | |
1025 | 1026 | | |
1026 | 1027 | | |
| |||
1033 | 1034 | | |
1034 | 1035 | | |
1035 | 1036 | | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
1036 | 1044 | | |
1037 | 1045 | | |
1038 | 1046 | | |
| |||
1047 | 1055 | | |
1048 | 1056 | | |
1049 | 1057 | | |
| 1058 | + | |
1050 | 1059 | | |
1051 | 1060 | | |
1052 | 1061 | | |
| |||
1081 | 1090 | | |
1082 | 1091 | | |
1083 | 1092 | | |
1084 | | - | |
| 1093 | + | |
| 1094 | + | |
1085 | 1095 | | |
1086 | 1096 | | |
1087 | 1097 | | |
| |||
1090 | 1100 | | |
1091 | 1101 | | |
1092 | 1102 | | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
1093 | 1108 | | |
1094 | 1109 | | |
1095 | 1110 | | |
| |||
1111 | 1126 | | |
1112 | 1127 | | |
1113 | 1128 | | |
| 1129 | + | |
1114 | 1130 | | |
1115 | 1131 | | |
1116 | 1132 | | |
| |||
1133 | 1149 | | |
1134 | 1150 | | |
1135 | 1151 | | |
1136 | | - | |
1137 | | - | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
1138 | 1165 | | |
1139 | 1166 | | |
1140 | | - | |
1141 | 1167 | | |
1142 | 1168 | | |
1143 | 1169 | | |
| |||
1147 | 1173 | | |
1148 | 1174 | | |
1149 | 1175 | | |
| 1176 | + | |
| 1177 | + | |
1150 | 1178 | | |
1151 | 1179 | | |
1152 | 1180 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1418 | 1418 | | |
1419 | 1419 | | |
1420 | 1420 | | |
| 1421 | + | |
1421 | 1422 | | |
1422 | 1423 | | |
1423 | 1424 | | |
| |||
1430 | 1431 | | |
1431 | 1432 | | |
1432 | 1433 | | |
| 1434 | + | |
| 1435 | + | |
| 1436 | + | |
| 1437 | + | |
| 1438 | + | |
| 1439 | + | |
| 1440 | + | |
1433 | 1441 | | |
1434 | 1442 | | |
1435 | 1443 | | |
| |||
1444 | 1452 | | |
1445 | 1453 | | |
1446 | 1454 | | |
| 1455 | + | |
1447 | 1456 | | |
1448 | 1457 | | |
1449 | 1458 | | |
| |||
1478 | 1487 | | |
1479 | 1488 | | |
1480 | 1489 | | |
1481 | | - | |
| 1490 | + | |
| 1491 | + | |
1482 | 1492 | | |
1483 | 1493 | | |
1484 | 1494 | | |
| |||
1487 | 1497 | | |
1488 | 1498 | | |
1489 | 1499 | | |
| 1500 | + | |
| 1501 | + | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
1490 | 1505 | | |
1491 | 1506 | | |
1492 | 1507 | | |
| |||
1508 | 1523 | | |
1509 | 1524 | | |
1510 | 1525 | | |
| 1526 | + | |
1511 | 1527 | | |
1512 | 1528 | | |
1513 | 1529 | | |
| |||
1530 | 1546 | | |
1531 | 1547 | | |
1532 | 1548 | | |
1533 | | - | |
1534 | | - | |
| 1549 | + | |
| 1550 | + | |
| 1551 | + | |
| 1552 | + | |
| 1553 | + | |
| 1554 | + | |
| 1555 | + | |
| 1556 | + | |
| 1557 | + | |
| 1558 | + | |
| 1559 | + | |
| 1560 | + | |
| 1561 | + | |
1535 | 1562 | | |
1536 | 1563 | | |
1537 | | - | |
1538 | 1564 | | |
1539 | 1565 | | |
1540 | 1566 | | |
| |||
1544 | 1570 | | |
1545 | 1571 | | |
1546 | 1572 | | |
| 1573 | + | |
| 1574 | + | |
1547 | 1575 | | |
1548 | 1576 | | |
1549 | 1577 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
251 | 251 | | |
252 | 252 | | |
253 | 253 | | |
| 254 | + | |
254 | 255 | | |
255 | 256 | | |
256 | 257 | | |
| |||
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1416 | 1416 | | |
1417 | 1417 | | |
1418 | 1418 | | |
| 1419 | + | |
1419 | 1420 | | |
1420 | 1421 | | |
1421 | 1422 | | |
| |||
1428 | 1429 | | |
1429 | 1430 | | |
1430 | 1431 | | |
| 1432 | + | |
| 1433 | + | |
| 1434 | + | |
| 1435 | + | |
| 1436 | + | |
| 1437 | + | |
| 1438 | + | |
1431 | 1439 | | |
1432 | 1440 | | |
1433 | 1441 | | |
| |||
1442 | 1450 | | |
1443 | 1451 | | |
1444 | 1452 | | |
| 1453 | + | |
1445 | 1454 | | |
1446 | 1455 | | |
1447 | 1456 | | |
| |||
1476 | 1485 | | |
1477 | 1486 | | |
1478 | 1487 | | |
1479 | | - | |
| 1488 | + | |
| 1489 | + | |
1480 | 1490 | | |
1481 | 1491 | | |
1482 | 1492 | | |
| |||
1485 | 1495 | | |
1486 | 1496 | | |
1487 | 1497 | | |
| 1498 | + | |
| 1499 | + | |
| 1500 | + | |
| 1501 | + | |
| 1502 | + | |
1488 | 1503 | | |
1489 | 1504 | | |
1490 | 1505 | | |
| |||
1506 | 1521 | | |
1507 | 1522 | | |
1508 | 1523 | | |
| 1524 | + | |
1509 | 1525 | | |
1510 | 1526 | | |
1511 | 1527 | | |
| |||
1528 | 1544 | | |
1529 | 1545 | | |
1530 | 1546 | | |
1531 | | - | |
1532 | | - | |
| 1547 | + | |
| 1548 | + | |
| 1549 | + | |
| 1550 | + | |
| 1551 | + | |
| 1552 | + | |
| 1553 | + | |
| 1554 | + | |
| 1555 | + | |
| 1556 | + | |
| 1557 | + | |
| 1558 | + | |
| 1559 | + | |
1533 | 1560 | | |
1534 | 1561 | | |
1535 | | - | |
1536 | 1562 | | |
1537 | 1563 | | |
1538 | 1564 | | |
| |||
1542 | 1568 | | |
1543 | 1569 | | |
1544 | 1570 | | |
| 1571 | + | |
| 1572 | + | |
1545 | 1573 | | |
1546 | 1574 | | |
1547 | 1575 | | |
| |||
Large diffs are not rendered by default.
0 commit comments