Skip to content

Commit 80eefc9

Browse files
authored
Merge pull request #3 from grove-platform/gdcd-compare-page-counts
GDCD script for page counts, remove deprecated project
2 parents 2166b6a + ec1379c commit 80eefc9

3 files changed

Lines changed: 416 additions & 6 deletions

File tree

audit/gdcd/scripts/README.md

Lines changed: 123 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,18 @@
11
# Log Parser Scripts
22

3-
This directory contains a script to parse GDCD log files and analyze page changes, specifically identifying moved pages vs truly new/removed pages and tracking applied usage examples.
3+
This directory contains scripts to parse GDCD log files and analyze page changes, specifically identifying moved pages vs truly new/removed pages and tracking applied usage examples.
44

55
## Files
66

7-
- `parse-log.go` - Main Go script that performs the log parsing and analysis
7+
- `parse-log.go` - Go script that performs log parsing and analysis for page changes
8+
- `compare-page-counts.go` - Go script that compares page counts from log files with audit-cli output
89
- `README.md` - This documentation file
910

1011
## Purpose
1112

12-
The script analyzes log files to distinguish between:
13+
### parse-log.go
14+
15+
The parse-log.go script analyzes log files to distinguish between:
1316

1417
1. **Moved Pages**: Pages that appear to be removed and created but are actually the same page moved to a new location within the same project
1518
2. **Maybe New Pages**: Pages that may be genuinely new additions
@@ -18,9 +21,43 @@ The script analyzes log files to distinguish between:
1821

1922
All results are reported with **project context** to clearly show which project each page belongs to.
2023

24+
### compare-page-counts.go
25+
26+
The compare-page-counts.go script compares page counts between:
27+
28+
1. **Log File**: Page counts extracted from GDCD log files (lines like "Found 78 docs pages for project csharp")
29+
2. **audit-cli**: Current page counts from running `audit-cli count pages --current-only --count-by-project`
30+
31+
This helps identify discrepancies between what was processed during a GDCD run and the current state of the documentation repository. Differences can indicate:
32+
- Pages added or removed since the log was generated
33+
- Project name mismatches between systems
34+
- Data inconsistencies that need investigation
35+
36+
The script automatically:
37+
1. Runs audit-cli once to identify projects that exist only in audit-cli (not in the log)
38+
2. Re-runs audit-cli with those projects excluded using the `--exclude-dirs` flag
39+
3. Compares the filtered results for a cleaner comparison
40+
41+
The script includes built-in project name mappings to handle known differences between log file project names and audit-cli project names:
42+
- `scala``scala-driver`
43+
- `cloud-docs``atlas`
44+
- `c``c-driver`
45+
- `cloudgov``atlas-government`
46+
- `django``django-mongodb`
47+
- `docs``manual`
48+
- `docs-relational-migrator``relational-migrator`
49+
- `laravel``laravel-mongodb`
50+
- `pymongo``pymongo-driver`
51+
- `pymongo-arrow``pymongo-arrow-driver`
52+
- `mck``kubernetes`
53+
54+
The script also excludes deprecated projects from comparison:
55+
- `docs-k8s-operator` (deprecated)
56+
2157
## Dependencies
2258

2359
- Go
60+
- `audit-cli` command (required for compare-page-counts.go) - must be available in your PATH
2461

2562
## How It Works
2663

@@ -59,7 +96,9 @@ moved, we must manually adjust the count of new applied usage examples to omit t
5996

6097
## Usage
6198

62-
**Important**: You must be in the scripts directory to run the Go script directly:
99+
**Important**: You must be in the scripts directory to run the Go scripts directly:
100+
101+
### parse-log.go
63102

64103
```bash
65104
# Navigate to the scripts directory first
@@ -70,9 +109,22 @@ go run parse-log.go ../logs/2025-09-24-18-01-30-app.log
70109
go run parse-log.go /absolute/path/to/your/log/file.log
71110
```
72111

112+
### compare-page-counts.go
113+
114+
```bash
115+
# Navigate to the scripts directory first
116+
cd /Your/Local/Filepath/tooling/audit/gdcd/scripts
117+
118+
# Then run the Go script with log file and docs repo path
119+
go run compare-page-counts.go ../logs/2025-12-10-17-58-47-app.log /path/to/docs-mongodb-internal
120+
go run compare-page-counts.go /absolute/path/to/log/file.log /absolute/path/to/docs/repo
121+
```
122+
73123
## Output Format
74124

75-
The script produces four sections:
125+
### parse-log.go
126+
127+
The parse-log.go script produces four sections:
76128

77129
### 1. MOVED PAGES
78130
```
@@ -108,6 +160,72 @@ APPLIED USAGE [pymongo]: data-formats|custom-types|type-codecs (1 applied usage
108160
Total new applied usage examples: 17
109161
```
110162

163+
### compare-page-counts.go
164+
165+
The compare-page-counts.go script compares page counts from the log file with the current state from audit-cli and produces output like:
166+
167+
```
168+
=== INITIAL COMPARISON ===
169+
Found 6 projects only in audit-cli: [app-services guides mongodb-analyzer mongodb-intellij mongodb-vscode realm]
170+
171+
Re-running audit-cli with exclusions...
172+
173+
=== PAGE COUNT COMPARISON ===
174+
175+
Projects with differences:
176+
--------------------------------------------------
177+
atlas Log: 777 Audit: 703 (diff: -74)
178+
atlas-architecture Log: 124 Audit: 121 (diff: -3)
179+
atlas-cli Log: 1276 Audit: 930 (diff: -346)
180+
atlas-operator Log: 58 Audit: 57 (diff: -1)
181+
c-driver Log: 86 Audit: 56 (diff: -30)
182+
cloud-manager Log: 490 Audit: 482 (diff: -8)
183+
compass Log: 117 Audit: 115 (diff: -2)
184+
cpp-driver Log: 56 Audit: 52 (diff: -4)
185+
csharp Log: 78 Audit: 77 (diff: -1)
186+
database-tools Log: 61 Audit: 53 (diff: -8)
187+
django-mongodb Log: 30 Audit: 27 (diff: -3)
188+
drivers Log: 21 Audit: 20 (diff: -1)
189+
entity-framework Log: 13 Audit: 14 (diff: +1)
190+
golang Log: 143 Audit: 68 (diff: -75)
191+
java Log: 90 Audit: 89 (diff: -1)
192+
java-rs Log: 56 Audit: 55 (diff: -1)
193+
kotlin Log: 88 Audit: 87 (diff: -1)
194+
kotlin-sync Log: 95 Audit: 66 (diff: -29)
195+
landing Log: 27 Audit: 23 (diff: -4)
196+
laravel-mongodb Log: 58 Audit: 57 (diff: -1)
197+
manual Log: 1668 Audit: 1596 (diff: -72)
198+
mongocli Log: 403 Audit: 17 (diff: -386)
199+
mongoid Log: 60 Audit: 59 (diff: -1)
200+
mongosync Log: 73 Audit: 88 (diff: +15)
201+
node Log: 77 Audit: 76 (diff: -1)
202+
ops-manager Log: 632 Audit: 628 (diff: -4)
203+
php-library Log: 259 Audit: 258 (diff: -1)
204+
pymongo-arrow-driver Log: 8 Audit: 9 (diff: +1)
205+
pymongo-driver Log: 67 Audit: 66 (diff: -1)
206+
relational-migrator Log: 135 Audit: 109 (diff: -26)
207+
ruby-driver Log: 91 Audit: 62 (diff: -29)
208+
rust Log: 76 Audit: 74 (diff: -2)
209+
scala-driver Log: 44 Audit: 43 (diff: -1)
210+
spark-connector Log: 16 Audit: 17 (diff: +1)
211+
voyage Log: 0 Audit: 1 (diff: +1)
212+
213+
=== SUMMARY ===
214+
Total projects: 43
215+
Matching counts: 8
216+
Different counts: 35
217+
218+
Total pages in log: 7869
219+
Total pages in audit-cli: 6771
220+
Difference: -1098
221+
```
222+
223+
This helps identify:
224+
- **Matching counts**: Projects where log and audit-cli agree
225+
- **Different counts**: Projects where counts differ (with the difference shown)
226+
- **Only in log**: Projects found in the log but not in audit-cli output (may indicate project name mismatches)
227+
- **Total pages**: Sum of all page counts from each source, excluding deprecated projects and projects only in audit-cli
228+
111229
## Log Format Requirements
112230

113231
The scripts expect log lines in the following formats:

0 commit comments

Comments
 (0)