You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: audit/gdcd/scripts/README.md
+123-5Lines changed: 123 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,18 @@
1
1
# Log Parser Scripts
2
2
3
-
This directory contains a script to parse GDCD log files and analyze page changes, specifically identifying moved pages vs truly new/removed pages and tracking applied usage examples.
3
+
This directory contains scripts to parse GDCD log files and analyze page changes, specifically identifying moved pages vs truly new/removed pages and tracking applied usage examples.
4
4
5
5
## Files
6
6
7
-
-`parse-log.go` - Main Go script that performs the log parsing and analysis
7
+
-`parse-log.go` - Go script that performs log parsing and analysis for page changes
8
+
-`compare-page-counts.go` - Go script that compares page counts from log files with audit-cli output
8
9
-`README.md` - This documentation file
9
10
10
11
## Purpose
11
12
12
-
The script analyzes log files to distinguish between:
13
+
### parse-log.go
14
+
15
+
The parse-log.go script analyzes log files to distinguish between:
13
16
14
17
1.**Moved Pages**: Pages that appear to be removed and created but are actually the same page moved to a new location within the same project
15
18
2.**Maybe New Pages**: Pages that may be genuinely new additions
@@ -18,9 +21,43 @@ The script analyzes log files to distinguish between:
18
21
19
22
All results are reported with **project context** to clearly show which project each page belongs to.
20
23
24
+
### compare-page-counts.go
25
+
26
+
The compare-page-counts.go script compares page counts between:
27
+
28
+
1.**Log File**: Page counts extracted from GDCD log files (lines like "Found 78 docs pages for project csharp")
29
+
2.**audit-cli**: Current page counts from running `audit-cli count pages --current-only --count-by-project`
30
+
31
+
This helps identify discrepancies between what was processed during a GDCD run and the current state of the documentation repository. Differences can indicate:
32
+
- Pages added or removed since the log was generated
33
+
- Project name mismatches between systems
34
+
- Data inconsistencies that need investigation
35
+
36
+
The script automatically:
37
+
1. Runs audit-cli once to identify projects that exist only in audit-cli (not in the log)
38
+
2. Re-runs audit-cli with those projects excluded using the `--exclude-dirs` flag
39
+
3. Compares the filtered results for a cleaner comparison
40
+
41
+
The script includes built-in project name mappings to handle known differences between log file project names and audit-cli project names:
0 commit comments