Skip to content

Commit 3c15871

Browse files
authored
Merge pull request #12 from grove-platform/add-csv-utils
create-url-list: Add debug utils to fix CSV issues
2 parents 93b575e + 70a6c52 commit 3c15871

7 files changed

Lines changed: 407 additions & 1 deletion

File tree

create-url-list/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,5 @@ output/
22
test-output/
33
create-url-list
44
config.yml
5+
utils/convert-csv
6+
utils/debug-csv

create-url-list/README.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,3 +117,84 @@ The tool exits with code 1 and displays an error message if:
117117
- Required columns are missing from the CSV
118118
- URL structure doesn't match expected format (must start with `www.`)
119119
- Range format is invalid
120+
121+
## Troubleshooting Utilities
122+
123+
The `utils/` directory contains helper tools for diagnosing and fixing CSV format issues.
124+
125+
### CSV Format Debugger
126+
127+
If you're getting a "missing required columns" error, use the debug tool to inspect your CSV file:
128+
129+
```bash
130+
# Build the debug tool
131+
cd utils
132+
go build -o debug-csv debug-csv.go
133+
134+
# Run it on your CSV file
135+
./debug-csv /path/to/your/file.csv
136+
```
137+
138+
The debug tool will show you:
139+
- How many columns were detected
140+
- The exact name of each column (with quotes to reveal whitespace)
141+
- Byte representation to reveal hidden characters (BOM, special encoding, etc.)
142+
- Whether each required column was found
143+
- Warnings about common issues (BOM, extra whitespace, etc.)
144+
145+
**Example output:**
146+
```
147+
Found 5 columns in header:
148+
149+
Column 0: "Page"
150+
Bytes: [80 97 103 101]
151+
Length: 4
152+
✓ Matches required column 'Page'
153+
154+
Column 2: "Measure Names"
155+
Bytes: [77 101 97 115 117 114 101 32 78 97 109 101 115]
156+
Length: 13
157+
✓ Matches required column 'Measure Names'
158+
...
159+
```
160+
161+
### CSV Format Converter
162+
163+
If your CSV file is in UTF-16 encoding or tab-delimited format (common with Excel/Tableau exports), use the converter tool:
164+
165+
```bash
166+
# Build the converter tool
167+
cd utils
168+
go build -o convert-csv convert-csv.go
169+
170+
# Convert your file
171+
./convert-csv /path/to/input.csv /path/to/output.csv
172+
```
173+
174+
This tool converts:
175+
- **From:** UTF-16 encoding with tab delimiters
176+
- **To:** UTF-8 encoding with comma delimiters (standard CSV)
177+
178+
**Example:**
179+
```bash
180+
# Convert a Tableau export
181+
./convert-csv ~/Downloads/tableau-export.csv ~/temp/converted.csv
182+
183+
# Then use the converted file
184+
cd ..
185+
./create-url-list ~/temp/converted.csv 1-250 output.csv
186+
```
187+
188+
### Common CSV Issues
189+
190+
1. **UTF-16 encoding with BOM** - File starts with byte order mark (bytes `255 254`)
191+
- **Solution:** Use `convert-csv` tool
192+
193+
2. **Tab-delimited instead of comma-delimited** - Columns separated by tabs
194+
- **Solution:** Use `convert-csv` tool
195+
196+
3. **Extra whitespace in column names** - Column named `" Page "` instead of `"Page"`
197+
- **Solution:** Edit the CSV header row to remove extra spaces
198+
199+
4. **Wrong column names** - Different capitalization or spelling
200+
- **Solution:** Rename columns to exactly match: `Page`, `Measure Names`, `Measure Values`

create-url-list/go.mod

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,7 @@ module create-url-list
22

33
go 1.25.4
44

5-
require gopkg.in/yaml.v3 v3.0.1 // indirect
5+
require (
6+
golang.org/x/text v0.33.0 // indirect
7+
gopkg.in/yaml.v3 v3.0.1 // indirect
8+
)

create-url-list/go.sum

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
golang.org/x/text v0.33.0 h1:B3njUFyqtHDUI5jMn1YIr5B0IE2U0qck04r6d4KPAxE=
2+
golang.org/x/text v0.33.0/go.mod h1:LuMebE6+rBincTi9+xWTY8TztLzKHc/9C1uBCG27+q8=
13
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
24
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
35
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

create-url-list/utils/README.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Utility Tools for create-url-list
2+
3+
This directory contains diagnostic and conversion tools for troubleshooting CSV format issues with `create-url-list`.
4+
5+
## Tools
6+
7+
### debug-csv - CSV Format Inspector
8+
9+
Inspects a CSV file to diagnose format issues and verify column names.
10+
11+
**Build:**
12+
```bash
13+
go build -o debug-csv debug-csv.go
14+
```
15+
16+
**Usage:**
17+
```bash
18+
./debug-csv <csv-file-path>
19+
```
20+
21+
**What it shows:**
22+
- Number of columns detected
23+
- Exact column names (with quotes to reveal whitespace)
24+
- Byte representation of each column name (to detect encoding issues)
25+
- Column length
26+
- Whether required columns (`Page`, `Measure Names`, `Measure Values`) are present
27+
- Warnings about common issues:
28+
- BOM (Byte Order Mark) at start of file
29+
- Leading/trailing whitespace in column names
30+
31+
**Example:**
32+
```bash
33+
./debug-csv ~/Downloads/analytics-data.csv
34+
```
35+
36+
**Sample output:**
37+
```
38+
Found 5 columns in header:
39+
40+
Column 0: "Page"
41+
Bytes: [80 97 103 101]
42+
Length: 4
43+
✓ Matches required column 'Page'
44+
45+
Column 1: "Page Subsite"
46+
Bytes: [80 97 103 101 32 83 117 98 115 105 116 101]
47+
Length: 12
48+
49+
Column 2: "Measure Names"
50+
Bytes: [77 101 97 115 117 114 101 32 78 97 109 101 115]
51+
Length: 13
52+
✓ Matches required column 'Measure Names'
53+
54+
Column 3: "Measure Values"
55+
Bytes: [77 101 97 115 117 114 101 32 86 97 108 117 101 115]
56+
Length: 14
57+
✓ Matches required column 'Measure Values'
58+
59+
Required columns check:
60+
✓ 'Page' found
61+
✓ 'Measure Names' found
62+
✓ 'Measure Values' found
63+
64+
✓ All required columns present!
65+
```
66+
67+
---
68+
69+
### convert-csv - CSV Format Converter
70+
71+
Converts UTF-16 tab-delimited CSV files to UTF-8 comma-delimited format (standard CSV).
72+
73+
**Build:**
74+
```bash
75+
go build -o convert-csv convert-csv.go
76+
```
77+
78+
**Usage:**
79+
```bash
80+
./convert-csv <input-file> <output-file>
81+
```
82+
83+
**What it does:**
84+
- Reads UTF-16 encoded files (with or without BOM)
85+
- Handles tab-delimited data
86+
- Outputs standard UTF-8 comma-delimited CSV
87+
88+
**Example:**
89+
```bash
90+
# Convert a Tableau or Excel export
91+
./convert-csv ~/Downloads/tableau-export.csv ~/temp/converted.csv
92+
93+
# Then use with create-url-list
94+
cd ..
95+
./create-url-list ~/temp/converted.csv 1-250 output.csv
96+
```
97+
98+
**Sample output:**
99+
```
100+
Successfully converted 51396 rows from /path/to/input.csv to /path/to/output.csv
101+
Input format: UTF-16 tab-delimited
102+
Output format: UTF-8 comma-delimited
103+
```
104+
105+
---
106+
107+
## Common Workflow
108+
109+
When you encounter a "missing required columns" error:
110+
111+
1. **Diagnose the issue:**
112+
```bash
113+
./debug-csv /path/to/problematic-file.csv
114+
```
115+
116+
2. **If the file is UTF-16 or tab-delimited, convert it:**
117+
```bash
118+
./convert-csv /path/to/problematic-file.csv /path/to/fixed-file.csv
119+
```
120+
121+
3. **Verify the conversion worked:**
122+
```bash
123+
./debug-csv /path/to/fixed-file.csv
124+
```
125+
126+
4. **Use the fixed file with create-url-list:**
127+
```bash
128+
cd ..
129+
./create-url-list /path/to/fixed-file.csv 1-250 output.csv
130+
```
131+
132+
## Dependencies
133+
134+
The `convert-csv` tool requires the `golang.org/x/text` package:
135+
136+
```bash
137+
go get golang.org/x/text/encoding/unicode
138+
go get golang.org/x/text/transform
139+
```
140+
141+
This dependency is automatically downloaded when you build the tool.
142+
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
package main
2+
3+
import (
4+
"encoding/csv"
5+
"fmt"
6+
"os"
7+
8+
"golang.org/x/text/encoding/unicode"
9+
"golang.org/x/text/transform"
10+
)
11+
12+
func main() {
13+
if len(os.Args) < 3 {
14+
fmt.Fprintf(os.Stderr, "Usage: %s <input-file> <output-file>\n", os.Args[0])
15+
fmt.Fprintf(os.Stderr, "Converts UTF-16 tab-delimited CSV to UTF-8 comma-delimited CSV\n")
16+
os.Exit(1)
17+
}
18+
19+
inputPath := os.Args[1]
20+
outputPath := os.Args[2]
21+
22+
// Open input file
23+
inputFile, err := os.Open(inputPath)
24+
if err != nil {
25+
fmt.Fprintf(os.Stderr, "Error opening input file: %v\n", err)
26+
os.Exit(1)
27+
}
28+
defer inputFile.Close()
29+
30+
// Create UTF-16 decoder
31+
decoder := unicode.UTF16(unicode.LittleEndian, unicode.UseBOM).NewDecoder()
32+
reader := transform.NewReader(inputFile, decoder)
33+
34+
// Create CSV reader with tab delimiter
35+
csvReader := csv.NewReader(reader)
36+
csvReader.Comma = '\t'
37+
csvReader.LazyQuotes = true
38+
39+
// Read all records
40+
records, err := csvReader.ReadAll()
41+
if err != nil {
42+
fmt.Fprintf(os.Stderr, "Error reading CSV: %v\n", err)
43+
os.Exit(1)
44+
}
45+
46+
// Create output file
47+
outputFile, err := os.Create(outputPath)
48+
if err != nil {
49+
fmt.Fprintf(os.Stderr, "Error creating output file: %v\n", err)
50+
os.Exit(1)
51+
}
52+
defer outputFile.Close()
53+
54+
// Create CSV writer (defaults to comma delimiter)
55+
csvWriter := csv.NewWriter(outputFile)
56+
defer csvWriter.Flush()
57+
58+
// Write all records
59+
for _, record := range records {
60+
if err := csvWriter.Write(record); err != nil {
61+
fmt.Fprintf(os.Stderr, "Error writing record: %v\n", err)
62+
os.Exit(1)
63+
}
64+
}
65+
66+
fmt.Printf("Successfully converted %d rows from %s to %s\n", len(records), inputPath, outputPath)
67+
fmt.Printf("Input format: UTF-16 tab-delimited\n")
68+
fmt.Printf("Output format: UTF-8 comma-delimited\n")
69+
}

0 commit comments

Comments
 (0)