Skip to content

Commit 58b5a85

Browse files
authored
Merge pull request #5 from grove-platform/create-url-list
Add a tool to parse metrics output and extract URLs within ranges by page view count
2 parents 80eefc9 + 9bae423 commit 58b5a85

14 files changed

Lines changed: 1175 additions & 0 deletions

create-url-list/.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
output/
2+
test-output/
3+
create-url-list
4+
config.yml

create-url-list/README.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# create-url-list
2+
3+
A Go CLI tool that extracts and ranks URLs by pageviews from CSV data containing page analytics.
4+
5+
## Build
6+
7+
```bash
8+
go build
9+
```
10+
11+
## Usage
12+
13+
```bash
14+
./create-url-list [--quiet] <csv-file-path> [range] [output-path]
15+
```
16+
17+
### Arguments
18+
19+
1. **--quiet** (optional): Suppress all informational output (warnings, info messages, and success messages). Only errors will be displayed. Useful when using this tool in pipelines.
20+
2. **csv-file-path** (required): Path to the input CSV file
21+
3. **range** (optional): Rank range in format `min-max` (e.g., `1-50`). Default: `1-250`
22+
- Specifies which ranked entries to include in the output
23+
- `1-50` means "get the top 50 pages by pageviews"
24+
- `51-100` means "get pages ranked 51-100 by pageviews"
25+
4. **output-path** (optional): Custom output file path. Default: `output/YYYY-MM-DD_HH-MM-SS_range.csv`
26+
27+
### Examples
28+
29+
```bash
30+
# Get top 250 pages by pageviews (default)
31+
./create-url-list data.csv
32+
33+
# Get top 50 pages by pageviews
34+
./create-url-list data.csv 1-50
35+
36+
# Get pages ranked 101-200 by pageviews
37+
./create-url-list data.csv 101-200
38+
39+
# Specify custom output path
40+
./create-url-list data.csv 1-100 results/top-100.csv
41+
42+
# Use in a pipeline with quiet mode (no informational output)
43+
./create-url-list --quiet data.csv 1-50 output.csv
44+
```
45+
46+
## Input Requirements
47+
48+
The input CSV file must contain the following columns:
49+
- `Page`: URL of the page (must start with `www.`)
50+
- `Measure Names`: Type of metric
51+
- `Measure Values`: Integer value of the metric
52+
53+
The tool will:
54+
- Collect all rows where `Measure Names` equals `Pageviews`
55+
- Rank them by `Measure Values` (highest to lowest)
56+
- Extract entries within the specified rank range
57+
- Validate that URLs start with `www.` (to ensure consistent format without `https://`)
58+
59+
## Output
60+
61+
The output CSV file contains two columns (no headers):
62+
1. Rank (integer) - Position in the ranking (1 = highest pageviews)
63+
2. URL (string) - Page URL
64+
65+
Rows are sorted by rank in ascending order (rank 1 first).
66+
67+
## Configuration (Optional)
68+
69+
You can create a `config.yml` file in the same directory as the executable to configure URL filtering and output format:
70+
71+
```yaml
72+
# List of URLs to ignore from the output
73+
ignore_urls:
74+
- www.example.com/page-to-ignore
75+
- www.example.com/another-page-to-ignore
76+
77+
# Whether to show pageviews as a third column in the output
78+
show_pageviews: true
79+
80+
# Whether to include headers in the output CSV
81+
show_headers: true
82+
```
83+
84+
### Configuration Options
85+
86+
**`ignore_urls`** (optional)
87+
- URLs listed here will be completely removed from the ranking (not just hidden)
88+
- Excluded before ranking is calculated, so remaining URLs move up without gaps
89+
- For example, if you ignore rank #2, the former rank #3 becomes the new rank #2
90+
91+
**`show_pageviews`** (optional, default: `false`)
92+
- When `false`: Output contains 2 columns (rank, URL)
93+
- When `true`: Output contains 3 columns (rank, URL, pageviews)
94+
95+
**`show_headers`** (optional, default: `false`)
96+
- When `false`: No headers in output (just data rows)
97+
- When `true`: Adds header row with column names
98+
- Without pageviews: `Rank,Page`
99+
- With pageviews: `Rank,Page,Number of Page Views`
100+
101+
The config file is optional. If it doesn't exist or can't be loaded, the tool will display a warning and continue with default settings.
102+
103+
## Error Handling
104+
105+
The tool exits with code 1 and displays an error message if:
106+
- Input file path is invalid or file doesn't exist
107+
- Required columns are missing from the CSV
108+
- URL structure doesn't match expected format (must start with `www.`)
109+
- Range format is invalid

create-url-list/config.yml.example

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Configuration file for create-url-list
2+
# Copy this file to config.yml and customize as needed
3+
4+
# List of URLs to ignore from the output
5+
# These URLs will be filtered out before ranking, so they won't create gaps
6+
# in the ranking numbers
7+
ignore_urls:
8+
- www.example.com/page-to-ignore
9+
- www.example.com/another-page-to-ignore
10+
11+
# Whether to show pageviews as a third column in the output
12+
# Default: false (output only rank and URL)
13+
# When true: output rank, URL, and pageviews
14+
show_pageviews: false
15+
16+
# Whether to include headers in the output CSV
17+
# Default: false (no headers)
18+
# When true: adds "Rank", "Page", and optionally "Number of Page Views" as headers
19+
show_headers: false
20+

create-url-list/go.mod

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
module create-url-list
2+
3+
go 1.25.4
4+
5+
require gopkg.in/yaml.v3 v3.0.1 // indirect

create-url-list/go.sum

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
2+
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
3+
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

0 commit comments

Comments
 (0)