|
| 1 | +# create-url-list |
| 2 | + |
| 3 | +A Go CLI tool that extracts and ranks URLs by pageviews from CSV data containing page analytics. |
| 4 | + |
| 5 | +## Build |
| 6 | + |
| 7 | +```bash |
| 8 | +go build |
| 9 | +``` |
| 10 | + |
| 11 | +## Usage |
| 12 | + |
| 13 | +```bash |
| 14 | +./create-url-list [--quiet] <csv-file-path> [range] [output-path] |
| 15 | +``` |
| 16 | + |
| 17 | +### Arguments |
| 18 | + |
| 19 | +1. **--quiet** (optional): Suppress all informational output (warnings, info messages, and success messages). Only errors will be displayed. Useful when using this tool in pipelines. |
| 20 | +2. **csv-file-path** (required): Path to the input CSV file |
| 21 | +3. **range** (optional): Rank range in format `min-max` (e.g., `1-50`). Default: `1-250` |
| 22 | + - Specifies which ranked entries to include in the output |
| 23 | + - `1-50` means "get the top 50 pages by pageviews" |
| 24 | + - `51-100` means "get pages ranked 51-100 by pageviews" |
| 25 | +4. **output-path** (optional): Custom output file path. Default: `output/YYYY-MM-DD_HH-MM-SS_range.csv` |
| 26 | + |
| 27 | +### Examples |
| 28 | + |
| 29 | +```bash |
| 30 | +# Get top 250 pages by pageviews (default) |
| 31 | +./create-url-list data.csv |
| 32 | + |
| 33 | +# Get top 50 pages by pageviews |
| 34 | +./create-url-list data.csv 1-50 |
| 35 | + |
| 36 | +# Get pages ranked 101-200 by pageviews |
| 37 | +./create-url-list data.csv 101-200 |
| 38 | + |
| 39 | +# Specify custom output path |
| 40 | +./create-url-list data.csv 1-100 results/top-100.csv |
| 41 | + |
| 42 | +# Use in a pipeline with quiet mode (no informational output) |
| 43 | +./create-url-list --quiet data.csv 1-50 output.csv |
| 44 | +``` |
| 45 | + |
| 46 | +## Input Requirements |
| 47 | + |
| 48 | +The input CSV file must contain the following columns: |
| 49 | +- `Page`: URL of the page (must start with `www.`) |
| 50 | +- `Measure Names`: Type of metric |
| 51 | +- `Measure Values`: Integer value of the metric |
| 52 | + |
| 53 | +The tool will: |
| 54 | +- Collect all rows where `Measure Names` equals `Pageviews` |
| 55 | +- Rank them by `Measure Values` (highest to lowest) |
| 56 | +- Extract entries within the specified rank range |
| 57 | +- Validate that URLs start with `www.` (to ensure consistent format without `https://`) |
| 58 | + |
| 59 | +## Output |
| 60 | + |
| 61 | +The output CSV file contains two columns (no headers): |
| 62 | +1. Rank (integer) - Position in the ranking (1 = highest pageviews) |
| 63 | +2. URL (string) - Page URL |
| 64 | + |
| 65 | +Rows are sorted by rank in ascending order (rank 1 first). |
| 66 | + |
| 67 | +## Configuration (Optional) |
| 68 | + |
| 69 | +You can create a `config.yml` file in the same directory as the executable to configure URL filtering and output format: |
| 70 | + |
| 71 | +```yaml |
| 72 | +# List of URLs to ignore from the output |
| 73 | +ignore_urls: |
| 74 | + - www.example.com/page-to-ignore |
| 75 | + - www.example.com/another-page-to-ignore |
| 76 | + |
| 77 | +# Whether to show pageviews as a third column in the output |
| 78 | +show_pageviews: true |
| 79 | + |
| 80 | +# Whether to include headers in the output CSV |
| 81 | +show_headers: true |
| 82 | +``` |
| 83 | +
|
| 84 | +### Configuration Options |
| 85 | +
|
| 86 | +**`ignore_urls`** (optional) |
| 87 | +- URLs listed here will be completely removed from the ranking (not just hidden) |
| 88 | +- Excluded before ranking is calculated, so remaining URLs move up without gaps |
| 89 | +- For example, if you ignore rank #2, the former rank #3 becomes the new rank #2 |
| 90 | + |
| 91 | +**`show_pageviews`** (optional, default: `false`) |
| 92 | +- When `false`: Output contains 2 columns (rank, URL) |
| 93 | +- When `true`: Output contains 3 columns (rank, URL, pageviews) |
| 94 | + |
| 95 | +**`show_headers`** (optional, default: `false`) |
| 96 | +- When `false`: No headers in output (just data rows) |
| 97 | +- When `true`: Adds header row with column names |
| 98 | + - Without pageviews: `Rank,Page` |
| 99 | + - With pageviews: `Rank,Page,Number of Page Views` |
| 100 | + |
| 101 | +The config file is optional. If it doesn't exist or can't be loaded, the tool will display a warning and continue with default settings. |
| 102 | + |
| 103 | +## Error Handling |
| 104 | + |
| 105 | +The tool exits with code 1 and displays an error message if: |
| 106 | +- Input file path is invalid or file doesn't exist |
| 107 | +- Required columns are missing from the CSV |
| 108 | +- URL structure doesn't match expected format (must start with `www.`) |
| 109 | +- Range format is invalid |
0 commit comments