|
| 1 | +# Cleanup Notes |
| 2 | + |
| 3 | +> Code and design cleanup opportunities for span. The codebase targets Go 1.24 |
| 4 | +> but largely uses patterns from the Go 1.11-1.17 era. Most items below are |
| 5 | +> incremental and can be adopted file-by-file. |
| 6 | +
|
| 7 | +## 1. Error handling |
| 8 | + |
| 9 | +Only 3 files in the entire codebase use `%w` for error wrapping; zero use of |
| 10 | +`errors.Is`, `errors.As`, or `errors.Join`. |
| 11 | + |
| 12 | +**Wrap with `%w`** -- most `fmt.Errorf` calls pass errors via `%v`, which |
| 13 | +discards the error chain. Wrapping lets callers inspect causes when needed. |
| 14 | + |
| 15 | +```go |
| 16 | +// before |
| 17 | +return fmt.Errorf("error in stage 1: %v", err) |
| 18 | + |
| 19 | +// after |
| 20 | +return fmt.Errorf("stage 1: %w", err) |
| 21 | +``` |
| 22 | + |
| 23 | +**Use `errors.Join`** -- `crossref/snapshot.go` and other places with |
| 24 | +multi-stage cleanup return only the first error. `errors.Join` (Go 1.20) |
| 25 | +preserves all of them. |
| 26 | + |
| 27 | +**Use `errors.As` for Skip** -- `span.Skip` is type-asserted in several `cmd/` |
| 28 | +files via `if _, ok := err.(span.Skip); ok`. The idiomatic replacement is |
| 29 | +`errors.As`, which works through wrapped errors. |
| 30 | + |
| 31 | +## 2. Structured logging with `log/slog` |
| 32 | + |
| 33 | +21 files import logrus, ~15 more use stdlib `log`. Some cmd files alias both |
| 34 | +under the same name. `log/slog` (Go 1.21) is now the standard approach and |
| 35 | +removes the external dependency. |
| 36 | + |
| 37 | +```go |
| 38 | +// before |
| 39 | +log.Printf("[%s] fetching: %s", is.ID, link) |
| 40 | + |
| 41 | +// after |
| 42 | +slog.Info("fetching", "id", is.ID, "link", link) |
| 43 | +``` |
| 44 | + |
| 45 | +Migrating one package at a time is fine; logrus and slog can coexist during |
| 46 | +transition. |
| 47 | + |
| 48 | +## 3. Replace `interface{}` with generics or `any` |
| 49 | + |
| 50 | +81 occurrences of `interface{}` across 23 files. |
| 51 | + |
| 52 | +**Quick win** -- rename `interface{}` to `any` (Go 1.18 alias). No behavior |
| 53 | +change, just readability. |
| 54 | + |
| 55 | +**`container.StringSet` -> `Set[T comparable]`** -- the current type is only |
| 56 | +usable with strings and reimplements iteration, intersection, and difference by |
| 57 | +hand. A generic set backed by `map[T]struct{}` covers all uses and removes |
| 58 | +duplicate set logic elsewhere. |
| 59 | + |
| 60 | +**`FormatMap` in span-import** -- currently `map[string]func() interface{}`. |
| 61 | +Could become `map[string]func() IntermediateSchemaer` or use a small generic |
| 62 | +registry, removing the type assertions in `processXML`/`processJSON`. |
| 63 | + |
| 64 | +## 4. Adopt `slices` and `maps` packages |
| 65 | + |
| 66 | +Neither `slices` nor `maps` are imported anywhere. Several patterns would |
| 67 | +benefit: |
| 68 | + |
| 69 | +```go |
| 70 | +// container/string.go -- before |
| 71 | +func (set *StringSet) SortedValues() (values []string) { |
| 72 | + for k := range set.Set { |
| 73 | + values = append(values, k) |
| 74 | + } |
| 75 | + sort.Strings(values) |
| 76 | + return values |
| 77 | +} |
| 78 | + |
| 79 | +// after |
| 80 | +func (set *StringSet) SortedValues() []string { |
| 81 | + v := slices.Collect(maps.Keys(set.Set)) |
| 82 | + slices.Sort(v) |
| 83 | + return v |
| 84 | +} |
| 85 | +``` |
| 86 | + |
| 87 | +Also applies to: `cmd/span-import` (sorting format names), |
| 88 | +`formats/finc/intermediate.go` (dedup helpers), `filter/filter.go` (key |
| 89 | +extraction). |
| 90 | + |
| 91 | +## 5. `context.Context` for long-running operations |
| 92 | + |
| 93 | +Neither `parallel.Processor` nor `crossref.CreateSnapshot` accept a context. |
| 94 | +Adding it enables cancellation, timeouts, and tracing without changing the |
| 95 | +public API shape much (context is typically the first parameter). |
| 96 | + |
| 97 | +`folio/folio.go` makes HTTP requests without context -- `http.NewRequestWithContext` |
| 98 | +is the standard replacement. |
| 99 | + |
| 100 | +## 6. Filter registry boilerplate |
| 101 | + |
| 102 | +`filter/filter.go:unmarshalFilter` is a 70-line switch that repeats the same |
| 103 | +unmarshal-and-return pattern 11 times. A small registry map would collapse this: |
| 104 | + |
| 105 | +```go |
| 106 | +var registry = map[string]func() Filter{ |
| 107 | + "any": func() Filter { return &AnyFilter{} }, |
| 108 | + "doi": func() Filter { return &DOIFilter{} }, |
| 109 | + // ... |
| 110 | +} |
| 111 | + |
| 112 | +func unmarshalFilter(name string, raw json.RawMessage) (Filter, error) { |
| 113 | + ctor, ok := registry[name] |
| 114 | + if !ok { |
| 115 | + return nil, fmt.Errorf("unknown filter: %s", name) |
| 116 | + } |
| 117 | + f := ctor() |
| 118 | + if err := json.Unmarshal(raw, f); err != nil { |
| 119 | + return nil, err |
| 120 | + } |
| 121 | + return f, nil |
| 122 | +} |
| 123 | +``` |
| 124 | + |
| 125 | +## 7. `crossref/snapshot.go` complexity |
| 126 | + |
| 127 | +At 700+ lines with three processing stages, external tool invocations (`sort`, |
| 128 | +`zstd`, `filterline`), and manual concurrency, this is the most complex file in |
| 129 | +the project. |
| 130 | + |
| 131 | +Possible steps: |
| 132 | + |
| 133 | +* Split into `stage1.go`, `stage2.go`, `stage3.go` within the same package. |
| 134 | +* Replace the custom worker pool with `errgroup.Group` for structured |
| 135 | + concurrency and proper error propagation. |
| 136 | +* Consider `context.Context` for cancellation (snapshot creation can take hours). |
| 137 | + |
| 138 | +The shell pipeline in stage 2 is intentional (leveraging coreutils sort |
| 139 | +parallelism) and probably not worth replacing, but it could be behind an |
| 140 | +interface so it's testable. |
| 141 | + |
| 142 | +## 8. `parallel/processor.go` concurrency |
| 143 | + |
| 144 | +The processor silently races on `wErr` -- multiple goroutines write to the same |
| 145 | +error variable without synchronization. The comment says "we don't care about |
| 146 | +synchronisation" but this is technically a data race. Using `errgroup` or an |
| 147 | +`atomic.Pointer[error]` would fix it cleanly. |
| 148 | + |
| 149 | +Also: `BytesBatch.Reset()` sets `bb.b = nil` -- `bb.b = bb.b[:0]` retains the |
| 150 | +backing array and avoids re-allocation. |
| 151 | + |
| 152 | +## 9. Version management |
| 153 | + |
| 154 | +The version string lives in three places: `Makefile:2`, `common.go:31`, and |
| 155 | +`packaging/*/control|spec`. The Makefile has an `update-version` target that |
| 156 | +patches them with sed. |
| 157 | + |
| 158 | +An alternative: inject at build time via `-ldflags`: |
| 159 | + |
| 160 | +```makefile |
| 161 | +$(TARGETS): %: cmd/%/main.go |
| 162 | + go build -ldflags "-X github.com/miku/span.AppVersion=$(VERSION)" -o $@ $< |
| 163 | +``` |
| 164 | + |
| 165 | +This reduces the places to update to one (the Makefile) and makes `common.go` |
| 166 | +the single declaration site with a default value. |
| 167 | + |
| 168 | +## 10. Test modernization |
| 169 | + |
| 170 | +* No use of `t.Run()` for subtests -- adding it gives named sub-cases and |
| 171 | + allows `-run` filtering. |
| 172 | +* Table-driven tests use `t.Errorf` instead of `t.Fatalf` for setup failures -- |
| 173 | + a failed setup should not continue to assertions. |
| 174 | +* No benchmarks outside of `crossref/` -- `parallel/`, `encoding/`, and |
| 175 | + `filter/` are performance-sensitive and would benefit from them. |
| 176 | + |
| 177 | +## 11. Minor items |
| 178 | + |
| 179 | +| Item | Location | Notes | |
| 180 | +|------|----------|-------| |
| 181 | +| 64 TODO/FIXME/XXX comments | 34 files | Triage: close stale ones, convert actionable ones to issues | |
| 182 | +| `sort.Strings` → `slices.Sort` | 6+ call sites | Drop `sort` import where possible | |
| 183 | +| `io.ReadAll` available | several files still use `ioutil` patterns | Already migrated, but double-check | |
| 184 | +| `range N` (Go 1.22) | `for i := 0; i < p.NumWorkers; i++` | Minor, but nice in new code | |
| 185 | +| Unused `KeyLengthLimit` const | `common.go:34` | Comment says "obsolete"; remove | |
| 186 | +| `golint` in Makefile | `Makefile:53` | Deprecated; replace with `staticcheck` or `golangci-lint` | |
| 187 | +| logrus fork | `github.com/lytics/logrus` imported alongside `github.com/sirupsen/logrus` | Consolidate to one (or migrate both to slog) | |
0 commit comments