Skip to content

Commit 7924387

Browse files
committed
docs: add cleanup ideas
1 parent 6e9f4fb commit 7924387

File tree

1 file changed

+187
-0
lines changed

1 file changed

+187
-0
lines changed

docs/2026-02-16-cleanup-notes.md

Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
# Cleanup Notes
2+
3+
> Code and design cleanup opportunities for span. The codebase targets Go 1.24
4+
> but largely uses patterns from the Go 1.11-1.17 era. Most items below are
5+
> incremental and can be adopted file-by-file.
6+
7+
## 1. Error handling
8+
9+
Only 3 files in the entire codebase use `%w` for error wrapping; zero use of
10+
`errors.Is`, `errors.As`, or `errors.Join`.
11+
12+
**Wrap with `%w`** -- most `fmt.Errorf` calls pass errors via `%v`, which
13+
discards the error chain. Wrapping lets callers inspect causes when needed.
14+
15+
```go
16+
// before
17+
return fmt.Errorf("error in stage 1: %v", err)
18+
19+
// after
20+
return fmt.Errorf("stage 1: %w", err)
21+
```
22+
23+
**Use `errors.Join`** -- `crossref/snapshot.go` and other places with
24+
multi-stage cleanup return only the first error. `errors.Join` (Go 1.20)
25+
preserves all of them.
26+
27+
**Use `errors.As` for Skip** -- `span.Skip` is type-asserted in several `cmd/`
28+
files via `if _, ok := err.(span.Skip); ok`. The idiomatic replacement is
29+
`errors.As`, which works through wrapped errors.
30+
31+
## 2. Structured logging with `log/slog`
32+
33+
21 files import logrus, ~15 more use stdlib `log`. Some cmd files alias both
34+
under the same name. `log/slog` (Go 1.21) is now the standard approach and
35+
removes the external dependency.
36+
37+
```go
38+
// before
39+
log.Printf("[%s] fetching: %s", is.ID, link)
40+
41+
// after
42+
slog.Info("fetching", "id", is.ID, "link", link)
43+
```
44+
45+
Migrating one package at a time is fine; logrus and slog can coexist during
46+
transition.
47+
48+
## 3. Replace `interface{}` with generics or `any`
49+
50+
81 occurrences of `interface{}` across 23 files.
51+
52+
**Quick win** -- rename `interface{}` to `any` (Go 1.18 alias). No behavior
53+
change, just readability.
54+
55+
**`container.StringSet` -> `Set[T comparable]`** -- the current type is only
56+
usable with strings and reimplements iteration, intersection, and difference by
57+
hand. A generic set backed by `map[T]struct{}` covers all uses and removes
58+
duplicate set logic elsewhere.
59+
60+
**`FormatMap` in span-import** -- currently `map[string]func() interface{}`.
61+
Could become `map[string]func() IntermediateSchemaer` or use a small generic
62+
registry, removing the type assertions in `processXML`/`processJSON`.
63+
64+
## 4. Adopt `slices` and `maps` packages
65+
66+
Neither `slices` nor `maps` are imported anywhere. Several patterns would
67+
benefit:
68+
69+
```go
70+
// container/string.go -- before
71+
func (set *StringSet) SortedValues() (values []string) {
72+
for k := range set.Set {
73+
values = append(values, k)
74+
}
75+
sort.Strings(values)
76+
return values
77+
}
78+
79+
// after
80+
func (set *StringSet) SortedValues() []string {
81+
v := slices.Collect(maps.Keys(set.Set))
82+
slices.Sort(v)
83+
return v
84+
}
85+
```
86+
87+
Also applies to: `cmd/span-import` (sorting format names),
88+
`formats/finc/intermediate.go` (dedup helpers), `filter/filter.go` (key
89+
extraction).
90+
91+
## 5. `context.Context` for long-running operations
92+
93+
Neither `parallel.Processor` nor `crossref.CreateSnapshot` accept a context.
94+
Adding it enables cancellation, timeouts, and tracing without changing the
95+
public API shape much (context is typically the first parameter).
96+
97+
`folio/folio.go` makes HTTP requests without context -- `http.NewRequestWithContext`
98+
is the standard replacement.
99+
100+
## 6. Filter registry boilerplate
101+
102+
`filter/filter.go:unmarshalFilter` is a 70-line switch that repeats the same
103+
unmarshal-and-return pattern 11 times. A small registry map would collapse this:
104+
105+
```go
106+
var registry = map[string]func() Filter{
107+
"any": func() Filter { return &AnyFilter{} },
108+
"doi": func() Filter { return &DOIFilter{} },
109+
// ...
110+
}
111+
112+
func unmarshalFilter(name string, raw json.RawMessage) (Filter, error) {
113+
ctor, ok := registry[name]
114+
if !ok {
115+
return nil, fmt.Errorf("unknown filter: %s", name)
116+
}
117+
f := ctor()
118+
if err := json.Unmarshal(raw, f); err != nil {
119+
return nil, err
120+
}
121+
return f, nil
122+
}
123+
```
124+
125+
## 7. `crossref/snapshot.go` complexity
126+
127+
At 700+ lines with three processing stages, external tool invocations (`sort`,
128+
`zstd`, `filterline`), and manual concurrency, this is the most complex file in
129+
the project.
130+
131+
Possible steps:
132+
133+
* Split into `stage1.go`, `stage2.go`, `stage3.go` within the same package.
134+
* Replace the custom worker pool with `errgroup.Group` for structured
135+
concurrency and proper error propagation.
136+
* Consider `context.Context` for cancellation (snapshot creation can take hours).
137+
138+
The shell pipeline in stage 2 is intentional (leveraging coreutils sort
139+
parallelism) and probably not worth replacing, but it could be behind an
140+
interface so it's testable.
141+
142+
## 8. `parallel/processor.go` concurrency
143+
144+
The processor silently races on `wErr` -- multiple goroutines write to the same
145+
error variable without synchronization. The comment says "we don't care about
146+
synchronisation" but this is technically a data race. Using `errgroup` or an
147+
`atomic.Pointer[error]` would fix it cleanly.
148+
149+
Also: `BytesBatch.Reset()` sets `bb.b = nil` -- `bb.b = bb.b[:0]` retains the
150+
backing array and avoids re-allocation.
151+
152+
## 9. Version management
153+
154+
The version string lives in three places: `Makefile:2`, `common.go:31`, and
155+
`packaging/*/control|spec`. The Makefile has an `update-version` target that
156+
patches them with sed.
157+
158+
An alternative: inject at build time via `-ldflags`:
159+
160+
```makefile
161+
$(TARGETS): %: cmd/%/main.go
162+
go build -ldflags "-X github.com/miku/span.AppVersion=$(VERSION)" -o $@ $<
163+
```
164+
165+
This reduces the places to update to one (the Makefile) and makes `common.go`
166+
the single declaration site with a default value.
167+
168+
## 10. Test modernization
169+
170+
* No use of `t.Run()` for subtests -- adding it gives named sub-cases and
171+
allows `-run` filtering.
172+
* Table-driven tests use `t.Errorf` instead of `t.Fatalf` for setup failures --
173+
a failed setup should not continue to assertions.
174+
* No benchmarks outside of `crossref/` -- `parallel/`, `encoding/`, and
175+
`filter/` are performance-sensitive and would benefit from them.
176+
177+
## 11. Minor items
178+
179+
| Item | Location | Notes |
180+
|------|----------|-------|
181+
| 64 TODO/FIXME/XXX comments | 34 files | Triage: close stale ones, convert actionable ones to issues |
182+
| `sort.Strings``slices.Sort` | 6+ call sites | Drop `sort` import where possible |
183+
| `io.ReadAll` available | several files still use `ioutil` patterns | Already migrated, but double-check |
184+
| `range N` (Go 1.22) | `for i := 0; i < p.NumWorkers; i++` | Minor, but nice in new code |
185+
| Unused `KeyLengthLimit` const | `common.go:34` | Comment says "obsolete"; remove |
186+
| `golint` in Makefile | `Makefile:53` | Deprecated; replace with `staticcheck` or `golangci-lint` |
187+
| logrus fork | `github.com/lytics/logrus` imported alongside `github.com/sirupsen/logrus` | Consolidate to one (or migrate both to slog) |

0 commit comments

Comments
 (0)