Max Body Size

While crawling through sites, sometimes you hit something like a video file - or just an extremely large response. This usually wouldn't be a problem, however because gocrawl is a single goroutine per host - this can lock the entire thing up for a long time.

Suggestions is that rather than using ioutil.ReadAll() (https://github.com/PuerkitoBio/gocrawl/blob/master/worker.go#L331), supporting a configuration option to only read the first N bytes - and then just proceed.

The problem I foresee is that maybe this would require an API change, because in your Visit function you would have to receive a variable which told you whether the entire body was downloaded or not. The only way around this that I can think of that would not require an API change would be to pass everything forward to a VisitBodyNotCompleted function, however this isn't as ideal.

How open are you to breaking API changes, or can you think of a way around this?

What are you thoughts on supporting this?

Thanks,
Jake


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Max Body Size #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Max Body Size #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions