I used this successfully to ingest a large batch of recent ehrQL jobs for investigation. Just recording a few rough edges I encountered so we don't lose track of them. There's no particular urgency about resolving these.
I was using rg to find relevant logs files and then trying to supply those to the command one at a time using xargs -n 1.
The first issue I hit was when trying to use the just jobrunner/cli ehrql_telemetry command from here. This worked when running --help, but when trying to use xargs I would get the error:
input device is not a TTY
The reason for this is probably obvious if I think carefully enough about it, but I didn't want to do that. I found that manually constructing the docker compose run command and using the -no-TTY argument made it work.
The second issue is that the command doesn't actually take paths to log files: it takes paths to log directories (or plain job IDs). I ended up using cut to remove the final part of the path, but it would be nice if it could just the right thing here.
There turned out to be some logs that the ehrQL tooling couldn't parse. These were more annoying to debug than they could have been because I would get a "non-zero subprocess exit" error from the tooling but nothing to tell me what the error was or what file triggered it. So a couple of helpful changes would be:
- Print the file path for each log file before processing (which would also give a rough idea of progress).
- Show stderr from the subprocess if it fails.
There was also one mysterious job whose metadata file triggered an error because container_metadata was empty so metadata["container_metadata"]["Config"]["Image"] threw a KeyError. Job Server shows this as "Cancelled by user" while the metadata file reports it as "cancelled by system" so possibly something racey happened here. Probably not worth worrying about but just noting it.
I guess the ultimate in usability would be to have a single command which just ingests the last 60 days worth of ehrQL jobs. I think we could probably knock together something which did this without massive amounts of work and it would make the process much easier for next time.
I used this successfully to ingest a large batch of recent ehrQL jobs for investigation. Just recording a few rough edges I encountered so we don't lose track of them. There's no particular urgency about resolving these.
I was using
rgto find relevant logs files and then trying to supply those to the command one at a time usingxargs -n 1.The first issue I hit was when trying to use the
just jobrunner/cli ehrql_telemetrycommand from here. This worked when running--help, but when trying to use xargs I would get the error:The reason for this is probably obvious if I think carefully enough about it, but I didn't want to do that. I found that manually constructing the
docker compose runcommand and using the-no-TTYargument made it work.The second issue is that the command doesn't actually take paths to log files: it takes paths to log directories (or plain job IDs). I ended up using
cutto remove the final part of the path, but it would be nice if it could just the right thing here.There turned out to be some logs that the ehrQL tooling couldn't parse. These were more annoying to debug than they could have been because I would get a "non-zero subprocess exit" error from the tooling but nothing to tell me what the error was or what file triggered it. So a couple of helpful changes would be:
There was also one mysterious job whose metadata file triggered an error because
container_metadatawas empty sometadata["container_metadata"]["Config"]["Image"]threw a KeyError. Job Server shows this as "Cancelled by user" while the metadata file reports it as "cancelled by system" so possibly something racey happened here. Probably not worth worrying about but just noting it.I guess the ultimate in usability would be to have a single command which just ingests the last 60 days worth of ehrQL jobs. I think we could probably knock together something which did this without massive amounts of work and it would make the process much easier for next time.