Skip to content

Generated compute_size() returns incorrect value on packed repeated fields #281

@mjadczak

Description

@mjadczak

I'm writing to a format which consists of length-delimited proto messages (TensorFlow's TFRecord format) and I was having trouble with getting my output read correctly. Since I was writing in a stream, I used compute_size to get the length and write it to the stream before I write the actual data. Unfortunately, I noticed that the length returned was smaller than the actual number of bytes written.

After some debugging, I tracked it down to the following line (in this particular case, it's a packed repeated field of 32-bit floats):

my_size += 1 + ::protobuf::rt::compute_raw_varint32_size(self.value.len() as u32) + (self.value.len() * 4) as u32;

Notice that my_size is the size of the entire message, in bytes. A length-delimited message consists of a tag (hence the 1), the length of the data contained in it (in bytes) and then the data itself (value.len() floats, each consisting of 4 bytes). The error here is that this function is calculating the size of the varint which would be used to encode the number of elements of the list, but actually it should calculate the size of the varint which encodes the number of bytes which the contents of the list take:

let list_bytes = (self.value.len() * 4) as u32;
my_size += 1 + ::protobuf::rt::compute_raw_varint32_size(list_bytes) + list_bytes;

I've fixed this by hand in my generated files, since I only have a couple of lists in the protos I'm using, so I don't know where the bug is in the actual codegen, but hopefully this will allow someone to fix it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions