pubsub: Old messages fail to ack because "expired"

This is related to https://github.com/googleapis/google-cloud-go/issues/1247 but I'm filing a new issue since I have more data now and different questions. We've been running our "high throughput" queues with `Synchronous=false` but it's not clear what the disadvantage is of always setting it to true.

We don't usually see any failed acks (with true or false) but this time in particular we were catching up on a subscription that was >9 million and >2 hours behind. These "expired" are making it almost impossible to catch up since at some points over half of the messages we ack are failing and being retried.

I'm not sure if there's an issue where we're receiving already expired messages or if the client is holding onto messages too long (since we have `Synchronous=false`). Though I raised the Ack Deadline in the Google Console to 60 seconds and I didn't see any change (19:30 the change was made relative to the graphs below) so I'm inclined to think something is wrong here. I'm also not sure if we should just be using `Synchronous=true` instead?

## Client

PubSub (aef6eeb)

## Describe Your Environment

CentOS 7 on GCE (specifically in us-east1)
2 workers in each region with
```
MaxOutstandingMessages = 2000
Timeout = 15*time.Second
NumConsumers = 2000
NumGoroutines = 20 (10 x CPUs)
Synchronous = false
```

## Expected Behavior

Acks succeed and we're not losing half of our ack's to "expired" errors.

## Actual Behavior

We're seeing thousands of messages failing ack with the error "expired":
![image](https://user-images.githubusercontent.com/112555/60462853-6ccc7e80-9c18-11e9-8587-165a6772a404.png)

In https://github.com/googleapis/google-cloud-go/issues/1247#issuecomment-453314286 you mentioned that if `Synchronous=false` then the client fetches more than the `MaxOutstandingMessages`. During that same time, looking at the `pubsub_pull_count` OpenCensus metric compared to our internal count of acks shows that we're acking all of the pulled messages:
![image](https://user-images.githubusercontent.com/112555/60462640-d7c98580-9c17-11e9-961b-849892c6917b.png)

Does the `pubsub_pull_count` not include the count "extra" messages that are pulled? If not, how can we determine that and graph it to aide with debugging.

This subscription in particular uses more CPU if a job is duplicated so more duplications cause the CPU to spike and for jobs to take longer to ack. The 95th percentile of time it takes to ack is < 5 seconds so I would imagine that even if the client fetched 2x `MaxOutstandingMessages` we could still ack all of them before the deadline (even with no ModAcks).
![image](https://user-images.githubusercontent.com/112555/60463452-07798d00-9c1a-11e9-89b3-446d7d963ae0.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pubsub: Old messages fail to ack because "expired" #1485

Client

Describe Your Environment

Expected Behavior

Actual Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pubsub: Old messages fail to ack because "expired" #1485

Description

Client

Describe Your Environment

Expected Behavior

Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions