What can we help you with?
I’m new to the tiered storage function and am currently building/testing it. I have successfully built Aiven's plugins with my Kraft-mode Kafka cluster. On the producer side, everything works just fine—the messages flow to the enabled remote storage topic, and the files are then moved to Minio for storage. At this point, there are still about 100k messages left in the brokers.
I then tried to start a consumer, which in my case includes an Iceberg Sink Kafka Connector (https://github.com/databricks/iceberg-kafka-connect) and a Python application as the consumer.
My expectation was that when any consumer starts consuming a topic with the remote storage function enabled for the first time, all the logs stored in Minio would be loaded into the cache folder, and then the consumer would process both the messages currently on the local disk and those loaded into the cache folder.
However, in my experience, the consumer only consumes the messages in the local disk (the 100k). When I checked the broker's container, I observed that the cache folder was being loaded with the log files that have the same names as those in Minio, but the consumer does not consume them.
After checking the logs of both the brokers and the Kafka connector, I found no errors.
Where would you expect to find this information?
It would be greatly appreciated if there are tutorials or documentation on how to consume messages (via Kafka Connect, Python/PySpark, Apache Pinot, etc.) from a topic with remote storage enabled!
What can we help you with?
I’m new to the tiered storage function and am currently building/testing it. I have successfully built Aiven's plugins with my Kraft-mode Kafka cluster. On the producer side, everything works just fine—the messages flow to the enabled remote storage topic, and the files are then moved to Minio for storage. At this point, there are still about 100k messages left in the brokers.
I then tried to start a consumer, which in my case includes an Iceberg Sink Kafka Connector (https://github.com/databricks/iceberg-kafka-connect) and a Python application as the consumer.
My expectation was that when any consumer starts consuming a topic with the remote storage function enabled for the first time, all the logs stored in Minio would be loaded into the cache folder, and then the consumer would process both the messages currently on the local disk and those loaded into the cache folder.
However, in my experience, the consumer only consumes the messages in the local disk (the 100k). When I checked the broker's container, I observed that the cache folder was being loaded with the log files that have the same names as those in Minio, but the consumer does not consume them.
After checking the logs of both the brokers and the Kafka connector, I found no errors.
Where would you expect to find this information?
It would be greatly appreciated if there are tutorials or documentation on how to consume messages (via Kafka Connect, Python/PySpark, Apache Pinot, etc.) from a topic with remote storage enabled!