You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A setting has been added for configuring sleep intervals between OAI calls for specific harvesting clients. Making it possible to harvest uninterrupted from servers enforcing rate limit policies. See the configuration guide for details. Additionally, this release fixes a problem with harvesting from DataCite OAI-PMH where initial, long-running harvests were failing on sets with large numbers of records.
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/installation/config.rst
+15Lines changed: 15 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4672,6 +4672,21 @@ Examples:
4672
4672
4673
4673
``curl -X PUT -d '{"default":"0", "CSV":"268435456"}' http://localhost:8080/api/admin/settings/:TabularIngestSizeLimit``
4674
4674
4675
+
.. _:HarvestingClientCallRateLimit:
4676
+
4677
+
:HarvestingClientCallRateLimit
4678
+
++++++++++++++++++++++++++++++
4679
+
4680
+
This setting allows configuring sleep intervals between OAI calls for specific harvesting clients. Which makes it possible to harvest from servers that enforce rate limits.
4681
+
4682
+
The setting value is a serialized JSON object mapping client names to the specified intervals in fractional seconds. It is also possible to set a universal default interval for all harvesting clients on the instance (in a somewhat unlikely use case where this may be practically necessary).
4683
+
4684
+
In the following example, the harvester is instructed to sleep for 900 milliseconds between calls when running the client named ``harvarddv``, and to default to zero otherwise:
4685
+
4686
+
``curl -X PUT -d "{\"harvarddv\": 0.9, \"default\": 0}" "http://localhost:8080/api/admin/settings/:HarvestingClientCallRateLimit"``
4687
+
4688
+
Please note that the default in the example above is there for illustrative purposes and is otherwise redundant, since no sleep interval is the default behavior anyway.
Copy file name to clipboardExpand all lines: src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -765,7 +765,7 @@ Whether Harvesting (OAI) service is enabled
765
765
FileCategories,
766
766
CreateDataFilesMaxErrorsToDisplay,
767
767
768
-
ContactFeedbackMessageSizeLimit,
768
+
ContactFeedbackMessageSizeLimit,
769
769
//Experimental setting to allow connecting to a GET external search service expecting a GET request with query parameter mirroring the search API query parameters (without search_service)
770
770
GetExternalSearchUrl,
771
771
//Experimental setting to provide a display name for the GET external search service
@@ -779,6 +779,8 @@ Whether Harvesting (OAI) service is enabled
779
779
COARNotifyRelationshipAnnouncementTriggerFields,
780
780
// JSON specification of the targets to send announcements to
781
781
COARNotifyRelationshipAnnouncementTargets,
782
+
// Configurable delay between harvesting calls, when required to avoid triggering rate limits
0 commit comments