Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
fc0f758
#10217 - Adding Source name in harvesting client to improve metadata …
luddaniel Jan 30, 2025
062edf5
Merge branch 'develop' into 10217-source-name-harvesting-client
luddaniel Feb 5, 2025
ef4360d
renaming sql file with good version
luddaniel Feb 5, 2025
a424b4e
Added a sourceName test to HarvestingClientsIT
luddaniel Feb 5, 2025
5238eda
adding release note
luddaniel Feb 5, 2025
d36cca4
Improved documentation
luddaniel Feb 5, 2025
668ac77
Added a missing feature flag
luddaniel Feb 5, 2025
c489f8a
improve release note #10217
pdurbin Feb 27, 2025
8eb0d45
renamed: V6.5.0.4.sql -> V6.5.0.5.sql #10217
pdurbin Feb 27, 2025
46bfa20
Merge branch 'develop' into 10217-source-name-harvesting-client #10217
pdurbin Feb 27, 2025
9af422b
further clarify release note #10217
pdurbin Feb 28, 2025
3f9e9b2
the UI and API call it nickname, not name #10217
pdurbin Feb 28, 2025
02d8a0d
renamed: src/main/resources/db/migration/V6.5.0.5.sql -> src/main/…
pdurbin Feb 28, 2025
093ea91
Merge branch 'develop' into 10217-source-name-harvesting-client #10217
pdurbin Feb 28, 2025
957ac31
tweak source name docs #10217
pdurbin Feb 28, 2025
3ccb7ff
switch harvesting client CRUD to standard doc style #10217
pdurbin Feb 28, 2025
374a1cd
bump sql script #10217
pdurbin Mar 4, 2025
61654b6
Merge branch 'develop' into 10217-source-name-harvesting-client #10217
pdurbin Mar 4, 2025
e341fee
use Title Case to match other fields #10217
pdurbin Mar 4, 2025
7deb300
reword source name help text #10217
pdurbin Mar 4, 2025
f7c4c42
always show source name help text #10217
pdurbin Mar 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions doc/release-notes/11217-source-name-harvesting-client.md
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luddaniel can you please look again at this release not snippet? I just rewrote it again, based on my updated understanding of the history of this feature. Thanks!

Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
### Metadata Source Facet Can Now Differentiate Between Harvested Sources

The behavior of the feature flag `index-harvested-metadata-source` and the "Metadata Source" facet, which were added and updated, respectively, in [Dataverse 6.3](https://github.com/IQSS/dataverse/releases/tag/v6.3) (through pull requests #10464 and #10651), have been updated. A new field called "Source Name" has been added to harvesting clients.

Before Dataverse 6.3, all harvested content (datasets and files) appeared together under "Harvested" under the "Metadata Source" facet. This is still the behavior of Dataverse out of the box. Since Dataverse 6.3, enabling the `index-harvested-metadata-source` feature flag (and reindexing) resulted in harvested content appearing under the nickname for whatever harvesting client was used to bring in the content. This meant that instead of having all harvested content lumped together under "Harvested", content would appear under "client1", "client2", etc.

Now, as this release, enabling the `index-harvested-metadata-source` feature flag, populating a new field for harvesting clients called "Source Name" ("sourceName" in the [API](https://dataverse-guide--11217.org.readthedocs.build/en/11217/api/native-api.html#create-a-harvesting-client)), and reindexing (see upgrade instructions below), results in the source name appearing under the "Metadata Source" facet rather than the harvesting client nickname. This gives you more control over the name that appears under the "Metadata Source" facet and allows you to group harvested content from various harvesting clients under the same name if you wish (by reusing the same source name).

Previously, `index-harvested-metadata-source` was not documented in the guides, but now you can find information about it under [Feature Flags](https://dataverse-guide--11217.org.readthedocs.build/en/11217/installation/config.html#feature-flags). See also #10217 and #11217.

## Upgrade instructions

If you have enabled the `dataverse.feature.index-harvested-metadata-source` feature flag and given some of your harvesting clients a source name, you should reindex to have those source names appear under the "Metadata Source" facet.
3 changes: 3 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5697,6 +5697,8 @@ Shows a Harvesting Client with a defined nickname::
}


.. _create-a-harvesting-client:

Create a Harvesting Client
~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -5715,6 +5717,7 @@ You must supply a JSON file that describes the configuration, similarly to the o

The following optional fields are supported:

- sourceName: When :ref:`feature-flags` `index-harvested-metadata-source` is enabled, it can overload the nickname in the Metadata source facet. It can be used to map many harvesting client under the same name.
- archiveDescription: What the name suggests. If not supplied, will default to "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data."
- set: The OAI set on the remote server. If not supplied, will default to none, i.e., "harvest everything".
- style: Defaults to "default" - a generic OAI archive. (Make sure to use "dataverse" when configuring harvesting from another Dataverse installation).
Expand Down
3 changes: 3 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3493,6 +3493,9 @@ please find all known feature flags below. Any of these flags can be activated u
* - globus-use-experimental-async-framework
- Activates a new experimental implementation of Globus polling of ongoing remote data transfers that does not rely on the instance staying up continuously for the duration of the transfers and saves the state information about Globus upload requests in the database. Added in v6.4. Affects :ref:`:GlobusPollingInterval`. Note that the JVM option :ref:`dataverse.files.globus-monitoring-server` described above must also be enabled on one (and only one, in a multi-node installation) Dataverse instance.
- ``Off``
* - index-harvested-metadata-source
- Index the nickname or the source name (See `sourceName` optional field in :ref:`create-a-harvesting-client`) of the harvesting client as the "Metadata Source" of harvested datasets and files; if enabled, the Metadata Source facet will show separate entries for the content harvested from different sources, instead of the current, default behavior where there is one "Harvested" facet for all such content.
- ``Off``

**Note:** Feature flags can be set via any `supported MicroProfile Config API source`_, e.g. the environment variable
``DATAVERSE_FEATURE_XXX`` (e.g. ``DATAVERSE_FEATURE_API_SESSION_AUTH=1``). These environment variables can be set in your shell before starting Payara. If you are using :doc:`Docker for development </container/dev-usage>`, you can set them in the `docker compose <https://docs.docker.com/compose/environment-variables/set-environment-variables/>`_ file.
Expand Down
1 change: 1 addition & 0 deletions docker-compose-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ services:
SKIP_DEPLOY: "${SKIP_DEPLOY}"
DATAVERSE_JSF_REFRESH_PERIOD: "1"
DATAVERSE_FEATURE_API_BEARER_AUTH: "1"
DATAVERSE_FEATURE_INDEX_HARVESTED_METADATA_SOURCE: "1"
DATAVERSE_FEATURE_API_BEARER_AUTH_PROVIDE_MISSING_CLAIMS: "1"
DATAVERSE_MAIL_SYSTEM_EMAIL: "dataverse@localhost"
DATAVERSE_MAIL_MTA_HOST: "smtp"
Expand Down
140 changes: 60 additions & 80 deletions src/main/java/edu/harvard/iq/dataverse/HarvestingClientsPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ public class HarvestingClientsPage implements java.io.Serializable {
private Long dataverseId = null;
private HarvestingClient selectedClient;
private boolean setListTruncated = false;

//private static final String solrDocIdentifierDataset = "dataset_";

public enum PageMode {
Expand Down Expand Up @@ -242,6 +242,7 @@ public void editClient(HarvestingClient harvestingClient) {
setSelectedClient(harvestingClient);

this.newNickname = harvestingClient.getName();
this.sourceName = harvestingClient.getSourceName();
this.newHarvestingUrl = harvestingClient.getHarvestingUrl();
this.customHeader = harvestingClient.getCustomHttpHeaders();
this.initialSettingsValidated = false;
Expand Down Expand Up @@ -323,10 +324,9 @@ public void deleteClient() {
}

public void createClient(ActionEvent ae) {

HarvestingClient newHarvestingClient = new HarvestingClient(); // will be set as type OAI by default

newHarvestingClient.setName(newNickname);

// will be set as type OAI by default
HarvestingClient newHarvestingClient = fillHarvestingClient(new HarvestingClient());

if (getSelectedDestinationDataverse() == null) {
JsfHelper.JH.addMessage(FacesMessage.SEVERITY_ERROR,BundleUtil.getStringFromBundle("harvest.create.error"));
Expand All @@ -338,35 +338,6 @@ public void createClient(ActionEvent ae) {
}
getSelectedDestinationDataverse().getHarvestingClientConfigs().add(newHarvestingClient);

newHarvestingClient.setHarvestingUrl(newHarvestingUrl);
newHarvestingClient.setCustomHttpHeaders(customHeader);
if (!StringUtils.isEmpty(newOaiSet)) {
newHarvestingClient.setHarvestingSet(newOaiSet);
}
newHarvestingClient.setMetadataPrefix(newMetadataFormat);
newHarvestingClient.setHarvestStyle(newHarvestingStyle);

if (isNewHarvestingScheduled()) {
newHarvestingClient.setScheduled(true);

if (isNewHarvestingScheduledWeekly()) {
newHarvestingClient.setSchedulePeriod(HarvestingClient.SCHEDULE_PERIOD_WEEKLY);
if (getWeekDayNumber() == null) {
// create a "week day is required..." error message, etc.
// but we may be better off not even giving them an opportunity
// to leave the field blank - ?
}
newHarvestingClient.setScheduleDayOfWeek(getWeekDayNumber());
} else {
newHarvestingClient.setSchedulePeriod(HarvestingClient.SCHEDULE_PERIOD_DAILY);
}

if (getHourOfDay() == null) {
// see the comment above, about the day of week. same here.
}
newHarvestingClient.setScheduleHourOfDay(getHourOfDay());
}

// make default archive url (used to generate links pointing back to the
// archival sources, when harvested datasets are displayed in search results),
// from the harvesting url:
Expand Down Expand Up @@ -412,51 +383,9 @@ public void createClient(ActionEvent ae) {
// this saves an existing client that the user has edited:

public void saveClient(ActionEvent ae) {

HarvestingClient harvestingClient = getSelectedClient();

if (harvestingClient == null) {
// TODO:
// tell the user somehow that the client cannot be saved, and advise
// them to save the settings they have entered.
// as of now - we will show an error message, but only after the
// edit form has been closed.
}

// nickname is not editable for existing clients:
//harvestingClient.setName(newNickname);
harvestingClient.setHarvestingUrl(newHarvestingUrl);
harvestingClient.setCustomHttpHeaders(customHeader);
harvestingClient.setHarvestingSet(newOaiSet);
harvestingClient.setMetadataPrefix(newMetadataFormat);
harvestingClient.setHarvestStyle(newHarvestingStyle);

if (isNewHarvestingScheduled()) {
harvestingClient.setScheduled(true);

if (isNewHarvestingScheduledWeekly()) {
harvestingClient.setSchedulePeriod(HarvestingClient.SCHEDULE_PERIOD_WEEKLY);
if (getWeekDayNumber() == null) {
// create a "week day is required..." error message, etc.
// but we may be better off not even giving them an opportunity
// to leave the field blank - ?
}
harvestingClient.setScheduleDayOfWeek(getWeekDayNumber());
} else {
harvestingClient.setSchedulePeriod(HarvestingClient.SCHEDULE_PERIOD_DAILY);
}

if (getHourOfDay() == null) {
// see the comment above, about the day of week. same here.
}
harvestingClient.setScheduleHourOfDay(getHourOfDay());
} else {
harvestingClient.setScheduled(false);
}

// will try to save it now:

try {
HarvestingClient harvestingClient = fillHarvestingClient(getSelectedClient());

harvestingClient = engineService.submit( new UpdateHarvestingClientCommand(dvRequestService.getDataverseRequest(), harvestingClient));

configuredHarvestingClients = harvestingClientService.getAllHarvestingClients();
Expand All @@ -477,9 +406,50 @@ public void saveClient(ActionEvent ae) {
}
setPageMode(PageMode.VIEW);


}


/**
* Based on a new harvestingClient instance or an existing one, it will update basics fields with new UI fields values
* @param harvestingClient new or existing harvestingClient to update
* @return harvestingClient with updated values
*/
private HarvestingClient fillHarvestingClient(HarvestingClient harvestingClient) {
// update nickname if it's a new object otherwise is not editable for existing clients
if(harvestingClient.getId() == null) {
harvestingClient.setName(newNickname);
}
harvestingClient.setSourceName(sourceName);
harvestingClient.setHarvestingUrl(newHarvestingUrl);
harvestingClient.setCustomHttpHeaders(customHeader);
if (!StringUtils.isEmpty(newOaiSet)) {
harvestingClient.setHarvestingSet(newOaiSet);
}
harvestingClient.setMetadataPrefix(newMetadataFormat);
harvestingClient.setHarvestStyle(newHarvestingStyle);

harvestingClient.setScheduled(isNewHarvestingScheduled());
if (isNewHarvestingScheduled()) {
if (isNewHarvestingScheduledWeekly()) {
harvestingClient.setSchedulePeriod(HarvestingClient.SCHEDULE_PERIOD_WEEKLY);
if (getWeekDayNumber() == null) {
// create a "week day is required..." error message, etc.
// but we may be better off not even giving them an opportunity
// to leave the field blank - ?
}
harvestingClient.setScheduleDayOfWeek(getWeekDayNumber());
} else {
harvestingClient.setSchedulePeriod(HarvestingClient.SCHEDULE_PERIOD_DAILY);
}

if (getHourOfDay() == null) {
// see the comment above, about the day of week. same here.
}
harvestingClient.setScheduleHourOfDay(getHourOfDay());
}
return harvestingClient;
}

public void validateMetadataFormat(FacesContext context, UIComponent toValidate, Object rawValue) {
String value = (String) rawValue;
UIInput input = (UIInput) toValidate;
Expand Down Expand Up @@ -717,6 +687,7 @@ public void backToStepThree() {
UIInput selectedDataverseMenu;

private String newNickname = "";
private String sourceName = "";
private String newHarvestingUrl = "";
private String customHeader = null;
private boolean initialSettingsValidated = false;
Expand All @@ -741,6 +712,7 @@ public void backToStepThree() {
public void initNewClient(ActionEvent ae) {
//this.selectedClient = new HarvestingClient();
this.newNickname = "";
this.sourceName = "";
this.newHarvestingUrl = "";
this.customHeader = null;
this.initialSettingsValidated = false;
Expand Down Expand Up @@ -842,6 +814,14 @@ public int getHarvestingScheduleRadio() {
public void setHarvestingScheduleRadio(int harvestingScheduleRadio) {
this.harvestingScheduleRadio = harvestingScheduleRadio;
}

public String getSourceName() {
return sourceName;
}

public void setSourceName(String sourceName) {
this.sourceName = sourceName;
}

public boolean isNewHarvestingScheduled() {
return this.harvestingScheduleRadio != harvestingScheduleRadioNone;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,10 @@ public Response modifyHarvestingClient(@Context ContainerRequestContext crc, Str
// Go through the supported editable fields and update the client accordingly:
// TODO: We may want to reevaluate whether we really want/need *all*
// of these fields to be editable.


if (newHarvestingClient.getSourceName() != null) {
harvestingClient.setSourceName(newHarvestingClient.getSourceName());
}
if (newHarvestingClient.getHarvestingUrl() != null) {
harvestingClient.setHarvestingUrl(newHarvestingClient.getHarvestingUrl());
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,11 @@
import jakarta.persistence.NamedQueries;
import jakarta.persistence.NamedQuery;
import jakarta.persistence.OneToMany;
import jakarta.persistence.OneToOne;
import jakarta.persistence.OrderBy;
import jakarta.persistence.Table;
import jakarta.persistence.Temporal;
import jakarta.persistence.TemporalType;
import jakarta.validation.constraints.Pattern;
import jakarta.validation.constraints.Size;
import org.apache.commons.lang3.StringUtils;
import org.hibernate.validator.constraints.NotBlank;

/**
Expand Down Expand Up @@ -192,6 +190,20 @@ public void setHarvestingUrl(String harvestingUrl) {
this.harvestingUrl = harvestingUrl.trim();
}
}

private String sourceName;

public String getSourceName() {
return sourceName;
}

public void setSourceName(String sourceName) {
this.sourceName = sourceName;
}

public String getMetadataSource() {
return StringUtils.isNotBlank(this.sourceName) ? this.sourceName : this.name;
}

private String archiveUrl;

Expand Down Expand Up @@ -476,5 +488,4 @@ public boolean equals(Object object) {
public String toString() {
return "edu.harvard.iq.dataverse.harvest.client.HarvestingClient[ id=" + id + " ]";
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -1005,7 +1005,7 @@ public SolrInputDocuments toSolrDocs(IndexableDataset indexableDataset, Set<Long
// New - as of 6.3 - option of indexing the actual origin of
// harvested objects as the metadata source:
solrInputDocument.addField(SearchFields.METADATA_SOURCE,
dataset.getHarvestedFrom() != null ? dataset.getHarvestedFrom().getName() : HARVESTED);
dataset.getHarvestedFrom() != null ? dataset.getHarvestedFrom().getMetadataSource() : HARVESTED);
} else {
solrInputDocument.addField(SearchFields.METADATA_SOURCE, HARVESTED);
}
Expand Down Expand Up @@ -1577,7 +1577,7 @@ public SolrInputDocuments toSolrDocs(IndexableDataset indexableDataset, Set<Long
// New - as of 6.3 - option of indexing the actual origin of
// harvested objects as the metadata source:
datafileSolrInputDocument.addField(SearchFields.METADATA_SOURCE,
dataset.getHarvestedFrom() != null ? dataset.getHarvestedFrom().getName() : HARVESTED);
dataset.getHarvestedFrom() != null ? dataset.getHarvestedFrom().getMetadataSource() : HARVESTED);
} else {
datafileSolrInputDocument.addField(SearchFields.METADATA_SOURCE, HARVESTED);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1058,6 +1058,7 @@ public String parseHarvestingClient(JsonObject obj, HarvestingClient harvestingC
String dataverseAlias = obj.getString("dataverseAlias",null);

harvestingClient.setName(obj.getString("nickName",null));
harvestingClient.setSourceName(obj.getString("sourceName",null));
harvestingClient.setHarvestStyle(obj.getString("style", "default"));
harvestingClient.setHarvestingUrl(obj.getString("harvestUrl",null));
harvestingClient.setArchiveUrl(obj.getString("archiveUrl",null));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1051,6 +1051,7 @@ public static JsonObjectBuilder json(HarvestingClient harvestingClient) {
}

return jsonObjectBuilder().add("nickName", harvestingClient.getName()).
add("sourceName", harvestingClient.getSourceName()).
add("dataverseAlias", harvestingClient.getDataverse().getAlias()).
add("type", harvestingClient.getHarvestType()).
add("style", harvestingClient.getHarvestStyle()).
Expand Down
2 changes: 2 additions & 0 deletions src/main/java/propertyFiles/Bundle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -579,6 +579,8 @@ harvestclients.newClientDialog.nickname.helptext=Consists of letters, digits, un
harvestclients.newClientDialog.nickname.required=Client nickname cannot be empty!
harvestclients.newClientDialog.nickname.invalid=Client nickname can contain only letters, digits, underscores (_) and dashes (-); and must be at most 30 characters.
harvestclients.newClientDialog.nickname.alreadyused=This nickname is already used.
harvestclients.newClientDialog.sourcename=Source name
Comment thread
pdurbin marked this conversation as resolved.
Outdated
harvestclients.newClientDialog.sourcename.helptext=When feature flag index-harvested-metadata-source is enabled, it can overload the nickname in the Metadata source facet. It can be used to map many harvesting client under the same name.
Comment thread
pdurbin marked this conversation as resolved.
Outdated
harvestclients.newClientDialog.customHeader=Custom HTTP Header
harvestclients.newClientDialog.customHeader.helptext=(Optional) Custom HTTP header to add to requests, if required by this OAI server.
harvestclients.newClientDialog.customHeader.watermark=Enter an http header, as in header-name: header-value
Expand Down
2 changes: 2 additions & 0 deletions src/main/resources/db/migration/V6.5.0.6.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
-- Add this text will help to customized name in metadata source facet
ALTER TABLE harvestingclient ADD COLUMN IF NOT EXISTS sourcename TEXT;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are close to releasing 6.6 and will be at code freeze (no more PRs merged) in a week on March 7th. With lots of merging going on I've already bumped the version of of this SQL script twice.

Mostly I want to communicate that with little time left, this PR might need to get bumped to 6.7. We'll see.

Copy link
Copy Markdown
Contributor

@landreev landreev Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, V6.5.0.6 has already been merged; and it appears that there are multiple contenders for V6.5.0.7... So, some last minute renaming will be needed, depending on which one is ready to be merged first.

Loading