Skip to content

Commit 75066c5

Browse files
committed
Centralize CSV parsing (CsvUtil) + CORS origin echo & Vary header improvements
1 parent 618ee4f commit 75066c5

29 files changed

Lines changed: 349 additions & 163 deletions
Lines changed: 25 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,33 @@
1-
# 11744: CORS header handling fixes (echo single Origin, add Vary: Origin, multi-origin allow, sanitization)
1+
# 11744: CORS handling improvements
22

3-
This branch adjusts the CORS filter so browser clients work correctly when multiple origins are allowed.
3+
Modernizes CORS so browser integrations (previewers, external tools, JS clients) work correctly with multiple origins and proper caching.
44

5-
## What changed
6-
- Access-Control-Allow-Origin (ACAO) now echoes the single request `Origin` when it matches an allowlist from `dataverse.cors.origin`.
7-
- `Vary: Origin` is added when echoing a specific origin to keep caches correct across different origins.
8-
- Comma-separated origin lists are supported; surrounding quotes in CSV configs are stripped.
9-
- Sanitization is applied to CORS header lists (methods/allow/expose) to avoid quoted values that can break preflight checks.
10-
- Deprecated DB fallback for enabling CORS is removed; CORS is considered enabled only when `dataverse.cors.origin` is set as a JVM options/Microprofile setting.
5+
## Highlights
6+
* Echoes the request origin (`Access-Control-Allow-Origin`) when it matches `dataverse.cors.origin`.
7+
* Adds `Vary: Origin` for per-origin responses (not for wildcard).
8+
* Supports comma‑separated origin list; any `*` in the list = wildcard mode.
9+
* CORS now only enabled when `dataverse.cors.origin` is set (deprecated `:AllowCors` no longer enables it).
10+
* Sanitizes CORS CSV settings (`dataverse.cors.methods`, `dataverse.cors.headers.allow`, `dataverse.cors.headers.expose`).
11+
* Docs updated (Installation, Big Data Support, External Tools, File Previews); new tests cover edge cases.
1112

12-
## Upgrade / run notes (non-SQL)
13-
To keep CORS working after pulling this branch:
13+
## Admin Action
14+
Set `dataverse.cors.origin` explicitly (required). Use explicit origins (not `*`) for credentialed requests. Ensure proxies keep `Vary: Origin`.
1415

15-
1) Configure origins as JVM options/Microprofile settings (no quotes):
16-
- Single origin:
17-
- `dataverse.cors.origin=https://example.org`
18-
- Multiple origins (comma-separated):
19-
- `dataverse.cors.origin=https://libis.github.io,https://gdcc.github.io`
20-
- Wildcard:
21-
- `dataverse.cors.origin=*`
22-
- Note: Browsers reject `*` when credentialed requests are used (cookies/Authorization headers). Prefer explicit origins for those cases.
23-
24-
2) Optional headers/methods lists (unquoted, comma-separated CSV):
25-
- `dataverse.cors.methods`
26-
- `dataverse.cors.headers.allow`
27-
- `dataverse.cors.headers.expose`
28-
29-
Avoid surrounding values with quotes (e.g., do not use `"Accept, Content-Type"`). Quotes will be stripped but may cause confusion.
30-
31-
3) If you previously relied on the database setting to enable CORS (deprecated `AllowCors`), set `dataverse.cors.origin` instead. The DB fallback is no longer used.
32-
33-
4) Reverse proxies/caches: `Vary: Origin` is now emitted. Ensure your proxy does not drop this header.
34-
35-
## Verify
36-
Preflight (replace DV_URL with your base URL):
37-
38-
```bash
39-
curl -i -X OPTIONS \
40-
-H "Origin: https://libis.github.io" \
41-
-H "Access-Control-Request-Method: GET" \
42-
"${DV_URL}/api/info/version"
16+
Examples:
4317
```
44-
45-
Expected:
46-
- `Access-Control-Allow-Origin: https://libis.github.io`
47-
- `Vary: Origin` present
48-
49-
Actual request:
50-
51-
```bash
52-
curl -i \
53-
-H "Origin: https://libis.github.io" \
54-
"${DV_URL}/api/info/version"
18+
dataverse.cors.origin=https://example.org
19+
dataverse.cors.origin=https://libis.github.io,https://gdcc.github.io
20+
dataverse.cors.origin=*
5521
```
22+
Optional (unquoted):
23+
```
24+
dataverse.cors.methods=GET, POST, OPTIONS, PUT, DELETE
25+
```
26+
27+
## Compatibility
28+
* Must configure `dataverse.cors.origin`; `:AllowCors` no longer sufficient.
29+
* Any `*` triggers wildcard (no per-origin echo / no Vary header).
5630

57-
Expected:
58-
- Same ACAO echo as above
31+
## Docs
32+
See updated `dataverse.cors.origin` section and related notes in Big Data Support (S3), External Tools, and File Previews.
5933

60-
## Backward compatibility
61-
- Instances relying on the deprecated DB-based CORS enablement must set `dataverse.cors.origin` to keep CORS enabled.
62-
- Quoted CORS configuration values may behave differently; remove quotes going forward.

doc/sphinx-guides/source/api/external-tools.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ Introduction
1111

1212
External tools are additional applications the user can access or open from your Dataverse installation to preview, explore, and manipulate data files and datasets. The term "external" is used to indicate that the tool is not part of the main Dataverse Software.
1313

14+
.. note::
15+
Browser-based preview or explore tools that make XHR/fetch calls back to the Dataverse API must have CORS explicitly enabled on the Dataverse installation via :ref:`dataverse.cors.origin <dataverse.cors.origin>`. The legacy ``:AllowCors`` database setting is deprecated and no longer enables CORS by itself. Be sure the origins hosting your tool (or ``*`` when appropriate) are included in ``dataverse.cors.origin``; otherwise requests from your tool will be blocked by the browser even if the tool itself loads correctly.
16+
1417
Once you have created the external tool itself (which is most of the work!), you need to teach a Dataverse installation how to construct URLs that your tool needs to operate. For example, if you've deployed your tool to fabulousfiletool.com your tool might want the ID of a file and the siteUrl of the Dataverse installation like this: https://fabulousfiletool.com?fileId=42&siteUrl=https://demo.dataverse.org
1518

1619
In short, you will be creating a manifest in JSON format that describes not only how to construct URLs for your tool, but also what types of files your tool operates on, where it should appear in the Dataverse installation web interfaces, etc.

doc/sphinx-guides/source/developers/big-data-support.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,15 @@ Allow CORS for S3 Buckets
5757
**IMPORTANT:** One additional step that is required to enable direct uploads via a Dataverse installation and for direct download to work with previewers and direct upload to work with dvwebloader (:ref:`folder-upload`) is to allow cross site (CORS) requests on your S3 store.
5858
The example below shows how to enable CORS rules (to support upload and download) on a bucket using the AWS CLI command line tool. Note that you may want to limit the AllowedOrigins and/or AllowedHeaders further. https://github.com/gdcc/dataverse-previewers/wiki/Using-Previewers-with-download-redirects-from-S3 has some additional information about doing this.
5959

60+
Dataverse itself will only emit the necessary ``Access-Control-*`` headers to browsers when CORS has been explicitly enabled via the JVM/MicroProfile setting :ref:`dataverse.cors.origin <dataverse.cors.origin>`. The legacy database setting ``:AllowCors`` no longer turns CORS on. You must both:
61+
62+
* Configure an appropriate ``dataverse.cors.origin`` value (single origin, comma-separated list, or ``*``) on the Dataverse application server; and
63+
* Configure a matching/compatible CORS policy on each S3 bucket (and any CDN/proxy in front of it) that will be used for direct upload or for redirect (download-redirect) operations consumed by previewers.
64+
65+
If you specify multiple origins in ``dataverse.cors.origin`` Dataverse will echo back the requesting origin (when it matches) and will include ``Vary: Origin`` so that shared caches do not serve one origin's response to another. If you configure ``*`` Dataverse will respond with ``Access-Control-Allow-Origin: *`` (note that browsers will not allow credentialed requests with a wildcard).
66+
67+
Make sure the bucket CORS configuration ``AllowedOrigins`` is at least as permissive as the origins you configure in ``dataverse.cors.origin``. If the bucket allows ``*`` but the Dataverse application only allows a subset, the browser will still enforce the more restrictive application response.
68+
6069
If you'd like to check the CORS configuration on your bucket before making changes:
6170

6271
``aws s3api get-bucket-cors --bucket <BUCKET_NAME>``

doc/sphinx-guides/source/installation/config.rst

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3667,17 +3667,22 @@ The following settings control Cross-Origin Resource Sharing (CORS) for your Dat
36673667
dataverse.cors.origin
36683668
+++++++++++++++++++++
36693669

3670-
Allowed origins for CORS requests. The default with no value set is to not include CORS headers. However, if the deprecated :AllowCors setting is explicitly set to true the default is "\*" (all origins).
3671-
When the :AllowsCors setting is not used, you must set this setting to "\*" or a list of origins to enable CORS headers.
3670+
Allowed origins for CORS requests. If this setting is not defined, CORS headers are not added. Set to ``*`` to allow all origins (note that browsers will not allow credentialed requests with ``*``) or provide a comma-separated list of explicit origins.
36723671

3673-
Multiple origins can be specified as a comma-separated list.
3672+
Multiple origins can be specified as a comma-separated list (whitespace is ignored):
36743673

36753674
Example:
36763675

36773676
``./asadmin create-jvm-options '-Ddataverse.cors.origin=https://example.com,https://subdomain.example.com'``
36783677

36793678
Can also be set via any `supported MicroProfile Config API source`_, e.g. the environment variable ``DATAVERSE_CORS_ORIGIN``.
36803679

3680+
Behavior:
3681+
3682+
* When a list of origins is configured, Dataverse echoes the single matching request ``Origin`` value in ``Access-Control-Allow-Origin`` and adds ``Vary: Origin`` to support correct proxy/CDN caching.
3683+
* When ``*`` is configured, ``Access-Control-Allow-Origin: *`` is sent and ``Vary`` is not modified.
3684+
* The legacy database setting ``:AllowCors`` is deprecated and no longer enables CORS automatically; you must configure ``dataverse.cors.origin``.
3685+
36813686
.. _dataverse.cors.methods:
36823687

36833688
dataverse.cors.methods
@@ -4917,19 +4922,17 @@ This can be helpful in situations where multiple organizations are sharing one D
49174922
or
49184923
``curl -X PUT -d '*' http://localhost:8080/api/admin/settings/:InheritParentRoleAssignments``
49194924

4920-
:AllowCors (Deprecated)
4921-
+++++++++++++++++++++++
4925+
:AllowCors (Deprecated – no longer used once dataverse.cors.* settings exist)
4926+
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
49224927

49234928
.. note::
4924-
This setting is deprecated. Please use the JVM settings above instead.
4925-
This legacy setting will only be used if the newer JVM settings are not set.
4929+
This legacy database setting has been superseded by the ``dataverse.cors.*`` JVM/MicroProfile settings. In current versions CORS is only enabled when ``dataverse.cors.origin`` is explicitly set. Existing values of ``:AllowCors`` are ignored if ``dataverse.cors.origin`` is unset.
49264930

4927-
Enable or disable support for Cross-Origin Resource Sharing (CORS) by setting ``:AllowCors`` to ``true`` or ``false``.
4931+
Historical behavior (prior versions) allowed setting ``:AllowCors`` to ``true``/``false``. Administrators should migrate to the JVM/MicroProfile setting:
49284932

4929-
``curl -X PUT -d true http://localhost:8080/api/admin/settings/:AllowCors``
4933+
``./asadmin create-jvm-options '-Ddataverse.cors.origin=*'``
49304934

4931-
.. note::
4932-
New values for this setting will only be used after a server restart.
4935+
or a comma-separated list of allowed origins.
49334936

49344937
:ChronologicalDateFacets
49354938
++++++++++++++++++++++++

doc/sphinx-guides/source/user/dataset-management.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,9 @@ File Previews
175175

176176
Dataverse installations can add previewers for common file types uploaded by their research communities. The previews appear on the file page. If a preview tool for a specific file type is available, the preview will be created and will display automatically, after terms have been agreed to or a guestbook entry has been made, if necessary. File previews are not available for restricted files unless they are being accessed using a Preview URL. See also :ref:`previewUrl`. When the dataset license is not the default license, users will be prompted to accept the license/data use agreement before the preview is shown. See also :ref:`license-terms`.
177177

178+
.. note::
179+
Some previewers run purely in the browser and make direct (JavaScript) requests back to the Dataverse API endpoints to retrieve file contents, metadata, or signed URLs. For these previewers to function when hosted on a different origin (e.g., a CDN or a separate previewer service), the Dataverse installation must have CORS enabled via :ref:`dataverse.cors.origin <dataverse.cors.origin>`. Administrators should configure the list of allowed origins to include the host serving the previewers. The deprecated ``:AllowCors`` database setting no longer enables CORS.
180+
178181
Previewers are available for the following file types:
179182

180183
- Text

src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252
import org.apache.http.protocol.HttpContext;
5353
import org.apache.http.util.EntityUtils;
5454
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
55+
import edu.harvard.iq.dataverse.util.CsvUtil;
5556

5657
/**
5758
*
@@ -853,12 +854,12 @@ public String getFieldLanguage(String languages, String localeCode) {
853854
// If the fields list of supported languages contains the current locale (e.g.
854855
// the lang of the UI, or the current metadata input/display lang (tbd)), use
855856
// that. Otherwise, return the first in the list
856-
String[] langStrings = languages.split("\\s*,\\s*");
857-
if (langStrings.length > 0) {
858-
if (Arrays.asList(langStrings).contains(localeCode)) {
857+
final List<String> langStrings = CsvUtil.split(languages);
858+
if (!langStrings.isEmpty()) {
859+
if (langStrings.contains(localeCode)) {
859860
return localeCode;
860861
} else {
861-
return langStrings[0];
862+
return langStrings.get(0);
862863
}
863864
}
864865
return null;

src/main/java/edu/harvard/iq/dataverse/FileMetadata.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
import edu.harvard.iq.dataverse.datavariable.DataVariable;
4949
import edu.harvard.iq.dataverse.datavariable.VarGroup;
5050
import edu.harvard.iq.dataverse.datavariable.VariableMetadata;
51+
import edu.harvard.iq.dataverse.util.CsvUtil;
5152
import edu.harvard.iq.dataverse.util.DateUtil;
5253
import edu.harvard.iq.dataverse.util.StringUtil;
5354
import java.util.HashSet;
@@ -609,7 +610,7 @@ public int compare(FileMetadata o1, FileMetadata o2) {
609610
public static void setCategorySortOrder(String categories) {
610611
categoryMap=new HashMap<String, Long>();
611612
long i=1;
612-
for(String cat: categories.split(",\\s*")) {
613+
for(String cat: CsvUtil.split(categories)) {
613614
categoryMap.put(cat.toUpperCase(), i);
614615
i++;
615616
}

src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
1515
import edu.harvard.iq.dataverse.settings.SettingsServiceBean.Key;
1616
import edu.harvard.iq.dataverse.util.BundleUtil;
17-
import edu.harvard.iq.dataverse.util.MailUtil;
17+
import edu.harvard.iq.dataverse.util.CsvUtil;
1818
import edu.harvard.iq.dataverse.util.StringUtil;
1919
import edu.harvard.iq.dataverse.util.SystemConfig;
2020
import edu.harvard.iq.dataverse.UserNotification.Type;
@@ -396,7 +396,7 @@ public boolean isRsyncOnly() {
396396
if (uploadMethods==null){
397397
rsyncOnly = false;
398398
} else {
399-
rsyncOnly = Arrays.asList(uploadMethods.toLowerCase().split("\\s*,\\s*")).size() == 1 && uploadMethods.toLowerCase().equals(SystemConfig.FileUploadMethods.RSYNC.toString());
399+
rsyncOnly = CsvUtil.split(uploadMethods).size() == 1 && uploadMethods.toLowerCase().equals(SystemConfig.FileUploadMethods.RSYNC.toString());
400400
}
401401
}
402402
}
@@ -428,7 +428,7 @@ public Integer getUploadMethodsCount() {
428428
if (uploadMethods==null){
429429
uploadMethodsCount = 0;
430430
} else {
431-
uploadMethodsCount = Arrays.asList(uploadMethods.toLowerCase().split("\\s*,\\s*")).size();
431+
uploadMethodsCount = CsvUtil.split(uploadMethods).size();
432432
}
433433
}
434434
return uploadMethodsCount;
@@ -502,7 +502,8 @@ public boolean shouldBeAnonymized(DatasetField df) {
502502
if (anonymizedFieldTypes == null) {
503503
anonymizedFieldTypes = new ArrayList<String>();
504504
String names = get(SettingsServiceBean.Key.AnonymizedFieldTypeNames.toString(), "");
505-
anonymizedFieldTypes.addAll(Arrays.asList(names.split(",\\s")));
505+
// Use CsvUtil for consistent CSV parsing instead of raw regex split
506+
anonymizedFieldTypes.addAll(CsvUtil.split(names));
506507
}
507508
return anonymizedFieldTypes.contains(df.getDatasetFieldType().getName());
508509
}
@@ -830,7 +831,7 @@ private Boolean getUploadMethodAvailable(String method){
830831
if (uploadMethods==null){
831832
return false;
832833
} else {
833-
return Arrays.asList(uploadMethods.toLowerCase().split("\\s*,\\s*")).contains(method);
834+
return CsvUtil.splitToLowerCaseSet(uploadMethods).contains(method);
834835
}
835836
}
836837

src/main/java/edu/harvard/iq/dataverse/api/Admin.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,7 @@
111111
import edu.harvard.iq.dataverse.userdata.UserListResult;
112112
import edu.harvard.iq.dataverse.util.ArchiverUtil;
113113
import edu.harvard.iq.dataverse.util.BundleUtil;
114+
import edu.harvard.iq.dataverse.util.CsvUtil;
114115
import edu.harvard.iq.dataverse.util.FileUtil;
115116
import edu.harvard.iq.dataverse.util.SystemConfig;
116117
import edu.harvard.iq.dataverse.util.URLTokenUtil;
@@ -2167,7 +2168,7 @@ public Response addRoleAssignementsToChildren(@Context ContainerRequestContext c
21672168
boolean inheritAllRoles = false;
21682169
String rolesString = settingsSvc.getValueForKey(SettingsServiceBean.Key.InheritParentRoleAssignments, "");
21692170
if (rolesString.length() > 0) {
2170-
ArrayList<String> rolesToInherit = new ArrayList<String>(Arrays.asList(rolesString.split("\\s*,\\s*")));
2171+
ArrayList<String> rolesToInherit = new ArrayList<>(CsvUtil.split(rolesString));
21712172
if (!rolesToInherit.isEmpty()) {
21722173
if (rolesToInherit.contains("*")) {
21732174
inheritAllRoles = true;

src/main/java/edu/harvard/iq/dataverse/api/Datasets.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5222,7 +5222,8 @@ public Response getPrivateUrlDatasetVersion(@PathParam("privateUrlToken") String
52225222
}
52235223
JsonObjectBuilder responseJson;
52245224
if (isAnonymizedAccess) {
5225-
List<String> anonymizedFieldTypeNamesList = new ArrayList<>(Arrays.asList(anonymizedFieldTypeNames.split(",\\s")));
5225+
// Use CsvUtil for consistent CSV parsing
5226+
List<String> anonymizedFieldTypeNamesList = new ArrayList<>(CsvUtil.split(anonymizedFieldTypeNames));
52265227
responseJson = json(dsv, anonymizedFieldTypeNamesList, true, returnOwners);
52275228
} else {
52285229
responseJson = json(dsv, null, true, returnOwners);
@@ -5248,7 +5249,8 @@ public Response getPreviewUrlDatasetVersion(@PathParam("previewUrlToken") String
52485249
}
52495250
JsonObjectBuilder responseJson;
52505251
if (isAnonymizedAccess) {
5251-
List<String> anonymizedFieldTypeNamesList = new ArrayList<>(Arrays.asList(anonymizedFieldTypeNames.split(",\\s")));
5252+
// Use CsvUtil for consistent CSV parsing
5253+
List<String> anonymizedFieldTypeNamesList = new ArrayList<>(CsvUtil.split(anonymizedFieldTypeNames));
52525254
responseJson = json(dsv, anonymizedFieldTypeNamesList, true, returnOwners);
52535255
} else {
52545256
responseJson = json(dsv, null, true, returnOwners);

0 commit comments

Comments
 (0)