Skip to content

Commit 950029b

Browse files
authored
Merge branch 'IQSS:develop' into codemeta_structure
2 parents 948ba58 + 72d7d9c commit 950029b

13 files changed

Lines changed: 107 additions & 4 deletions

File tree

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
### video subtitles (vtt files)
2+
3+
The `IQSS/dataverse` PR sets the content type for new(!) files with extension `vtt` to `text/vtt`
4+
what is presented as "_Web Video Text Tracks_". The PR also enables full text indexing for these files,
5+
if [configured](https://guides.dataverse.org/en/latest/installation/config.html#solrfulltextindexing).
6+
7+
The `gdcc/dataverse-previewer` PRs provide a new version of the video previewer.
8+
The new previewer version presents `vtt` files as subtitles for videos,
9+
the naming convention is `<video-basename>.<language-tag>.vtt`.
10+
The previewer does not rely on the content type.
11+
A proper content type may hint users to ask permission for the subtitles together with a video.
12+
13+
Existing files with extension `vtt` will keep content type `application/octet-stream` presented as "_Unknown_".
14+
The following query shows the number of files per extension with an "_Unknown_" content type:
15+
16+
SELECT substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2)) AS extension, COUNT(*) as count
17+
FROM datafile f LEFT JOIN filemetadata m ON f.id = m.datafile_id
18+
WHERE f.contenttype = 'application/octet-stream'
19+
GROUP BY extension;
20+
21+
If `vtt` does not appear in the result, you are done.
22+
Otherwise, you may want to update the content type for existing files and reindex those datasets.
23+
24+
First figure out which datasets would need [reindexing](https://guides.dataverse.org/en/latest/admin/solr-search-index.html#manual-reindexing):
25+
26+
select distinct
27+
o.protocol, o.authority, o.identifier,
28+
v.versionnumber, v.minorversionnumber, v.versionstate
29+
from datafile f
30+
left join filemetadata m on f.id = m.datafile_id
31+
left join datasetversion v on v.id = m.datasetversion_id
32+
left join dvobject o on o.id = v.dataset_id
33+
WHERE contenttype = 'application/octet-stream'
34+
AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2))
35+
;
36+
37+
Then update the content type for the files:
38+
39+
UPDATE datafile SET contenttype = 'text/vtt' WHERE id IN (
40+
SELECT datafile_id FROM filemetadata m
41+
WHERE contenttype = 'application/octet-stream'
42+
AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2))
43+
);
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
The "string" type has been added as a new field type for metadata fields.
2+
3+
In contrast to "text" fields, "string" fields are stored and indexed exactly as provided, without any text analysis or transformations.
4+
5+
This field type is suitable for fields like IDs (e.g. ORCIDs) or enums, where exact matches are required when searching.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
### Tabular Tags can now be replaced
2+
3+
Previously the API POST /files/{id}/metadata/tabularTags could only add new tags to the tabular tags list. Now with the query parameter ?replace=true the list of tags will be replaced.
4+
5+
See also [the guides](https://dataverse-guide--11359.org.readthedocs.build/en/11359/api/native-api.html#updating-file-tabular-tags), #11292, and #11359.

doc/sphinx-guides/source/admin/metadatacustomization.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,7 @@ Each of the three main sections own sets of properties:
144144
| | | | \• email |
145145
| | | | \• text |
146146
| | | | \• textbox |
147+
| | | | \• string |
147148
| | | | \• url |
148149
| | | | \• int |
149150
| | | | \• float |
@@ -315,6 +316,12 @@ FieldType definitions
315316
| | section of the Dataset + File |
316317
| | Management page in the User Guide. |
317318
+---------------+------------------------------------+
319+
| string | Any text may be entered into this |
320+
| | field. The value is stored and |
321+
| | indexed exactly as provided, |
322+
| | without any text analysis or |
323+
| | transformations. |
324+
+---------------+------------------------------------+
318325
| url | If not empty, field must contain |
319326
| | a valid URL. |
320327
+---------------+------------------------------------+

doc/sphinx-guides/source/api/native-api.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4669,6 +4669,8 @@ Updating File Tabular Tags
46694669
46704670
Updates the tabular tags for an existing tabular file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the tabular tag names.
46714671
4672+
The list of "tabularTags" will be added to the existing list unless the optional ``replace=true`` query parameter is included. The inclusion of this parameter will cause the pre-existing tags to be deleted and the "tabularTags" to be added. Sending an empty list will remove all of the pre-existing tags.
4673+
46724674
The JSON representation of tabular tags (``tags.json``) looks like this::
46734675
46744676
{
@@ -4698,6 +4700,9 @@ The fully expanded example above (without environment variables) looks like this
46984700
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
46994701
"http://demo.dataverse.org/api/files/24/metadata/tabularTags" \
47004702
-H "Content-type:application/json" --upload-file tags.json
4703+
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
4704+
"http://demo.dataverse.org/api/files/24/metadata/tabularTags?replace=true" \
4705+
-H "Content-type:application/json" --upload-file tags.json
47014706
47024707
A curl example using a ``PERSISTENT_ID``
47034708
@@ -4711,6 +4716,9 @@ A curl example using a ``PERSISTENT_ID``
47114716
curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
47124717
"$SERVER_URL/api/files/:persistentId/metadata/tabularTags?persistentId=$PERSISTENT_ID" \
47134718
-H "Content-type:application/json" --upload-file $FILE_PATH
4719+
curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
4720+
"$SERVER_URL/api/files/:persistentId/metadata/tabularTags?persistentId=$PERSISTENT_ID&replace=true" \
4721+
-H "Content-type:application/json" --upload-file $FILE_PATH
47144722
47154723
The fully expanded example above (without environment variables) looks like this:
47164724
@@ -4719,6 +4727,9 @@ The fully expanded example above (without environment variables) looks like this
47194727
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
47204728
"https://demo.dataverse.org/api/files/:persistentId/metadata/tabularTags?persistentId=doi:10.5072/FK2/AAA000" \
47214729
-H "Content-type:application/json" --upload-file tags.json
4730+
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
4731+
"https://demo.dataverse.org/api/files/:persistentId/metadata/tabularTags?persistentId=doi:10.5072/FK2/AAA000&replace=true" \
4732+
-H "Content-type:application/json" --upload-file tags.json
47224733
47234734
Note that the specified tabular tags must be valid. The supported tags are:
47244735

src/main/java/edu/harvard/iq/dataverse/DatasetFieldType.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ public class DatasetFieldType implements Serializable, Comparable<DatasetFieldTy
3636
* The set of possible metatypes of the field. Used for validation and layout.
3737
*/
3838
public enum FieldType {
39-
TEXT, TEXTBOX, DATE, EMAIL, URL, FLOAT, INT, NONE
40-
};
39+
TEXT, TEXTBOX, STRING, DATE, EMAIL, URL, FLOAT, INT, NONE
40+
};
4141

4242
@Id
4343
@GeneratedValue(strategy = GenerationType.IDENTITY)
@@ -558,6 +558,8 @@ public SolrField getSolrField() {
558558
solrType = SolrField.SolrType.INTEGER;
559559
} else if (fieldType.equals(FieldType.FLOAT)) {
560560
solrType = SolrField.SolrType.FLOAT;
561+
} else if (fieldType.equals(FieldType.STRING)) {
562+
solrType = SolrField.SolrType.STRING;
561563
}
562564

563565
Boolean anyParentAllowsMultiplesBoolean = false;

src/main/java/edu/harvard/iq/dataverse/api/Files.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
package edu.harvard.iq.dataverse.api;
22

3+
import com.google.api.client.util.Lists;
34
import com.google.gson.Gson;
45
import com.google.gson.JsonObject;
56
import edu.harvard.iq.dataverse.*;
@@ -947,7 +948,7 @@ public Response setFileCategories(@Context ContainerRequestContext crc, @PathPar
947948
@AuthRequired
948949
@Path("{id}/metadata/tabularTags")
949950
@Produces(MediaType.APPLICATION_JSON)
950-
public Response setFileTabularTags(@Context ContainerRequestContext crc, @PathParam("id") String dataFileId, String jsonBody) {
951+
public Response setFileTabularTags(@Context ContainerRequestContext crc, @PathParam("id") String dataFileId, String jsonBody, @QueryParam("replace") boolean replaceData) {
951952
return response(req -> {
952953
DataFile dataFile = execCommand(new GetDataFileCommand(req, findDataFileOrDie(dataFileId)));
953954
if (!dataFile.isTabularData()) {
@@ -957,6 +958,9 @@ public Response setFileTabularTags(@Context ContainerRequestContext crc, @PathPa
957958
try (StringReader stringReader = new StringReader(jsonBody)) {
958959
jsonObject = Json.createReader(stringReader).readObject();
959960
JsonArray requestedTabularTagsJson = jsonObject.getJsonArray("tabularTags");
961+
if (replaceData) {
962+
dataFile.setTags(Lists.newArrayList());
963+
}
960964
for (JsonValue jsonValue : requestedTabularTagsJson) {
961965
JsonString jsonString = (JsonString) jsonValue;
962966
try {

src/main/java/propertyFiles/MimeTypeDetectionByFileExtension.properties

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ mat=application/matlab-mat
1616
md=text/markdown
1717
mp3=audio/mp3
1818
m4a=audio/mp4
19+
vtt=text/vtt
1920
nii=image/nii
2021
nc=application/netcdf
2122
ods=application/vnd.oasis.opendocument.spreadsheet

src/main/java/propertyFiles/MimeTypeDisplay.properties

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,7 @@ video/x-m4v=MPEG-4 Video
217217
video/ogg=OGG Video
218218
video/quicktime=Quicktime Video
219219
video/webm=WebM Video
220+
text/vtt=Web Video Text Tracks
220221
# Network Data
221222
text/xml-graphml=GraphML Network Data
222223
# 3D Data

src/main/java/propertyFiles/MimeTypeFacets.properties

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ text/richtext=Text
3030
text/turtle=Text
3131
application/xml=Text
3232
text/xml=Text
33+
text/vtt=Text
3334
# Code
3435
text/x-c=Code
3536
text/x-c++src=Code

0 commit comments

Comments
 (0)