Skip to content

Commit f8401b2

Browse files
authored
Merge branch 'main' into usability_create_readme
2 parents 7291a6f + 6efc353 commit f8401b2

52 files changed

Lines changed: 4551 additions & 2191 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,20 @@
88
En aquest repositori el [Consorci de Serveis Universitaris de Catalunya (CSUC)](https://www.csuc.cat/ca) publica scripts que les institucions i els usuaris del [Repositori de Dades de Recerca](https://dataverse.csuc.cat/) poden fer servir per realitzar tasques de forma automatitzada. Tots els scripts requereixen usar l'[API de Dataverse](https://guides.dataverse.org/en/latest/api/).
99

1010
## Descripció dels scripts
11-
12-
- **Crear fitxers README.txt**: Aquest script permet crear un fitxer README automaticament a partir de les metadades d'un dataset depositat al repositori Dataverse.
13-
- **Pujada automàtica de fitxers**: Aquest script permet pujar fitxers automàticament a un repositori Dataverse.
14-
- **Moure datasets entre instàncies**: Aquest script permet moure datasets entre instàncies d'un repositori Dataverse.
15-
- **Extreure metadades en un fitxer tabular**: Aquest script permet descarregar les metadades d'un conjunt de dades en format tabular.
16-
- **REVISAT**: Aquest script automatitza i facilita la revisió d'un dataset fent servir la majoria dels criteris del [REVISAT](https://confluence.csuc.cat/display/RDM/REVISAT).
17-
- **Descarregar datasets sencers**: Aquest script permet descarregar un conjunt de dades d'un repositori Dataverse.
11+
12+
- **REVISAT**: Automatitza la revisió d’un dataset segons els criteris del [REVISAT](https://confluence.csuc.cat/display/RDM/REVISAT).
13+
- **change_CSV_delimiter**: Canvia el delimitador dels fitxers CSV.
14+
- **create_Readme**: Genera automàticament un fitxer README a partir de les metadades del dataset.
15+
- **dataset_size_calculator**: Calcula la mida total d’un conjunt de dades.
16+
- **extract_metadata**: Extreu metadades de datasets i les desa en format tabular.
17+
- **metrics**: Obté mètriques d’ús o descàrregues dels datasets.
18+
- **move_dataset**: Permet moure datasets entre diferents instàncies de Dataverse.
19+
- **multiple_datasets_metadata**: Extreu metadades de múltiples datasets de manera massiva.
20+
- **persistent_link**: Comprova i mostra l’enllaç persistent correcte d’un dataset o fitxer.
21+
- **related_publication_check**: Comprova si un dataset té una publicació relacionada vinculada correctament.
22+
- **transform_excel**: Transforma fitxers Excel segons formats compatibles amb el repositori.
23+
- **upload_files**: Automatitza la pujada de fitxers a Dataverse.
24+
- **verification_readme**: Verifica si el fitxer README és a dins dels datasets d'una instància.
1825

1926
## Contacte
2027

README_ENG.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,20 @@ In this repository the [Consortium of University Services of Catalonia (CSUC)](h
88

99
## Description of the scripts
1010

11-
- **Create README.txt files**: This script creates a README.txt file automatically using the metadata in a dataset record.
12-
- **Automatic file upload**: This script uploads files automatically to a dataset record in a Dataverse repository.
13-
- **Moving datasets between Dataverses**: This script moves datasets between different Dataverses in a Dataverse repository.
14-
- **Extract metadata in a tabular file**: This script downloads the metadata of a dataset in tabular format.
15-
- **REVISAT**: This script automates and facilitates the review of a dataset using most of the [REVISAT criteria](https://confluence.csuc.cat/display/RDM/REVISAT).
16-
- **Download full datasets**: This script downloads a dataset from a Dataverse repository.
11+
- **REVISAT**: Automates the review of a dataset based on the [REVISAT](https://confluence.csuc.cat/display/RDM/REVISAT) checklist.
12+
- **change_CSV_delimiter**: Changes the delimiter of CSV files.
13+
- **create_Readme**: Automatically generates a README file based on the dataset metadata.
14+
- **dataset_size_calculator**: Calculates the total size of a dataset.
15+
- **extract_metadata**: Extracts metadata from datasets and saves it in tabular format.
16+
- **metrics**: Retrieves usage or download metrics for datasets.
17+
- **move_dataset**: Allows moving datasets between different Dataverse instances.
18+
- **multiple_datasets_metadata**: Extracts metadata from multiple datasets in bulk.
19+
- **persistent_link**: Checks and displays the correct persistent link of a dataset or file.
20+
- **related_publication_check**: Checks whether a dataset has a properly linked related publication.
21+
- **transform_excel**: Transforms Excel files into formats compatible with the repository.
22+
- **upload_files**: Automates the upload of files to Dataverse.
23+
- **verification_readme**: Verifies whether the README file is included in the datasets of an instance.
24+
1725

1826
## Contact
1927
If you have questions or comments about these scripts open an issue or send an e-mail to <aco@csuc.cat>.

REVISAT/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
[![ca](https://img.shields.io/badge/lang-ca-blue.svg)](https://github.com/CSUC/RDR-scripts/blob/main/REVISAT/README.md)
22
[![en](https://img.shields.io/badge/lang-en-green.svg)](https://github.com/CSUC/RDR-scripts/blob/main/REVISAT/README_ENG.md)
3+
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CSUC/RDR-scripts/blob/main/REVISAT/REVISAT_script.ipynb)
4+
35
# Script d'Avaluació de datasets (REVISAT)
46
Per a qualsevol consulta sobre el codi, poseu-vos en contacte amb rdr-contacte@csuc.cat
57

REVISAT/README_ENG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
[![ca](https://img.shields.io/badge/lang-ca-blue.svg)](https://github.com/CSUC/RDR-scripts/blob/main/REVISAT/README.md)
22
[![en](https://img.shields.io/badge/lang-en-green.svg)](https://github.com/CSUC/RDR-scripts/blob/main/REVISAT/README_ENG.md)
3+
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CSUC/RDR-scripts/blob/main/REVISAT/REVISAT_script.ipynb)
4+
35
# Dataset Evaluation Script (REVISAT / CURATED)
46
For any queries regarding the code, contact rdr-contacte@csuc.cat
57

REVISAT/REVISAT.py

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# =======================
2+
# CONFIGURATION PARAMETERS
3+
# =======================
4+
doi = "" # Full DOI, e.g., "doi:10.34810/data123456"
5+
token = "" # API token from https://dataverse.csuc.cat/dataverseuser.xhtml?selectTab=apiTokenTab
6+
driver = None # Use: webdriver.Chrome(), webdriver.Firefox(), or None
7+
opcions = [
8+
"Universitat Rovira i Virgili",
9+
"Universitat Pompeu Fabra",
10+
"Universitat Oberta de Catalunya",
11+
"Vall d’Hebron Institut de Recerca",
12+
"Centre for Research on Ecology and Forestry Applications",
13+
"Universitat Ramon Llull",
14+
"Consorci Institut D'Investigacions Biomèdiques August Pi i Sunyer",
15+
"Centre de Recerca en Agrigenòmica",
16+
"Institut Català de Nanociència i Nanotecnologia",
17+
"Institut de Recerca Sant Joan de Déu",
18+
"Universitat Autònoma de Barcelona",
19+
"Universitat Politècnica de Catalunya",
20+
"Consorci de Serveis Universitaris de Catalunya",
21+
"Institut de Física d'Altes Energies",
22+
"Universitat Internacional de Catalunya",
23+
"Centre de Recerca Matemàtica",
24+
"Institut d'Investigació Biomèdica de Bellvitge",
25+
"Universitat de Lleida",
26+
"Universitat de Girona",
27+
"i2CAT",
28+
"Institut de Recerca i Tecnologia Agroalimentàries",
29+
"Fundación Josep Carreras Contra la Leucemia",
30+
"Centre for Demographic Studies",
31+
"Centre Tecnològic Forestal de Catalunya",
32+
"Universitat de Vic - Universitat Central de Catalunya",
33+
"IrsiCaixa",
34+
"Institute for Bioengineering of Catalonia",
35+
"Biomedical Research Institute of Lleida",
36+
"Institut Barcelona d'Estudis Internacionals",
37+
"Barcelona University",
38+
"Catalan Institute for Water Research",
39+
"Institute of Research and Innovation Parc Taulí",
40+
"Institut Català de Paleoecologia Humana i Evolució Social",
41+
"Universitat de les Illes Balears",
42+
"Institute of Photonic Sciences",
43+
"Institute for Research in Biomedicine",
44+
"Agrotecnio - Centre for Food and Agriculture Research",
45+
"Institut d'Investigació Biomèdica de Girona",
46+
"Institut Català d'Arqueologia Clàssica",
47+
"Barcelona Institute for Global Health"
48+
]
49+
50+
51+
# =======================
52+
# IMPORTS & INSTALLATION
53+
# =======================
54+
import os
55+
import sys
56+
import subprocess
57+
from datetime import date
58+
from pyDataverse.api import NativeApi
59+
from selenium import webdriver
60+
from selenium.webdriver.common.by import By
61+
from collections import Counter
62+
from IPython.display import HTML, display
63+
64+
# Install necessary packages if running interactively (optional)
65+
def install_packages():
66+
subprocess.check_call([sys.executable, "-m", "pip", "install", "--upgrade", "pip"])
67+
subprocess.check_call([sys.executable, "-m", "pip", "install", "pyDataverse"])
68+
subprocess.check_call([sys.executable, "-m", "pip", "install", "selenium"])
69+
subprocess.check_call([sys.executable, "-m", "pip", "install", "--upgrade", "tensorflow-probability"])
70+
71+
# =======================
72+
# MAIN FUNCTION
73+
# =======================
74+
def Meta(doi, token, driver, opcions):
75+
today = date.today()
76+
print("Data:", today)
77+
78+
base_url = 'https://dataverse.csuc.cat/'
79+
api = NativeApi(base_url, token)
80+
Metadata = api.get_dataset(doi)
81+
82+
fields_metadata = Metadata.json()["data"]["latestVersion"]["metadataBlocks"]["citation"]["fields"]
83+
metadata_repositori = [field["typeName"] for field in fields_metadata]
84+
85+
Metadata_min_req = ['title', 'datasetContact', 'dsDescription', 'keyword', 'subject', 'kindOfData', 'author']
86+
intersect_metadata = list(set(metadata_repositori) & set(Metadata_min_req))
87+
same_metadata = len(list(set(Metadata_min_req) ^ set(intersect_metadata)))
88+
89+
print("\nConté les metadades mínimes obligatòries?")
90+
if same_metadata != 0:
91+
print("NO", list(set(Metadata_min_req) ^ set(intersect_metadata)))
92+
else:
93+
print("SÍ")
94+
95+
# Title
96+
index_title = metadata_repositori.index('title')
97+
titol = fields_metadata[index_title]["value"]
98+
print("\nTítol dataset:\n{}\n".format(titol))
99+
100+
titol_1 = titol.split(":")
101+
102+
# Related publication
103+
print("En el cas que el dataset tingui una publicació relacionada, inclou la citació?")
104+
if 'publication' in metadata_repositori:
105+
print("SÍ")
106+
index_publication = metadata_repositori.index('publication')
107+
Rel_pub = [pub["publicationCitation"]["value"] for pub in fields_metadata[index_publication]["value"]]
108+
for citation in Rel_pub:
109+
print(citation)
110+
111+
if "Replication Data for" in titol_1[0] and len(titol_1) > 1:
112+
only_title = titol[21:]
113+
print("\nEl títol inclou: Replication data for")
114+
for i in Rel_pub[0].split("."):
115+
if only_title.casefold() == i.strip().casefold():
116+
print("Els títols coincideixen")
117+
else:
118+
print("\nNo és rèplica de l'article")
119+
else:
120+
print("\nNo té publicacions relacionades")
121+
122+
# Author info
123+
index_author = metadata_repositori.index('author')
124+
author_id = []
125+
afiliacion = []
126+
institucion = []
127+
128+
for author in fields_metadata[index_author]["value"]:
129+
aff = author.get("authorAffiliation", {})
130+
aff_val = aff.get("expandedvalue", {}).get("termName") or aff.get("value")
131+
if aff_val:
132+
afiliacion.append(aff_val)
133+
134+
if "authorIdentifier" in author:
135+
author_id.append("SÍ")
136+
137+
for aff in afiliacion:
138+
matched = any(inst in aff for inst in opcions)
139+
institucion.append("SÍ" if matched else "NO")
140+
141+
print("\nAlmenys un/a dels/les autors/es pertany a la institució on es diposita:", "SÍ" if "SÍ" in institucion else "NO")
142+
print("Almenys un/a dels/les autors/es informa del seu ORCID?")
143+
print("ORCID: ", "SÍ" if "SÍ" in author_id else "NO")
144+
145+
# Description
146+
index_descripcion = metadata_repositori.index('dsDescription')
147+
descripcion = fields_metadata[index_descripcion]["value"][0]['dsDescriptionValue']["value"]
148+
print("\nDescripció:\n", descripcion)
149+
150+
# File formats
151+
print("\nFormat de fitxers")
152+
total_files = len(Metadata.json()['data']['latestVersion']['files'])
153+
files = [file['dataFile']['filename'] for file in Metadata.json()['data']['latestVersion']['files']]
154+
extensions = [os.path.splitext(f)[1] for f in files]
155+
print(Counter(extensions))
156+
157+
lowercase_files = [f.lower() for f in files]
158+
if "readme.txt" in lowercase_files:
159+
print("Sí que conté el fitxer readme.txt")
160+
161+
# License
162+
print("\nLlicència:")
163+
license_info = Metadata.json()["data"]['latestVersion'].get("license", {}).get("name") \
164+
or Metadata.json()["data"]['latestVersion'].get('termsOfUse')
165+
print(license_info)
166+
167+
# F-UJI
168+
if driver is None:
169+
print("\nAvalueu el dataset manualment a F-UJI: https://www.f-uji.net/")
170+
else:
171+
driver.get("https://www.f-uji.net/")
172+
driver.find_element(By.XPATH, '/html/body/div[1]/div[1]/div/p/a').click()
173+
driver.find_element(By.XPATH, '//*[@id="pid"]').send_keys(doi)
174+
driver.find_element(By.XPATH, '//*[@id="assessment_form"]/div/form/div[4]/button').click()
175+
176+
177+
# =======================
178+
# EXECUTE SCRIPT
179+
# =======================
180+
if __name__ == "__main__":
181+
Meta(doi, token, driver, opcions)

REVISAT/REVISAT_script.ipynb

Lines changed: 52 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2,50 +2,26 @@
22
"cells": [
33
{
44
"cell_type": "markdown",
5-
"id": "7852b47e-53f8-48ca-ae07-79f98e37dba2",
65
"metadata": {
7-
"id": "7852b47e-53f8-48ca-ae07-79f98e37dba2"
6+
"id": "view-in-github",
7+
"colab_type": "text"
88
},
99
"source": [
10-
"## REVISAT\n",
11-
"REVISAT is a script that allows reviewing a dataset, in draft, before being published, to ensure compliance with good open access practices. It is a first version, and as the repository software is updated and/or metadata is updated, the script will be changed accordingly.\n",
12-
"If you as a user have any doubts about the operation, proposal, or suggestion for improvement and want to incorporate it into the script, please write to us at: rdr-contacte@csuc.cat\n",
13-
"\n",
14-
"Last updated: 2023-11-14"
10+
"<a href=\"https://colab.research.google.com/github/CSUC/RDR-scripts/blob/main/REVISAT/REVISAT_script.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
1511
]
1612
},
1713
{
18-
"cell_type": "code",
19-
"execution_count": null,
20-
"id": "3fff23c3-7143-41d6-9123-a55fcfb4b596",
14+
"cell_type": "markdown",
15+
"id": "7852b47e-53f8-48ca-ae07-79f98e37dba2",
2116
"metadata": {
22-
"cellView": "form",
23-
"id": "3fff23c3-7143-41d6-9123-a55fcfb4b596"
17+
"id": "7852b47e-53f8-48ca-ae07-79f98e37dba2"
2418
},
25-
"outputs": [],
2619
"source": [
27-
"# @title Install or update libraries (Click execution button &#x25B6; )\n",
28-
"import ipywidgets as widgets\n",
29-
"from IPython.display import display, HTML, clear_output\n",
20+
"## REVISAT\n",
21+
"REVISAT is a script that allows reviewing a dataset, in draft, before being published, to ensure compliance with good open access practices. It is a first version, and as the repository software is updated and/or metadata is updated, the script will be changed accordingly.\n",
22+
"If you as a user have any doubts about the operation, proposal, or suggestion for improvement and want to incorporate it into the script, please write to us at: rdr-contacte@csuc.cat\n",
3023
"\n",
31-
"# Function to install required packages\n",
32-
"def install_packages(b):\n",
33-
" clear_output(wait=True)\n",
34-
" !pip install --upgrade pip -q\n",
35-
" !pip --upgrade tensorflow-probability -q\n",
36-
" !pip install pyDataverse -q\n",
37-
" !pip install selenium -q\n",
38-
" print(\"S'han descarregat o actualitzat les llibreries.\")\n",
39-
"\n",
40-
"# Displaying installation message\n",
41-
"display(HTML(\"<p style='font-size:14px;'><b>Feu clic al botó següent per instal·lar les llibreries.</b></p>\"))\n",
42-
"\n",
43-
"# Creating installation button\n",
44-
"install_button = widgets.Button(description='Instal·lar llibreries')\n",
45-
"install_button.on_click(install_packages)\n",
46-
"\n",
47-
"# Displaying the installation button\n",
48-
"display(install_button)"
24+
"Last updated: 2025-03-25"
4925
]
5026
},
5127
{
@@ -58,15 +34,51 @@
5834
},
5935
"outputs": [],
6036
"source": [
61-
"# @title Introduir DOI (doi:10.34810/dataXXX), el token i el nom complet de la institució. Clicar botó d'executar cel·la &#x25B6;\n",
37+
"# @title First enter the token (If you don't have your API token, you can get it from the following link <a href='https://dataverse.csuc.cat/dataverseuser.xhtml?selectTab=apiTokenTab' target='_blank'>Get API Token</a>).</p> After that, enter the LAST DIGITS of the DOI (for example, if the DOI ends in <strong>dataXYZ</strong>, only write the number <strong>XYZ</strong> ).</p> Finally, click the &#x25B6; button to execute the script.\n",
38+
"import os\n",
39+
"import subprocess\n",
40+
"import sys\n",
41+
"\n",
42+
"# Function to install required packages\n",
43+
"def install_packages():\n",
44+
" \"\"\"\n",
45+
" Function to install or update necessary Python packages.\n",
46+
" \"\"\"\n",
47+
" # Upgrade pip first\n",
48+
" subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"--upgrade\", \"pip\", \"-q\"])\n",
49+
"\n",
50+
" # Install the required libraries\n",
51+
" subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"pyDataverse\", \"-q\"])\n",
52+
" subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"selenium\", \"-q\"])\n",
53+
" subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"pyDataverse\", \"-q\"])\n",
54+
" subprocess.check_call([sys.executable, \"-m\", \"pip\", \"--upgrade\", \"tensorflow-probability\", \"-q\"])\n",
55+
"\n",
56+
"\n",
57+
" print(\"Libraries have been downloaded or updated.\")\n",
58+
"\n",
59+
"# Install libraries if they are not installed already\n",
60+
"try:\n",
61+
" import pyDataverse\n",
62+
"except ImportError:\n",
63+
" print(\"Installing libraries...\")\n",
64+
" install_packages()\n",
65+
"\n",
66+
"try:\n",
67+
" import google.colab\n",
68+
" IN_COLAB = True\n",
69+
"except ImportError:\n",
70+
" IN_COLAB = False\n",
71+
"\n",
72+
"from google.colab import output\n",
73+
"import ipywidgets as widgets\n",
74+
"from IPython.display import display, HTML, clear_output\n",
75+
"\n",
6276
"from datetime import date\n",
6377
"from pyDataverse.api import NativeApi, DataAccessApi\n",
6478
"from selenium import webdriver\n",
6579
"from selenium.webdriver.common.keys import Keys\n",
6680
"from selenium.webdriver.common.by import By\n",
67-
"import sys\n",
6881
"import numpy as np\n",
69-
"import os\n",
7082
"from collections import Counter\n",
7183
"import textwrap\n",
7284
"import pprint\n",
@@ -75,7 +87,7 @@
7587
"identifier = \"\" # @param {type:\"string\"}\n",
7688
"token = \"\" # @param {type:\"string\"}\n",
7789
"driver = None ## triar (webdriver.Chrome(), webdriver.Firefox() or None) per evaluar el daset a F-uji. Trieu None si useu l'script a Colab.\n",
78-
"doi = identifier\n",
90+
"doi = 'doi:10.34810/data'+identifier\n",
7991
"\n",
8092
"#Choose an institution\n",
8193
"institucions = [\n",
@@ -352,7 +364,8 @@
352364
],
353365
"metadata": {
354366
"colab": {
355-
"provenance": []
367+
"provenance": [],
368+
"include_colab_link": true
356369
},
357370
"kernelspec": {
358371
"display_name": "Python 3 (ipykernel)",

change_CSV_delimiter/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
[![ca](https://img.shields.io/badge/lang-ca-blue.svg)](https://github.com/CSUC/RDR-scripts/blob/main/change_CSV_delimiter/README.md)
22
[![en](https://img.shields.io/badge/lang-en-green.svg)](https://github.com/CSUC/RDR-scripts/blob/main/change_CSV_delimiter/README_ENG.md)
3+
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CSUC/RDR-scripts/blob/main/change_CSV_delimiter/csv_delimiter_converter.ipynb)
34

45
# Convertidor de Delimitador CSV
56

0 commit comments

Comments
 (0)