Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,6 @@ Same as above but use a JVM option in domain.xml such as the example below.
### Differences from Kaggle

- I see an `encodingFormat` of `text/comma-separated-values`. Kind of curious about that since I think `text/csv` is more the MIME type that's on https://www.iana.org/assignments/media-types/media-types.xhtml and https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types . See https://github.com/IQSS/dataverse/issues/4943#issuecomment-2145333830
- One big difference I see is that you have many `recordSets` (and each one containing a single `field`) despite there being only 1 CSV. My understanding was that a `recordSet` maps roughly to a table and a `field` maps roughly to a column. So you'll see that our implementation has only 1 `recordSet` with many `field`s. This might be a good thing to get clarification on.
- Another thing that sticks out is that I see all of the `field`s have a `dataType` of `sc:Integer`. But nearly all of the columns (excluding `quality` and `Id`) are `sc:Float`. On the Kaggle side, we have a column type of "Id" and so if that's set on a column, we set the `dataType` to `sc:Text` since Ids can often be non-numerical. Just a minor difference there, though, so nothing alarming to me personally.

### Differences from pyDataverse
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,8 @@ public void exportDataset(ExportDataProvider dataProvider, OutputStream outputSt
int fileCounter = 0;
for (JsonValue jsonValue : datasetFileDetails) {

JsonObjectBuilder recordSetContent = Json.createObjectBuilder();
recordSetContent.add("@type", "cr:RecordSet");
JsonObject fileDetails = jsonValue.asJsonObject();
/**
* When there is an originalFileName, it means that the file has gone through ingest
Expand Down Expand Up @@ -306,9 +308,9 @@ public void exportDataset(ExportDataProvider dataProvider, OutputStream outputSt
"fileObject",
Json.createObjectBuilder()
.add("@id", fileId))));
fieldSetObject.add("field", fieldSetArray);
recordSet.add(fieldSetObject);
}
recordSetContent.add("field", fieldSetArray);
recordSet.add(recordSetContent);
fileIndex++;
}
fileCounter++;
Expand Down
77 changes: 11 additions & 66 deletions src/test/resources/cars/expected/cars-croissant.json
Original file line number Diff line number Diff line change
Expand Up @@ -126,12 +126,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "price",
Expand All @@ -143,12 +138,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "mpg",
Expand All @@ -160,12 +150,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "rep78",
Expand All @@ -177,12 +162,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "headroom",
Expand All @@ -194,12 +174,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "trunk",
Expand All @@ -211,12 +186,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "weight",
Expand All @@ -228,12 +198,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "length",
Expand All @@ -245,12 +210,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "turn",
Expand All @@ -262,12 +222,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "displacement",
Expand All @@ -279,12 +234,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "gear_ratio",
Expand All @@ -296,12 +246,7 @@
"@id": "data/stata13-auto.dta"
}
}
}
]
},
{
"@type": "cr:RecordSet",
"field": [
},
{
"@type": "cr:Field",
"name": "foreign",
Expand Down