-
Notifications
You must be signed in to change notification settings - Fork 147
Error loading data with julia 1.7 #932
Copy link
Copy link
Closed
Description
There is a potential bug using julia Version 1.7.0-rc1 (2021-09-12). The following is an example.
import FileIO
import Downloads
url = "https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/HG7NV7/KM2QOA"
fpath = Downloads.download(url, "2003.csv.bz2")
using CSV, DataFrames, Mmap, CodecBzip2
df = DataFrame(CSV.File(transcode(Bzip2Decompressor, Mmap.mmap(fpath)))) gives the following error
ERROR: TaskFailedException
nested task error: TypeError: in typeassert, expected Dict{Union{Missing, String3}, UInt32}, got a value of type Dict{Union{Missing, String7}, UInt32}
Stacktrace:
[1] syncrefs!(#unused#::Type{String3}, col::CSV.Column, task_col::CSV.Column, task_rows::Int64)
@ CSV ~/.julia/packages/CSV/owrEo/src/file.jl:477
[2] multithreadpostparse(ctx::CSV.Context, ntasks::Int64, pertaskcolumns::Vector{Vector{CSV.Column}}, rows::Vector{Int64}, finalrows::Int64, j::Int64, col::CSV.Column)
@ CSV ~/.julia/packages/CSV/owrEo/src/file.jl:425
[3] (::CSV.var"#28#33"{CSV.Context, Vector{Int64}, Vector{Vector{CSV.Column}}, Int64, CSV.Column, Int64, Int64})()
@ CSV ./threadingconstructs.jl:178
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:369
[2] macro expansion
@ ./task.jl:388 [inlined]
[3] CSV.File(ctx::CSV.Context, chunking::Bool)
@ CSV ~/.julia/packages/CSV/owrEo/src/file.jl:259
[4] File
@ ~/.julia/packages/CSV/owrEo/src/file.jl:225 [inlined]
[5] #File#25
@ ~/.julia/packages/CSV/owrEo/src/file.jl:221 [inlined]
[6] CSV.File(source::Vector{UInt8})
@ CSV ~/.julia/packages/CSV/owrEo/src/file.jl:220
[7] top-level scope
@ REPL[8]:1
I tried loading the uncompressed .csv file directly. The following code
using CSV, DataFrames
df = CSV.read("2003.csv", DataFrame) gives the following error
ERROR: TaskFailedException
nested task error: TypeError: in typeassert, expected Dict{Union{Missing, String3}, UInt32}, got a value of type Dict{Union{Missing, String7}, UInt32}
Stacktrace:
[1] syncrefs!(#unused#::Type{String3}, col::CSV.Column, task_col::CSV.Column, task_rows::Int64)
@ CSV ~/.julia/packages/CSV/owrEo/src/file.jl:477
[2] multithreadpostparse(ctx::CSV.Context, ntasks::Int64, pertaskcolumns::Vector{Vector{CSV.Column}}, rows::Vector{Int64}, finalrows::Int64, j::Int64, col::CSV.Column)
@ CSV ~/.julia/packages/CSV/owrEo/src/file.jl:425
[3] (::CSV.var"#28#33"{CSV.Context, Vector{Int64}, Vector{Vector{CSV.Column}}, Int64, CSV.Column, Int64, Int64})()
@ CSV ./threadingconstructs.jl:178
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:369
[2] macro expansion
@ ./task.jl:388 [inlined]
[3] CSV.File(ctx::CSV.Context, chunking::Bool)
@ CSV ~/.julia/packages/CSV/owrEo/src/file.jl:259
[4] File
@ ~/.julia/packages/CSV/owrEo/src/file.jl:225 [inlined]
[5] #File#25
@ ~/.julia/packages/CSV/owrEo/src/file.jl:221 [inlined]
[6] File
@ ~/.julia/packages/CSV/owrEo/src/file.jl:220 [inlined]
[7] read(source::String, sink::Type; copycols::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CSV ~/.julia/packages/CSV/owrEo/src/CSV.jl:68
[8] read(source::String, sink::Type)
@ CSV ~/.julia/packages/CSV/owrEo/src/CSV.jl:65
[9] top-level scope
@ REPL[10]:1
I also tried on other data sets and DataFrames.jl works most of the time. I also tested on Julia Version 1.6.2 (2021-07-14), and everything works fine. Here are my packages installed.
(@v1.7) pkg> status
Status `~/.julia/environments/v1.7/Project.toml`
[69666777] Arrow v2.1.0
[6e4b80f9] BenchmarkTools v1.2.0
[336ed68f] CSV v0.9.6
[523fee87] CodecBzip2 v0.7.2
[a93c6f00] DataFrames v1.2.2
[31c24e10] Distributions v0.25.19
[5789e2e9] FileIO v1.11.1
[09f84164] HypothesisTests v0.10.4
[babc3d20] JDF v0.4.4
[429524aa] Optim v1.4.1
[b98c9c47] Pipe v1.3.0
[91a5bcdd] Plots v1.22.4
[c3e4b0f8] Pluto v0.16.1
[2913bbd2] StatsBase v0.33.11
[f3b207a7] StatsPlots v0.14.28
[f269a46b] TimeZones v1.5.7
[44d3d7a6] Weave v0.10.10Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels