replace ASCIIString & UTF8String with String#16058
Conversation
74bb1af to
d80ed00
Compare
|
145 files changed == smaller change, eh? |
| convert(::Type{ASCIIString}, a::Array{UInt8,1}, invalids_as::AbstractString) = | ||
| convert(ASCIIString, a, ascii(invalids_as)) | ||
| convert(::Type{ASCIIString}, s::AbstractString) = ascii(bytestring(s)) | ||
| ascii(x) = ascii(convert(String, x)) |
There was a problem hiding this comment.
So are we going to keep the ascii methods here around instead of deprecating? I guess there's still a use-case for when you want to ensure your string is only ASCII characters?
There was a problem hiding this comment.
For the first stage, I figured it would be less disruptive to keep it. We should probably get rid of ascii and utf8 later.
There was a problem hiding this comment.
Why not deprecate them at the same time as the type renaming? It's easier to fix everything at the same time for users.
Also, EDIT: I misread the code.ascii should definitely continue checking that the input is ASCII, for both safety and reliability.
There was a problem hiding this comment.
When I say "at the same time", I mean that it should be done soon, but not necessarily in this PR, which is already big enough.
There was a problem hiding this comment.
shouldn't this nevermind it does so, got confusedascii(x) method verify its input is ascii?
|
The downside of losing an |
|
Yeah, this change is a beast but there's even more to come... |
| gc_enable(on::Bool) = ccall(:jl_gc_enable, Cint, (Cint,), on)!=0 | ||
|
|
||
| bytestring(str::ByteString) = str | ||
| bytestring(str::String) = str |
There was a problem hiding this comment.
Similar to my question above about ascii; is there a reason to keep bytestring around as opposed to deprecating? I also understand that we want to keep things minimal, but it might be nice to just mark the places we want to follow up on later.
There was a problem hiding this comment.
Thanks for taking the time to review, @quinnj. Yes, we should deprecate all of ascii, utf8, bytestring, maybe string and replace them all with String, which will take any number of values, convert each to a string and produce a single String object for the concatenation of all of those.
There was a problem hiding this comment.
I started down that road, but that's a very big sweater once you start pulling the strings, and this change plus that one were just too much when combined.
There was a problem hiding this comment.
Yep, understood. Definitely excited for this and I'll try to take some more time to review/test things out to help out.
|
@iamed2: that can be provided by an ASCII string type in an external package. |
| catch ex | ||
| if isa(ex, TypeError) && (ex.func == :store_cell) | ||
| T = ex.expected | ||
| elseif get(optsd, :ignore_invalid_chars, false) |
There was a problem hiding this comment.
does this option need to be removed from docs and elsewhere in the code?
There was a problem hiding this comment.
I think I'll just reinstate it instead.
There was a problem hiding this comment.
Actually, this whole option doesn't really make sense, which is why I deleted it. The transformation is now a no-op.
There was a problem hiding this comment.
Also, I hate this file. It should not be in Base, imo.
There was a problem hiding this comment.
OT: Judging by the frequent complaints about performance we get, I also think these functions should be deprecated in favor of CSV.jl as soon as it's ready. CSV.jl+DataStreams.jl will also unify reading data into different structures like Array, DataFrame, TimeSeries, data bases, etc.
base/boot.jl
Outdated
| immutable String <: AbstractString | ||
| data::Array{UInt8,1} | ||
| ASCIIString(d::Array{UInt8,1}) = new(d) | ||
| String(d::Array{UInt8,1}) = new(d) |
There was a problem hiding this comment.
Isn't this inner constructor useless?
There was a problem hiding this comment.
Yes, I think it is. Of course, the old code had it too.
There was a problem hiding this comment.
Why not remove it then, we're likely to forget about it after merging.
There was a problem hiding this comment.
Because I'm trying to get this PR merged before someone introduces more conflicts and/or breaks the tests. You have no idea how many times I've rebased this and how often it breaks when someone changes something seemingly unrelated.
There was a problem hiding this comment.
That was indeed my concern too (hence the quick review). Go ahead then.
There was a problem hiding this comment.
Since this is no longer fresh and I'm going to have to rebase and rerun CI anyway, I'll include this change (and a few others) when I do.
dac2e0c to
d74cbe7
Compare
|
I'm inclined to merge this but it's going to wreak havoc on packages. Thoughts? |
|
I guess I should add |
|
|
||
| """ | ||
| readdlm(source, delim::Char, T::Type, eol::Char; header=false, skipstart=0, skipblanks=true, use_mmap, ignore_invalid_chars=false, quotes=true, dims, comments=true, comment_char='#') | ||
| readdlm(source, delim::Char, T::Type, eol::Char; header=false, skipstart=0, skipblanks=true, use_mmap, quotes=true, dims, comments=true, comment_char='#') |
There was a problem hiding this comment.
there's another mention of ignore_invalid_chars below
d74cbe7 to
8509862
Compare
|
Nice! |
|
Lineedit change is due to Before: After: I believe the following is the correct patch: |
|
Ah, I see. That's pretty subtle. Thanks for figuring it out. |
|
Since |
|
This PR seems to have dropped code coverage by nearly 10% (https://codecov.io/gh/JuliaLang/julia/commit/5de52cf9c9343cfcf50be4c7c736290d3f985961/changes). |
|
Is this expected? julia> String("1")
ERROR: MethodError: no method matching convert(::Type{Array{UInt8,1}}, ::String)
you may have intended to import Base.convert
Closest candidates are:
convert(::Type{Any}, ::ANY)
convert{T}(::Type{T}, ::T)
in String(::String) at ./boot.jl:222
in eval(::Module, ::Any) at ./boot.jl:228 |
|
Yeah, turns out there was a reason to leave this "useless" inner constructor in and it's the problem @yuyichao just found. I think I discovered that months ago and had left it in for a reason. That's the problem with really long-running PRs. Fixed here: #16212. |
|
Hard to believe that wasn't tested. |
|
@tkelman: Yeah, I made a PR but I'm also adding tests to it for that. @vtjnash: I'm not sure what's up with that coverage drop. Some of it is legit in that we relied on ACSIIString and UTF8String to exercise different code paths. But most of it seems totally unrelated – if you look at the actual changes, they look like code paths that don't depend on string types at all. I wonder if the code coverage machinery broke somehow? |
|
The coverage tests run with inlining off. There are occasionally tests that fail in that situation but pass when inlining is on. https://build.julialang.org/builders/coverage_ubuntu14.04-x64/builds/216/steps/Run%20non-inlined%20tests/logs/stdio |
|
Ah, I guess I should make sure that tests pass with inlining off. |
It's definitely well past time to get moving on this change, and I've tried bigger PRs but I think I need to start smaller. Here I'm separating out just the part that merges ASCIIString and UTF8String and replaces them with a single String type, which is basically just a rename of UTF8String. The deprecation is pretty minimal, we'll have to run PkgEval and figure out what breaks out there and try to mitigate.
This is kind of an awkward middle state since UTF8String no longer exists but UTF16String and UTF32String still do. Eventually, these should all move out into a package: if you want fully validated UTF8-encoded strings, then you should use a UTF8String type, etc. The Base String type will then be the somewhat permissive default string type that can handle any UTF8-like data, valid or not.