- Make DataFrames.jl hashing consistent with Julia 1.13 and take into account column names when hashing
AbstractDataFrame(#3507) - Require Julia 1.10 and add PrettyTables.jl v3 support (#3510)
- Make DataFrames.jl support DataStructures.jl version 0.19 (#3503)
- Allow passing multiple values to add in
push!,pushfirst!,append!, andprepend!(#3372) renameandrename!now allow to apply a function transforming column names only to a subset of the columns specified by thecolskeyword argument (#3380)mapcolsandmapcols!now allow to apply a function transforming columns only to a subset of the columns specified by thecolskeyword argument (#3386)
- Correctly throw an error if negative number of rows is passed
to
firstorlast(#3402) - Always use the default thread pool for multithreaded operations,
instead of using the interactive thread pool when Julia was started
with
-tM,Nwith N > 0 (#3385) - Correctly return
Bool[]in thenonuniquefunction applied to a data frame with a pulled column that has zero levels in the pool (#3393) - Correctly index
eachrowandeachcolwithCartesianIndex(#3413) - Correctly handle non-standard integers when converting them to
BigInt(#3419)
- The
byandaggregatefunctions that were deprecated before 1.0 release are now removed. (#3422)
- Ensure that
allunique(::AbstractDataFrame, ::Any)always gets interpreted as test for uniqueness of rows in the first positional argument (#3434) - Make sure that an empty vector of
Anyor ofAbstractVectoris treated as having no columns when a data frame is being processed withcombine/select/transform. (#3435)
- Fix error in specification of dependency on DataStructures.jl (#3359)
- Objects inheriting from
Tables.AbstractRoware now treated in the same way asDataFrameRowbyselect/transform/combinefunctions. In previous versions they were treated as a scalar, but this was inconsistent with the intention ofTables.AbstractRowdefinition (#3348)
- Add
Iterators.partitionsupport forDataFrameRows(#3299) - Add support for
renamecolskeyword argument incrossjoin(#3314) DataFrameRowsandDataFrameColumnsnow supportnrow,ncol, andTables.subset(#3311)Notallows passing multiple positional arguments that are treated as if they were wrapped inColsand does not throw an error when a vector of duplicate indices is passed when doing column selection (#3302)- Added the kwarg
checkuniqueto sorting related functions (issorted,sort,sort!andsortperm) that throws an error when duplicate elements make multiple sort orders valid (#3312) reduceperformingvcaton a collection of data frames now acceptsinitkeyword argument (#3310)- Allow to pass column names in
DataFrameconstructor that replace the names generated by default (#3320) describenow has:sumavailable as a descriptive statistic. (#3303)
deleteat!correctly handles the situation when vector of rows to be dropped from a data frame is its column or might alias with some of its columns (#3304)
- Add
Iterators.partitionsupport (#3212) - Add
alluniqueand allow transformations incolsargument ofdescribeandnonuniquewhen working withSubDataFrame(3232) - Add support for
Tables.AbstractRowforpush!,pushfirst!, andinsert!(#3245) - Add support for
operatorkeyword argument inColsto take a set operation to apply to passed selectors (unionby default) (3224) - Allow to pass multiple predicates in
Colsand mix them with other selectors (3279) - Improve support for setting group order in
groupby(3253) - Joining functions now support
orderkeyword argument allowing the user to specify the order of the rows in the produced table (#3233) - Add
keepkeyword argument tononunique,unique, andunique!allowing to specify which duplicate rows should be kept (#3260) - Add
haskeyandgetmethods toDataFrameColumnsto make it support dictionary interface more completely (#3282) - Allow passing
scalarkeyword argument inflatten(#3283)
- passing very many data frames to
innerjoinandouterjoindoes not lead to stack overflow (#3233) - fixed incorrect handling of passing no conditions in
subsetandsubset!(#3264) - fixed error in fast aggregation in
sumandmeanof columns only havingmissingvalues (#3268) - fixed error in indexing of
SubDataFramethat has no columns selected from its parent (#3273)
dropmissingcreates new columns in a single pass ifdisallowmissing=true(#3256)
- Fix bug in
selectandtransformwithcopycols=falseonSubDataFramethat incorrectly allowed passing transformations (#3231)
- Fix incorrect handling of column metadata in
insertcols!andinsertcols(#3220) - Correctly handle
GroupedDataFramewith no groups in multi-column operation specification syntax (#3122)
- Improve printing of grouping keys when displaying
GroupedDataFrame(#3213)
- Support updates of metadata API introduced in DataAPI.jl 1.13.0 (3216)
- Make sure
flattenworks correctly on a data frame with zero rows (#3198)
- Make sure we always copy the indexing value when calling
getindexonDataFrameRowsobject (#3192)
- DataFrames.jl 1.4 requires Julia 1.6 (#3145)
subsetandsubset!now allow passing zero column selectors (#3025)subsetandsubset!processingGroupedDataFrameallow using a scalar as a subsetting condition (this will result in including/excluding a whole group); forAbstractDataFrameprocessing onlyAbstractVectorsubsetting condition is allowed as accepting scalars can lead to hard to catch bugs in users' code (#3032)permutedimsnow supports astrictkeyword argument that allows for a more flexible handling of values stored in a column that will become a new header (#3004)unstacknow allows passing a function incombinekeyword argument; this allows for a convenient creation of two dimensional pivot tables (#2998, #3185)filterforGroupedDataFramenow acceptsungroupkeyword argument (#3021)- Add special syntax for
eachindex,groupindices, andproprowto transformation mini-language (#3001). - Add support for
reverse!,permute!,invpermute!,shuffle, andshuffle!functions. Improve functionality ofreverse. (#3010). firstandlastforGroupedDataFramenow support passing number of elements to get (#3006)- Add
insertcols, which is a version ofinsertcols!that creates a new data frame (#3020) - Add
fillcombinationsfunction that generates all combinations of levels of selected columns of a data frame (#3012) - Guarantee that
permute!andinvpermute!throw on invalid input (#3035) - Add
allcombinationsfunction that returns a data frame created from all combinations of the passed vectors (#3031) - Add
resize!,keepat!,pop!,popfirst!, andpopat!, makedeleteat!signature more precise (#3047) - Add
pushfirst!andinsert!(#3072) - New
threadsargument allows disabling multithreading incombine,select,select!,transform,transform!,subsetandsubset!(#3030) - Add support for table-level and column-level metadata using DataAPI.jl interface (#3055)
completecasesandnonuniqueno longer throw an error when data frame with no columns is passed (#3055)describenow accepts two predefined arguments::nnonmissingand:nuniqueall(#3146)
- On Julia 1.7 or newer broadcasting assignment into an existing column of a data frame replaces it. Under Julia 1.6 or older it is an in place operation. (#3022)
allowduplicateskeyword argument inunstackis deprecated,combinekeyword argument should be used instead (#3185)
DataFrameis now amutable structand has three new fieldsmetadata,colmetadata, andallnotemetadata; this change makesDataFrameobjects serialized under earlier versions of DataFrames.jl incompatible with version 1.4 (#3055)
- fix dispatch ambiguity in
renameandrename!when only source data frame is passed (#3055) - Make sure that
AsTableaccepts only valid argument (#3064) - Make sure we avoid aliasing when repeating the same column
in
select[!]andtransform[!]onGroupedDataFrame(#3070) - Make
vcatcorrectly handlecolskeyword argument if only data frames having no columns are passed (#3081) - Make
subsetpreserves group ordering whenungroup=falselikesubset!already does (#3094) - Fix incorrect behavior of
GroupDataFrameindexing in corner cases (#3179) - Fix errors in
insertcols!when no columns to add are passed (#3179) - Fix errors in
minimumandmaximumaggregates when processingGroupedDataFramewithcombinein corner cases (#3179)
- Speed up
permute!andinvpermute!(and therefore sorting) 2x-8x for large tables by using cycle notation (#3035) - Make one-dimensional multi-element indexing of
DataFrameRowsreturnDataFrameRows(#3037) - Make
transform!onSubDataFramefaster (#3070)
- Support
Tables.subsetand moveByRowdefinition to Tables.jl (#3158)
- Fix overly restrictive type assertion in
filterandfilter!(#3155)
- Allow version 4 of Compat.jl
- Fix handling of
variable_eltypeinstack(#3043)
- Fix handling of
matchmissingkeyword argument in joins (#3040)
- Make sure that
select!/transform!andselect/transform(withcopycols=false) do not produce aliases of the same source column consistently (currently onlytransform[!]ensured it for an unwrapped column renaming operation) (#2983) - Fix aliasing detection in
sort!(now only identical columns passing===test are considered aliases) (#2981) - Make sure
ByRowcalls wrapped function exactly once for each element in all cases (#2982)
- Fix
getindexthat incorrectly allowed vectors ofPairs (#2970)
-
Improve
sortkeyword argument ingroupby(#2812).In the
groupbyfunction thesortkeyword argument now allows three values:nothing(the default) leaves the order of groups undefined and allowsgroupbyto pick the fastest available grouping algorithm;truesorts groups by key columns;falsecreates groups in the order of their appearance in the parent data frame;
In previous versions, the
sortkeyword argument allowed onlyBoolvalues andfalse(which was the default) corresponded to the new behavior whennothingis passed. Therefore only the user visible change affecting existing code is whensort=falseis passed explicitly. The order of groups was undefined in that case, but in practice groups were already created in their order of appearance, except when grouping columns implemented theDataAPI.refpoolAPI (notablyPooledArrayandCategoricalArray) or when they contained only integers in a small range. (#2812) -
the
unstackfunction receives new keyword argumentfill(withmissingdefault) that is used to fill combinations of not encountered rows and columns. This feature allows to distinguish between missings in value column and just missing row/column combinations and to easily fill with zeros non existing combinations in case of counting. (#2828) -
Allow adding new columns to a
SubDataFramecreated with:as column selector (#2794).If
sdfis aSubDataFramecreated with:as a column selector theninsertcols!,setindex!, and broadcasted assignment allow for creation of new columns, automatically filling filtered-out rows withmissingvalues; -
Allow replacing existing columns in a
SubDataFramewith!as row selector in assignment and broadcasted assignment (#2794).Assignment to existing columns allocates a new column. Values already stored in filtered-out rows are copied.
-
Allow
SubDataFrameto be passed as an argument toselect!andtransform!(also onGroupedDataFramecreated from aSubDataFrame) (#2794).Assignment to existing columns allocates a new column. Values already stored in filtered-out rows are copied. In case of creation of new columns, filtered-out rows are automatically filled with
missingvalues. IfSubDataFramewas not created with:as column selector the resulting operation must produce the same column names as stored in the sourceSubDataFrameor an error is thrown. -
Tables.materializerwhen passed the following types or their subtypes:AbstractDataFrame,DataFrameRows,DataFrameColumnsreturnsDataFrame. (#2839) -
the
insertcols!function receives new keyword argumentafter(withfalsedefault) that specifies if columns should be inserted after or beforecol. (#2829) -
Added support for
deleteat!(#2854) -
leftjoin!performing a left join of two data frame objects by updating the left data frame with the joined columns from right data frame. (#2843) -
the
DataFrameconstructor when column names are passed to it as a second argument now determines if a passed vector of column names is valid based on its contents and not element type (#2859) -
the
DataFrameconstructor when matrix is passed to it as a first argument now allowscopycolskeyword argument (#2859) -
Colsnow accepts a predicate accepting column names as strings. (#2881) -
In
source => transformation => destinationtransformation specification minilanguage nowdestinationcan be also aFunctiongenerating target column names and taking column names specified bysourceas an argument. (#2897) -
subsetandsubset!now allow passing multiple column selectors and vectors or matrices ofPairs as specifications of selection conditions (#2926) -
When using broadcasting in
source .=> transformation .=> destinationtransformation specification minilanguage nowAll,Cols,Between, andNotselectors when used assourceordestinationare properly expanded to selected column names within the call data frame scope. (#2918) -
describenow accepts:detailedas thestatsargument to compute standard deviation and quartiles in addition to statistics that are reported by default. (#2459) -
sort!now supports generalAbstractDataFrame(#2946) -
filternow supportsviewkeyword argument (#2951)
- fix a problem with
unstackon empty data frame (#2842) - fix a problem with not specialized
Pairarguments passed as transformations (#2889) - sorting related functions now more carefully check passed arguments for correctness. Now all keyword arguments are correctly checked to be either scalars of vectors of scalars. (#2946)
- for selected common transformation specifications like e.g.
AsTable(...) => ByRow(sum)use a custom implementations that lead to lower compilation latency and faster computation (#2869), (#2919)
delete!is deprecated in favor ofdeleteat!(#2854)- In
sort,sort!,issortedandsortpermit is now documented that the result of passing an empty column selector uses lexicographic ordering of all columns, but this behavior is deprecated. (#2941)
- In DataFrames.jl 1.4 release on Julia 1.7 or newer broadcasting assignment into an existing column of a data frame will replace it. Under Julia 1.6 or older it will be an in place operation. (#2937
- fix a bug in
crossjoinif the first argument isSubDataFrameandmakeunique=true(#2826)
- Add workaround for
deleteat!bug in Julia Base indelete!function (#2820)
- add option
matchmissing=:notequalin joins; inleftjoin,semijoinandantijoinmissings are dropped in right data frame, but preserved in left; inrightjoinmissings are dropped in left data frame, but preserved in right; ininnerjoinmissings are dropped in both data frames; inouterjointhis value of keyword argument is not supported (#2724) - correctly handle selectors of the form
:col => AsTableand:col => colsby expanding a single column into multiple columns (#2780) - if
subset!is passed aGroupedDataFramethe grouping in the passed object gets updated to reflect rows removed from the parent data frame (#2809)
- fix bug in how
groupbyhandles grouping of float columns; now-0.0is treated as not integer when deciding on which grouping algorithm should be used (#2791) - fix bug in how
issortedhandles custom orderings and improve performance of sorting when complex custom orderings are passed (#2746) - fix bug in
combine,select,select!,transform, andtransform!that incorrectly disallowed matrices ofPairs inGroupedDataFrameprocessing (#2782) - fix location of summary in
text/htmloutput (#2801)
SubDataFrame,filter!,unique!,getindex,delete!,leftjoin,rightjoin, andouterjoinare now more efficient if rows selected in internal operations form a continuous block (#2727, #2769)
hcatof a data frame with a vector is now deprecated to allow consistent handling of horizontal concatenation of data frame with Tables.jl tables in the future (#2777)
text/plainrendering of columns containing complex numbers is now improved (#2756)- in
text/htmldisplay of a data frame show full type information when hovering over the shortened type with a mouse (#2774)
- fix performance issue when aggregation function produces multiple rows in split-apply-combine (2749)
completecasesis now optimized and only processes columns that can contain missing values; additionally it is now type stable and always returns aBitVector(#2726)- fix performance bottleneck when displaying wide tables (#2750)
- make sure
subsetchecks if the passed condition function returns a vector of values (in the 1.0 release also returning scalartrue,false, ormissingwas allowed which was unintended and error prone) (#2744)
- fix of performance issue of
groupbywhen using multi-threading (#2736) - fix of performance issue of
groupbywhen usingPooledVector(2733)
- No breaking changes are planned for v1.0 release
- DataFrames.jl now checks that passed columns are 1-based as this is a current design assumption (#2594)
mapcols!makes sure not to create columns beingAbstractRangeconsistently with other methods that add columns to aDataFrame(#2594)transformandtransform!always copy columns when column renaming transformation is passed. If similar issues are identified after 1.0 release (i.e. that a copy of data is not made in scenarios where it normally should be made these will be considered bugs and fixed as non-breaking changes) (#2721)
firstindex,lastindex,size,ndims, andaxesare now consistently defined and documented in the manual forAbstractDataFrame,DataFrameRow,DataFrameRows,DataFrameColumns,GroupedDataFrame,GroupKeys, andGroupKey(#2573)- add
subsetandsubset!functions that allow to subset rows (#2496) namesnow allows passing a predicate as a column selector (#2417)vcatnow allows asourcekeyword argument that specifies the additional column to be added in the last position in the resulting data frame that will identify the source data frame. (#2649)GroupKeyandDataFrameRoware consistently behaving likeNamedTuplein comparisons and they now implement:hash,==,isequal,<,isless(#2669])- since Julia 1.7 using broadcasting assignment on a
DataFramecolumn selected as a property (e.g.df.col .= 1) is allowed when column does not exist and it allocates a fresh column (#2655) delete!now correctly handles the case when columns of a data frame are aliased (#2690)
- in
leftjoin,rightjoin, andouterjointheindicatorkeyword argument is deprecated in favor ofsourcekeyword argument;indicatorwill be removed in 2.0 release (2649) - Using broadcasting assignment on a
SubDataFramescolumn selected as a property (e.g.sdf.col .= 1) is deprecated; it will be disallowed in the future. (#2655) - Broadcasting assignment to an existing column of a
DataFrameselected as a property (e.g.df.col .= 1) being an in-place operation is deprecated. It will allocate a fresh column in the future (#2655) - all deprecations present in 0.22 release now throw an error
(#2554);
in particular
convertmethods,maponGroupedDataFramethat were deprecated in 0.22.6 release now throw an error (#2679)
innerjoin,leftjoin,rightjoin,outerjoin,semijoin, andantijoinare now much faster and check if passed data frames are sorted by theoncolumns and take into account if shorter data frame that is joined has unique values inoncolumns. These aspects of input data frames might affect the order of rows produced in the output (#2612, #2622)DataFrameconstructor,copy,getindex,select,select!,transform,transform!,combine,sort, and join functions now use multiple threads in selected operations (#2647, #2588, #2574, #2664)
convertmethods fromAbstractDataFrame,DataFrameRowandGroupKeytoArray,Matrix,VectorandTuple, as well as fromAbstractDicttoDataFrame, are now deprecated: use corresponding constructors instead. The only conversions that are retained areconvert(::Type{NamedTuple}, dfr::DataFrameRow),convert(::Type{NamedTuple}, key::GroupKey), andconvert(::Type{DataFrame}, sdf::SubDataFrame); the deprecated methods will be removed in 1.0 release- as a bug fix
eltypeof vector returned byeachrowis nowDataFrameRow(#2662) - applying
maptoGroupedDataFrameis now deprecated. It will be an error in 1.0 release. (#2662) copycolskeyword argument is now respected when building aDataFramefromTables.CopiedColumns(#2656)
- the rules for transformations passed to
select/select!,transform/transform!, andcombinehave been made more flexible; in particular now it is allowed to return multiple columns from a transformation function (#2461 and #2481) - CategoricalArrays.jl is no longer reexported: call
using CategoricalArraysto use it #2404. In the same vein, thecategoricalandcategorical!functions have been deprecated in favor oftransform(df, cols .=> categorical .=> cols)and similar syntaxes #2394.stacknow creates aPooledVector{String}variable column rather than aCategoricalVector{String}column by default; passvariable_eltype=CategoricalValue{String}to get the previous behavior (#2391) islessforDataFrameRows now checks column names (#2292)DataFrameColumnsis now not a subtype ofAbstractVector(#2291)nuniqueis not reported now bydescribeby default (#2339)- stop reordering columns of the parent in
transformandtransform!; always generate columns that were specified to be computed even forGroupedDataFramewith zero rows (#2324) - improve the rule for automatically generated column names in
combine/select(!)/transform(!)with composed functions (#2274) :nmissingindescribenow produces0if the column does not allow missing values; earliernothingwas produced in this case (#2360)- fast aggregation functions in for
GroupedDataFramenow correctly choose the fast path only when it is safe; this resolves inconsistencies with what the same functions not using fast path produce (#2357) - joins now return
PooledVectornotCategoricalVectorin indicator column (#2505) GroupKeysnow supportsinforGroupKey,Tuple,NamedTupleand dictionaries (2392)- in
describethe specification of custom aggregation is nowfunction => name; oldname => functionorder is now deprecated (#2401) - in joins passing
NaNor real or imaginary-0.0inoncolumn now throws an error; passingmissingthrows an error unlessmatchmissing=:equalkeyword argument is passed (#2504) unstacknow produces row and column keys in the order of their first appearance and has two new keyword argumentsallowmissingandallowduplicates(#2494)- PrettyTables.jl is now the
default back-end to print DataFrames to text/plain; the print option
splitcolswas removed and the output format was changed (#2429)
- add
filtertoGroupedDataFrame(#2279) - add
emptyandempty!function forDataFramethat remove all rows from it, but keep columns (#2262) - make
indicatorkeyword argument in joins allow passing a string (#2284, #2296) - add new functions to
GroupKeyAPI to make it more consistent withDataFrameRow(#2308) - allow column renaming in joins (#2313 and (#2398)
- add
rownumbertoDataFrameRow(#2356) - allow passing column name to specify the position where a new columns should be
inserted in
insertcols!(#2365) - allow
GroupedDataFrames to be indexed using a dictionary, which can useSymbolor string keys and are not dependent on the order of keys. (#2281) - add
isapproxmethod to check for approximate equality between two dataframes (#2373) - add
columnindexforDataFrameRow(#2380) namesnow acceptsTypeas a column selector (#2400)select,select!,transform,transform!andcombinenow allowrenamecolskeyword argument that makes it possible to avoid adding transformation function name as a suffix in automatically generated column names (#2397)filter,sort,dropmissing, anduniquenow support aviewkeyword argument which if set totruemakes them return aSubDataFrameview into the passed data frame.- add
onlymethod forAbstractDataFrame(#2449) - passing empty sets of columns in
filter/filter!and inselect/transform/combinewithByRowis now accepted (#2476) - add
permutedimsmethod forAbstractDataFrame(#2447) - add support for
Colsfrom DataAPI.jl (#2495) - add
reversefunction forAbstractDataFramethat reverses the rows (#2944)
DataFrame!is now deprecated (#2338)- several in-standard
DataFrameconstructors are now deprecated (#2464) - all old deprecations now throw an error (#2350)
- Tables.jl version 1.2 is now required.
- DataAPI.jl version 1.4 is now required. It implies that
All(args...)is deprecated andCols(args...)is recommended instead.All()is still supported.
- Documentation is now available also in Dark mode (#2315)
- add rich display support for Markdown cell entries in HTML and LaTeX (#2346)
- limit the maximal display width the output can use in
text/plainbefore being truncated (in thetextwidthsense, excluding…) to32per column by default and fix a corner case when no columns are printed in situations when they are too wide (#2403) - Common methods are now precompiled to improve responsiveness the first time a method is called in a Julia session. Precompilation takes up to 30 seconds after installing the package (#2456).