- Added support for arrow (Thanks to @Kopilov for contributing PR 150)
- Improved support for large Excels tables (Thanks to @ayvazj for contributing PR 126)
- Added second version of
unfold()to work with property accessors instead
cars.unfold("cars", listOf(Car::brand, Car::ps))Minor enhancements
- Fixed #63: Can not print
schema()of empty data-frame
Released 2021-07-17
- New Jupyter kernel integration with auto-import and improved rendering for
DataFrameandDataFrame.schema() - Added
DataFrame.letsPlot()to ease integration withlets-plots - New tutorial (jupyter notebook): Mammalian Sleep
- Updated to kotlin v1.5 and added supported for
value classinList<Any>.toDataFrame()andDataFrame.unfold() - Added timestamp support for database support API (fixes #124)
Released 2021-04-13
krangl is now deployed to maven-central and no no longer ot jcenter
Features
- Added support for fixed-width files with
readFixedWidth() - Added supported for more compact column type specification when reading tsv
- Fixed: NA and emtpy cell handling in excel-reader
- Fixed: Use correct cell types when writing Excel file
Republished to maven central https://search.maven.org/artifact/com.github.holgerbrandl.krangl/krangl
- Fixed
gatherconversion in case of mixed number types - Indicate guessed column type with prefix Any for basic types in
schemaandprint
- Fixed asDataFrame to include parent type properties
- Added
DataFrame.filterNotNullto remove records will nulls. A column selector can be provided to check only a subset of columns.
New Features
- #97 Added Excel read/write support (by LeandroC89)
// read
df = DataFrame.readExcel("data.xlsx", sheetName = "sales")
df = DataFrame.readExcel("data.xlsx", cellRange = CellRangeAddress.valueOf("A1:D10"))
// write
df.writeExcel("results.xslx")
- #95 Improved column type casts
dataFrameOf("foo")(1, 2, 3).addColumn("stringified_foo") { it["foo"].toStrings() }.schema()
> DataFrame with 3 observations
> foo [Int] 1, 2, 3
> stringified_foo [Str] 1, 2, 3
dataFrameOf("foo")("1", "2", "3").addColumn("parsed_foo") { it["foo"].toInts() }.schema()
> DataFrame with 3 observations
> foo [Str] 1, 2, 3
> parsed_foo [Int] 1, 2, 3
- #99 Added filtering by list (similar to R's
%in%operator)
irisData.filter { it["Species"].inList("setosa", "versicolor") }Bug Fixes
- #84 Builder now supports mixed numbers in column
- #96 & #94 Fixed bugs in
join - #100 Improved SQL bindings
- #99 Fixed median
- Fixed missing by values overhanging RHS in outer join (fixes #94)
- Added addRow (via PR92 by LeandroC89
- Added column type text to sql interface (fixes #72)
Released: 2020-06-02
- Added column transformation to calculate cumulative sum
cumSum
sales
.sortedBy("quarter")
.addColumn("cum_sales" to { it["sold_units"].cumSum()})
- Added column transformation
pctChangeto calculate percentage change between the current and a prior element. similar to pct_change in pandas (contributed by @amorphous1 in PR85)
sales
.groupBy("product")
.addColumn("sales_pct_change" to { it["sold_units"].pctChange() })- Added
leadandlag(contributed by @amorphous1 in PR85)
sales
.groupBy("product")
.sortedBy("quarter")
.addColumn("prev_quarter_sales" to { it["sold_units"].lag() })-
Significantly improved join performance (contributed by @amorphous1 in PR85)
-
New: Extended
bindRowsAPI to combine data rowwise (see PR #77 by @CrystalLord)
val person1 = mapOf("person" to "James", "year" to 1996)
val person2 = mapOf("person" to "Anne", "year" to 1998)
emptyDataFrame().bindRows(person1, person2).print()internal release
- New: Added built-it support for
Longcolumns (PR #69 by @davidpedrosa)
Major:
- New:
summarizeAtfor simplified column aggregations - New:
setNamesto replace column headers of a data-frame - New: Deparse Iterables more conveniently using lambdas in
deparseRecords
Minor:
- Fixed: Can not read csv-tables without header
- Added option to skip lines in csv reader.
- Fixed
schema()should no throw memory exception (#53: ) - Fixed
DataFrame.readTSVdefault format (#56) - Added
where()for conditional column creation (relates to #54) - Added
writeTSV - Fixed grouping by
Anycolumns - Added:
toDoubleMatrix()helper extension method
Major Enhancements
DataFrame.fromJsonwill now flatten nested json data
Minor
- Added
sum()extension for columns summaries/transformation - Added
dataFrameOf()that accepts Iterable of names - Added
bindRows()alias that accepts data frames as varargs - Added
bindCols()extension for list ofDataCol - Fill missing cells with NA in
bindRowsandbindCols - Resolve duplicated column names in
bindCols() - Added new builder to create data-frame from
DataFrameRowiterator - Added
addRowNumberto add the row number as column to a data-frame - Fixed: Incorrect types in gathered columns
Released 2018-04-11
Major Enhancements
- Allow index access for column model (fixes #46):
irisData[1][2] - Improved
DataFrame.countto respect existing groupings and to simply count rows if no grouping is defined - Added
moveLeftandmoveRightto rearrange column order - Added
nestandunnestto wrap columns into sub-tables and back - Added
expandandcompleteto expand column value-sets into data-frames - Added function literal support for
countandgroupBy(fixes #48):irisData.groupByExpr{ it["Sepal.Width"] > 3 } - Added receiver context for sortBy lambdas with sorting specific API (fixes #44)
Improved data-frame rendering
- Improved
print()ing of data-frames andschema()ta to have better alignment and more formatting options - Print row numbers by default when using
print(fixes #49)
Minor Enhancements
- Renamed
select2/remove2toselectIfandremoveIF - Fixed #39: Can not add scalar object as column
- Started submodule for documentation
- Hide columns in
printafter exceeding maximum line length (fixes #50) - Fixed #45:
sleepData.sortedBy{ "order" }should fail with informative exception
Released 2018-03-21
Major Enhancements
- Added property unfolding
df.unfold<Person("user", properties=listOf("address")) - Added text matching helper:
irisData.filter{ it["Species"].isMatching{ startsWith("se") }}(fixes #21) - Added
sortedByDescendinganddescand added more sorting tests - Added More elegant object bindings via reflection. Example
val objPersons : Iterable<User> = users.rowsAs<User>()(fixes #22) - Added compressed csv write support, configurable or by filename guessing
Minor Enhancements
- More robust row to object conversion
- Made
List<Boolean?>.not()public - Use regex instead of string as
separateseparator - Replaced fixed temporary column names with uuids
- Fixed incorrect coercion of incomplete inplace data to df
- Added
concatoperator for string column arithmetics - Fixed arithmetic comparison operators
- Added beakerx display adapter
Released 2018-03-14
Major Enhancements
- Allow specifying column types when reading csv data (Thanks to LeanderG for providing the PR)
- Added
groupedByto provide distinct set of grouping tuples as data-frame - Read support for URLs (Example
DataFrame.readCSV("https://git.io/vxks7").glimpse()) - Added basic read/write support for JSON data
- Added generic collection conversion
Iterable<Any>.asDataFrame()via reflection (fixes #24)
Incompatible API changes
- Renamed
structuretocolumnTypes - Renamed all table read function from
.from*to.read* - Fixed #29:
mapNonNullshould use parameter and not receiver
Minor Enhancements
- Namespace cleanup to hide internal helpers
- Bundled
irisData - Enhanced:
DataCol.toDouble()should work for int columns as well (same vv) - Added MIT License
- Use iterable instead of list for object conversions
Released: 2017-11-11
- More idiomatic API mimicking kotlin stdlib where possible
- Added
DataFrame.removeto drop columns from data-frames - Added
DataFrame.addColumnto add column from data-frames - Added
DataFrame.sortBy(TableFormula) - Added
DataFrame.filterByRow - Reworked column selector API
- Changed column expression API from Any to a constrained set of support types
- Fixed issues when combining columns of different types (e.g. DoubleCol + IntCol
- Dropped most unary operators
Skipped.
released on 2017-4-12
New Features
spread()-gather()support for elegant data reshaping (fixes #2)- Improve reshaping functionality by adding
uniteandseparate(fixes #9) - Added
sampleFrac()andsampleN()for random sub-sampling of data-frames (either with or without replacement)
Important Bug Fixes
mutate()can now change existing columns without altering column positions
Other
- New property accessor
DataFrame.colsto access all columns of a data-frame - Incremented kotlin version to 1.1
Initial Release
- Implement all
dplyrcore verbs - Implement all join types
- Table write support using csv-commons wrapper
- Extensive unit test coverage =
- TravisCI integration
- Support for
count()anddistinct() - Basic benchmarking framework (without jvm usage)