There are a number of issues with the Dataframe component as it exist today and we need to do some work to fix the outstanding but also improve the usability for humans.
We can use this issue to keep track of the issues that have been reported and come up with a design that addresses the usability issues. I'll start with a simple proposal and we can discuss from there.
python API Changes
Today the dataframe API looks like this:
Dataframe(
headers=None,
row_count=3,
col_count=3,
default_value=None,
datatype="str",
label=None,
col_width=None,
type="pandas",
optional=False
)
modifying column width
Proposal: col_width should be removed.
I think allowing users to set the column width is not a good idea. The whole purpose of gradio is to generate high quality web apps to share and showcase models. This should work across device sizes and screen widths. Tables automatically adapt column widths to accomodate their content, providing an API that will almost definitely break the UI is not 'pit of success' stuff.
fixed column and row count
Proposal: col_count and col_width should take either a number or a tuple of `(number, "fixed"|"dynamic").
in #868 @osanseviero wrote:
Allow making the number of columns or rows fixed, since for some use cases you don't want users creating new rows.
We do not currently have a mechanism to prevent end-users from creating new columns and rows. I think we have two options here:
- create
col_fixed and row_fixed boolean kwargs.
This adds additional options to dataframe when it already has a lot of kwargs, it could start to get overwhelming if we keep adding to the API, but it is simple and would work.
- allow
row_count and col_count to take either a number (e.g. 3) or a tuple of (number, "fixed"|"dynamic")
This does expand the API for *_count but I quite like it as it binds two highly related options together. This would look like col_count=(3, "fixed"). col_count=3 would essentially be shorthand for col_count=(3, "dynamic").
Note: We could rename these kwargs to col, and row
I propose the second option (tuple) but I do not feel strongly about it.
conflicts and confusements with kwargs relating to col and row quantity
Proposal: headers, col_count, row_count, default_value should be validated to ensure there are no conflicts.
More specifically: Any combination of kwargs that can set the column count must always equal the same number of columns. Any kwargs that can set the row count must not result in provided data being hidden.
headers and col_count can conflict; default_value and headers can conflict (kinda); default_value, col_count, row_count and default_value can conflict.
This is easiest to explain with examples.
This is confusing but not necessarily an issue:
Dataframe(
col_count=3,
headers=["One", "Two"]
)
However this is just wrong and will lead to unexpected behaviour:
Dataframe(
col_count=2,
headers=["One", "Two", "Three"]
)
What should happen here:
Dataframe(
headers=["One", "Two", "Three"]
default_value=[[1, 2], [3, 4]]
)
And here:
Dataframe(
default_value=[[1, 2, 3, 4], [5, 6, 7, 8]],
col_count=2
)
We need to figure out simple rules to validate datafram inputs that affect the ciolumns + widths, or decide how to normalise.
Some possible rules aimed at removing ambiguity:
- if
headers and col_count are provided, the length of headers must be equal to col_count.
- if
headers and default_value are provided, the length of each piece of column data in default_value should match the length of headers.
- if
default_value and col_count are provided, the length of each piece of column data in default_value should be equal to col_count.
- if
default_value and row_count are provided, the length of the row data in default_value should be equal to or less than the row_count. (This isn't essential but would lead to weird behaviour).
The obvious counter to this is that we could add additional values to default_value or headers to 'fill in the gaps' but I think the API will be far easier to reason about for users if we have clear rules. It will allow us to easily detect errors and provide helpful messages to users. Trying to guess what users want without being explicit is how perl happened.
This validation would happen at python time, and we could provide error messages like:
`col_count` is 3 but you passed 2 headers. Set `col_count` to 2 or add 1 header, even if it is an empty string.
`col_count` is 2 but you passed 4 headers. Set `col_count` to 4 or remove 2 headers.
default_value contains data for 4 columns but col_count is set to 2. Set col_count to 4 or remove some data from `default_value`
proposed python API
Dataframe(
headers=None, # validated
row_count=(3, "dynamic") # validated
col_count=(3, "dynamic") # validated
default_value=None, # validated
datatype="str",
label=None,
type="pandas",
optional=False
)
UX improvements
make cells easier to interact with
in #868 @osanseviero said:
Modifying the input of a cell requires double-clicking on it. I would love to be able to just click and add my input. There were also a couple of dev experience improvements I would love to see
I'm not certain about this.
The current behaviour mimics how most spreadsheets work but users of spreadhseets freequently move around the spreadsheet befopre editing. Our dataframe is not a powertool but a quick user entry tool, so perhaps ease of data entry is more important than ease of cell navigation.
If we change click behavioour, we also need to change keyboard behaviour for parity of usability. Essentially this feature request is to remove the different 'states' from the dataframe, so that it is essentially 'edit only', rather than having view/ edit modes as without click triggering that state it would be impossible to get to. Static or output dataframes would still have this behaviour.
@omarespejel Could you add some mroe details about how you would like to interact with the dataframe. Not just click but how would you like to change to a different cell, how would that work for keyboard users who do not or cannot use a mouse?
better inputs when the datatype is given
in #868 @osanseviero said:
With using datatype="number", I would love if users can only write numbers and not strings. I know this modifies what is passed to the interface, but modifying the user input type as well would be great.
Currently everything is treated as a string by the frontend, even when we know the datatype. I think we can improve this significantly.
number fields could use a number input which will only allow number entry
boolean fields could use a toggle or checkbox
date fields could use a date input
string fields will use the current textbox functionality
But I think we can go further if we expanded the datatype kwarg.
- enums/ unions would render a dropdown or autocompleting dropdown thing.
- we could support prefix + suffixes for currencies + measurements. `datatype=[()]
- it might be possible to support custom validators in the future as long as they are regex based.
Be good to get your thoughts @gary149
populate dataframe fom file
@merveenoyan create issued #945 discussing uploading csv/ tsv files into the dataframe.
I think this makes perfect sense. We are alrady doing this for the timeseries, adapting it for the dataframe should be straightforward.
@merveenoyan could you clarify the first part of that issue. Are you saying it would be good for the dataframe to accept different values in the python library (i.e. the default_value kwarg)? or that It would be good to be able to modify what is displayed after a user uploads the file via the UI (i.e. only showing the first/ last 5 rows, etc.)?
row + col creation and deletion
#631
Row and column creation and deletion needs some work. Deletion isn't currently possible.
Would love to get people's thoughs on what good creation and deletion might look like, are there other datatables you have seen in the wild that do this well while remaining very compact?
Another for @gary149
Redesign
Just putting this here for posterity. Things are being redesigned.
bugs
We have bugs:
Issues relating to features for tracking purposes:
Let me know if I have missed anything and would be good to get people's thoughts on this.
cc @abidlabs @aliabid94 @dawoodkhan82 @aliabd @farukozderim
There are a number of issues with the Dataframe component as it exist today and we need to do some work to fix the outstanding but also improve the usability for humans.
We can use this issue to keep track of the issues that have been reported and come up with a design that addresses the usability issues. I'll start with a simple proposal and we can discuss from there.
python API Changes
Today the dataframe API looks like this:
modifying column width
Proposal:
col_widthshould be removed.I think allowing users to set the column width is not a good idea. The whole purpose of gradio is to generate high quality web apps to share and showcase models. This should work across device sizes and screen widths. Tables automatically adapt column widths to accomodate their content, providing an API that will almost definitely break the UI is not 'pit of success' stuff.
fixed column and row count
Proposal:
col_countandcol_widthshould take either anumberor a tuple of `(number, "fixed"|"dynamic").in #868 @osanseviero wrote:
We do not currently have a mechanism to prevent end-users from creating new columns and rows. I think we have two options here:
col_fixedandrow_fixedboolean kwargs.This adds additional options to dataframe when it already has a lot of kwargs, it could start to get overwhelming if we keep adding to the API, but it is simple and would work.
row_countandcol_countto take either a number (e.g.3) or a tuple of(number, "fixed"|"dynamic")This does expand the API for
*_countbut I quite like it as it binds two highly related options together. This would look likecol_count=(3, "fixed").col_count=3would essentially be shorthand forcol_count=(3, "dynamic").Note: We could rename these kwargs to
col, androwI propose the second option (tuple) but I do not feel strongly about it.
conflicts and confusements with kwargs relating to col and row quantity
Proposal:
headers,col_count,row_count,default_valueshould be validated to ensure there are no conflicts.More specifically: Any combination of kwargs that can set the column count must always equal the same number of columns. Any kwargs that can set the row count must not result in provided data being hidden.
headersandcol_countcan conflict;default_valueandheaderscan conflict (kinda);default_value,col_count,row_countanddefault_valuecan conflict.This is easiest to explain with examples.
This is confusing but not necessarily an issue:
However this is just wrong and will lead to unexpected behaviour:
What should happen here:
And here:
We need to figure out simple rules to validate datafram inputs that affect the ciolumns + widths, or decide how to normalise.
Some possible rules aimed at removing ambiguity:
headersandcol_countare provided, the length ofheadersmust be equal tocol_count.headersanddefault_valueare provided, the length of each piece of column data indefault_valueshould match the length ofheaders.default_valueandcol_countare provided, the length of each piece of column data indefault_valueshould be equal to col_count.default_valueandrow_countare provided, the length of the row data indefault_valueshould be equal to or less than therow_count. (This isn't essential but would lead to weird behaviour).The obvious counter to this is that we could add additional values to
default_valueorheadersto 'fill in the gaps' but I think the API will be far easier to reason about for users if we have clear rules. It will allow us to easily detect errors and provide helpful messages to users. Trying to guess what users want without being explicit is how perl happened.This validation would happen at python time, and we could provide error messages like:
proposed python API
UX improvements
make cells easier to interact with
in #868 @osanseviero said:
I'm not certain about this.
The current behaviour mimics how most spreadsheets work but users of spreadhseets freequently move around the spreadsheet befopre editing. Our dataframe is not a powertool but a quick user entry tool, so perhaps ease of data entry is more important than ease of cell navigation.
If we change
clickbehavioour, we also need to change keyboard behaviour for parity of usability. Essentially this feature request is to remove the different 'states' from the dataframe, so that it is essentially 'edit only', rather than having view/ edit modes as without click triggering that state it would be impossible to get to. Static or output dataframes would still have this behaviour.@omarespejel Could you add some mroe details about how you would like to interact with the dataframe. Not just click but how would you like to change to a different cell, how would that work for keyboard users who do not or cannot use a mouse?
better inputs when the datatype is given
in #868 @osanseviero said:
Currently everything is treated as a string by the frontend, even when we know the datatype. I think we can improve this significantly.
numberfields could use a number input which will only allow number entrybooleanfields could use a toggle or checkboxdatefields could use a date inputstringfields will use the current textbox functionalityBut I think we can go further if we expanded the datatype kwarg.
Be good to get your thoughts @gary149
populate dataframe fom file
@merveenoyan create issued #945 discussing uploading csv/ tsv files into the dataframe.
I think this makes perfect sense. We are alrady doing this for the timeseries, adapting it for the dataframe should be straightforward.
@merveenoyan could you clarify the first part of that issue. Are you saying it would be good for the dataframe to accept different values in the python library (i.e. the
default_valuekwarg)? or that It would be good to be able to modify what is displayed after a user uploads the file via the UI (i.e. only showing the first/ last 5 rows, etc.)?row + col creation and deletion
#631
Row and column creation and deletion needs some work. Deletion isn't currently possible.
Would love to get people's thoughs on what good creation and deletion might look like, are there other datatables you have seen in the wild that do this well while remaining very compact?
Another for @gary149
Redesign
Just putting this here for posterity. Things are being redesigned.
bugs
We have bugs:
blocks-dev] Dataframe input and output both behave incorrectly #890Issues relating to features for tracking purposes:
Let me know if I have missed anything and would be good to get people's thoughts on this.
cc @abidlabs @aliabid94 @dawoodkhan82 @aliabd @farukozderim