Skip to content

Dataframe Improvements #962

@pngwn

Description

@pngwn

There are a number of issues with the Dataframe component as it exist today and we need to do some work to fix the outstanding but also improve the usability for humans.

We can use this issue to keep track of the issues that have been reported and come up with a design that addresses the usability issues. I'll start with a simple proposal and we can discuss from there.

python API Changes

Today the dataframe API looks like this:

Dataframe(
  headers=None, 
  row_count=3, 
  col_count=3, 
  default_value=None, 
  datatype="str", 
  label=None, 
  col_width=None, 
  type="pandas", 
  optional=False
)

modifying column width

Proposal: col_width should be removed.

I think allowing users to set the column width is not a good idea. The whole purpose of gradio is to generate high quality web apps to share and showcase models. This should work across device sizes and screen widths. Tables automatically adapt column widths to accomodate their content, providing an API that will almost definitely break the UI is not 'pit of success' stuff.

fixed column and row count

Proposal: col_count and col_width should take either a number or a tuple of `(number, "fixed"|"dynamic").

in #868 @osanseviero wrote:

Allow making the number of columns or rows fixed, since for some use cases you don't want users creating new rows.

We do not currently have a mechanism to prevent end-users from creating new columns and rows. I think we have two options here:

  • create col_fixed and row_fixed boolean kwargs.
    This adds additional options to dataframe when it already has a lot of kwargs, it could start to get overwhelming if we keep adding to the API, but it is simple and would work.
  • allow row_count and col_count to take either a number (e.g. 3) or a tuple of (number, "fixed"|"dynamic")
    This does expand the API for *_count but I quite like it as it binds two highly related options together. This would look like col_count=(3, "fixed"). col_count=3 would essentially be shorthand for col_count=(3, "dynamic").

Note: We could rename these kwargs to col, and row

I propose the second option (tuple) but I do not feel strongly about it.

conflicts and confusements with kwargs relating to col and row quantity

Proposal: headers, col_count, row_count, default_value should be validated to ensure there are no conflicts.

More specifically: Any combination of kwargs that can set the column count must always equal the same number of columns. Any kwargs that can set the row count must not result in provided data being hidden.

headers and col_count can conflict; default_value and headers can conflict (kinda); default_value, col_count, row_count and default_value can conflict.

This is easiest to explain with examples.

This is confusing but not necessarily an issue:

Dataframe(
  col_count=3,
  headers=["One", "Two"]
)

However this is just wrong and will lead to unexpected behaviour:

Dataframe(
  col_count=2,
  headers=["One", "Two", "Three"]
)

What should happen here:

Dataframe(
  headers=["One", "Two", "Three"]
  default_value=[[1, 2], [3, 4]]
)

And here:

Dataframe(
  default_value=[[1, 2, 3, 4], [5, 6, 7, 8]],
  col_count=2
)

We need to figure out simple rules to validate datafram inputs that affect the ciolumns + widths, or decide how to normalise.

Some possible rules aimed at removing ambiguity:

  • if headers and col_count are provided, the length of headers must be equal to col_count.
  • if headers and default_value are provided, the length of each piece of column data in default_value should match the length of headers.
  • if default_value and col_count are provided, the length of each piece of column data in default_value should be equal to col_count.
  • if default_value and row_count are provided, the length of the row data in default_value should be equal to or less than the row_count. (This isn't essential but would lead to weird behaviour).

The obvious counter to this is that we could add additional values to default_value or headers to 'fill in the gaps' but I think the API will be far easier to reason about for users if we have clear rules. It will allow us to easily detect errors and provide helpful messages to users. Trying to guess what users want without being explicit is how perl happened.

This validation would happen at python time, and we could provide error messages like:

`col_count` is 3 but you passed 2 headers. Set `col_count` to 2 or add 1 header, even if it is an empty string.
`col_count` is 2 but you passed 4 headers. Set `col_count` to 4 or remove 2 headers.
default_value contains data for 4 columns but col_count is set to 2. Set col_count to 4 or remove some data from `default_value`

proposed python API

Dataframe(
  headers=None,             # validated
  row_count=(3, "dynamic")  # validated
  col_count=(3, "dynamic")  # validated
  default_value=None,       # validated
  datatype="str", 
  label=None, 
  type="pandas", 
  optional=False
)

UX improvements

make cells easier to interact with

in #868 @osanseviero said:

Modifying the input of a cell requires double-clicking on it. I would love to be able to just click and add my input. There were also a couple of dev experience improvements I would love to see

I'm not certain about this.

The current behaviour mimics how most spreadsheets work but users of spreadhseets freequently move around the spreadsheet befopre editing. Our dataframe is not a powertool but a quick user entry tool, so perhaps ease of data entry is more important than ease of cell navigation.

If we change click behavioour, we also need to change keyboard behaviour for parity of usability. Essentially this feature request is to remove the different 'states' from the dataframe, so that it is essentially 'edit only', rather than having view/ edit modes as without click triggering that state it would be impossible to get to. Static or output dataframes would still have this behaviour.

@omarespejel Could you add some mroe details about how you would like to interact with the dataframe. Not just click but how would you like to change to a different cell, how would that work for keyboard users who do not or cannot use a mouse?

better inputs when the datatype is given

in #868 @osanseviero said:

With using datatype="number", I would love if users can only write numbers and not strings. I know this modifies what is passed to the interface, but modifying the user input type as well would be great.

Currently everything is treated as a string by the frontend, even when we know the datatype. I think we can improve this significantly.

  • number fields could use a number input which will only allow number entry
  • boolean fields could use a toggle or checkbox
  • date fields could use a date input
  • string fields will use the current textbox functionality

But I think we can go further if we expanded the datatype kwarg.

  • enums/ unions would render a dropdown or autocompleting dropdown thing.
  • we could support prefix + suffixes for currencies + measurements. `datatype=[()]
  • it might be possible to support custom validators in the future as long as they are regex based.

Be good to get your thoughts @gary149

populate dataframe fom file

@merveenoyan create issued #945 discussing uploading csv/ tsv files into the dataframe.

I think this makes perfect sense. We are alrady doing this for the timeseries, adapting it for the dataframe should be straightforward.

@merveenoyan could you clarify the first part of that issue. Are you saying it would be good for the dataframe to accept different values in the python library (i.e. the default_value kwarg)? or that It would be good to be able to modify what is displayed after a user uploads the file via the UI (i.e. only showing the first/ last 5 rows, etc.)?

row + col creation and deletion

#631

Row and column creation and deletion needs some work. Deletion isn't currently possible.

Would love to get people's thoughs on what good creation and deletion might look like, are there other datatables you have seen in the wild that do this well while remaining very compact?

Another for @gary149

Redesign

Just putting this here for posterity. Things are being redesigned.

bugs

We have bugs:

Issues relating to features for tracking purposes:


Let me know if I have missed anything and would be good to get people's thoughts on this.

cc @abidlabs @aliabid94 @dawoodkhan82 @aliabd @farukozderim

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions