Skip to content

col_max() that can be used with dplyr::mutate() #26

@michaelaoash

Description

@michaelaoash

col_max() is a useful function that, given a data frame with numeric columns, returns a data frame that includes a new character column holding the name of the numeric column with the largest value in each row. While it is not documented, col_max() appears to ignore and retain existing character columns from the old data frame in the returned new data frame.

For example, given a dataset of the shares of heating types by Census tract with a character identifier of the tract, it might be useful to know the most prevalent heating type:

          GEOID    heat_util_gas    heat_electricity
1   25013810901       44.9122807           32.631579
2   25013811400       28.6046512           63.604651
3   25013800300       61.5200479            7.839617
4   25013801402       50.5154639           18.041237
5   25013812201       47.8042086           34.903934
> census_data %>% select(GEOID, heat_util_gas, heat_electricity) %>% tidyfst::col_max()
           GEOID    heat_util_gas    heat_electricity          max_col
  1: 25013810901         44.91228           32.631579    heat_util_gas
  2: 25013811400         28.60465           63.604651 heat_electricity
  3: 25013800300         61.52005            7.839617    heat_util_gas
  4: 25013801402         50.51546           18.041237    heat_util_gas
  5: 25013812201         47.80421           34.903934    heat_util_gas

(Electric heat is the most prevalent in observation 2; gas heat is the most prevalent in the other 4 Census tracts.)

It would be helpful to have the col_max() configured (1) to take a specified list of columns rather than all the numeric columns in the supplied data frame (for example, there may be additional columns in this dataset describing features of the Census tract that have nothing to do with heat source); and (2) to return the column holding the name of the column holding the maximum value, rather than the entire data frame with the new column appended. In effect, it would be helpful to be able to use col_max() in mutate() and on only a specified set of numeric columns. I hope that I have expressed the issue clearly. Please follow up if I can clarify. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions