col_max() is a useful function that, given a data frame with numeric columns, returns a data frame that includes a new character column holding the name of the numeric column with the largest value in each row. While it is not documented, col_max() appears to ignore and retain existing character columns from the old data frame in the returned new data frame.
For example, given a dataset of the shares of heating types by Census tract with a character identifier of the tract, it might be useful to know the most prevalent heating type:
GEOID heat_util_gas heat_electricity
1 25013810901 44.9122807 32.631579
2 25013811400 28.6046512 63.604651
3 25013800300 61.5200479 7.839617
4 25013801402 50.5154639 18.041237
5 25013812201 47.8042086 34.903934
> census_data %>% select(GEOID, heat_util_gas, heat_electricity) %>% tidyfst::col_max()
GEOID heat_util_gas heat_electricity max_col
1: 25013810901 44.91228 32.631579 heat_util_gas
2: 25013811400 28.60465 63.604651 heat_electricity
3: 25013800300 61.52005 7.839617 heat_util_gas
4: 25013801402 50.51546 18.041237 heat_util_gas
5: 25013812201 47.80421 34.903934 heat_util_gas
(Electric heat is the most prevalent in observation 2; gas heat is the most prevalent in the other 4 Census tracts.)
It would be helpful to have the col_max() configured (1) to take a specified list of columns rather than all the numeric columns in the supplied data frame (for example, there may be additional columns in this dataset describing features of the Census tract that have nothing to do with heat source); and (2) to return the column holding the name of the column holding the maximum value, rather than the entire data frame with the new column appended. In effect, it would be helpful to be able to use col_max() in mutate() and on only a specified set of numeric columns. I hope that I have expressed the issue clearly. Please follow up if I can clarify. Thanks.
col_max() is a useful function that, given a data frame with numeric columns, returns a data frame that includes a new character column holding the name of the numeric column with the largest value in each row. While it is not documented, col_max() appears to ignore and retain existing character columns from the old data frame in the returned new data frame.
For example, given a dataset of the shares of heating types by Census tract with a character identifier of the tract, it might be useful to know the most prevalent heating type:
(Electric heat is the most prevalent in observation 2; gas heat is the most prevalent in the other 4 Census tracts.)
It would be helpful to have the col_max() configured (1) to take a specified list of columns rather than all the numeric columns in the supplied data frame (for example, there may be additional columns in this dataset describing features of the Census tract that have nothing to do with heat source); and (2) to return the column holding the name of the column holding the maximum value, rather than the entire data frame with the new column appended. In effect, it would be helpful to be able to use col_max() in mutate() and on only a specified set of numeric columns. I hope that I have expressed the issue clearly. Please follow up if I can clarify. Thanks.