Add a flagging callback to save json files to a hugging face dataset#1821
Conversation
|
Hi @chrisemezue, This is very cool, two quick high-level questions:
|
|
I think the title of this PR is a bit misleading. |
Thanks @osanseviero for your feedback
|
|
Hey @chrisemezue, thank you! I was a bit confused by the mentions of csvs, but now that you mention it's a |
|
Please let us know whenever this is ready for review :) |
|
@osanseviero I am done now. Ready for review. |
osanseviero
left a comment
There was a problem hiding this comment.
Thanks a lot for this! This is very cool! I left some minor comments 🤗
I would love to see an example output of using this flagging callback (a small dataset, since https://huggingface.co/datasets/chrisjay/crowd-speech-africa has too many files and it does not load :()).
cc @lhoestq
|
@osanseviero here is an example of a small dataset with this flagging callback. |
|
Hi @chrisemezue this looks really good! I left some suggestions / clarification questions in the PR, but once these are addressed, we should be good to merge |
|
Pushed some changes which should fix the tests. As discussed over Slack, we just have a couple of minor fixes, and then we should be good to merge! |
|
Thanks so much @chrisemezue for making the PR and addressing the suggestions! And thanks all for reviewing. LGTM -- will merge in after the tests run |
osanseviero
left a comment
There was a problem hiding this comment.
This looks good! Thanks for working on this!
|
|
||
| for component in components: | ||
| headers.append(component.label) | ||
| headers.append(component.label) |
There was a problem hiding this comment.
This is repeated above, is that intended?
|
I'll resolve the conflicts and fix the last few suggestions you made @osanseviero, so that we can get this merged in. Thanks a bunch @chrisemezue! |
Description
Based on issue #1676 I have created the
HuggingFaceDatasetJSONSaverclass which saves the files as JSONL.Specifically, for each flagged sample:
folder_name.folder_namemetadata.jsonlfile inside thefolder_namefolder.Advantages of this:
HuggingFaceDatasetSaver.Closes: #1676
Checklist: