parallel json parsing#33
Conversation
Find dividers
merging from upstream
josiahseaman
left a comment
There was a problem hiding this comment.
python -m pip install ray
Collecting ray
ERROR: Could not find a version that satisfies the requirement ray (from versions: none)
ERROR: No matching distribution found for ray
There's clearly a repo here: https://pypi.org/project/ray/ I've updated pip. Perhaps this is pypy only distro? Could you please give me more info on ray install compatibility and whether we'll be able to build it into the requirements.txt or some other automated solution?
|
The issue appears to be that wheels are only built for Linux and MacOS, but I'm testing on a Windows machine. The lack of support makes sense really, but it should at least be noted this would be the first departure of Pantograph from being cross-platform compatible. Would you mind writing an OS switch and local import instead? |
|
Also consider using an OS-independent library (joblib?) |
|
Do we really need support for Windows? |
|
Please check now. The current change works on Linux and Windows |
|
Reading in the data on a 28 core machine is much faster now! Thanks @dimatr . Just one minor thing: How about a parameter where users can specify the number of cores to use? |
|
I will add one new parameter --parallel-cores with default os.cpu_count() |
|
should be ready now, please check |
On a 2.2 GB test this drops the JSONparser time from 3 min to 30 sec