parallel json parsing by dimatr · Pull Request #33 · graph-genome/component_segmentation

dimatr · 2020-04-08T18:10:38Z

On a 2.2 GB test this drops the JSONparser time from 3 min to 30 sec

Find dividers

merging from upstream

josiahseaman

python -m pip install ray
Collecting ray
ERROR: Could not find a version that satisfies the requirement ray (from versions: none)
ERROR: No matching distribution found for ray

There's clearly a repo here: https://pypi.org/project/ray/ I've updated pip. Perhaps this is pypy only distro? Could you please give me more info on ray install compatibility and whether we'll be able to build it into the requirements.txt or some other automated solution?

josiahseaman · 2020-04-08T21:37:34Z

The issue appears to be that wheels are only built for Linux and MacOS, but I'm testing on a Windows machine. The lack of support makes sense really, but it should at least be noted this would be the first departure of Pantograph from being cross-platform compatible. Would you mind writing an OS switch and local import instead?

https://ray.readthedocs.io/en/latest/installation.html

lomereiter · 2020-04-09T08:03:30Z

Also consider using an OS-independent library (joblib?)

subwaystation · 2020-04-09T10:06:42Z

Do we really need support for Windows?
There is no Windows support for e.g. odgi, too.
We will have our docker pipeline, so as long as docker is available, the whole thing will work out.

dimatr · 2020-04-09T10:20:05Z

Please check now. The current change works on Linux and Windows

subwaystation · 2020-04-09T13:57:13Z

Reading in the data on a 28 core machine is much faster now! Thanks @dimatr .

Just one minor thing: How about a parameter where users can specify the number of cores to use?

dimatr · 2020-04-09T14:02:19Z

I will add one new parameter --parallel-cores with default os.cpu_count()

… level

dimatr · 2020-04-09T15:28:43Z

should be ready now, please check

dimatr added 4 commits April 7, 2020 17:13

Merge pull request #1 from lomereiter/find_dividers

ce3087a

Find dividers

Merge pull request #2 from graph-genome/master

ec95bbf

merging from upstream

parallel json parsing with Ray module

1988b71

ray module version

8b1679b

josiahseaman self-requested a review April 8, 2020 21:29

josiahseaman requested changes Apr 8, 2020

View reviewed changes

now with joblib module as a crossplatform solution

684a659

dimatr added 2 commits April 9, 2020 16:34

a new input parameter --parallel-cores to control the parallelization…

a459114

… level

--parallel-cores default to os.cpu_count()

64e07b8

subwaystation merged commit 8bc49fb into graph-genome:master Apr 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel json parsing#33

parallel json parsing#33
subwaystation merged 7 commits intograph-genome:masterfrom
dimatr:master

dimatr commented Apr 8, 2020

Uh oh!

josiahseaman left a comment

Uh oh!

josiahseaman commented Apr 8, 2020

Uh oh!

lomereiter commented Apr 9, 2020

Uh oh!

subwaystation commented Apr 9, 2020

Uh oh!

dimatr commented Apr 9, 2020

Uh oh!

subwaystation commented Apr 9, 2020

Uh oh!

dimatr commented Apr 9, 2020

Uh oh!

dimatr commented Apr 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dimatr commented Apr 8, 2020

Uh oh!

josiahseaman left a comment

Choose a reason for hiding this comment

Uh oh!

josiahseaman commented Apr 8, 2020

Uh oh!

lomereiter commented Apr 9, 2020

Uh oh!

subwaystation commented Apr 9, 2020

Uh oh!

dimatr commented Apr 9, 2020

Uh oh!

subwaystation commented Apr 9, 2020

Uh oh!

dimatr commented Apr 9, 2020

Uh oh!

dimatr commented Apr 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants