Skip to content

{devel,lib}[GCCcore/7.3.0,foss/2018b] TensorFlow v1.12.0, Bazel v0.18.0 w/ Python 3.6.6#7157

Merged
bartoldeman merged 6 commits intoeasybuilders:developfrom
boegel:20181113163523_new_pr_TensorFlow1120
Nov 15, 2018
Merged

{devel,lib}[GCCcore/7.3.0,foss/2018b] TensorFlow v1.12.0, Bazel v0.18.0 w/ Python 3.6.6#7157
bartoldeman merged 6 commits intoeasybuilders:developfrom
boegel:20181113163523_new_pr_TensorFlow1120

Conversation

@boegel
Copy link
Copy Markdown
Member

@boegel boegel commented Nov 13, 2018

(created using eb --new-pr)

@boegel boegel added the update label Nov 13, 2018
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 13, 2018

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
node3152.skitty.os - Linux centos linux 7.5.1804, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 2.7.5
See https://gist.github.com/9dfe14c97bf9b3c08bf2ecee2a3cd643 for a full test report.

@boegel boegel changed the title {devel,lib}[GCCcore/7.3.0,foss/2018b] TensorFlow v1.12.0, Bazel v0.18.0 w/ Pyrthon 3.6.6 {devel,lib}[GCCcore/7.3.0,foss/2018b] TensorFlow v1.12.0, Bazel v0.18.0 w/ Python 3.6.6 Nov 13, 2018
@boegel boegel requested a review from bartoldeman November 13, 2018 16:10
@bartoldeman
Copy link
Copy Markdown
Contributor

Test report by @bartoldeman
FAILED
Build succeeded for 4 out of 5 (2 easyconfigs in this PR)
lg-1r17-n06 - Linux centos 6.6, Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz, Python 2.6.6
See https://gist.github.com/2866c5e8ba7c802f03daa19a455191b9 for a full test report.

@bartoldeman
Copy link
Copy Markdown
Contributor

Test report by @bartoldeman
FAILED
Build succeeded for 0 out of 1 (2 easyconfigs in this PR)
lg-1r17-n06 - Linux centos 6.6, Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz, Python 2.6.6
See https://gist.github.com/a3bdbc058907f69680424d070e5366af for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 13, 2018

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
node2709.swalot.os - Linux centos linux 7.5.1804, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz, Python 2.7.5
See https://gist.github.com/0e4f72a61cb3bcee95301d0a17297cb0 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 13, 2018

@migueldiascosta Up for testing this too on CentOS 6?

@migueldiascosta
Copy link
Copy Markdown
Member

Test report by @migueldiascosta
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in this PR)
grc-cluster1 - Linux centos 6.10, Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz, Python 2.7.14
See https://gist.github.com/c90021b9166bc4875f951fef23b45551 for a full test report.

@migueldiascosta
Copy link
Copy Markdown
Member

I can get further by adding a few more -lrt flags [1], but then it still fails with

  /opt/apps/util/easybuild/software/OpenMPI/3.1.1-GCC-7.3.0-2.30/bin/mpicc -o bazel-out/k8-opt/bin/external/protobuf_archive/protoc '-fuse-ld=gold' -Wl,-no-as-needed -Wl,-z,relro,-z,now -B/opt/apps/util/easybuild/software/OpenMPI/3.1.1-GCC-7.3.0-2.30/bin -B/opt/apps/util/easybuild/software/binutils/2.30-GCCcore-7.3.0/bin -pass-exit-codes -Wl,--gc-sections -Wl,@bazel-out/k8-opt/bin/external/protobuf_archive/protoc-2.params)
collect2: error: ld returned 1 exit status
Target //tensorflow/tools/pip_package:build_pip_package failed to build

[1] https://gist.github.com/migueldiascosta/415e946223e9d5154e91a1c0388bfb79

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 14, 2018

@migueldiascosta Can you upload a full log for that last attempt?

@migueldiascosta
Copy link
Copy Markdown
Member

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 14, 2018

The only additional info I could find is this:

ERROR: /dev/shm/tmp/eb-mzdivg/tmp6DAelR-bazel-build/external/protobuf_archive/BUILD:373:1: Linking of rule '@protobuf_archive//:protoc' failed (Exit 1): mpicc failed: error executing command 

Does that happen to ring any bells for you @akesandgren?

@akesandgren
Copy link
Copy Markdown
Contributor

Out of space?

I never build in /dev/shm, these things just takes way to much space...

@migueldiascosta
Copy link
Copy Markdown
Member

@akesandgren you're right (:man_facepalming:)

@boegel with the additional -lrt flags it builds. I'm using your pip_check branch and it flagged this:

tensorflow 1.12.0 requires wheel, which is not installed.
tensorboard 1.12.0 requires wheel, which is not installed.
keras-applications 1.0.6 requires h5py, which is not installed.
tensorflow 1.12.0 has requirement protobuf>=3.6.1, but you have protobuf 3.6.0.

I suppose wheel should be a dependency instead of builddependency (?), protobuf should be bumped up and h5py added.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

@akesandgren Is there any way we can make it more clear that the TF build may have failed because of a lack of disk space?

@akesandgren
Copy link
Copy Markdown
Contributor

akesandgren commented Nov 15, 2018

Not really, it's a generic problem though. Any build may fail because of OOS but detecting it correctly from a build system is basically impossible. You could perhaps do it for CMake/ConfigureMake by scanning the logs for certain messages, but generally, and bazel specifically, not a chance.

(And no, looking at the df output of the builddir/tmpdir etc doesn't work since something else that was using the same space may have removed files from there before you look)

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

@akesandgren I know it's a general problem, but usually there's a pretty clear error...

How about looking at the output of df -h %(builddir)s when building the TF wheel failed?

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

@migueldiascosta Patch updated, problems with required Python packages fixed.

The dependency variants check is going to fail now though, since we now have two versions of protobuf & protobuf-python in the 2018b easyconfigs... :(

@migueldiascosta
Copy link
Copy Markdown
Member

@boegel your commit only has the new protobuf easyconfigs, you forgot to add the tf easyconfige and the lrt patch?

…obuf-pythion version to required 3.6.1, move wheel to runtime deps
@boegel boegel force-pushed the 20181113163523_new_pr_TensorFlow1120 branch from e76d70e to 57d4b0f Compare November 15, 2018 08:50
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

@migueldiascosta Woops! Fixed with a git push --force...

@akesandgren
Copy link
Copy Markdown
Contributor

@boegel As i said, checking df -h output won't help if something else removed files.
But if it is indeed still at 100% then it is a good indication of problems.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in this PR)
node2424.golett.os - Linux centos linux 7.5.1804, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, Python 2.7.5
See https://gist.github.com/8193348addf3760c3c67f8077b6253af for a full test report.

@boegelbot
Copy link
Copy Markdown
Collaborator

Travis test report: 7/7 runs failed - see https://travis-ci.org/easybuilders/easybuild-easyconfigs/builds/455391107

Only showing partial log for 1st failed test suite run 11716.1;
full log at https://travis-ci.org/easybuilders/easybuild-easyconfigs/jobs/455391108

...
FAIL: test_dep_versions_per_toolchain_generation (test.easyconfigs.easyconfigs.EasyConfigTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/easybuilders/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 299, in test_dep_versions_per_toolchain_generation
    self.assertFalse(multi_dep_vars, error_msg)
AssertionError: No multi-variant deps found for '^.*-(?P<tc_gen>201[89][ab]).*\.eb$' easyconfigs:

found 2 variants of 'protobuf' dependency in easyconfigs using '2018b' toolchain generation
* version: 3.6.0; versionsuffix:  as dep for set(['protobuf-python-3.6.0-foss-2018b-Python-3.6.6.eb', 'TensorFlow-1.11.0-foss-2018b-Python-3.6.6.eb', 'Keras-2.2.2-fosscuda-2018b-Python-2.7.15.eb', 'TensorFlow-1.10.1-fosscuda-2018b-Python-2.7.15.eb', 'protobuf-python-3.6.0-fosscuda-2018b-Python-2.7.15.eb', 'TensorFlow-1.10.1-foss-2018b-Python-3.6.6.eb', 'TensorFlow-1.10.0-fosscuda-2018b-Python-2.7.15.eb'])
* version: 3.6.1; versionsuffix:  as dep for set(['protobuf-python-3.6.1-foss-2018b-Python-3.6.6.eb', 'TensorFlow-1.12.0-foss-2018b-Python-3.6.6.eb'])

found 3 variants of 'protobuf-python' dependency in easyconfigs using '2018b' toolchain generation
* version: 3.6.0; versionsuffix: -Python-2.7.15 as dep for set(['TensorFlow-1.10.0-fosscuda-2018b-Python-2.7.15.eb', 'Keras-2.2.2-fosscuda-2018b-Python-2.7.15.eb', 'TensorFlow-1.10.1-fosscuda-2018b-Python-2.7.15.eb'])
* version: 3.6.0; versionsuffix: -Python-3.6.6 as dep for set(['TensorFlow-1.10.1-foss-2018b-Python-3.6.6.eb', 'TensorFlow-1.11.0-foss-2018b-Python-3.6.6.eb'])
* version: 3.6.1; versionsuffix: -Python-3.6.6 as dep for set(['TensorFlow-1.12.0-foss-2018b-Python-3.6.6.eb'])


----------------------------------------------------------------------
Ran 9950 tests in 728.746s

FAILED (failures=1)
ERROR: Not all tests were successful.

(bleep, bloop, I'm just a bot, please talk to my owner @boegel if you notice you me acting stupid)

@migueldiascosta
Copy link
Copy Markdown
Member

Test report by @migueldiascosta
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in this PR)
grc-cluster1 - Linux centos 6.10, Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz, Python 2.7.14
See https://gist.github.com/1b64e017b2ac4330b97fb82f3404df22 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

As expected, the dep variant check fails... Thoughts on how to deal with that @akesandgren, @migueldiascosta?

Would it be acceptable in this case to bypass the check for protobuf(-python) in case they're a dependency of TensorFlow?

@akesandgren
Copy link
Copy Markdown
Contributor

I'd say so.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

@akesandgren Thoughts on the changes in the last commit? (c2bc9b5)

@bartoldeman
Copy link
Copy Markdown
Contributor

Test report by @bartoldeman
SUCCESS
Build succeeded for 5 out of 5 (4 easyconfigs in this PR)
lg-1r17-n06 - Linux centos 6.6, Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz, Python 2.6.6
See https://gist.github.com/9b47fedc292a165e5195abd8dc349c24 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in this PR)
node3146.skitty.os - Linux centos linux 7.5.1804, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 2.7.5
See https://gist.github.com/ae0dac537483e8b8f89ae9be0007b209 for a full test report.

@akesandgren
Copy link
Copy Markdown
Contributor

I think it looks ok.

@bartoldeman
Copy link
Copy Markdown
Contributor

Test report by @bartoldeman
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in this PR)
lg-1r17-n06 - Linux centos 6.6, Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz, Python 2.6.6
See https://gist.github.com/023dc6ccf755899f72f95c15454fced6 for a full test report.

Copy link
Copy Markdown
Contributor

@bartoldeman bartoldeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if %(name)s should be used religiously. I'm happy either way, since it's %(version)s that saves editing usually, not %(name)s.

Comment thread easybuild/easyconfigs/b/Bazel/Bazel-0.18.0-GCCcore-7.3.0.eb Outdated
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

@bartoldeman I'd like to see a test report from @migueldiascosta too before we merge this in.

@migueldiascosta
Copy link
Copy Markdown
Member

Test report by @migueldiascosta
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in this PR)
grc-cluster1 - Linux centos 6.10, Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz, Python 2.7.14
See https://gist.github.com/c7d9cf14285fa67091d83134fea2d1e0 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

That was quick @migueldiascosta! 👍 ;)

Copy link
Copy Markdown
Contributor

@bartoldeman bartoldeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@bartoldeman bartoldeman added this to the 3.8.0 milestone Nov 15, 2018
@bartoldeman
Copy link
Copy Markdown
Contributor

Test report by @bartoldeman
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in this PR)
lg-1r17-n06 - Linux centos 6.6, Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz, Python 2.6.6
See https://gist.github.com/bec8f8bdfcc8fbb2e3d8968300568b39 for a full test report.

@bartoldeman
Copy link
Copy Markdown
Contributor

Going in, thanks @boegel!

@bartoldeman bartoldeman merged commit 2bfc573 into easybuilders:develop Nov 15, 2018
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 15, 2018

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in this PR)
node2607.swalot.os - Linux centos linux 7.5.1804, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz, Python 2.7.5
See https://gist.github.com/1d7ed2c1e826caab264bc03d09228dec for a full test report.

@boegel boegel deleted the 20181113163523_new_pr_TensorFlow1120 branch November 15, 2018 19:43
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 16, 2018

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in this PR)
node2109.delcatty.os - Linux centos linux 7.5.1804, Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, Python 2.7.5
See https://gist.github.com/2c2b96ca3d044b1611ed960c274af869 for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants