Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
存在一个bug,当传入参数
device并非从0开始(),例如传入2,3或者1,2 时候,提示报错:AssertionError: Invalid device id。这个bug产生的原因是当75行代码设置了全局GPU的数量,如下:
os.environ["CUDA_VISIBLE_DEVICES"] = args.device # 此处设置程序使用哪些显卡而多GPU平行模型的代码如下:
model = DataParallel(model, device_ids=[int(i) for i in args.device.split(',')])问题在于device_ids读入的id是实际id号而因为设置了环境变量CUDA_VISIBLE_DEVICES,导致gpu 的id不一致。
以传入两个gpu id为例:
0,1环境变量识别出两个GPU,则GPU的id为0,1,工作正常;1,2,环境变量识别出的两个GPU,工作环境识别的的ID是’0,1‘, 那此时的DataParallel中的device_ids如果继续传入1,2就会报错AssertionError: Invalid device id解决办法:
把多GPU平行模型的代码如下:
model = DataParallel(model, device_ids=[int(i) for i in args.device.split(',')])改为
model = DataParallel(model, device_ids=list(range(len(args.device.split(',')))))即根据数量,从0开始建立id list。