自己的电脑做服务器,并建网站,营销网站模板html,建设部网站上怎样查询企业业绩,腾讯云 wordpress建站文章目录 一、安装二、训练设置Step1 第一步参数解析Step2 初始化后端Step3 训练初始化 三、训练代码展示 官方文档直接抄过来#xff0c;留个笔记。 https://deepspeed.readthedocs.io/en/latest/initialize.html
使用案例来自#xff1a; https://github.com/OvJat/DeepSp… 文章目录 一、安装二、训练设置Step1 第一步参数解析Step2 初始化后端Step3 训练初始化 三、训练代码展示 官方文档直接抄过来留个笔记。 https://deepspeed.readthedocs.io/en/latest/initialize.html
使用案例来自 https://github.com/OvJat/DeepSpeedTutorial 大模型训练的痛点是模型参数过大动辄上百亿如果单靠单个GPU来完成训练基本不可能。所以需要多卡或者分布式训练来完成这项工作。
DeepSpeed是由Microsoft提供的分布式训练工具旨在支持更大规模的模型和提供更多的优化策略和工具。对于更大模型的训练来说DeepSpeed提供了更多策略例如Zero、Offload等。
本文简单介绍下如何使用DeepSpeed。 一、安装
pip install deepspeed二、训练设置
Step1 第一步参数解析
DeepSpeed 使用 argparse 来应用控制台的设置使用
deepspeed.add_config_arguments()可以将DeepSpeed内置的参数增加到我们自己的应用参数解析中。
parser argparse.ArgumentParser(descriptionMy training script.)
parser.add_argument(--local_rank, typeint, default-1,helplocal rank passed from distributed launcher)
# Include DeepSpeed configuration arguments
parser deepspeed.add_config_arguments(parser)
cmd_args parser.parse_args()Step2 初始化后端
与Step3中的 deepspeed.initialize() 不同 直接调用即可。 一般发生在以下场景
when using model parallelism, pipeline parallelism, or certain data loader scenarios.
在Step3的initialize前进行调用
deepspeed.init_distributed()Step3 训练初始化
首先调用 deepspeed.initialize() 进行初始化是整个调用DeepSpeed训练的入口。 调用后如果分布式后端没有被初始化后此时会初始化分布式后端。 使用案例
model_engine, optimizer, _, _ deepspeed.initialize(argscmd_args,modelnet,model_parametersnet.parameters(),training_datads)API如下
def initialize(argsNone,model: torch.nn.Module None,optimizer: Optional[Union[Optimizer, DeepSpeedOptimizerCallable]] None,model_parameters: Optional[torch.nn.Module] None,training_data: Optional[torch.utils.data.Dataset] None,lr_scheduler: Optional[Union[_LRScheduler, DeepSpeedSchedulerCallable]] None,distributed_port: int TORCH_DISTRIBUTED_DEFAULT_PORT,mpuNone,dist_init_required: Optional[bool] None,collate_fnNone,configNone,config_paramsNone):Initialize the DeepSpeed Engine.Arguments:args: an object containing local_rank and deepspeed_config fields.This is optional if config is passed.model: Required: nn.module class before apply any wrappersoptimizer: Optional: a user defined Optimizer or Callable that returns an Optimizer object.This overrides any optimizer definition in the DeepSpeed json config.model_parameters: Optional: An iterable of torch.Tensors or dicts.Specifies what Tensors should be optimized.training_data: Optional: Dataset of type torch.utils.data.Datasetlr_scheduler: Optional: Learning Rate Scheduler Object or a Callable that takes an Optimizer and returns a Scheduler object.The scheduler object should define a get_lr(), step(), state_dict(), and load_state_dict() methodsdistributed_port: Optional: Master node (rank 0)s free port that needs to be used for communication during distributed trainingmpu: Optional: A model parallelism unit object that implementsget_{model,data}_parallel_{rank,group,world_size}()dist_init_required: Optional: None will auto-initialize torch distributed if needed,otherwise the user can force it to be initialized or not via boolean.collate_fn: Optional: Merges a list of samples to form amini-batch of Tensor(s). Used when using batched loading from amap-style dataset.config: Optional: Instead of requiring args.deepspeed_config you can pass your deepspeed configas an argument instead, as a path or a dictionary.config_params: Optional: Same as config, kept for backwards compatibility.Returns:A tuple of engine, optimizer, training_dataloader, lr_scheduler* engine: DeepSpeed runtime engine which wraps the client model for distributed training.* optimizer: Wrapped optimizer if a user defined optimizer is supplied, or ifoptimizer is specified in json config else None.* training_dataloader: DeepSpeed dataloader if training_data was supplied,otherwise None.* lr_scheduler: Wrapped lr scheduler if user lr_scheduler is passed, orif lr_scheduler specified in JSON configuration. Otherwise None.三、训练代码展示
def parse_arguments():import argparseparser argparse.ArgumentParser(descriptiondeepspeed training script.)parser.add_argument(--local_rank, typeint, default-1,helplocal rank passed from distributed launcher)# Include DeepSpeed configuration argumentsparser deepspeed.add_config_arguments(parser)args parser.parse_args()return argsdef train():args parse_arguments()# init distributeddeepspeed.init_distributed()# init modelmodel MyClassifier(3, 100, ch_multi128)# init datasetds MyDataset((3, 512, 512), 100, sample_countint(1e6))# init engineengine, optimizer, training_dataloader, lr_scheduler deepspeed.initialize(argsargs,modelmodel,model_parametersmodel.parameters(),training_datads,# configdeepspeed_config,)# load checkpointengine.load_checkpoint(./data/checkpoints/MyClassifier/)# trainlast_time time.time()loss_list []echo_interval 10engine.train()for step, (xx, yy) in enumerate(training_dataloader):step 1xx xx.to(deviceengine.device, dtypetorch.float16)yy yy.to(deviceengine.device, dtypetorch.long).reshape(-1)outputs engine(xx)loss tnf.cross_entropy(outputs, yy)engine.backward(loss)engine.step()loss_list.append(loss.detach().cpu().numpy())if step % echo_interval 0:loss_avg np.mean(loss_list[-echo_interval:])used_time time.time() - last_timetime_p_step used_time / echo_intervalif args.local_rank 0:logging.info([Train Step] Step:{:10d} Loss:{:8.4f} | Time/Batch: {:6.4f}s,step, loss_avg, time_p_step,)last_time time.time()# save checkpointengine.save_checkpoint(./data/checkpoints/MyClassifier/)最后~
码字不易~~ 独乐不如众乐~~ 如有帮助欢迎点赞收藏~~