lightning¶
Lightning utilities.
- cneuromax.fitting.deeplearning.utils.lightning.instantiate_trainer(trainer_partial, logger_partial, device, output_dir, save_every_n_train_steps)[source]¶
Instantiates
trainer_partial
.- Parameters:
trainer_partial (
partial
[Trainer
])logger_partial (
partial
[WandbLogger
])output_dir (
str
) – Seeoutput_dir
.save_every_n_train_steps (
int
|None
) – Seesave_every_n_train_steps
.
- Return type:
- Returns:
A
lightning.pytorch.Trainer
instance.
- cneuromax.fitting.deeplearning.utils.lightning.set_batch_size_and_num_workers(trainer, datamodule, litmodule, device, output_dir)[source]¶
Sets attribute values for a
BaseDataModule
.See
find_good_per_device_batch_size()
andfind_good_per_device_num_workers()
for more details on how these variables’ values are determined.- Parameters:
output_dir (
str
) – Seeoutput_dir
.
- Return type:
- cneuromax.fitting.deeplearning.utils.lightning.find_good_per_device_batch_size(litmodule, datamodule, device, device_ids, output_dir)[source]¶
Probes a
per_device_batch_size
value.This functionality makes the following, not always correct, but generally reasonable assumptions:
As long as the
total_batch_size / dataset_size
ratio remains
small (e.g.
< 0.01
so as to benefit from the stochasticity of gradient updates), running the same number of gradient updates with a larger batch size will yield better training performance than running the same number of gradient updates with a smaller batch size.Loading data from disk to RAM is a larger bottleneck than loading
data from RAM to GPU VRAM.
If you are training on multiple GPUs, each GPU has roughly the
same amount of VRAM.
- Parameters:
device_ids (
list
[int
]) – Seelightning.pytorch.Trainer.device_ids
.output_dir (
str
) – Seeoutput_dir
.
- Return type:
- Returns:
A roughly optimal
per_device_batch_size
value.
- cneuromax.fitting.deeplearning.utils.lightning.find_good_per_device_num_workers(datamodule, per_device_batch_size, max_num_data_passes=100)[source]¶
Probes a
per_device_num_workers
value.Iterates through a range of
num_workers
values and measures the time it takes to iterate through a fixed number of data passes; returning the value that yields the shortest time.- Parameters:
per_device_batch_size (
int
) – The return value offind_good_per_device_batch_size()
.max_num_data_passes (
int
, default:100
) – Maximum number of data passes to iterate through.
- Return type:
- Returns:
A roughly optimal
per_device_num_workers
value.
- class cneuromax.fitting.deeplearning.utils.lightning.InitOptimParamsCheckpointConnector(trainer)[source]¶
Bases:
_CheckpointConnector
Tweaked Lightning ckpt connector.
Allows to make use of the instantiated optimizers’ hyper-parameters rather than the checkpointed hyper-parameters. For use when resuming training with different optimizer hyper-parameters (e.g. with a PBT Hydra Sweeper).