When Work in Process is piling up, big batches are a usual suspect. Batch size is the quantity of stuff contained in a unit of work, e.g. the number of features or the size of features contained in a story, or the number of stories in an EPIC. Show Making batch size smaller consists in reducing the number of tasks that are bundled together, and that you need to finish together to consider your are done. That can be achieved through:
The benefits of reducing batch size are applicable at every level of software engineering (be it planning or delivery), but also for a variety of aspects that may have nothing to do with software (organizing school expeditions, doing the laundry, writing a blog post or fixing your house). In the rest of this post I will try to refer to work and tasks instead of stories and software to make it as generic as possible. The benefits of small batches are:
I got lots of inspiration from Principles of product development flow by Donald Reinersten, which I still highly, highly recommend. Why is batch size important?Batch Size is among the important hyperparameters in Machine Learning. It is the hyperparameter that defines the number of samples to work through before updating the internal model parameters. It can one of the crucial steps to making sure your models hit peak performance.
Are smaller batch sizes better?There is a high correlation between the learning rate and the batch size, when the learning rates are high, the large batch size performs better than with small learning rates. We recommend choosing small batch size with low learning rate.
Why is it important to work quickly in small batches?Working in small batches has many benefits: It reduces the time it takes to get feedback on changes, making it easier to triage and remediate problems. It increases efficiency and motivation.
What happens if batch size is too small?The issue is that a small batch size both helps and hurts convergence. Updating the weights based on a small batch will be more noisy. The noise can be good, helping by jerking out of local optima. However, the same noise and jerkiness will prevent the descent from fully converging to an optima at all.
|