Researchers developing methods: Please, do, and do not do this

Have you ever tried to use someone else's code? If you are a researcher developing methods, you have definitely gone through this. In fact, you have probably attempted something a more complicated: using someone else's code on different data that the method was not tested.

Sharing your code is great. I personally think that it is completely pointless to develop a method and not share your code. Do you want people to use your method? Make it easy for them. But, what's the point of sharing your code if it's so badly written that no one can use it? Through the years, I found code that is so badly written that I had two choices: either completely ignore it and find another method with better code, or spend a lot of time refactoring and debugging the code. By doing the latter, I might accidentally change or ignore some crucial parts, leading to underperformance (which, on the other hand, you might be happy about, because your method could easily---for some reason---beat the one you spent the whole weekend refractoring). Not to mention that people who enjoy coding less than you are not willing to waste their time on your code.

Here, I compiled (work in progress) a list of things that you should never do, things that you should do, and things that would be great if you do.

Do not do this

Don't upload Jupyter Notebooks (JNs)

JNs are great for data exploration, writting tutorials, and a bunch of other things. But presenting your method in a JN will make it impossible to run it in the command line, as most researchers do. Additionally, your method won't be able to run in a cluster. What you can do is to nicely pack it into various python scripts.

An even worse practice is to copy-paste all the JN cells into a python script directly. In addition to end up having unreadable code, this is prone to errors. In JN, once you load a cell, the data/function will be in memory, so, you might need to run, say, cell #10 before running cell #8.


Do not import libraries in non-standard ways
import scipy as sp
import torch as T

Also:

from library import *

This is a very bad practice because: 1) the script will import functions that doesn't use, 2) when calling those functions we won't know from which library they come from, and 3) those functions may override other already-loaded functions.


Do not pass unnecessary parameters to functions
config # dictionary with all the experimental details
def train(config):
    ...
    
def test(config):
    ...
    
class NeuralNetwork:
    def __init__(self, config):
        ...

This act of laziness makes it very difficult for everyone to know the variables that each function uses. Also for you; try to look back at your code 2 years later. Another problem that results from this practice is that, when someone wants to use your method/class in their own code, they are forced to create that "config" dictionary and look for what it should contain.

Do not give random names to the variables
tmp1 = ...
tt = ...
a = ...
xx0 = ...
xyz = ...

Why does the user of your code (including yourself) needs to spend time and guess what those variable contains? Especially if the code is uncommented.

Do this

Using typing

Typing makes your function much easier to understand. It also makes you save space in the comments since you don't need to specify there the type of arguments.

def trainer(model: torch.nn.Module,
            tr_loader: monai.data.dataloader.DataLoader,
            loss: Loss,
            opt: Optimizer,
            scheduler: Union[_LRScheduler, None],
            iteration_start: int,
            iterations: int,
            val_loader: monai.data.dataloader.DataLoader,
            val_interval: int,
            metrics: List[monai.metrics.CumulativeIterationMetric],
            datalib: ModuleType,
            postprocessing: monai.transforms.Compose,
            path_exp: Path,
            device: str,
            callbacks: List[Callable]=[]) -> None:
Make your code general

Consider the following function that reshapes "tensors":

def reshapeTensor(tensor):
    r,c,ch,f = tensor.shape
    new_dim = [c,r*ch*f]
    return np.reshape(tensor, new_dim)

Besides the ambiguity problem (at first, one could think that the input is a torch.Tensor, but then, one realizes that the function uses numpy to reshape the tensor), the problem of this function is that it won't work on 5D tensors, which are common for those working on 3D images. A better looking function would be the following, which works on 4D (2D convs) and 5D (3D convs):

def reshapeArray(arr):
    arr = np.moveaxis(arr, 0, 1)
    return arr.reshape(arr.shape[0], -1)
Use random seeds
Use random seeds to make your results reproducible.