Augmentation Strategies

In this notebook we explore different augmentation strategies for neural differential equations. This time, we’ll make use of the 3D concentric spheres dataset.

[1]:
import sys
sys.path.append('../')
[2]:
from torchdyn.models import *; from torchdyn.datasets import *
from torchdyn import *

Data

[6]:
# Generate 3D nested spheres data
d = ToyDataset()
X, yn = d.generate(n_samples=2 << 12, dataset_type='spheres')
[7]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

c = ['blue', 'orange']
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111, projection='3d')
for i in range(2):
    ax.scatter(X[yn==i,0], X[yn==i,1], X[yn==i,2], s=5, alpha=0.5, c=c[i])
../_images/tutorials_04_augmentation_strategies_6_0.png
[8]:
import torch
import torch.utils.data as data
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Load data into dataloader
bs = len(X)
X_train = torch.Tensor(X).to(device)
y_train = torch.LongTensor(yn.long()).to(device)
train = data.TensorDataset(X_train, y_train)
trainloader = data.DataLoader(train, batch_size=bs, shuffle=True)

Learner

[9]:
import torch.nn as nn
import pytorch_lightning as pl

class Learner(pl.LightningModule):
    def __init__(self, model:nn.Module):
        super().__init__()
        self.model = model

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self.model(x)
        loss = nn.CrossEntropyLoss()(y_hat, y)
        logs = {'train_loss': loss}
        return {'loss': loss, 'log': logs}

    def configure_optimizers(self):
        return torch.optim.Adam(self.model.parameters(), lr=0.01)

    def train_dataloader(self):
        return trainloader

Neural ODE (0-augmentation)

The first model under consideration involves 0-augmentation (often referred to as ANODE) and consisting in augmenting the state \(h\) with additional dimensions \(a\) which are initialized to 0

\[\begin{split}\left\{ \begin{aligned} \begin{bmatrix} \dot z(s)\\ \dot a(s) \end{bmatrix} &= f([z(s), a(s)], \theta)\\ \begin{bmatrix} \dot h(0)\\ \dot a(0) \end{bmatrix} &= \begin{bmatrix} x\\ 0 \end{bmatrix}\\ \hat y &= Linear([z(1),a(1)]) \end{aligned}\right.\end{split}\]

With torchdyn, turning a neural ODE into an ANODE is as simple as using the Augmenter class as follows:

[14]:
# We consider 5 augmenting dimensions, i.e. the DEFunc must accomodate 8 inputs
func = nn.Sequential(nn.Linear(8, 64),
                            nn.Tanh(),
                            nn.Linear(64, 8)
                    )


# Define NeuralDE
neuralDE = NeuralDE(func, solver='dopri5').to(device)

# Here we specify to the "Augmenter" the 5 extra dims. For 0-augmentation, we do not need to pass additional arg.s
model = nn.Sequential(Augmenter(augment_dims=5),
                      neuralDE,
                      nn.Linear(8, 2)).to(device)

Note that Augmenter can be fully-specified outside the neuralDE model. This makes it straightforward to switch between augmented and non-augmented at will.

[15]:
# Train the model
learn = Learner(model)
trainer = pl.Trainer(max_epochs=200)
trainer.fit(learn)
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type       | Params
-------------------------------------
0 | model | Sequential | 1 K

[15]:
1

Plots

We plot models outputs \(\hat y\)

[18]:
# Evaluate outputs
y_hat = model(X_train).detach().cpu()

# Plot results
c = ['blue', 'orange']
fig = plt.figure(figsize=(3,3))
ax = fig.add_subplot(111)
for i in range(2):
    ax.scatter(y_hat[yn==i,0], y_hat[yn==i,1], s=2, alpha=0.2, c=c[i])
../_images/tutorials_04_augmentation_strategies_16_0.png

Neural ODE (Input-Layer (IL)-Augmentation)

Input layer augmentation (IL-augmentation) is also easy to implement with torchdyn.

\[\begin{split}\left\{ \begin{aligned} \begin{bmatrix} \dot z(s)\\ \dot a(s) \end{bmatrix} &= f([h(s), a(s)], \theta)\\ \begin{bmatrix} \dot z(0)\\ \dot a(0) \end{bmatrix} &= \begin{bmatrix} x\\ g(x, \omega) \end{bmatrix}\\ \hat y &= Linear([h(1),a(1)]) \end{aligned}\right.\end{split}\]

where \(g(x, \omega)\) is a neural network, usually a single linear layer. The following is an example of a partial IL-augmentation where a linear layer is trained to determine the initial condition of the additional 5 dimensions. Notice how the DEFunc goes from 8 to 8 dimensions.

[19]:
# We consider 5 augmenting dimensions, i.e. the DEFunc must accomodate 8 inputs
func = nn.Sequential(nn.Linear(8,64),
                            nn.Tanh(),
                            nn.Linear(64,8))


# Define NeuralDE
neuralDE = NeuralDE(func, solver='dopri5').to(device)

# Here we just need to specify to the "Augmenter" the input layer. In this case an nn.Linear(3, 5)
model = nn.Sequential(Augmenter(augment_func=nn.Linear(3, 5)),
                      neuralDE,
                      nn.Linear(8,2)).to(device)
[21]:
# Train the model
learn = Learner(model)
trainer = pl.Trainer(max_epochs=200)
trainer.fit(learn)
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type       | Params
-------------------------------------
0 | model | Sequential | 1 K

[21]:
1

Plots

We plot models outputs \(\hat y\)

[22]:
# Evaluate outputs
y_hat = model(X_train).detach().cpu()

# Plot results
c = ['blue', 'orange']
fig = plt.figure(figsize=(3,3))
ax = fig.add_subplot(111)
for i in range(2):
    ax.scatter(y_hat[yn==i,0], y_hat[yn==i,1], s=2, alpha=0.2, c=c[i])
../_images/tutorials_04_augmentation_strategies_22_0.png

Note that for a full IL-augmentation, the following model definition can replace the above.

\[\begin{split}\left\{ \begin{aligned} \begin{bmatrix} \dot z(s)\\ \dot a(s) \end{bmatrix} &= f([z(s), a(s)], \theta)\\ \begin{bmatrix} \dot z(0)\\ \dot a(0) \end{bmatrix} &= g(x, \omega)\\ \hat y &= Linear([z(1),a(1)]) \end{aligned}\right.\end{split}\]

Determining the entire initial condition with a linear layer \(g\) can be achieved without Augmenter as follows

[24]:
func = DEFunc(nn.Sequential(nn.Linear(8, 64),
                            nn.Tanh(),
                            nn.Linear(64, 8))
                            )

neuralDE = NeuralDE(func).to(device)

model = nn.Sequential(nn.Linear(3, 8),
                      neuralDE,
                      nn.Linear(8, 2)).to(device)

Neural ODE (Higher-Order Augmentation)

Finally, we take a look at increasing the order of the ODE as a method to augment the dimensions, e.g. a 2nd order neural ODE

\[\begin{split}\left\{ \begin{aligned} \begin{bmatrix} \dot z(s)\\ \dot a(s) \end{bmatrix} &= \begin{bmatrix} a(s)\\ f([z(s),a(s)],\theta) \end{bmatrix}\\ \begin{bmatrix} \dot z(0)\\ \dot a(0) \end{bmatrix} &= g(x, \omega)\\ \hat y &= Linear([z(1),a(1)]) \end{aligned}\right.\end{split}\]

In torchdyn, the DEFunc can be specified to evolve according to higher orders by passing func_type='higher_order and order=n as arguments. Augmenter is still used to augment with data dimension * n additional dimensions.

[26]:
func = nn.Sequential(nn.Linear(6, 64),
                     nn.Tanh(),
                     nn.Linear(64, 3))

neuralDE = NeuralDE(func, solver='dopri5', order=2).to(device)

model = nn.Sequential(Augmenter(augment_dims=3),
                      neuralDE,
                      nn.Linear(6,2)).to(device)
[27]:
learn = Learner(model)
trainer = pl.Trainer(max_epochs=300)
trainer.fit(learn)
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type       | Params
-------------------------------------
0 | model | Sequential | 657

[27]:
1

Plots

We plot models outputs \(\hat y\)

[29]:
# Evaluate outputs
y_hat = model(X_train).detach().cpu()

# Plot results
c = ['blue', 'orange']
fig = plt.figure(figsize=(3,3))
ax = fig.add_subplot(111)
for i in range(2):
    ax.scatter(y_hat[yn==i,0], y_hat[yn==i,1], s=2, alpha=0.2, c=c[i])
../_images/tutorials_04_augmentation_strategies_30_0.png
[ ]: