Streaming flow policy with stabilizing conditional flow

Streaming flow policy with stabilizing conditional flow#

import jupyviz as jviz
import matplotlib.pyplot as plt
import numpy as np
import torch; torch.set_default_dtype(torch.double)
from streaming_flow_policy.all import StreamingFlowPolicyASpace
from streaming_flow_policy.toy.plot_aspace import (
    plot_probability_density,
    plot_probability_density_and_vector_field,
    plot_probability_density_and_streamlines,
    plot_probability_density_with_static_trajectories,
    plot_probability_density_and_streamlines_with_animated_trajectories,
)

from pydrake.all import (
    CompositeTrajectory,
    PiecewisePolynomial,
    Trajectory,
)

# Set seed
np.random.seed(0)

Set hyperparameters#

σ0 = 0.05
k = 2.5

def demonstration_traj_right() -> Trajectory:
    """
    Returns a trajectory x(t) that is 0 for 0 < t < 0.25, and a sine curve
    for 0.25 < t < 1 that starts at 0 and ends at 0.75.
    """
    piece_1 = PiecewisePolynomial.FirstOrderHold(
        breaks=[0, 0.25],
        samples=[[0, 0]],
    )
    piece_2 = PiecewisePolynomial.CubicWithContinuousSecondDerivatives(
        breaks=[0.25, 0.50, 0.75, 1.0],
        samples=[[0.00, 0.62, 0.70, 0.5]],
        sample_dot_at_start=[[0.0]],
        sample_dot_at_end=[[-0.7]],
    )
    return CompositeTrajectory([piece_1, piece_2])

def demonstration_traj_left() -> Trajectory:
    """
    Returns a trajectory x(t) that is 0 for 0 < t < 0.25, and a sine curve
    for 0.25 < t < 1 that starts at 0 and ends at 0.75.
    """
    piece_1 = PiecewisePolynomial.FirstOrderHold(
        breaks=[0, 0.25],
        samples=[[0, 0]],
    )
    piece_2 = PiecewisePolynomial.CubicWithContinuousSecondDerivatives(
        breaks=[0.25, 0.50, 0.75, 1.0],
        samples=[[0.00, -0.62, -0.70, -0.5]],
        sample_dot_at_start=[[0.0]],
        sample_dot_at_end=[[0.7]],
    )
    return CompositeTrajectory([piece_1, piece_2])

traj_right = demonstration_traj_right()
traj_left = demonstration_traj_left()

Plot demonstration trajectories#

"""
Plot demonstration trajectory on x-y plane where x axis is the state in [-1, 1]
and y axis is the time in [0, 1].
"""
times = np.linspace(0, 1, 100)
plt.plot(traj_right.vector_values(times)[0], times, color='blue', alpha=0.9)
plt.plot(traj_left.vector_values(times)[0], times, color='red', alpha=0.9)
plt.xlim(-1, 1)
plt.ylim(0, 1)
plt.xlabel('Action')
plt.ylabel('Time ⟶')
plt.title('Demonstration Trajectories')
plt.grid(True)
plt.show()

_images/dcaaaae4087e11249631427f90f4349dcc307f70955132e6e79ac923e9aa2a41.png

Notation#

Symbol	Space	Meaning
\(t\)	\([0, 1]\)	Time
\(a\)	\(\mathcal{A}\)	Action
\(x = a\)	\(\mathcal{A}\)	State (may not equal action in general)
\(v\)	\(T\mathcal{A}\)	Velocity
\(h\)	\(\mathcal{H}\)	Observation history
\(v_\theta(x, t \mid h)\)	\(T\mathcal{A}\)	Learned flow policy
\(\xi \sim \mathcal{D}\)	\([0, 1] \rightarrow \mathcal{A}\)	Random variable for demonstration trajectories
\(\xi(t)\)	\(\mathcal{A}\)	Action in the demonstration at time \(t\)
\(\dot{\xi}(t)\)	\(T\mathcal{A}\)	Velocity in the demonstration at time \(t\)

In this notebook, \(x \equiv a\).

Hyperparameters#

Symbol	Space	Meaning
\(\sigma_0\)	\(\mathbb{R}_{\geq 0}\)	Initial standard deviation
\(k\)	\(\mathbb{R}_{\geq 0}\)	Stabilizing gain for the conditional flow

Conditional flow#

fp = StreamingFlowPolicyASpace(dim=1, trajectories=[traj_right], prior=[1.0], σ0=σ0, k=k)

Given \(\xi \sim \mathcal{D}\) with associated observation history \(h\), we will define a conditional flow field \(v_\theta(x, t \mid h)\) that must be learned.

First, we sample the initial action \(a_0\) from a Gaussian centered at the initial action of the demonstration trajectory:

\[ a_0 \sim \mathcal{N}(\xi(0), \sigma_0^2) \]

where \(\sigma_0\) is a small value. Then, the stabilizing velocity field is given by:

\[ \begin{align*} v(a, t \mid \xi) &= \underbrace{-k(a - \xi(t))}_{\text{Stabilization term}} ~~+~~ \hspace{-1em}\underbrace{\dot{\xi}(t)}_{\text{Path velocity}} \end{align*} \]

To solve for the actual flow, we must solve the following ordinary differential equation (ODE):

\[\begin{split} \begin{align*} \frac{\mathrm{d}a}{\mathrm{d}t} &= -k(a - \xi(t)) + \dot{\xi}(t)\\ \implies \frac{\mathrm{d}}{\mathrm{d}t} (a - \xi(t)) &= -k(a - \xi(t))\\ \implies a(t \mid \xi) &= \xi(t) ~+~ \hspace{-1em}\underbrace{\left(a_0 - \xi(0)\right) e^{-kt}}_{\text{Error decays exponentially}} \end{align*} \end{split}\]

Due to the stabilizing velocity field, the initial error \((a_0 - \xi(0))\) decays exponentially with time.

Since \(a(t \mid \xi)\) is linear in \(a_0\), the per-timestep marginal distribution of the conditional flow at any time \(t\) is a Gaussian:

\[\mathbb{P}(a \mid t, \xi) = \mathcal{N} \left( a \,\big\vert\, \xi(t)\,,\, \sigma_0^2 e^{-2kt} \right)\]

Plot conditional probability path of right trajectory#

fig, ax = plt.subplots(dpi=120)
ts = torch.linspace(0, 1, 200)  # (T,)
xs = torch.linspace(-1, 1, 200)  # (X,)
ts, xs = torch.meshgrid(ts, xs, indexing='ij')  # (T, X)
plot_probability_density(fp, ts, xs, ax)
# plt.tight_layout()
plt.show()

_images/28e43c5ad6e7b9bd478081851c5338a289b7ea31451ba609dd7bd5c88671504f.png

Plot conditional vector field of right trajectory#

fig = plt.figure(figsize=(8, 4), dpi=300)
gs = fig.add_gridspec(1, 2, width_ratios=[1, 1])
ax1 = fig.add_subplot(gs[0])
ax2 = fig.add_subplot(gs[1])
im1 = plot_probability_density_and_vector_field(fp, ax1)
im2 = plot_probability_density_and_streamlines(fp, ax2)
plt.tight_layout()  # Uncommented to adjust spacing
plt.show()

_images/93cd6bf1e13f81edff5562b4d834987a37d3cf49866ccf941c3f6175db92a658.png

Plot trajectories under conditional flow of right trajectory#

fig, ax = plt.subplots(figsize=(5, 4), dpi=120)
im = plot_probability_density_with_static_trajectories(fp, ax, [None] * 20)
plt.show()

_images/64f533e1257ef892edaa4cb1b7be138fc35d844b1cf50c94eafc7d3f303552c2.png

Marginal flow#

fp = StreamingFlowPolicyASpace(dim=1, trajectories=[traj_right, traj_left], prior=[0.5, 0.5], σ0=σ0, k=k)

Training#

Flow matching loss:

\[ \mathcal{L}_\mathrm{FM}(\theta, \mathcal{D}) = \mathbb{E}_{\xi \sim \mathcal{D},\ t,\ a \sim \mathcal{N}\left(\xi(t), \sigma^2\right)} \big\| v_\theta(a, t \mid h) - v(a, t \mid \xi) \big\|_2^2 \]

Sample trajectory from dataset \(\xi \sim \mathcal{D}\).
Define conditional flow \(v(a, t \mid \xi) = -k(a - \xi(t)) + \dot{\xi}(t)\).
Sample \(t \sim \text{Uniform}([0, 1])\).
Sample \(a \sim \mathcal{N} \left( a \,\big\vert\, \xi(t)\,,\, \sigma_0^2 e^{-2kt} \right)\).
Compute L2 loss: \(\| v_\theta(a, t \mid h) - v(a, t \mid \xi) \|_2^2\).

Flow matching theorem#

If \(v^*(a, t \mid h)\) is the optimal velocity field that minimizes the flow matching loss, then the marginal distributions \(\mathbb{P}^*(a \mid t, h)\) induced by \(v^*\) at every time \(t\) is the “average” of the conditional flow distributions \(\mathbb{P}(a \mid t, \xi)\) averaged over the training distribution.

\[\mathbb{P}^*(a \mid t, h) = \mathbb{E}_{\xi} \left[ \mathbb{P}(a \mid t, \xi) \right], \ \forall a \in \mathbb{C}, \forall t \in [0, 1]\]

Plot marginal probability path#

fig, ax = plt.subplots(dpi=120)
ts = torch.linspace(0, 1, 200)  # (T,)
xs = torch.linspace(-1, 1, 200)  # (X,)
ts, xs = torch.meshgrid(ts, xs, indexing='ij')  # (T, X)
plot_probability_density(fp, ts, xs, ax)
# plt.tight_layout()
plt.show()

_images/ced6a3bc7b9afc62501ef83734c58b5b43bc65f1d3a17402ff70c366e10f3da6.png

Plot marginal vector field#

fig = plt.figure(figsize=(8, 4), dpi=300)
gs = fig.add_gridspec(1, 2, width_ratios=[1, 1])
ax1 = fig.add_subplot(gs[0])
ax2 = fig.add_subplot(gs[1])
im1 = plot_probability_density_and_vector_field(fp, ax1)
im2 = plot_probability_density_and_streamlines(fp, ax2)
plt.tight_layout()  # Uncommented to adjust spacing
plt.show()

_images/14d382ecaae8b490c395340259acb7852394ab596044b3a602317b4fc33cf564.png

Plot trajectories under marginal flow#

fig, ax = plt.subplots(figsize=(5, 4), dpi=200)
frames = plot_probability_density_and_streamlines_with_animated_trajectories(fp, ax, [None] * 20, num_frames=50, circle_radius=10, dpi=200)
jviz.gif(frames, time_in_ms=3000, hold_last_frame_time_in_ms=1000).html(width=500, pixelated=False).display(); plt.close()

Pathology when starting from \(a=0\)#

Let us compute the trajectory from the current action \(a=0\).

fig, ax = plt.subplots(figsize=(5, 4), dpi=120)
im = plot_probability_density_with_static_trajectories(fp, ax, [0], linewidth=2)
plt.tight_layout()
plt.show()

_images/8b553c06d3dc0fa5250dd73701dae049b5e77f88761b7cdbc224d41678a19eb7.png

Explanation#

This is due to:

The flow being a deterministic. Which means that for a fixed starting point (i.e. initial action), the trajectory is fixed.
In this particular example, the demonstration trajectories are symmetric. This causes the learned velocity field to be zero at \(a=0\) for all \(t \in [0, 1]\). Therefor, the sampled trajectory is pathological.

The sampled trajectory is not near the demonstration trajectories. Flow matching only guarantees that the marginal distribution of actions is matched at each timestep. Note that the probability of exactly sampling the pathological trajectory is zero, so the flow matching guarantees are satisfied.