TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 1
Training Passive Photonic Reservoirs with
Integrated Optical Readout
Matthias Freiberger, Student Member, IEEE, Andrew Katumba, Student Member, IEEE,
Peter Bienstman, Member, IEEE, and Joni Dambre, Member, IEEE
Abstract—As Moore’s law comes to an end, neuromorphic PD
approaches to computing are on the rise. One of these, passive
ADC
photonic reservoir computing, is a strong candidate for comput- reservoir ...
ing at high bitrates (>10 Gbps) and with low energy consumption.
Currently though, both benefits are limited by the necessity ADC
to perform training and readout operations in the electrical
domain. Thus, efforts are currently underway in the photonic ADC
community to design an integrated optical readout, which allows
ADC
to perform all operations in the optical domain. In addition to
the technological challenge of designing such a readout, new ADC
algorithms have to be designed in order to train it. Foremost, ...
suitable algorithms need to be able to deal with the fact that
the actual on-chip reservoir states are not directly observable. ADC
In this work, we investigate several options for such a training
algorithm and propose a solution in which the complex states of readout
the reservoir can be observed by appropriately setting the readout Fig. 1: Swirl reservoir and readout system of existing RC
weights, while iterating over a predefined input sequence. We
perform numerical simulations in order to compare our method photonic chip prototypes, with the nodes of the reservoir being
with an ideal baseline requiring full observability as well as with collected in the readout section, where optical output signals
an established black-box optimization approach (CMA-ES). are converted to electrical signals and then processed to a final
Index Terms—Cognitive Computing, Reservoir Computing, output. PD: photodiode, ADC: AD converter, MP: Micropro-
Photonic Computing, Neuromorphic Computing, Nonlinearity cessor. The blue and orange parts represent respectively the
Inversion, Integrated Optical Readout, Limited Observability. optical and electronic signals and components.
I. INTRODUCTION and header recognition tasks relevant for telecom applications
AS Moore’s law appears to come to an end [1], novel at data rates higher than 10 Gbps.
computing approaches, that go beyond traditional com- Nevertheless, the technology is still afflicted by a number
puting paradigms of silicon semiconductors are on the rise. of limitations, which prevent its application to problems on a
Among these, Reservoir Computing [2]–[4], a machine learn- larger scale. One of these limitations lies in the fact that the
ing technique proposed to perform brain-inspired computing, training, as well as the mixing of signals to solve a meaningful
has been applied in unconventional computing recently in task, has so far only happened in the electrical domain. Indeed,
attempts to overcome the Von Neumann bottleneck. While Vandoorne et al. [14] transferred the signal at each node from
promising results have been shown on various hardware plat- the optical to the electrical domain using a photodetector, and
forms [5]–[7], Photonic Reservoir Computing [8]–[13] makes then sent it through an AD converter. Finally, the required lin-
an especially strong case as a future-proof technology platform ear combination of signals and readout weights was performed
due to its potential to perform high-bandwidth tasks with little in the electrical domain using a microprocessor. See Figure 1
energy consumption. Especially, integrated passive photonic for a detailed illustration of the process.
reservoir computing [14]–[16] is a promising approach since In order to truly reap the benefits of optical computing
it exploits common CMOS fabrication technology and thus though, signals need to be processed at very high data rates
ensures an easy route to mass market production. Integrated in an energy-efficient way. Considering ecological as well
passive photonic reservoirs have been shown to solve parity bit as economical factors, minimizing power consumption is
of paramount importance for future computing technologies.
M. Freiberger and J. Dambre are with the Ghent University - imec From that perspective, the approach pursued in [14] for reading
IDLab, Department of Electronics and Information Systems, Technologiepark-
Zwijnaarde 15, B-9052 Ghent, Belgium. A. Katumba and P. Bienstman are out integrated photonic reservoirs is inefficient, since there is
with the Ghent University - imec Photonics Research Group, Department a significant energy and latency cost associated to it. Hence,
of Information Technology, Technologiepark-Zwijnaarde 15, B-9052 Ghent, it is desirable to perform the summing of signals in the
Belgium.
Email: Matthias.Freiberger@UGent.be optical domain instead of in the electrical domain. Using
Manuscript received 13.10.2017; revised 29.06.2018 such an integrated optical readout, only a single photodetec-
TNNLS-2017-P-8539.R1 ©c 2018 IEEE
arXiv:1810.03377v1  [cs.NE]  8 Oct 2018
MP
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 2
on real-valued signals, whereas here we need to use complex-
OM
bias valued weights on complex-valued signals.
A second possible solution could be to train the weights
OM
reservoir ... based on simulations of the behaviour of a virtual reservoir,
using photonic circuit simulation software, which will obvi-
OM
ously have full observability of all the nodes. However, the
PD
OM fabrication tolerances of these devices are such that the prop-
agation phase of two nominally identical waveguides could
OM be completely different. This prohibits the successful transfer
OM of weights trained using the idealized simulated reservoir
... to actual hardware. In summary: finding a suitable training
algorithm is not straight-forward.
OM
In this paper, we evaluate several approaches to train in-
tegrated photonic reservoirs without full state observability.
readout
We simulate the behavior of passive photonic reservoirs with
Fig. 2: Illustration of a swirl reservoir connected to a fully integrated optical readout and train the simulated readouts of
optical readout. Each optical output signal is modulated by these reservoirs to perform 3 bit header recognition. We eval-
an Optical Modulator (OM) implementing the weights. The uate the investigated approaches by comparing their achieved
optical outputs are then sent to a combiner structure where all bit error rates over a wide range of input signal bit rates.
signals are summed and finally converted to an electric output In Section II, we give an overview on our methodology and
signal using a photo diode. the simulation setup which is applied throughout the paper.
Thereafter, in Section III, we introduce a baseline approach
inspired by classic reservoir training methods, against which
tor, which receives the weighted sum of all optical signals, all other approaches will be evaluated. We apply CMA-ES, a
is required. A straightforward low-power optical weighting well known general-purpose black-box optimization approach,
element can take the form of a reverse-biased pn-junction. A in Section IV to train simulated reservoirs with integrated
better solution would be to use non-volatile optical weighting readout and evaluate a possibility of a combination of our
elements, such as the ones that are currently being developed baseline with CMA-ES in Section V. Finally, in Section VI
by several groups [17]–[19]. The necessary summing of the we propose a technique to estimate the complex-valued states
individual weighted reservoir state signals can be performed of the reservoir using only the final photodiode by running a
by a combiner tree, i.e. by summing signals pairwise using predetermined input sequence several times, while appropri-
an 2x1 optical combiner structure and repeating this step on ately setting the readout weights. The estimated complex states
the resulting intermediate signals until a single optical output can then be used for classic training algorithms on a digital
signal is obtained. Figure 2 further illustrates the concept of a computer and the resulting weights can be programmed on the
fully optical integrated readout. actual readout.
However, by employing such an integrated optical readout,
we lose direct observability of the states of the photonic
reservoir. Observing all states is mandatory though, in order to II. METHODOLOGY
use classical linear readout training algorithms such as ridge We evaluate the suitability of our proposed training ap-
regression [20] and other least-squares approaches. At first proaches by comparing the achieved bit error rates of sim-
glance, a possible solution could be to add a separate high- ulated reservoirs over a wide range of input signal bitrates.
speed photodetector to each reservoir node, which is only In more detail, we train the readout weights of simulated
used during training to observe the states. The weights would passive photonic reservoirs with integrated optical readout
then be calculated in the electrical domain, while the trained to perform the 3-bit header recognition task. Our simulation
reservoir could still be operated entirely in the optical domain. setup builds upon the setup in [16]: Katumba et al. inject an
This is unfortunately challenging due to a number of reasons. intensity-encoded bit signal into a simulated optical circuit
First, high-speed photodetectors tend to be costly in terms of corresponding to an integrated 4x4 passive photonic reservoir
chip footprint. Therefore, such an architecture would not scale using the swirl [14] architecture as as illustrated in Figure
well when increasing the number of nodes in the photonic 1. Further, they convert the complex-valued optical signals at
reservoirs to numbers common in classic echo state networks each of the 16 nodes of the reservoir, from the optical into the
[3]. Second, since passive photonic reservoirs make use of electrical domain using a previously introduced photodetector
coherent light for added richness, the tunable readout weights model that takes into account the responsitivity, bandwidth
in the optical domain have to be complex-valued as well. and background noise of an integrated optical photodetector.
However, unless we go to even more complicated coherent Finally, they arrange the 16 sampled electrical signals into
detectors, these photodiodes can only measure the intensities a reservoir state matrix and detect 1 bit delayed XOR bit
of the states and not their phases, so we cannot calculate the patterns in the bit sequence which has previously been fed
correct complex-valued weights. Note that this is different into the simulated photonic reservoir. In this work, contrary to
from the approach in [14], which used real-valued weights Katumba et al., we do not convert all the reservoir’s node
combiner structure
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 3
signals into the electrical domain prior to training. Instead 〈Id〉 = 0.1 nA, T = 300 K and RL = 1 MΩ. Finally, to model
we assume an integrated optical readout operating directly the limited bandwidth B of the integrated optical detector,
on the optical signals of the reservoir capable of weighting a fourth-order Butterworth low-pass filter is applied to the
and summing signals which are complex-valued by nature. resulting output signal.
By tuning the weights of this integrated optical readout, which We fix the delay time between any two connected nodes in
is simulated on the functional level, we are able to perform our simulated reservoir to 62.5 ps and assume the waveguide
meaningful computations by composing an optical signal from loss to be a rather pessimistic 3 dB/cm. We inject the input
the signals occurring at the individual reservoir nodes. This signal into the reservoir through nodes 5, 6, 9 and 10, where
optical signal is then converted from the optical into the node indices are ordered row by row and from left to right.
electrical domain using the detector model used in [16]. This input node configuration offers a good trade-off between
In more detail, we use Caphe [21] to simulate an optical performance and wiring effort [16].
circuit corresponding to an integrated 4x4 passive photonic As mentioned before, we use intensity modulation to encode
reservoir using the swirl [14] architecture. We subsample our bit patterns into the optical signals sent to the simulated
the intensity-modulated input signal 24 times per bit period, reservoir. In more detail, we distribute a total power ptotal =
smoothen it using a single-pole low-pass filter, and simulate 0.1 Watt over the 4 input nodes, which leaves the maximal
the response of our optical integrated swirl circuit using Caphe. amplitude power per node input signal at 0.1W
4 = 0.025W.
We obtain a sampled complex output signal of each reservoir Moreover, we have found that training our reservoirs with
node as result, denoting amplitude and phase of the optical a bias term is beneficial to classifier performance. Thus we
signal at that node at a certain instant in time. We arrange the implement an additional optical bias line which does not
sampled complex optical signals at each reservoir node as a lead into the reservoir, but directly into the integrated optical
complex state-node matrix X ∈ C and simulate the integrated readout (see Figure 2). This line carries a constant input power
optical readout by computing an inner product between this of 0.02W. Every training algorithm in this paper treats the bias
matrix and a complex weight vector w, which represents like an additional reservoir state signal, with the exception that
the complex optical weights. The resulting complex-valued the bias signal is not regularized during training. Assuming an
signal is fed into a photodetector model to obtain the electrical equal occurrence probability of symbols 1 and 0 in the input
output signal of the integrated photonic reservoir. This output signal, the average signal power to drive the reservoir amounts
power signal is then downsampled at a predetermined optimal to 0.1W · 0.5 + 0.02W = 0.07W. The power consumption for
sampling point and thresholded in order to obtain a clean actual implementations of an integrated optical readout with
binary output bit sequence. We place a threshold T in the possibly active weighting elements needs of course also to be
middle of the signal range of our output signal y[n] as taken into account, but is beyond the scope of this work.
T = P5(y[n]) + (P95(y[n])− P5(y[n]))/2, (1) Since we seek to find a training algorithm that works well
over a wide range of dynamics, we assess the performance
where P5(y[n]) and P95(y[n]) are the 5th and 95th percentile of our classifiers by training integrated photonic reservoirs
of y[n] respectively. We simulate the reservoir’s optical readout excited by input signals over a wide range of bitrates. We
as sweep the bit rate of the input signal in 1 Gbps steps between
y = σ(Xw), (2) 1 and 31 Gbps.
where X ∈ CN×F is the matrix of complex reservoir states, As a machine learning task to assess the performance of our
containing N samples of the complex signal occurring at F classifiers, we use the header recognition task. We expect the
reservoir nodes of an integrated passive photonic reservoir. reservoir to present 1 at the output whenever a certain sought
w ∈ CF×1 is a vector holding the complex weights of the header bit sequence occurs in the input signal and 0 otherwise.
integrated optical readout. σ(a) : CN → RN is the mapping More precisely, given an input signal u[n] and a predefined
from the optical to the electrical domain that is realized by header bit pattern h[n] we define our ideal desired signal
the photodetector. In this paper, the photodetector model of dideal[n] to be
[16] is used for all simulations and experiments. This model { ∑
if M−1
1 Jh[(M − 1)−m] = u[n−m]K = M
computes the electric current of a sampled complex signal a d m=0
ideal[n] =
as 0 else
i(a) = R|a|2, (3) (5)
where the notation above is Iversons bracket notation, defined
where R is the responsitivity of the photodetector. Thereafter as {
a zero-mean Gaussian noise vector n with a variance σ2
n is 1 if P true
added to i(a). The variance σ2
n is computed as JP K = (6)
0 else
σ2
n = 2qB(〈I〉+ 〈Id〉) + 4kBTB/RL (4)
∑ and M is the length of h[n] in bits. We start out setting
where q is the elementary particle charge, B is the bandwidth M = 3 and h[n] = δ[n] + δ[(n − 2)], where δ denotes
of the photodetector, 〈I〉 N
= 1
N n i[n] is the photocurrent, the dirac delta function, thus seeking to detect the header
〈Id〉 is the dark current, kB is the Boltzmann constant, T is bit pattern ”101”. We use this pattern throughout this work
the temperature and RL is the load impedance of the photode- as we discuss our findings. Thereafter, to demonstrate that
tector. In our simulations we set R = 0.5 A
W , B = 25 GHz, our introduced approaches exhibit consistent performance on
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 4
the task, we train classifiers to detect all possible 3 bit function. Consequently, if we train our readout weights in a
headers. While it is more common in recent literature [13], classical way, using ridge regression [20], the desired product
[14] to compute errors in a winner-takes-all fashion jointly vector d of states and readout weights will be transformed by
for all trained classifiers and patterns, we analyze the detection the detector output function. In more detail if we train w as
performance of each header bit pattern individually. We argue H 2 −1 H
that individual classifiers can easier be analyzed employing w = (X X + α I) X d, (10)
the latter approach, especially when investigating reservoir where α ∈ R is the regularization strength and I ∈ RF×F is
performance over a large range of bitrates. the identity matrix. Assuming w is ideal,
We train our reservoir readouts using a modified version of
the ideal desired signal σ(Xw) = σ(d). (11)
d[n] = dideal[n] · ptotal, (7) Therefore, at the model output, we obtain
where again ptotal = 0.1 W, the maximal attainable power of y = σ(Xw) = σ(d). (12)
the input signal. Since we would like our model to output d rather than
We generate 10010 random bits as training data as well as
σ(d), we need to find an approximate inversion of the readout
10010 random bits of test data. Since we remove the reservoir nonlinearity σ̂−1 to invert σ such that
signals generated by the first 10 input bits for both training
and test states in order to minimize the impact of transients, σ(σ̂−1(d)) ≈ d. (13)
this leaves us with 10000 bits of training data as well as 10000
bits of test data. To account for manufacturing variations, in We therefore train w as
general, we simulate each reservoir 10 times with identical w = (XHX + α2I)−1XH σ̂−1(d), (14)
train and test input, but different random phase configurations
of the waveguides between nodes, as well as of the waveguides We can approximate the detector model in [16] (for details,
feeding the input signals to the nodes. Subsequently we train see Section II) as a function in closed form, if we neglect its
the output weights of the simulated readout for each instance. bandlimiting lowpass filter for simplicity :
To compare the performance of our trained classifiers, we
σ(a) = R|a|2 + n. (15)
use the bit error rate (BER). The BER is defined as
N Again, R denotes the responsitivity of the photodetector and
1 ∑
ebit = JyT [n] =6 dideal[n]K, (8) n is a noise vector. One can see that σ(a) cannot be inverted
N
n=1 exactly, primarily since we take the absolute value of a, and
where y [n] is the subsampled, thresholded output signal of also due to the added, unknown noise vector. Nevertheless we
T
the reservoir and dideal[n] is again the ideal desired signal. can approximate the inverse of th√e detector function above as
With 10000 bits of test data, the minimal detectable bit −1 a
error rate with a confidence level of ≈ 90% is 10−3 [22]. σ̂ (a) = . (16)
R
Before computing the bit error rate on the test data, for all
trained classifiers we determine the optimal sampling point By doing so, we minimize the sum of squared errors
minimizing the bit error rate on the train data, which we ∑[ √ ]
N 2
(n) − d[n]
then use to downsample the test prediction and compute the (x w) (17)
bit error rate as described above. We have found that this R
n=0
additional step leads to more robust results for all classifiers where we denote the row vector with index n of X as x(n).
investigated. While this approach obviously can not be used on actual
devices due to the fact that the optical signals on the chip are
III. BASELINE: COMPLEX-VALUED RIDGE REGRESSION not observable, its similarity to classic, real-valued training
Consider again the model of an integrated optical readout approaches makes it a suitable candidate to be used as a
as introduced in Section II, defined as baseline to assess and compare novel training approaches for
(9) actual integrated reservoirs. Whenever we refer to this baseline
y = σ(Xw).
we call it plainly complex-valued ridge regression. Whenever
X ∈ CN×F is the matrix of complex reservoir states, contain- we train classifiers using complex-valued ridge regression, we
ing N samples of the complex signal occurring at F reservoir use 5-fold cross validation to find a suitable regularization
nodes of an integrated passive photonic reservoir. w ∈ CF×1 is parameter α for an optimal training result.
a vector holding the complex weights of the integrated optical We assess the performance of this baseline by training
readout. σ(a) : CN → RN is the mapping realized by the integrated photonic reservoirs for different input signal bitrates
photodetector of the readout. on the 3-bit header recognition task. Figure 3 shows the
Contrary to classic reservoir readouts, the readout weights achieved bit error rates.
of this model do not lie after the reservoir’s nonlinear detector As one can see, our proposed baseline works well for
function, but before it. This implies that any result of the dot bitrates between 2 and 20 Gbps with minor increases of the
product Xw of the model will be passed through this nonlinear error for bitrates at 14 Gbps and 17 Gbps.
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 5
Error Rate vs Bit Rate Error Rate vs Bit Rate
100 100
10−1 10−1
10−2 10−2
10−3 10−3
Baseline
CMA-ES
10−4 10−4
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Bit Rate[Gbps] Bit Rate[Gbps]
Fig. 3: Bit error rates as a function of input signal bitrate for Fig. 4: Bit error rates as a function of input signal bitrate for
our baseline approach, complex-valued ridge regression. Clas- the black-box approach, CMA-ES. Baseline approach is shown
sifiers have been trained to perform 3 bit header recognition for comparison. Classifiers have been trained to perform 3 bit
(pattern 101) on a 4x4 passive photonic swirl. All results are header recognition (pattern 101) on a 4x4 passive photonic
averaged over 10 different reservoirs, the minimal detectable swirl. All results are averaged over 10 different reservoirs, the
error rate is 10−3. minimal detectable error rate is 10−3.
IV. BLACK-BOX OPTIMIZATION from the reservoir’s subsequent output and feeding that error
measure back to the CMA-ES algorithm. The algorithm tra-
Since the states of our integrated photonic reservoir are not verses the loss function space suggesting new candidates to
observable, a straight-forward approach is to train the read- minimize the loss function. We encode the real and imaginary
outs using a black-box optimization approach. Among these, parts of the complex weight vector w ∈ C into a real-valued
Convergence Matrix Adaption - Evolution Strategy (CMA-ES) vector ( )
[23] appears to be a suitable candidate, since it usually deals ′ Re(w)
well with non-convex search spaces, which are likely to occur w = Im (18)
(w)
for our problem since we perform optimization in the complex
domain. Evolution strategies solve optimization problems in an prior to handing it to the CMA-ES algorithm. The inverse
iterative way using a repeated two-step process based on of the transformation is applied to weight vectors suggested by the
variation and selection of possible solutions. In the variation algorithm before setting them to the readout. We initialize the
′
step, new candidate solutions are generated by adding noise to algorithm with a zero vector w0 = 0 and perform sweeps
−5
the currently known best solution. This noise is drawn from a over the initial variance in steps of one decade between 10
2
normal distribution called the mutation distribution. Thereafter, and 10 . As recommended in [24], we set the population size
newly generated candidate solutions are evaluated using a to 4 + b3 log(F )c. Note that we might achieve better results
predefined cost function. Candidate solutions which perform by cross-validating initial variance and population size, from
better than the previously best known solution are selected which we refrain due to the typical very long training times
and used to update the parameters of the mutation distribution. of CMA-ES. As an objective function, we minimize the sum
Using the updated best known solution as well as the updated of squared errors
mutation distribution, the steps described above are repeated. ∑N
2
The overall process can be repeated until a satisfactory solution eSSE = (y[n]− d[n]) . (19)
to the given optimization problem has been found. The exact n=1
rules how the mutation distribution is updated vary inbetween We assess the performance of CMA-ES by training inte-
evolution strategies. CMA-ES adapts the mean value and grated photonic reservoirs with different input signal bitrates
covariance matrix of the used mutation distribution such that on the 3-bit header recognition task. Figure 4 shows the
the probability to draw updates which lead to previously resulting error rates (results are averaged over 10 different
selected solutions increases. The underlying rationale is that reservoirs).
conducting updates which lead to good solutions in the past As one can see, CMA-ES performs only slightly worse than
might result in even better solutions when performed again. the baseline, achieving the minimal detectable bit error rate of
For details see [23]. 10−3 for bitrates between 4 and 17 Gbps. Therefore, it is in
Integrated photonic reservoirs can be trained using CMA-ES principle capable of training integrated photonic reservoirs.
by transferring a candidate solution suggested by the algorithm However, because CMA-ES typically involves high training
to the reservoir’s readout weights, presenting the training bit time and requires many iterations of the input data, we run an
sequence to the reservoir, computing a chosen loss function additional experiment investigating its convergence behavior.
Error Rate[.]
Error Rate[.]
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 6
Maximum Phase Perturbation
Error Rate vs Evaluations 0 0.1π 0.2π 0.3π > 0.3π
100 < 0.001 0.506 0.643 0.673 ≈ 0.70
TABLE I: Bit error rate for the 3 bit header recognition
10−1 task (pattern 101) for increasing phase perturbations in the
reservoir’s waveguides (readout trained in simulation with
10−2 complex-valued ridge regression).
10−3 2) We create simulation models of 10 possible physical
instances of the previous reservoir that reflect possible
−4 waveguide phase variations in a physical reservoir due to
10
0 200 400 600 800 1000 manufacturing. This is done by adding random variations
Number of evaluations[.] ηi and ηc to the phase configurations of input and con-
nection waveguides of the nominal reservoir. The phase
Fig. 5: Bit error rate as a function of iterations over the training variations are drawn from a uniform distribution U(0, b),
data for CMA-ES. Classifiers have been trained to perform 3 where b represents the maximal phase perturbation in
bit header recognition (pattern 101) on a 4x4 passive photonic each waveguide.
swirl. All results are averaged over 10 different reservoirs, the 3) We reapply the previously trained readout weight vector
minimal detectable error rate is 10−3. to each of these physical reservoir models and record
the resulting bit error rate.
This procedure is repeated for for all perturbations
We simulate a passive photonic reservoir and train it with b ∈ {0.1π, 0.2π, 0.3π, 0.4π, 0.5π, 0.6π, 0.7π, 0.8π, 0.9π, π}.
CMA-ES where we record the error rate at each iteration. We In addition, for each value of b, the resulting bit error rates are
again average our results over 10 simulated reservoirs driven averaged across 10 different instances of the nominal reservoir
by a bit rate of 10 Gbps to obtain the graph seen in Figure 5. (each with different random phases of all connections).
The results of this experiment show that the full input Table I summarises the results. The bit error rate already
training sequence needs to be presented about 500 times before increases by two orders of magnitude for amounts of ran-
CMA-ES reaches satisfactory results. Since a short training dom phase noise that are currently well below fabrication
process on the actual hardware is mandatory for our devices tolerances. This renders a pretraining-retraining approach chal-
in order to obtain mass-market maturity, a way to drastically lenging. In the next section, we therefore pursue a radically
reduce the number of necessary iterations over the input is different approach, in which we estimate the reservoir’s states
necessary. Since stand-alone CMA-ES training converges too through the available photodetector in order to train a weight
slowly, a promising alternative is the pretraining of models in vector from the actual states on a digital computer. This weight
simulation and the refining them using CMA-ES on the actual vector can then be transferred back to the hardware where no
devices to speed up the training process. We investigate the significant increase in error is to be expected.
feasibility of such a pretraining-retraining approach in the next
section.
VI. NONLINEARITY INVERSION
V. PRETRAINING IN SIMULATION Consider again the model of an integrated optical readout
as introduced in Section II, defined as
As mentioned in Section I, training reservoirs in simulation
allows full observability. However the resulting weights are y = σ(Xw), (20)
not directly transferable to hardware, due to the fabrication
tolerances of integrated photonic reservoirs. However, if the the closed form approximation of the photodetector function
changes in reservoir output signals due to process variations
2
are small enough, a pretraining approach could be useful if σ(a) = R|a| + n, (21)
weights trained in simulation still perform considerably better
than random weights. In such a case, one could use them to as well as its approximate inversion
initialise, e.g., the training with CMA-ES, leading to much √
a
faster training convergence. In this section, we quantify how σ̂−1(a) = , (22)
phase variability of in our reservoirs affects the quality of R
weights trained in simulation. Our approach is as follows: introduced in Section III.
1) We train the weights of a nominal simulated reservoir σ̂−1 can be used to estimate the reservoir state matrix |X|
with integrated readout using complex-valued ridge re- when it cannot be observed directly.
gression. We choose the bit rate of the input signal to Indeed, taking a closer look at Equation 20, one can see
be 5 Gbps, based on Figure 3. that it is possible to observe the powers of the state matrix X
Error Rate[.]
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 7
Pk exp(jφk)+Pl exp(jφl). The absolute value of their relative
OM
bias phase difference φkl = φk − φl can be computed using the
phase estimation equation ((see Appendix A fo)r derivation)
OM
reservoir ... 2 2 2
| P
φ | = arccos kl − (Pk + Pl )
kl . (27)
OM 2PkPl
PD
OM Due to the fact that arccos(x) is injective in the sense that
for any given input x there are two possible solutions on
OM the interval [−π, π], using Equation 27 we can only find the
absolute value |φkl|, while the sign of (φkl) remains unknown.
OM
... To resolve this issue, one performs this estimation of φkl
twice with different values Pkl and P ′kl: While Pkl is being
OM computed as shown before, a phase difference of π
2 is added
to state k such that
readout
′ | π
P = P 2
Fig. 6: Illustration of modulus (light intensity) observation kl k exp(jφk) exp(j ) + Pl exp(jφl)|
2 (28)
procedure. The weight highlighted in red is set to 1, all =|Pk exp(jφ′k) + Pl exp(jφl)|2.
remaining weights are set to 0, the observable output is the By comparing the estimates of |φ
square of the modulus of the corresponding reservoir state. ( kl| and
′ )
2
| ′ | Pkl − (P 2
k + P 2
l )
φkl = arccos , (29)
2PkPl
through an appropriate selection ofthe input weight vector w.
If we choose we are able to infer {the sign of φkl.
1 ′
  |φkl| if |φkl| ∈ [0, π ]
0 φkl = 2 (30)
w =  .. (23) −|φkl| else
.
0 (see Appendix A for proof). For illustration, see Figure 7.
Since
as a weight vector and present an input signal u[n] to the input
of the integrated photonic reservoir, we obtain |Pk exp(jφk) + Pl exp(jφl)| = |Pk + Pl exp(jφkl)| (31)
y = σ(x1) = R|x1|2 + n, (24) and thus in consequence
∀q 6=k
at the output of the integrated readout, where x1 is the first ∑
|P exp(jφ ) + P
column of the state matrix . Since we can observe | |2 k k ∑ q exp(jφq)| =
X R x1 + (32)
n, we can estimate the modulus |x1| of ∀q=6 k
|Pk + Pq exp(jφkq)|,
x1 = |x1| exp(j arg(x1)). (25)
we can pick a certain state x(t,k) as reference state and
If we neglect n, assuming we know R, we can apply σ̂−1 determine the relative phase φkq between the value of this
to y and estimate |x1| as √ state x(t,k) and the value of every other state x(t,q).
After having estimated each reservoir state this way, we
| n
xˆ1| = σ̂−1(R|x |21 + n) = |x1|2 + . (26) apply complex-valued ridge regression to find the optimal
R weights for our reservoir.
Since the above operation is an approximate inversion of In summary, after presenting the same input 3F − 2 times,
the nonlinearity of the photodetector, we call our method with F being the number of output channels, we are able to
nonlinearity inversion. measure the full complex time evolution of each of the F
Repeating this procedure for every channel of the read- output channels, even though we only have a single detector.
out/column of the state matrix, we are able to estimate the This information can then be used to calculate the required
moduli of state values in the state matrix |X|. See Figure 6 weights in a single pass in software on a digital computer,
for illustration. after which the computed weights are transferred back to
Nevertheless, in order to tune the weights of the integrated the readout. For our simulated 16 node reservoirs, taking
optical readout, we need information about the arguments (the into account the bias channel, which can be treated like any
phases) of |X| as well. Absolute phase information about the other readout channel, this requires to present the input data
states is lost as soon as it passes the photodetector. However, 3F − 2 = 3 × (16 + 1) − 2 = 49 times to the reservoir.
only the relative phases between states are of interest since As we have seen in Section IV, 3F − 2 measurements is
they influence the output sum of the readout. much less than what is typically necessary using CMA-ES.
Consider two given complex state values x(t,k) and x(t,l) In addition, it is also a deterministic number, in contrast to
at a certain instant in time t, with moduli Pk and Pl as a black-box optimisation technique, for which it is hard to
well as the modulus Pkl of their sum Pkl exp(jφkl) = determine beforehand how many iterations will be needed.
combiner structure
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 8
OM Error Rate vs Delay
bias 100
OM
reservoir ...
10−1
OM
PD
OM 10−2
OM
− Baseline
10 3
OM CMA-ES
... NL-Inv
10−4
OM
0 5 10 15 20 25 30
Bit Rate[Gbps]
readout
Fig. 7: Illustration of phase estimation procedure. The weights Fig. 8: Bit error rates as a function of input signal bitrate for
highlighted in red are set to 1, the remaining weights are our proposed approach, nonlinearity inversion. Baseline and
set to 0. This results in the power of the summed states at CMA-ES approaches are shown for comparison. Classifiers
the output, which allows us to calculate the argument (phase have been trained to perform 3 bit header recognition (pattern
angle) between states of the highlighted channels via the 101) on a 4x4 passive photonic swirl. All results are averaged
relationship of the sum power and the powers of the individual over 10 different reservoirs, the minimal detectable error rate
states obtained in the previous step. is 10−3.
In our experiments, we simulate the repeated measurements on the other hand, the performance varies more in between
for the nonlinearity inversion procedure by setting the cor- bit patterns, where it performs very well for some patterns
responding rows of the weight vector in the readout model such as ’000’ and ’100 ’and slightly less well for others such
according to our estimation procedure. For every setting of the as ’011’. Nevertheless, CMA-ES attains the minimal error
weight vector we apply the complete model of our readout as rate for all bitpatterns in a range between 6 and 16 Gbps.
described in Section II. We collect the corresponding output Interestingly, it manages to minimize the error rate for the
signal from the detector and replace any samples of the output ’100’ pattern between 20 and 22 Gbps where our baseline
signal smaller than 0 (which might occur due to noise or performs two orders of magnitude worse. This is consistent
ringing of the bandlimiting filter of the photodetector) with 0. with the performance of our nonlinearity inversion approach
Thereafter X is estimated using Equations 22, 27 and 30. We for that pattern, which also manages to minimize the error
again train integrated photonic reservoirs with input bit rates rate in the same region contrary to the baseline. The overall
between nodes on the 3-bit header recognition task to asses performance across bitrates is slightly smaller than for the two
the performance of our proposed method. Figure 8 shows the former approaches, still, nonlinearity inversion minimizes the
bit error rate as function of the input signal bit rate. error rate for the given task on all possible header bit patterns
The nonlinearity inversion approach performs only slightly on a range between 6 and 14 Gbps, and is thus almost on par
worse than the the CMA-ES approach and the complex- with CMA-ES here. In a nutshell, since nonlinearity inversion
valued ridge regression baseline. It is remarkable though, that shows only slightly worse performance than CMA-ES, while
for some bitrates, for instance at 14 Gbps, the nonlinearity requiring significantly less (3F − 2 = 3× (16 + 1)− 2 = 49
inversion approach outperforms the baseline. As the non- times) iterations of the input data, it appears to be the most
linearity inversion approach operates on an estimate of the suitable among the training approaches for integrated photonic
states used by the baseline, one would expect it to perform reservoirs that have been investigated in this paper.
at best just as well as the baseline. A possible explanation
for this phenomenon is that the noise introduced by the
detector model in the estimation step acts as an additional VII. CONCLUSION
regularizer for training. To provide a comprehensive picture on In this paper, we have assessed the suitability of different
the investigated training approaches, we have performed the training methods for passive photonic reservoirs with inte-
last experiment measuring the performance when recognizing grated optical readout. The most important boundary condition
all possible bit patterns in a 3 bit header. Figure 9 shows for this type of hardware is the fact that the internal reservoir
the results for all approaches discussed in this work. The states are not directly observable in the electronic domain. In
results for all headers are largely consistent with the results we addition, the variability between individual devices makes it
already observed for bit pattern ’101’: Our baseline approach necessary to perform the training on the actual hardware. For
performs consistently well over a wide range of bitrates for this reason, our analysis has focused on the total training time
all bitpatterns, approximately between 2 and 18 Gbps, with while staying below the maximally allowed bit-error rate of
minor increases in error at 14 and 17 Gbps. For CMA-ES 10−3.
Error Rate[.]
combiner structure
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 9
recorded output signals allow us to estimate the amplitude and
Baseline
100 phase of the reservoir’s states within 3F − 2 iterations over
the training data. We have shown that this method performs
10−1 as well as a classic training approach, which requires full
observability, although for a more narrow bitrange. While the
10−2 CMA-ES black box algorithm performs slightly better than
our own method with respect to task performance, our method
10−3 requires significantly fewer iterations over the input data. We
conclude that nonlinearity inversion is a suitable candidate for
10−4 training integrated passive photonic reservoirs.
0 5 10 15 20 25 30 While all our approaches are roughly comparable with
Bit Rate[Gbps] respect to task performance, we find that the baseline approach
still outperforms the remaining approaches. On has to take
CMA-ES
100 into account here though, that the baseline approach requires
full observability and thus can not be implemented without
10−1 considerable hardware effort which affects the scalability
of our systems. In contrast, CMA-ES and our nonlinearity
10−2 inversion approach exhibit slightly worse task performance,
but are still applicable for a large range of bit rates. It is
10−3 remarkable that our approach exhibits better results than the
baseline at certain bit rates, especially since it operates on
10−4 an estimate of the states used by the baseline. A possible
0 5 10 15 20 25 30 explanation for this phenomenon is that the noise introduced
Bit Rate[Gbps] by the detector model in the estimation step acts as an
additional regularizer for training. In future work, we will
Nonlinearity Inversion
100 further investigate the reasons behind this better performance
000 in order to come up with a more accurate but still efficient way
−1 001
10 of training the weights based on the estimated states obtained
010 from nonlinearity inversion. Moreover we will investigate the
−2 011
10 performance on actual hardware implementations of integrated
100 optical readouts as soon as they are available. While we have
10−3 101 assumed in this work that amplitude and phase of the readout
110 are largely independently tunable, we are aware that this will
10−4 111 not be perfectly realizable for all hardware implementations,
0 5 10 15 20 25 30 especially considering nonvolatile tunable weights. We will
Bit Rate[Gbps] address this problem in future work which could be met for
instance by refining weights trained by nonlinearity inversion
Fig. 9: Bit error rates as a function of input signal bitrate using reinforcement learning.
for all possible 3 bit headers as performed by our baseline,
CMA-ES and nonlinearity inversion approaches. Classifiers APPENDIX A
have been trained to perform 3 bit header recognition on a DERIVATION OF THE PHASE ESTIMATION PROCESS
4x4 passive photonic swirl. All results are averaged over 10 Two given optical signals of our reservoir states x(t,k) and
different reservoirs, the minimal detectable error rate is 10−3. x(t,l) can be represented as
x(t,k) = Pk exp(jφk) (33)
In the process, we found that CMA-ES delivers acceptable
performance, but has to be ruled out as a practical training and
method due to its long training times. Due to the high x(t,l) = Pl exp(jφl) (34)
fabrication tolerances of integrated photonic reservoirs, the respectively. Given Pk, Pl , as well as the squared sum of both
use of pretraining-retraining as a way to reduce these training signals
times does also not seem very promising. As an alternative, P 2
kl = |Pk exp(jφk) + Pl exp(jφ ))|2l (35)
we have proposed a novel method for training such reservoirs
which we call nonlinearity inversion. Our method essentially we want to gain information on the state phases φk and φl.
resolves the issue of limited observability of the states of Applying Eulers identity on the previous equation, separating
an integrated passive photonic reservoir by estimating the real and imaginary parts and computing the absolute value
reservoir’s states through a single photodetector at its output. gives us
In more detail, we iterate over the training data several times P 2
kl = [Pk cos(φk)+Pl cos(φl)]
2 +[Pk sin(φk)+Pl sin(φ 2
l)] .
while setting the weights according to a certain pattern. The (36)
Error Rate[.] Error Rate[.] Error Rate[.]
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 10
After expanding the squared brackets, we get If φkl has been mapped correctly in Equation 47, then 0 ≤
2 2 2 φkl ≤ π. In consequence, (0− π ) ≤ φ′ ≤ (π− π ), and thus
Pkl =Pk cos(φk) + 2PkPl cos(φl) cos(φ 2 kl 2
k)+ will be mapped on the interval [0, π2 ] by the arccos function.
P 2
l cos(φ )2l + P 2
k sin(φk)2+ (37) In all remaining cases, φkl has been mapped wrong and the
2P P sin(φ ) sin(φ ) + P 2
k l l k l sin(φl)
2. sign of the result of Equation 47 needs to be reversed. This
leaves us with a simp{le rule to determine the sign of φkl from
By setting
φ′ :
sin(φ 2
l) = 1− cos(φ 2 kl
l) (38)
|φ | if |φ′kl kl| ∈ [0, π2 ]
and applying the trigonometric identities φst = (50)
−|φkl| else.
1 1
cos(φl) cos(φk) = cos(φl − φk) + cos(φl + φk) (39)
2 2 ACKNOWLEDGMENT
and This research was funded by the EU Horizon 2020
1 1 PHRESCO Grant (Grant No. 688579) and the BELSPO IAP
sin(φl) sin(φk) = cos(φl − φk)− cos(φl + φk), (40)
2 2 P7-35 program Photonics@be.
we obtain
P 2 = P 2 REFERENCES
kl k + P 2
l + 2PkPl cos(φl − φk). (41)
[1] W. Haensch, E. Nowak, R. Dennard, P. Solomon, A. Bryant, O. Doku-
As expected, we can see in the last equation that the output of maci, A. Kumar, X. Wang, J. Johnson, and M. Fischetti, “Silicon cmos
the system only depends on the difference of the input signal devices beyond scaling,” IBM Journal of Research and Development,
phases, but not the absolute phases themselves. Thus if we vol. 50, no. 4/5, p. 339, 2006.
[2] W. Maas, T. Natschlaeger, and H. Markram, “Real-time computing
consider without stable states: a new framework for neural computation based
|P + P exp(j(φ − φ ))|2 (42) on perturbations,” Neural Computation, vol. 14, no. 11, pp. 2531–2560,
k l l k 2002.
we get [3] H. Jaeger and H. Haas, “Harnessing nonlinearity: predicting chaotic
systems and saving energy in wireless communication,” Science, vol.
P 2 + 2P P cos(φ − φ ) + P 2 cos(φ − φ )2+ 304, no. 5667, 2004.
k k l l k l l k
(43) [4] D. Verstraeten, B. Schrauwen, and D. S. M. D’Haene, “An experimental
P 2 sin(φl − φ 2
k) unification of reservoir computing methods,” Neural Networks, vol. 20,
l
no. 3, pp. 391–403, 2007.
which, after substitution with Equation 38 again yields [5] J. Bürger and C. Teuscher, “Variation-tolerant computing with mem-
ristive reservoirs,” in Nanoscale Architectures (NANOARCH), 2013
P 2 + P 2 + 2P P cos(φ − φ ) = P 2
k l k l l k kl (44) IEEE/ACM International Symposium on, 2013.
[6] E. Demis, R. Aguilera, H. Sillin, K. Scharnhorst, E. Sandouk, M. Aono,
and thus A. Stieg, and J. Gimzewski, “Atomic switch networks-nanoarchitectonic
design of a complex system for natural computing,” Nanotechnology,
| vol. 26, no. 20, p. 204003, 2015.
Pk exp(jφk) + Pl exp(jφl)|2 = |Pk + Pl exp(j(φ 2
l − φk))| . [7] M. Hermans, M. Soriano, J. Dambre, P. Bienstman, and I. Fischer,
(45) “Photonic delay systems as machine learning implementations.” Journal
Therefore we can just set the phase φ as a reference point of Machine Learning Research, vol. 16, pp. 2081–2097, 2015.
k
for our system and restructure Equation 41 such that [8] G. V. der Sande, D. Brunner, and M. Soriano, “Advances in photonic
reservoir computing,” Nanophotonics, vol. 6, no. 3, pp. 561–576, 2017.
P 2 − (P 2 + P 2) [9] L. Larger, M. Soriano, D. Brunner, L. Appeltant, J. Gutiérrez, L. Pes-
cos(φ − φ ) = cos(φ − φ kl k l quera, C. Mirasso, and I. Fischer, “Photonic information processing be-
k l k l) = . (46)
2PkPl yond turing: an optoelectronic implementation of reservoir computing,”
Optics express, vol. 20, no. 3, pp. 3241–3249, 2012.
Note that, since φkl = φl − φk and φlk = φk − φl mirror [10] A. Smerieri, F. Duport, Y. Paquot, M. Haelterman, B. Schrauwen, and
each other along the real axis, their cosines are identical. S. Massar, “Towards fully analog hardware reservoir computing for
Consequently, when applying an inverse cosine operation to speech recognition,” in AIP Conference Proceedings, 2012.
[11] M. Fiers, T. V. Vaerenbergh, F. Wyffels, D. Verstraeten, B. Schrauwen,
Equation 46 we can not determine the sign of φkl but its J. Dambre, and P. Bienstman, “Nanophotonic reservoir computing with
magnitude via ( ) photonic crystal cavities to generate periodic patterns,” IEEE transac-
tions on neural networks and learning systems, vol. 25, no. 2, pp. 344–
| − | P 2 − (P 2 + P 2) 355, 2014.
φk φl = |φkl| = arccos kl k l . (47)
2P P [12] K. Vandoorne, J. Dambre, D. Verstraeten, B. Schrauwen, and P. Bi-
k l enstman, “Parallel reservoir computing using optical amplifiers,” IEEE
We refer to Equation 47 as the phase estimation equation. transactions on neural networks, vol. 22, no. 9, pp. 1469–1481, 2011.
[13] J. Qin, Q. Zhao, H. Yin, Y. Jin, and C. Liu, “Numerical simulation
To find the actual sign of φkl, additional information needs and experiment on optical packet header recognition utilizing reservoir
to be found. Assume we know not only P , but also P ′kl kl computing based on optoelectronic feedback,” IEEE Photonics Journal,
defined as vol. 9, no. 1, pp. 1–11, 2017.
[14] K. Vandoorne, P. Mechet, T. V. Vaerenbergh, M. Fiers, G. Morthier,
P ′
π
=|P exp(jφ ) exp(j ) + P exp(jφ )|2 D. Verstraeten, B. Schrauwen, J. Dambre, and P. Bienstman, “Exper-
kl k l l l
2 (48) imental demonstration of reservoir computing on a silicon photonics
=|P exp(jφ′ ) + P exp(jφ )|2. chip,” Nature Communications, vol. 5, no. 3541, 2014.
k k l l [15] A. Katumba, P. Bienstman, and J. Dambre, “Photonic reservoir com-
Then, puting approaches to nanoscale computation,” in Proceedings of the
′ π Second Annual International Conference on Nanoscale Computing and
φkl = φl − φk − . (49) Communication, 2015.
2
TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. X, X 201X 11
[16] A. Katumba, M. Freiberger, P. Bienstman, and J. Dambre, “A multiple- Peter Bienstman was born in Ghent, Belgium, in
input strategy to efficient integrated photonic reservoir computing,” 1974. He received a degree in electrical engineering
Cognitive Computation, vol. pp1-8, 2017. from Ghent University, Belgium, in 1997 and a
[17] S. Abel, T. Stferle, C. Marchiori, C. Rossel, M. Rossell, R. Erni, Ph.D. from the same university in 2001, at the
D. Caimi, M. Sousa, A. Chelnokov, B. Offrein, and J. Fompeyrine, Department of Information Technology (INTEC),
“A strong electro-optically active lead-free ferroelectric integrated on where he is currently a full professor. His research
silicon,” Nature Communications, vol. 4, p. 1671, 2013. interests include several applications of nanopho-
[18] C. Rı́os, M. Stegmaier, P. Hosseini, D. Wang, T. Scherer, C. Wright, tonics (biosensors, photonic information processing,
H. Bhaskaran, and W. Pernice, “Integrated all-photonic non-volatile ...) as well as nanophotonics modelling. He has
multi-level memory,” Nature Photonics, vol. 9, no. 11, pp. 725–732, published over 110 papers and holds several patents.
2015. He has been awarded a ERC starting grant for
[19] B. V. Bilzen, P. Homm, L. Dillemans, C. Su, M. Menghini, M. Sousa, the Naresco-project: Novel paradigms for massively parallel nanophotonic
C. Marchiori, L. Zhang, J. Seo, and J. Locquet, “Production of vo 2 information processing.
thin films through post-deposition annealing of v 2 o 3 and vo x films,”
Thin Solid Films, vol. 591, pp. 143–148, 2015.
[20] A. Hoerl and R. Kennard, “Ridge regression: Biased estimation for
nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67,
1970.
[21] M. Fiers, T. V. Vaerenbergh, K. Caluwaerts, D. V. Ginste,
B. Schrauwen, J. Dambre, and P. Bienstman, “Time-domain and
frequency-domain modeling of nonlinear optical components at the
circuit-level using a node-based approach,” J. Opt. Soc. Am. B,
vol. 29, no. 5, pp. 896–900, May 2012. [Online]. Available:
http://josab.osa.org/abstract.cfm?URI=josab-29-5-896
[22] M. Jeruchim, “Techniques for estimating the bit error rate in the
simulation of digital communication systems,” IEEE Journal on Selected
Areas in Communications, vol. 2, no. 1, pp. 153–170, 1984.
[23] N. Hansen and A. Ostermeier, “Completely derandomized self-
adaptation in evolution strategies,” Evolutionary computation, vol. 9,
no. 2, pp. 159–195, 2001.
[24] N. Hansen, “The cma evolution strategy: A tutorial,” arXiv preprint
arXiv:1604.00772, 2016.
Matthias Freiberger was born in Graz, Austria, in Joni Dambre was born in Ghent, Belgium, in 1973.
1983. He received a M.Sc. degree in information She received the M.Sc. degree in electronics engi-
and computer engineering from Graz University of neering and the Ph.D. degree in computer science
Technology, Graz, Austria, in 2016. He is currently engineering from Ghent University, Ghent, in 1996
pursuing a Ph.D degree at the UGent-imec IDLab, and 2003, respectively. She is currently a professor
Department of Electronics and Information Systems with the UGent-imec IDLab, and the Department
at Ghent University. His current research focuses on of Electronics and Information Systems at Ghent
training algorithms for neuromorphic devices. His University. She performs research on machine learn-
research interests include scaling up neuromorphic ing and neural networks, with applications in sensor
systems, deep learning, and recurrent neural net- processing and robotics, as well as innovative un-
works on chip. conventional and neuromorphic computing hardware
that exploit the computational power of physical dynamical systems.
Andrew Katumba was born in Masaka, Uganda in
1985. He received an M.Sc. degree in Optics and
Photonics from Karlsrule Institute of Technology,
Germany in 2013. He is pursuing a Ph.D. degree
in Photonics Engineering at the Photonics Research
Group, Gent University-imec, Belgium. His current
research focuses on photonic neuromorphic archi-
tectures for high speed optical telecommunications
systems. He is a student member of IEEE Photonics
Society and the International Society for Optics and
Photonics.