- Part I
- Part II
- Part III

The code used here can be found in the following GitHub repository.

In the second installment of this series, I introduce an initial and arguably the most basic method to generate monophonic melodies. This consists of two approaches:

- A
*first-order Markov chain* - A
*feedforward neural network*

It’s important to note that I will disregard all forms of dynamics within a notated score (or performace), such as loudness, softness, etc.

Despite these two approaches being significantly outdated, I believe their demonstration serves as a valuable exercise for familiarizing oneself with the inherent challenges of the subject matter. The inspiration for this work comes from the tutorial series made by Valerio Velardo’s and another series made by Andrej Karpathy’s.

Although I utilize high-level libraries such as PyTorch and take advantage of its *computational graph* and *autograd* features, I intend to maintain the model code and training process at a relatively low level.

The necessary software requirements for this project include:

- Python
- PyTorch
- Music21
- preprocessor.py helpler class to deal with
`krn`

-files. - MuseScore (optional)
- Jupyter Notebook Environment (optional)

The database required can be found at EsAC. The specific dataset I utilized is Folksongs from the continent of Europe and for the purpose of this work, I will exclusively use the 1700 pieces found in the `./deutschl/erk`

directory.

Let’s listen to one of these pieces:

In Part I of this series, I alluded to various implementations that utilize different input encodings. Naturally, the information we can leverage depends on the format of our training data. For instance, MIDI provides us with attributes such as pitch, duration, and velocity.

In my implementation, you will notice two distinct, yet straightforward encoding options available.

`GridEncoder`

used by Valerio Velardo`NoteEncoder`

The `GridEncoder`

utilizes a fixed metrical grid where (a) output is generated for every timestep, and (b) the step size corresponds to a fixed meter (the shortest duration of any note in any score). For instance, if the shortest duration is a quarter note, a whole note of pitch 65 (in MIDI format) would result in the event series:

**65-note-on hold hold hold**

On the other hand, the `NoteEncoder`

employs a larger alphabet and encodes one note directly, i.e.,

**65-whole**

In comparison to *note encoding*, *equitemporal grid encoding* relies on a smaller alphabet but needs more tokens for the same score.
This disadvantage is magnified if the score contains notes of vastly differing durations or if we wish to introduce micro-dynamics through an increase in resolution, as done by (Oore et al., 2018).

Interestingly, Google’s Magenta project employs *equitemporal grid encoding*, specifically the MelodyOneHotEncoding class for their *Basic RNN*, *Mono RNN*, and *Lookback RNN*.
Since they capture polyphonic scores, they utilize **note-on** and **note-off** events for each MIDI key.

Of course, the chosen representation also depends on the application and the capabilities of the model we use.
For instance, one might only want to generate the pitches of the melody and manually adjust the duration of each note in post-processing.
Furthermore, a *first-order Markov chain* only *memorizes* the most recent event.
Therefore, an *equitemporal grid encoding* would yield unsatisfactory results because the context is lost after a **hold** event occurs.

As a result, in this post, I will focus on the *note encoding* approach, i.e. `NoteEncoder`

.

The procedure I’m about to present parallels the one detailed in Probabilistic Melody Modeling. The primary differences are that we’ll now be considering 1700 pieces instead of a single one, and we’ll be utilizing more sophisticated libraries instead of relying solely on Sonic Pi.

The Music21 library significantly simplifies the handling of symbolically notated music in `Python`

.
I am not so familiar with it but it comes in handy when reading and writing symoblic scores.
It enables us to construct pieces programmatically and to read from or write to various musical score formats.

As an initial step, we need to import all the necessary libraries and functions. Here I fix the global seed such that you can reproduce the exact same results.

```
import matplotlib.pyplot as plt
import music21 as m21
import torch
from preprocess import load_songs_in_kern, NoteEncoder, KERN_DATASET_PATH
# seed such that we can compare results
torch.manual_seed(0);
```

Then I read all the pieces inside the `./../deutschl/erk`

directory.
Furthermore, I introduce a special character `TERM_SYMBOL`

that I use to indicate the beginning and end of a score.

```
TERM_SYMBOL = '.'
scores = load_songs_in_kern('./../deutschl/erk')
```

Now we have to think about our encoding.
As discussed above, I use the `NoteEncoder`

.

```
encoder = NoteEncoder()
enc_songs = encoder.encode_songs(scores)
```

```
' '.join(enc_songs[0])
```

The code above prints out:

```
'55/4 60/4 60/4 60/4 60/4 64/4 64/4 r/4 ... 64/4 60/4 62/4 60/8 r/4'
```

`'55/4`

means MIDI note 55 four timesteps long where the timestep is determined by the shortest note within all scores.
In our case this means four times 1/4 beat which is one whole beat.

Given that computers cannot process strings directly, I convert these strings into numerical values. The first step is to create a set that includes all possible strings. Subsequently, I assign each string a corresponding natural number in sequential order.

```
symbols = sorted(list(set([item for sublist in
enc_songs for item in sublist])))
stoi = {s:i+1 for i, s in enumerate(symbols)}
stoi[TERM_SYMBOL] = 0
itos = {i: s for s, i in stoi.items()}
print(f'n_symbols: {len(itos)}')
```

`stoi`

maps **s**trings **to** **i**ntegers and `itos`

is its inverse mapping.

To implement a *first-order Markov chain*, we aim to construct a Markov matrix

where the element at the $i^{\text{th}}$ row and $j^{\text{th}}$ column represents the conditional probability

\[P(e_i\ | \ e_j) = p_{ij}.\]It describes the (conditional) probability of event $e_i$ (a note or rest of specific length) immediately following event $e_j$. For this purpose, I construct a matrix $\mathbf{N}$ that counts these transitions.

To accomplish this, I iterate over each score, considering every pair of consecutive events.
As the first event lacks a predecessor and the last lacks a successor, I append the unique terminal character `TERM_SYMBOL`

to each score for padding purposes.

```
N = torch.zeros((len(stoi), len(stoi)))
for enc_song in enc_songs:
chs = [TERM_SYMBOL] + enc_song + [TERM_SYMBOL]
for ch1, ch2 in zip(chs, chs[1:]):
ix1 = stoi[ch1]
ix2 = stoi[ch2]
N[ix1, ix2] += 1
```

To construct $\mathbf{P}$ we have to divide each entry $n_{ij}$ in $\mathbf{N}$ by the sum over the row $i$.

```
P = N.float()
P = P / P.sum(dim=1, keepdim=True)
```

In order to compute the sum over a row (instead of a column), i.e., “summing all columns”, we need to specify `dim=1`

(the default is `dim=0`

).
Additionally, to properly exploit broadcasting, it’s necessary to set `keepdim=True`

.
This ensures that the sum results in a `(1,m)`

tensor, as opposed to a `(m,)`

tensor.

Plotting the probabilities reviels that $\mathbf{P}$ is a rather sparse matrix containing many zeros. In fact, only approximately 7.86 percent of the entries are non-zero.

Figure 1: Matrix plot of our Markov matrix.

Given the tensor `P`

, we can generate new melodies using the function `torch.multinomial`

which expects a probability (discrete) distribution.
I start with the terminal `TERM_SYMBOL`

indicating the beginning and, when the second terminal is generated (which indicates the end), I terminate the generation.

```
generated_encoded_song = []
char = TERM_SYMBOL
while True:
ix = torch.multinomial(P[stoi[char]],
num_samples=1,
replacement=True).item()
char = itos[ix]
if char == TERM_SYMBOL:
break
generated_encoded_song.append(char)
len(generated_encoded_song)
```

Let’s listen to some of the generated scores:

The outcome is not particularly outstanding, but this is unsurprising given our very simple model. To evaluate the quality of our model, we can calculate the likelihood that our generative process produces for a specific training data point $e_1, e_2, \ldots, e_k$, i.e.,

\[P(e_1) \cdot P(e_2 \ | \ e_1) \cdot \ldots \cdot P(e_{k-1} \ | \ e_k).\]We can add all the likelihoods (one for each data point) together and divide the sum by the number of data points.
However, it is more convinient to use the *negative log-likelihood* since one can use addition.

```
log_likelyhood = 0.0
n = 0
for m in enc_songs:
chs = [TERM_SYMBOL] + m + [TERM_SYMBOL]
for ch1, ch2 in zip(chs, chs[1:]):
ix1 = stoi[ch1]
ix2 = stoi[ch2]
prob = P[ix1, ix2]
logprob = torch.log(prob)
log_likelyhood += logprob
n += 1
print(f'{log_likelyhood=}')
nll = -log_likelyhood
print(f'avg negative log likelyhood: {(nll/n)}')
```

This gives us a *negative log-likelihood* of approximately `2.6756`

.
The lower this value gets the better it is.
It can be no smaller than 0.

One method of generating a melody using a feedforward network is by addressing a classification task. Specifically, given $t$ consecutive notes, we aim to identify the note that this sequence “represents”. For simplicity, let’s set $t=1$. This stipulation means we won’t require substantial modifications compared to our previous approach.

Since our training process will be more computationally intensive than merely computing frequencies, it’s advisable to use hardware accelerators, if available. This can result in faster training and inference times and lower energy costs. To check if hardware acceleration is available, I employ the following code:

```
if torch.cuda.is_available():
device = torch.device('cuda')
elif torch.backends.mps.is_available():
device = torch.device('mps')
else:
device = torch.device('cpu')
print(f'{device=}')
```

Instead of calculating our probability matrix, I am going to generate labeled training data using the variables `xs`

and `ys`

(labels).

```
xs = []
ys = []
for m in enc_songs:
chs = [TERM_SYMBOL] + m + [TERM_SYMBOL]
for ch1, ch2 in zip(chs, chs[1:]):
ix1 = stoi[ch1]
ix2 = stoi[ch2]
xs.append(ix1)
ys.append(ix2)
xs = torch.tensor(xs, device=device)
ys = torch.tensor(ys, device=device)
# one-hot-encoding
xenc = F.one_hot(xs, num_classes=len(stoi)).float()
```

I employ a *one-hot encoding* for the input data.
That is, for encoding unique $m$ elements one uses a $m$-dimensional vector where all entries except one is 0.0 and the one is 1.0.
`F.one_hot`

assumes that these elements are whole numbers between 0 and $M-1$, compare the documentation.

The $i^{\text{th}}$ element is represented by a vetor where the $i^{\text{th}}$ component is 1.0.
Note that our labels `ys`

are not one-hot encoded.

Next, I initialize a random matrix $\mathbf{W} \in [-1;1]^{m \times m}$, or tensor, `W`

with values ranging from -1.0 to 1.0.
This tensor includes our trainable parameters, which represent the single layer of our neural network.

```
W = torch.randn((len(stoi), len(stoi)), requires_grad=True, device=device)
```

Our network includes $m$ inputs and outputs, with the *softmax* values of the outputs being interpreted as probabilities.
Essentially, our “network” is just one large matrix!

The operation `xenc @ W`

represents a matrix multiplication where `xenc`

is an $1700 \times m$ matrix and `W`

is our $m \times m$ matrix.
Here I use the power of parallel computation.
By employing `probs[torch.arange(len(ys), device=device), ys]`

, I address a single entry for each row.

Please note that `probs[:, ys]`

does not work; instead of addressing a single entry, it addresses whole columns indexed by `ys`

!
Also, be aware that I apply an unusually large learning rate.

```
# training aka gradient decent
epochs = 2_000
for k in range(epochs):
# forward pass
logits = xenc @ W
odds = logits.exp()
probs = odds / odds.sum(dim=1, keepdim=True)
loss = -probs[torch.arange(len(ys), device=device), ys].log().mean()
print(f'epoch {k}, loss: {loss.item()}')
# backward pass
W.grad = None # set gradients to zero
loss.backward()
# update
W.data += -10.0 * W.grad
```

One iteration of the loop consist of the

*forward pass**backward pass*(backwardpropagation) done via`loss.backward()`

and- an update of our parameters done via
`W.data += -10.0 * W.grad`

.

`loss.backward()`

applies backpropagation thus computes the gradients and we can update `W`

by

where $\eta = 10$ is the *learning rate*.

After the initial 2000 epochs the loss is approximately `2.865`

.
This performance is somewhat inferior compared to the results achieved by our *Markov chain*.
However, by prolonging the training period, I managed to reduce the loss to around `2.707`

.

Let us assume we have only one sample $\mathbf{x}$.
The *forward pass* starts with

where $\mathbf{x}$ is a *one-hot encoded* training data point.
$\mathbf{o}$ gets interpreted as (component-wise) logarithm of the odds

which is the logit, i.e., the inverse of the *standard logistic function* also called *sigmoid*.
In fact, each data point in $\mathbf{x}$ selects one row of $\mathbf{W}$

To compute “probabilities” we compute the *softmax function* (`probs`

) of $\mathbf{o}$, i.e.,

with

\[s(\mathbf{o})_i = \frac{e^{o_i}}{\sum e^{o_j}}.\]Luckly the *softmax* has a simple derivative:

We can also compute the full Jacobian of the *softmax* vector-to-vector operation:

Similar then before, our loss $L$ is the mean *negative log likelihood*.

where $\mathbf{y}$ is the one-hot encoded label vector, i.e.,

`loss = -probs[torch.arange(len(ys), device=device), ys].log().mean()`

.

Note that $\mathbf{y}$ is a one-hot encoded vector, `ys`

is not.

For the *backpropagation* we need

Here we employ the *chain rule*.
The sensitivity of cost $L$ to the input to the softmax layer, $\mathbf{o}$ is given by a gradient-Jacobian product, each of which we’ve already computed:

The $\log$ and the devision operates component-wise and

\[\text{diag}\left(\mathbf{s}\right) = \begin{bmatrix} s(\mathbf{o})_1 & 0 & \ldots & 0 \\ 0 & s(\mathbf{o})_2 & \ldots & 0 \\ \ldots & \ldots & \ldots & \ldots \\ 0 & 0 & \ldots & s(\mathbf{o})_m \end{bmatrix}\]holds.
We have to apply the *chain rule* once again to finally get the desired update values for our weight matrix $\mathbf{W}$:

Given that $\mathbf{s}$ represents probabilities, and $\mathbf{y}$ contains only zeros except for one instance of 1 at the position of the “correct” probability, the entries of the $j^\text{th}$ row ($x_j=1$) of the gradient is $p_i$ if the $i^\text{th}$ probability is deemed “incorrect”, and $(p_i-1)$ otherwise. All other entries are zero. Note also that $\mathbf{x}$ is also a one-hot encoded vector. Consequently, if a probability is correct, it gets increased by $1-p_i$ and decreased by $p_i-1$ otherwise. Therefore, probabilities that are more incorrect experience a larger increase or decrease.

We can actually check this result! Using the following code:

```
# use only 1 data point
xs = xs[:1]
ys = ys[:1]
# one-hot-encoding
xenc = F.one_hot(xs, num_classes=len(stoi)).float()
# reinitiate W
W = torch.randn((len(stoi), len(stoi)), requires_grad=True, device=device)
logits = xenc @ W
counts = logits.exp()
probs = counts / counts.sum(dim=1, keepdim=True)
loss = -probs[torch.arange(len(ys), device=device), ys].log().mean()
# backward pass
W.grad = None
loss.backward()
y = torch.zeros(len(stoi), device=device)
y[ys[0]] = 1
print(W.grad) # same
print(xenc.T @ (probs-y)) # same
print(W.grad == xenc.T @ (probs-y)) # all true
```

So far we only considered the math using a single data point $\mathbf{x}$. Let us consider a batch of points, i.e.,

\[\mathbf{O} = \mathbf{X}\mathbf{W} = \begin{bmatrix} \mathbf{x}_1 \\ \mathbf{x}_2 \\ \vdots \\ \mathbf{x}_n \end{bmatrix} \mathbf{W}\]The *softmax* is still a vector-to-vector transformation, but it’s applied independently to each row of $\mathbf{X}$:

We can do the exact same steps but I will skip this part. For the interested reader I refer to

Important is that

\[\mathbf{J}_\mathbf{O}(L) = \mathbf{J}_\mathbf{S}(L) \mathbf{J}_\mathbf{O}(S) = \frac{1}{n} \left( \mathbf{S} - \mathbf{Y} \right)\]and

\[\mathbf{J}_\mathbf{W}(L) = \mathbf{J}_\mathbf{S}(L) \mathbf{J}_\mathbf{O}(S) \mathbf{J}_\mathbf{W}(O) = \frac{1}{n} \left( \mathbf{S} - \mathbf{Y} \right) \mathbf{X}.\]We can also check this result:

```
# one-hot-encoding
xenc = F.one_hot(xs, num_classes=len(stoi)).float()
# reinitiate W
W = torch.randn((len(stoi), len(stoi)), requires_grad=True, device=device)
logits = xenc @ W
counts = logits.exp()
probs = counts / counts.sum(dim=1, keepdim=True)
loss = -probs[torch.arange(len(ys), device=device), ys].log().mean()
# backward pass
W.grad = None
loss.backward()
y = torch.zeros((len(ys), len(stoi)), device=device)
y[torch.arange(len(ys), device=device),ys] = 1
print(W.grad) # same
print(xenc.T @ (probs-y)/len(ys)) # same
print(torch.allclose(W.grad, xenc.T @ (probs-y)/len(ys))) # true
```

Now, the natural question is: what is the best possible performance we could achieve?
The answer is that we should aim to match the performance of the *Markov chain*.
Indeed, as the process continues, we should expect that the matrix `W`

will gradually converge towards `P`

.

Moreover, we should not anticipate surpassing the results achieved by our *Markov chain* even if we deepen our network, that is, by introducing some *hidden layers*.
A very good example of useful informations are described in (Johnson, 2017), I discussed in Part I of this series.
For instance, Johnson adds (compare his interesting Blog post)

**Positional:**note within the score (that is what we use)**Pitchclass:**one of the twelve classes**Previous vicinity:**surrounding notes where played or aticulated last timestep (only useful for polyphonic music)**Previous context:**the amount of C’s, A’s and so on are played the last timestep (only useful for polyphonic music)**Beat:**a binary representation of position within the measure

However, our expectations may shift if we modify the input, referring to the data that the network processes.
That being said, we could enhance the training duration.
For instance, introducing a *hidden layer* results in a loss of `2.693`

after 2000 epochs.

```
W1 = torch.randn((len(stoi), len(stoi)//4),
requires_grad=True, device=device)
W2 = torch.randn((len(stoi)//4, len(stoi)),
requires_grad=True, device=device)
```

```
epochs = 2_000
for k in range(epochs):
# forward pass
x = xenc @ W1
logits = x @ W2
odds = logits.exp()
probs = odds / odds.sum(dim=1, keepdim=True)
loss = -probs[torch.arange(len(ys), device=device), ys].log().mean()
print(f'epoch {k}, loss: {loss.item()}')
# backward pass
W1.grad = None # set gradients to zero
W2.grad = None # set gradients to zero
loss.backward()
# update
W1.data += -10.0 * W1.grad
W2.data += -10.0 * W2.grad
```

- Oore, S., Simon, I., Dieleman, S., Eck, D., & Simonyan, K. (2018). This Time with Feeling: Learning Expressive Musical Performance.
*CoRR*,*abs/1808.03715*. http://arxiv.org/abs/1808.03715 - Johnson, D. D. (2017). Generating Polyphonic Music Using Tied Parallel Networks.
*EvoMUSART*.

You can find all the necessary code in my GitHub repo: link.

The example is based on the Wekinator, which is a powerful, free, and open-source software designed to simplify the process of using *machine learning models*.
With its user-friendly interface, anyone can easily build new musical instruments, gestural game controllers, computer vision, computer listening systems, and much more.
Developed by Rebecca Fiebrink, the software aims to make digital systems more accessible to artists and musicians.
Although Wekinator’s initial release (version 1.0) dates back to 2009, the software remains relevant today and has aged gracefully.

In her article (Fiebrink, 2019), Fiebrink elaborates on her machine learning education practices for creative practitioners and highlights the benefits of using Wekinator. More recently, she has been working on a new tool called InteractML, which is a more advanced tool for interactive machine learning via visual scripting with Unity (Hilton et al., 2021). InteractML is a fascinating project that I plan to explore further at a later time. It is worth noting that the project is currently in alpha release and may only have limited documentation available.

One of the key advantages of the Wekinator is its ease of use, making it accessible to a wide range of users. The software’s versatility is due to its ability to open sound control (OSC) messages, which are supported by numerous applications, especially in the artistic domain. If other systems can communicate via OSC, they can be integrated (via network) into the larger system. For instance, most digital audio workstations (DAWs), Processing (a renowned creative coding environment), TouchDesigner, Max/MSP, PureData, SuperCollider and APIs for popular programming languages support OSC.

To demonstrate the potential of the Wekinator, I will be using Processing and SuperCollider, both of which are also free and open-source tools. SuperCollider will be used to generate sound, while Processing will capture human motion. The objective is not to showcase elaborate gesture recognition but to provide an overview of the Wekinator’s fundamental workings.

More than just discussing the Wekinator, my aim is to provide readers with an understanding of how machine learning can supplant traditional programming, enabling non-experts to utilize techniques that were previously inaccessible to them.

We will observe how machine learning can replace coding, albeit with some coding required. However, the coding required is specific to the task of sending and receiving OSC messages, and the tools utilized, i.e., SuperCollider and Processing which are both programming environments. Other tools exist, which allow for sending OSC messages without any coding, and there are tools that produce sound without programming.

The following video demonstrates the final result and all the steps required.

Digital synthesizers are synthetic instruments that generate sound by outputting a stream of floating point numbers.
These synthesizers typically have numerous parameters, such as frequency (pitch), cutoff frequency of a low-pass filter, frequency of the carrier, and several others.
Often these parameters are not *interpretable*, meaning there is no straightforward connection between a parameter and the sound a synth generates.
Instead, altering multiple parameters concurrently leads to the desired outcome.
In mathematics, we refer to these parameters as residing in a high-dimensional space.

Let’s consider the following scenario: We have a dancer in a rectangular area, such as a room, and we wish to modify the sound generated by a synth based on the dancer’s 2D position. We aim to alter the synth’s parameters while it is playing (modulation) in response to the dancer’s movements. However, the changes in sound must be smooth and non-random.

Unfortunately, a significant problem arises in this situation. The number of parameters exceeds the number of positional values. We only have two coordinates to work with, and establishing a one-to-one mapping between these coordinates and parameters is not feasible. We want to avoid merely selecting two parameters and modifying them according to the dancer’s $x$ and $y$ coordinates.

In mathematical terms, we are looking for a function that receives $2$ values ($x$ and $y$) and outputs $n$ values (one for each parameter), where $n > 2$. We want a function

\[f : \mathbb{R}^2 \rightarrow \mathbb{R}^n.\]Additionally, it is necessary for $f$ to be smooth and meet our musical preferences.
As $f$ maps a lower-dimensional space to a higher-dimensional space and we require a seamless transition, we search for a two-dimensional *surface* in an $n$-dimensional space.
As the dancer moves in the $x$ and $y$ directions, $f$ translates this motion to a movement on the two-dimensional surface in the $n$-dimensional space.

If we have only one coordinate and three parameters, $f$ represents a curve in the three-dimensional space. In this case, we can draw it. Compare Fig. 1.

Figure 1: Sample points and fitted graph of a curve in a three-dimensional space. The length of the curve represents the dancer's single coordinate (let's say x). A point (X,Y,Z) on the curve represents three parameter values.

This is a complex challenge, and several questions arise: How does $f$ look like? How can we implement it?

Before the advent of machine learning, we would have addressed this problem using the “traditional way” of abstract reasoning and programming to implement the function $f$ directly via code. This approach involves roughly six steps:

- observer and analyse the system
- construct a falsifiable mathematical model (which is an imperfect generalization)
- implement a numerical model via code
- calibrate the model
- test your assumption via observation until the model approximates reality adequately
- find unobserved phenomena, which the model implies, in the real world

Newton’s discoveries serve as a classic example, where he observed reality and reasoned about it, generating laws that are incorrect but sufficiently accurate to travel to the moon. He established formulas such as

\[F = m \cdot a,\]which are falsifiable via experiments. With these formulas one can predict unknown phenomena that should appear if the model is reliable.

In our case, we would have conducted a comprehensive analysis of the synth to establish how its parameters interact with one another. This approach would have required a wealth of knowledge across diverse domains such as programming, signal processing, and mathematics. Furthermore, to achieve our desired sound, we would have needed to understand the impact of parameter changes on various aspects of the sound, such as pitch and timbre.

In essence, we would have aimed to create a model of the synth “world” to enable us to reason about its structures and rules. We would have then written code to manipulate the $n$ parameters concurrently, relative to the dancer’s position.

While I value the traditional approach for its ability to provide insights into actual and imaginative structures, it is not the optimal solution for our specific scenario. I believe that creative practitioners do not necessarily avoid analytical work but tend to focus on creation, thereby enabling a more tangible understanding of the problem. I welcome this approach, particularly in the field of machine learning. In contrast, the traditional approach can be challenging and less accessible, particularly in achieving our artistic objectives.

In *machine learning*, the focus shifts to a *data-driven approach*.
Rather than constructing a model by hand through reasoning about the world, we enable machines to learn the model by providing them with data, i.e., observations.

In extreme cases when observation is the model, we cannot provide outputs of unobserved inputs. Therefore, like models constructed manually, machine learning models are an imperfect abstraction of the data on which they are trained. This is the fundamental idea behind machine learning, albeit an oversimplification.

In our scenario, we replace manual modeling with machine learning by defining what we want and letting the machine learning models provided by the Wekinator figure out how to achieve it. We present an algorithm $A$ with examples $D$ that represent our requirements and ask it to “program” a function $f$ that fulfills our needs.

*Machine learning* involves learning $f$ from data $D$ using algorithm $A$, where $A$ is essentially just another function that produces functions:

The Wekinator allows us to choose algorithm $A$ from a list of algorithms and provides a graphical user interface (GUI) for recording $D$ and feeding it into $A$ to compute $f$. The algorithm $A$ is predetermined, and $f$ will be computed. As a result, we need to provide data/observation $D$ by recording it, so let’s get started!

**Disclaimer:**
Utilizing *machine learning* does not imply that we cease rationalizing about the world.
However, I selected this thought-provoking title to accentuate the contrast in tendencies.
Moreover, the effectiveness and characteristics of *machine learning models* are significantly influenced by the quality of the observed data and the choice of algorithm.
Given that our challenge involves a *regression task*, we will employ a *feed-forward neural network*.

In the following, we have a lot to set up since every part of the system is digital and our own creation. However, do not worry if you do not understand the SuperCollider or Processing part. It is more important to understand the principle of OSC communication and the Wekinator. If you are interested in SuperCollider or Processing I can highly recommend checking them out.

First, we need an actual synthesizer that produces sound. I use a synth that randomly produces a short resonating impulse that gets reflected. I want to go into only a few details about the inner workings of the synth. It creates a sound similar to a firework in a city perceived inside a room.

The synth has 6 parameters which are, in this case, explainable:

`\densityleft`

controls the number of impulses in the left speaker.`\densityright`

controls the number of impulses in the right speaker.`\freq`

controls the pitch of the impulse response (i.e., the sound). Higher frequency increases the pitch. (For`Ringz`

it is the frequency at which the impulse resonates.)`\cutofffreq`

controls the cutoff frequency of the lowpass filter. Lower values make the sound more doll.`\decaytime`

controls the time it takes for the impulses to decay, influencing the resonance.`\amp`

controls the signal’s amplitude, i.e., the volume.

`Dust`

outputs the impulse such that `Ringz`

resonates.
The resulting signal gets reflected by `FreeVerb`

which introduces reverberation.

```
(
SynthDef(\fireworks,{
var sig;
sig = Dust.ar([\densityleft.kr(3), \densityright.kr(3)-0.5]);
sig = Ringz.ar(
sig,
freq: \freq.kr(300),
decaytime: \decaytime.kr(0.1)) * \amp.kr(0.55);
sig = FreeVerb.ar(sig, 0.6, 0.9, 0.8);
sig = LPF.ar(in: sig, freq: \cutofffreq.kr(21000));
Out.ar(0, sig);
}).add;
)
```

We can play the synth and manipulate its parameters on the fly. Let’s listen but be warned since the amplitude during this example will change.

```
~fireworks = Synth(\fireworks);
~fireworks.set(\amp, 1);
~fireworks.set(\densityleft, 10);
~fireworks.set(\freq, 400);
~fireworks.set(\decaytime, 0.3);
~fireworks.free();
```

Ok cool, we can play sound.

Next, let’s envision a dancer gracefully moving across the room. While I am not a dancer, nor do I possess sensors to measure a dancer’s position, we can simulate this scenario using Processing. It’s worth mentioning that we could achieve the same result with SuperCollider, as it also features GUI elements. However, I’d like to demonstrate how we can effortlessly integrate multiple systems.

You can download the Processing example at this link. I will provide the code below, but please don’t be intimidated. It simply consists of a draggable green square, accompanied by informative log text displayed on a black window, and an OSC sender. Dragging the green square simulates the dancer’s movement, while the application continuously transmits the rectangle’s central position over the network.

Figure 2: Running Processing sketch.

If you start the sketch in Processing, you should see the window shown in Fig. 2.

```
/**
* REALLY simple processing sketch that sends
* mouse x and y position of box to wekinator
* This sends 2 input values to port 6448 using message /wek/inputs
* Adapated from https://processing.org/examples/mousefunctions.html
* by Rebecca Fiebrink
**/
import oscP5.*;
import netP5.*;
OscP5 oscP5;
NetAddress dest;
PFont f;
float bx;
float by;
int boxSize = 30;
boolean overBox = false;
boolean locked = false;
float xOffset = 0.0;
float yOffset = 0.0;
void setup() {
f = createFont("Courier", 15);
textFont(f);
size(640, 480, P2D);
noStroke();
smooth();
bx = width/2.0;
by = height/2.0;
rectMode(RADIUS);
/* start oscP5, listening for incoming messages at port 12000 */
oscP5 = new OscP5(this,9000);
dest = new NetAddress("127.0.0.1",6449);
}
void draw() {
background(0);
fill(255);
text("x=" + bx + ", y=" + by, 10, 80);
fill(0, 200, 0);
// Test if the cursor is over the box
if (mouseX > bx-boxSize && mouseX < bx+boxSize &&
mouseY > by-boxSize && mouseY < by+boxSize) {
overBox = true;
if(!locked) {
stroke(0, 255, 0);
fill(0, 255, 0);
}
} else {
stroke(0, 255, 0);
fill(0, 255, 0);
overBox = false;
}
// Draw the box
rect(bx, by, boxSize, boxSize);
//Send the OSC message with box current position
sendOsc();
}
void mousePressed() {
if(overBox) {
locked = true;
fill(0, 255, 0);
} else {
locked = false;
}
xOffset = mouseX-bx;
yOffset = mouseY-by;
}
void mouseDragged() {
if(locked) {
bx = mouseX-xOffset;
by = mouseY-yOffset;
}
}
void mouseReleased() {
locked = false;
}
void sendOsc() {
OscMessage msg = new OscMessage("/wek/inputs");
msg.add((float)bx);
msg.add((float)by);
oscP5.send(msg, dest);
}
```

When we start the Processing sketch, it shows a green square inside a black window. The window represents our room, and the green square the dancer. You can drag the dancer around.

The only crucial part of the Processing sketch’s code is the transmission of OSC messages..

```
oscP5 = new OscP5(this,9000);
dest = new NetAddress("127.0.0.1",6448);
```

These three lines of code enable the sketch to listen for OSC messages over the network using the IP address `"127.0.0.1"`

, which represents the local IP address.
This means that the sketch listens for messages originating from your device.
It receives messages on port `9000`

and transmits them to port `6448`

.
The listening port is inconsequential, as the dancer never responds to incoming signals.
The IP address and sending port are crucial because they must correspond with the numbers we will employ in the Wekinator.

The following code

```
void sendOsc() {
OscMessage msg = new OscMessage("/wek/inputs");
msg.add((float)mouseX);
msg.add((float)mouseY);
oscP5.send(msg, dest);
}
```

sends the coordinates $x$ and $y$ of the mouse (with respect to the window) to the port `6448`

and the path `'/wek/inputs'`

.
OSC uses these paths such that it is possible to differentiate different types of messages that got sent to some port.

Now we have to set up OSC communication of our sound generating system, i.e. SuperCollider.
Let us first listen to port `6448`

and path `'/wek/inputs'`

and let’s just print the raw data we perceive, i.e., the $x$ and $y$ values of our dancer (the green square).

```
(
OSCdef(
\getCoords,
{
arg val; val.postln;
},
'/wek/inputs',
recvPort: 6448
);
)
```

By executing this line in SuperCollider while the Processing sketch is running, you should see OSC messages on the post window. These messages look like this:

```
[ /wek/inputs, 248.0, 185.0 ]
```

Now, let’s take it a step further and modify two synth parameters based on these values. The following code maps $x$ and $y$ to a suitable range. While $x$ ranges from 0 to 650 and $y$ from 0 to 460, we aim to obtain values between 0.1 and 20.

```
(
OSCdef(
\getCoords,
{
arg val; var x, y;
x = val[1];
y = val[2];
~fireworks.set(\leftdensity, x.linlin(0, 650, 0.1, 20));
~fireworks.set(\rightdensity, y.linlin(0, 460, 0.1, 20));
},
'/wek/inputs',
recvPort: 6448
);
)
```

The impact of this code should be noticeable in the audio output. When the dancer is positioned at the top left, fewer impulses are produced. If situated at the bottom right, impulses emit from both speakers, while being at the bottom left results in only the right speaker activating. This effect may not seem extraordinary, as our function $f$ is quite elementary, involving a linear mapping from one interval to another.

We now introduce the Wekinator in the middle of the communication. That is, the dancer sends their position to the Wekinator, and SuperCollider listens to the messages from the Wekinator and changes the values of the synth accordingly. The Wekinator translates positions into synth parameters, and it realizes the function $f$. The 6 output signals

\[(v_1, \ldots, v_6) = f(x,y)\]are sent to SuperCollider.

Figure 3: Overview of all the connected parts.

First, we need to modify the port SuperCollider listens to by changing the line from `recvPort: 6448`

to `recvPort: 7448`

.
Additionally, we should update the path to `'/wek/outputs'`

as a reminder that we are receiving output signals from Wekinator.

Secondly, we must utilize the values we receive. Wekinator consistently sends a value between 0 and 1 for each dimension.
As a result, we need to map the interval [0;1] to appropriate synth values.
This step is critical and necessitates some understanding of the synth.
I employ the following mapping (note that we skip `val[0]`

since it represents the OSC path):

```
(
OSCdef(
\getCoords,
{
arg val;
~fireworks.set(\densityleft, val[1].linlin(0, 1, 0.1, 20));
~fireworks.set(\densityright, val[2].linlin(0, 1, 0.1, 20));
~fireworks.set(\freq, val[3].linlin(0, 1, 100, 700));
~fireworks.set(\amp, val[4].linlin(0, 1, 0, 2));
~fireworks.set(\decaytime, val[5].linlin(0, 1.0, 0.01, 1.0));
~fireworks.set(\cutofffreq, val[6].linlin(0, 1, 200, 20000));
},
'/wek/outputs',
recvPort: 7448
);
)
```

**If you use SuperCollider be careful with your choices and protect your ears** since it will try to use even unreasonable values like an amplitude of 10 or higher.

Now we start the Wekinator.
First, we have to specify the port for the input signals of $f$, i.e., the dancer’s position.
This is equal to `6448`

, and the OSC path is `'/wek/inputs'`

.
Then we have to specify the port of the output signal $f(x,y)$, i.e., the port used in SuperCollider, which is `7448`

furthermore, we specify a path `'/wek/outputs'`

such that we do not confuse input and output.

Figure 4: Wekinator after it has started.

Furthermore, we have to tell the Wekinator about the number of inputs and outputs, i.e., 2 and 6, respectively.
After everything is set up, we can click on `Start Listening`

.
Then we can click `Next`

.

Now you will see the following screen.
In the top left `OSC In`

should be green, since the Processing sketch is sending messages.

Figure 5: Wekinator before training.

If this is not the case, you either forgot to click `Start Listening`

, or the port is already used.
In that case, change the port in the Wekinator as well in your Processing sketch and restart the sketch.
You also should see 6 sliders.
If you manipulate these sliders, `OSC Out`

should turn green, indicating that we are sending OSC messages to SuperCollider.

Furthermore, the sound should change accordingly.
Make sure your amplitude is not zero.
Now you can play around with the sliders or press the `random`

button until you hear something you like.
Of course, you have to remember which slider represents which parameter.

To connect the sound to the dancer’s position, we have to

- record samples, i.e., construct the data set $D$
- train the model, i.e. compute $f = A(D)$.

To construct $D$ we need multiple tuples

\[(\text{input}, \text{output}) = ((x,y), (v_1, \ldots, v_6)).\]First, we move the square to a desired position and choose a set of desired parameters $(v_1, \ldots, v_6)$.
Next, we click `Start Recording`

, and after a few seconds, we press `Stop Recording`

.
In doing so, we generate portions of the data $D$.
We repeat this step multiple times until we have accumulated sufficient data $D$.

Once completed, we press `Train`

to compute

This process should only take a few seconds.

Finally, we can utilize $f$ by clicking `Run`

.
At this point, both `OSC In`

and `OSC Out`

indicators should be green, and as you move the square around, the sound should change accordingly.
Moreover, all parameters (assuming they were all manipulated during recording) should transition smoothly.

Keep in mind that we did not specify any algorithm A.
By default, Wekinator employs a *feed-forward neural network* and assumes a *regression task*.
This means that the output $f(x,y)$ is continuous and does not represent an element within a finite set of classes.

You can choose algorithms by modifying the `Type`

setting, as shown in Fig. 4.
This selection includes classification algorithms.
For instance, if you wish to establish a mapping between a gesture captured by your webcam and a specific sample, this would be a classification task. Another example would be classifying the type of instrument being played.

Wekinator has some limitations regarding the range of algorithms it offers, as the user lacks control over the model architecture and the model’s hyperparameters.

Wekinator is an outstanding tool that makes certain aspects of machine learning accessible. Its compatibility with OSC allows for seamless integration into various systems. It is an ideal fit for quick experimentation and serves as a valuable tool for teaching creative practitioners the fundamentals of machine learning on an intuitive level.

However, it does feel somewhat dated and experimental, offering no control over the hyperparameters of its built-in models.
Additionally, it does not utilize the latest software libraries, such as `PyTorch`

or `TensorFlow`

.
Being a Java application, it supports all operating systems, but `Java`

is a relatively uncommon programming language in the field of machine learning.

The concept behind Wekinator is exceptional, and it shouldn’t be too challenging to create a similar tool in Python, enabling artists or developers to integrate their own PyTorch or TensorFlow models. It supports the goal of making machine learning accessible to everyone and provides valuable insights into the features that ML tools for non-experts should offer. Perhaps we can develop new tools to break down even more barriers, allowing practitioners and developers to learn from one another.

In any case, if you’re interested in experimenting with simple machine learning models that process various types of generated input, give Wekinator a try.

- Fiebrink, R. (2019). Machine Learning Education for Artists, Musicians, and Other Creative Practitioners.
*ACM Trans. Comput. Educ.*,*19*(4). https://doi.org/10.1145/3294008 - Hilton, C., Plant, N., González Dı́az Carlos, Perry, P., Gibson, R., Martelli, B., Zbyszynski, M., Fiebrink, R., & Gillies, M. (2021). InteractML: Making Machine Learning Accessible for Creative Practitioners Working with Movement Interaction in Immersive Media.
*Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology*. https://doi.org/10.1145/3489849.3489879

Whereas the human mind, conscious of its conceived purpose, approaches even an artificial system with a selective attitude and so becomes aware of only the preconceived implications of the system, the computers would show the total of available content. Revealing far more than only the tendencies of the human mind, this nonselective picture of the mind-created system should be of significant importance. – Gerbert Brün (1970)

In this series of posts, I will talk about melody generation using *machine learning*. I start by reflecting on computer music. Furthermore, I give a short and selective overview of some techniques to generate melodies that captured my interest, focusing on *recurrent neural networks (RNN)* to generate *monophonic* melodies for now.

I intend to build an interactive system that facilitates the musical dialog between humans and machines. In the end, a user should be able to interrogate the machine learning model such that an evaluation of the model’s ability is possible and maybe – a big maybe – users can enhance their creative process.

Due to the recent advances in *machine learning*, especially in the domain of *deep learning*, *computer music* is regaining attraction.
What started in the early 1960th, when *rule-based* methods such as *Markov chains*, *hidden Markov models (HMM)*, *generative grammars*, *chaotic systems*, and *cellular automata* were applied to generate musical compositions, slowed down over the years.
Of course, since then, computers have always accompanied music production, but algorithms have taken a backseat when it comes to the structure of music itself.
Today, a reinvigoration of *computer music* might happen, but what will be at the forefront?
The algorithm? The artist? Or a massive amount of data?

In his book *Algorithmic Composition: Paradigms of Automated Music Generation* (Nierhaus, 2009) *Gerhard Nierhaus* makes an essential distinction between **genuine composition** and **style imitation**.
This terminology can bring nuances into the discussion around machines, algorithms, and data replacing human creativity.

**Style imitation** is more applicable when music is not the main focus of interest but amplifies or accompanies something else, e.g., ads, computer games, or movies.
In a computer game, you want music that captures the dynamic situation of the game; thus, surprises are undesirable.

**Genuine composition** appeals more to the artist who wants to play, experiment, and break things or criticize narratives.
*Genuine composition* seduces the audiance’s desire for experimentation, confrontation, and reflection on something new and unexpected.

While *genuine composition* carries much novelty, *style imitation* can be categorized into a set of preexisting styles of musical compositions.
The results generated by different techniques lean more or less towards one or the other.
For example, compositions generated by *Markov chains* (see Probabilistic Melody Modeling) lean more towards *style imitation* since the model samples from a learned distribution to estimate the structure of a particular composition.
On the other hand, using *cellular automata* is a technique more suited for *genuine composition* because no learning is involved.
Instead, *cellular automata* are *dynamic systems* capable of complex emergent behavior that can not be explained by solely looking at the parts of the system.
Of course, there is no black and white.
Instead, each composition or technique lives on a spectrum, and it might be impossible to decide where exactly.
If a *machine learning model* learns some non-obvious high-level structure, i.e., a probability distribution, from its training data, does this yield novelty or not?
It depends, I guess.

However, this “problem” is old, isn’t it?
Looking at the history of classical Western music, there are moments in time when it seemed that nothing was left to say and *style imitation* was inevitable.
For example, after one of the most quintessential romantic composers, *Richard Wagner*, there were view possibilities left concerning tonal music.
Consequently, the need for more chromaticism and dissonance was unavoidable, leading to the extreme case of Schönberg’s atonal compositions.

Concerning *artificial intelligence*, our current century is an era of *deep learning models* for which model size, data, parameters, and training time are more important than the underlying algorithm.
These models can combine, alter, and re-synthesis music from recorded history.
This re-combination and alteration seem similar to human composers’ working process, which is primed by their environment, education, and culture.
There is no doubt that these models can create music via *style imitation* but how good are they regarding *genuine composition*?
Is not learning from existing data *style imitation* per definition?
Well, again, even if all the data is known, one can argue that there are hidden high-level structures within the data that we are entirely unaware of.
A prime example is discovering a new way to multiply small matrices found via *reinforcement learning* (Fawzi et al., 2022).

If machines, revealing these structures, are creative is a question that can hardly be answered objectively, but, after all, the question might not be of great interest.
I instead look at the human condition and ask what such machines mean for our being in the world.
And here, we can say that if we do not try to understand our new black boxes, we rob ourselves of the experience of understanding, which, I think, is a fundamental desire.
But at the same time, if we do not give up understanding but enhance it with machine power (seen as a tool), the reverse can be true.
Beyond a lack of understanding, the lack of experienced practice is the more obvious loss we suffer.
The uncomfortable mood I sometimes develop when the achievements of AI shimmer through my perception comes from a fear of losing my habituated being in the world.
I ask myself: will machines write my code? Will I lose the enjoyment that I get out of coding?
Will I lose my identity as a programmer?
However, this loss has more to do with technology and economics.
We think of technology as a natural evolutionary force that points towards a better future.
And due to economic necessities, we have to adapt to stay productive.
However, without the industrial revolution, I would never have been able to enjoy programming in the first place.
Therefore, fear and excitement are reasonable reactions to this *time of disruption*.

But let’s come back and go back to the actual topic.
One of the challenges AI researchers face today is not so much the generation of novelty but the evaluation of it, especially in the case of aesthetics, which is somewhat subjective.
Within the generative process, the machine has to “judge” if the piece is engaging.
Otherwise, the whole process is nothing more than a sophisticated but impracticable random search through the space of all possible compositions.
However, interest depends on the cultural context.
Maybe Schönberg’s atonal, thus ungodly music, would not have attracted anybody in the 13. century.
Judging if novelty is sensible seems to require an *embodiment* in the world.
Therefore, as an art form, I find fully automated **opaque** music-generating systems that lack human interaction undesirable because they, in the end, lead to a mode of passive consumption.
Instead, human intelligence is needed to bring *intentionality* into the composition.
I imagine an interconnected relationship where *artificial communication* guides humans and machines to new places.

Even though machines are still unable to develop intentionality of their own, composing is a partially rigorous and formal endeavor; thus, algorithms and machines can certainly help with realizing intentions. Contrary to popular belief, music and computer science / mathematics are much more related than they seem. And there is no reason to believe that algorithms can not only be aesthetically and intellectually beautiful but can also evoke emotions.

However, *computer music* is diverse, and as an amateur, I should be careful in my assessment.
I encountered *rule-centered computer music*, highly interactive *live programming* (which is also a movement), *real-time music generation*, *sonification*, and *data-centered computer music*.
While *rule-centered music generation* is *transparent* and offers low-level control, *data-centered generation* is often *opaque* and offers only high-level control, at least, up to this date.
*Live programming* is highly interactive and relies on *real-time communication* between performer and audience as well as *real-time artificial communication* between performer and machine.

*Communication* between humans and machines excites me the most since it offers the possibility to reflect on algorithms, programs, machines, data, and technology in general.
It goes beyond analytical contemplation by making algorithms experiencable.
Furthermore, learning from human feedback to calibrate generative models such that they represent our experienced ups and downs, twists and turns in music by injecting *intentionality* in order to steer generative models might be the way to go.
The term **co-pilot** comes to mind.

For the interested reader, the survey *AI Methods in Algorithmic Composition: A Comprehensive Survey* (Rodriguez & Vico, 2014) and *A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions* (Ji et al., 2020) offer an excellent first overview of the different techniques up to the year 2014 and 2020, respectively.

Let us first reflect on the question of why melody generation is tricky.
First of all, in music, intervals are everything.
It does not matter so much if one plays a C or B; what matters is the relation of notes within a piece, i.e., playing A-C or A-B.
Intervals are so ingrained into music that musicians give them certain names such as *minor third*, *perfect fifth*, or *tritone* (the Devil’s tone).
It is no coincidence that *major* and *minor* chords, as well as most *scales*, are asymmetrical because it gives each note a distinctive quality within a chord or a scale.

In addition, this relation is multi-dimensional. We have melody, i.e., playing note after note horizontally, and harmony, i.e., vertical relations, for which we play notes together (chords). On top of that, what happens in measure 5 may directly influence what happens in measure 55, without necessarily affecting any of the intervening material. These relations can be all over the place and span a very long sequence! Good music hits a sweet spot between repetition and surprise, and landing on that spot is quite challenging. Together, these properties make melody (or polyphonic) generation hard.

Many artists and researchers took on the challenge, and *machine learning techniques* increasingly play an essential role.
The learning and generation are based on two major categories of representations: either **symbolic notation** or **performed pieces** of music.

Classical music and most other kinds of music as we know them came into being through the music generation process depicted in figure 1.
Composers write a score that leads to the performing musicians (the execution of the score) who are exciting their instruments.
This excitement causes the instrument to vibrate, and due to its physicality, it pushes air molecules around.
Molecules bump into each other in some areas and tend to leave other areas—a displacement we call *wave*.
In principle, the energy of the excitement travels outwards through the air to the listener’s ear.
The information gets more and more concrete.
While a score is an abstract representation, the sound caused by instruments is very concrete.
It is, in fact, so concrete that it is different for every listener since each of us perceives sound slightly differently.

Abstraction removes us from the messy reality and enables communication, but it also conceals the singularity of beings.
Concerning *melody generation* we have to ask how much abstraction is justified.
A score represented by a note sheet without any information about the dynamics is “more” *symbolic* or abstract than a note sheet that contains hints for the tempo and loudness.
Raw audio material is far away from the much more abstract *symbolic notation* and a lived thorough performance is a singular event.

Figure 1: Music generation process.

Using *symbolic notation* is more accessible, but one loses the slight variations, dynamics, and other qualities that make the result feel so humanly made.
However, composers who want to use *melody generation* to support their process might be happy to introduce these qualities by themselves.
It is a question of application.
The same is true for *the mono-* and *polyphonic* generations.
Do we want to generate a full-fetched performance, a melody snippet (monophonic), or something in between?
In this blog post, I will focus on the *monophonic* generation (one note at a time).

In my post Probabilistic Melody Modeling, I used a *(discrete) first-order Markov chain (MC)* (also called *Markov process*) to generate melodies after learning the model from one piece of music.
The approach was simple: translate the frequency of note transitions into probabilities.
For example, if the piece is defined by the note sequence A-B-B-A-C-A-B and the duration of each note is the same, then

To determine which note comes next, we only look at the previous note, i.e. at a very narrow context.
Using a *first-order Markov model*, one would estimate the probability of the melody (not considering note duration) A-B-F by

In this example, a state is defined by a note, e.g. A.

*Hidden Markov models (HMM)* are *Markov chains* with *hidden states*.
There are also a finite number of states, probabilistic transitions between these states, and the next state is determined by the current state, but we are unsure in which state the process is currently in.
The current hidden state $Z_t$ *emits* an *observation* $X_t$.
In other words, instead of going from one observed variable to the next, e.g., one note to the next, we move from one **distribution** of observations to the next, e.g. from one distribution of notes to the next!

For example, imagine a prisoner who has to estimate the outside weather (*hidden state*) by observing the dirt on the guard’s boots (*emissions*).
If he knows all the probability transitions (e.g., from sunny to rainy, sunny to dirty on boots, etc.) and existing states, the prisoner could model the problem by an HMM, compare Figure 2.

Figure 2: Hidden Markov model with 2 hidden states (sunny, rainy) and 2 observation variables (clean, dirty). The initial state is either sunny or rainy with 0.5 probability.

The prisoner could ask: given the HMM and an observation sequence $X_0, \ldots, X_n$, what is the likelihood that this sequence occurs (*likelihood problem*)?
One could also ask: what are the hidden states $Z_0, \ldots, Z_n$ that “best” explains the observations (*decoding problem*).
Moreover, what we are more interested in is: given an observation sequence $X_0, \ldots, X_n$, learn the model parameters $\theta$ that maximizes the likelihood for our observation (*learning problem*)!
Similar to neural networks, one defines the *architecture* of the HMM (states and transitions), then solve the *learning problem* and infer new melodies from the learned HMM.
In music, hidden states often lack interpretability.
Therefore, it needs to be clarified which architecture one should choose.

*HMM* is a special case of *dynamic Bayesian networks* where a single hidden state variable represents the entire state of the world.
With respect to music, using *hidden Markov models* we can model more abstract states that we can not directly observe, for example,

Using *higher-order MC* or *HMM* widens the context to multiple notes of the past.
For our MC example A-B-F this would mean the probability changes to

But due to the chain property (linearity), this does not necessarily lead to better results since widening the range leads very quickly to overfitting, i.e., the model reproduces more or less exact replica because it does not generalize—more is not always better. In any case, the learning stays stepwise causal, i.e., one note after the other without jumping around. By focusing on linear temporal dependencies, these models need to take into account the higher-level structure and semantics important to music. By model design, HMMs have very limited memory and are thus also incapable of modeling the longer term structure that occurs in original musical pieces.

Their applications in musical structure generation goes back to *Harry F. Olson* around 1950, and in 1955, the first machine produced Markov models of first and second order in regard to pitches and rhythm (Nierhaus, 2009).
In (Van Der Merwe & Schulze, 2011), the authors used first-, higher-, or mixed-order MCs to represent chord duration, chord progression, and rhythm progression and first-, higher-, or mixed-order HMM to describe melodic arc.

A more rigorous and recent discussion, including *polyphonic* generation, be found in (Collins et al., 2016).

As stated, the challenge is primarily due to long-term relations.
One way of tackling this issue is to increase the memorizing capability of the model.
With this in mind, looking at the catalog of model types within the field of *deep learning* one can spot multiple alternatives to *Markov chains* for melody generation.

One obvious choice is the long short-term memory recurrent neural networks (LSTM) (Hochreiter & Schmidhuber, 1997), a type of recurrent neural network (RNN) that allows information to persist for a longer time by letting it flow almost directly through time.

In theory, *vanilla RNNs* can learn the temporal structure of any signal, but due to computational reasons (vanishing/exploding gradients), they can not keep track of temporally distant events.
If you want to learn more about RNNs, I highly recommend the blog article The Unreasonable Effectiveness of Neural Networks by Andrej Karpathy.
In his blog post, he writes:

RNNs combine the input vector with their state vector with a fixed (but learned) function to produce a new state vector. This can, in programming terms, be interpreted as running a fixed program with certain inputs and some internal variables. – Andrej Karpathy

Figure 3: Sketch of an RNN unfolded in time.

RNNs are similar to *multilayered perceptrons* (MLPs) but allow for connections from the output of one unit into the input of another unit located at a shallower layer than itself, i.e., closer to the input of the network.
The information no longer flows acyclic through the network.
Instead, recurrent feedback is introduced and allows an RNN to take into account its past inputs together with new inputs.
Essentially, an RNN predicts a sequence of symbols given an input sequence.
But using an RNN is like writing a thousand letters on the same piece of paper and then figuring out the information contained in the first letter—it is a mess; the information gets washed away.
In Figure 3 the basic components of an RNN are depicted.
Using my analogy, the piece of paper are the matrices $U,V,W$ which are learnable parameters.
**They are shared through time!**

The extension, motivated by the shortcomings of *vanilla RNNs*, are *LSTM RNNs* or just *LSTMs*.
LSTMs **learn** which information they should keep in long-term memory and which information they can forget after a short period.
Instead of just writing all the letters on the piece of paper, we use our rubber and get rid of some writings while highlighting other passages.

Figure 4: A sketch of an LSTM cell.

There is plenty of good material which explains LSTMs much more accurately than I can ever do. Figure 4 shows a sketch of a very complicated-looking LSTM cell where each green square is a linear transformation, each red bar indicates a sigmoid activation, and blue bars indicate a tanh activation. All the sigmoid activations are used to control the memorizing strategy (rubber and highlighter). First, the cell “decides” what to keep in the long-term state $\mathbf{c}_{t-1}$ via $f_t$.

Then $i_t$ decides what to add to the long-term state. In addition, $o_t$ decides what part of the new long-term state will make up the short-term state $\mathbf{h}_t.$ The importance is that along the path from $\mathbf{c}_{t-1}$ to $\mathbf{c}_{t},$ there is only a simple multiplication and addition! Therefore, information can persist for longer.

Note, however, that LSTMs can still access information of time step $t$ only via time step $t-1$.
There is no direct access to information compared to the *attention mechanism* (Bahdanau et al., 2014), and the *transformer* (Vaswani et al., 2017), which led to the most recent breakthroughs in the field of *deep learning*.
Just a few days ago, another RNN, called RWKV-LM, claimed to achieve similar results; thus, the last word has yet to be spoken.

In (Todd, 1989) *Peter M. Todd* used a *vanilla RNN* to generate melodies.
Various issues are discussed in designing the network.
Note that at this time in the young history of machine learning, there were no user-friendly software libraries available, and there was much more thinking going on in designing the network on a fine-grained level.
The author also suggests using *multi-layered perceptrons (MLPs)* by explicitly modeling time and either showing the whole piece to the network or using a sliding-window approach, e.g., showing one bar at a time and predicting the next bar.
In the end, Todd uses an RNN with 15 hidden units and 15 output units (1 note-begin unit and 14 pitch units) trained for 8500 epochs (cycle through the entire training set) using a handful of sliced melodies.
Fewer hidden units led to more training.
By today’s standard, Todd’s network was tiny.
Todd assumes the key of C from D4 to C6.
The *note-begin unit* indicates if the note begins or it merely continues.
Interestingly, Todd discusses a *pitch-interval* representation.
Instead of outputting the actual pitch values, the network outputs the relative transitions (intervals/pitch changes) in semitones.
For example, instead of A-B-C the network outputs A-(+1)-(-3).
The advantage is that outputs are more *key-independent* and can range over an extensive range of notes even if there are only view output units.
This allows for the transposition of an entire melody simply by changing the actual initial pitch (which need not even be produced by the network but could be specified elsewhere).
In the end, he decides not to use it because of possible errors in the production since a single error would make the whole melody flawed.

One of the very first applications of LSTMs for the generation of music was introduced by *Dougnles Eck* and *Jürgen Schmidhuber* (Eck & Schmidhuber, 2002).
The authors state that “most music has a well-defined global temporal structure in the form of nested periodicities or meter”.
In a walz, important melodic events occur every three-quarter notes (or every first note of a bar).
Chord changes occur more slowly and are most often aligned with the bars.

For this reason, one can say of music that some notes are more important than others: in general, a learning mechanism should spend more resources on metrically-important notes than others. – (Eck & Schmidhuber, 2002)

The authors use a *time-sliced representation*, i.e., each step/event represents the period (for example, a quarter note).
They use one input/output unit per note, making it implicitly *polyphonic* and avoiding an artificial distinction between melody and chords.
However, to keep this property, the authors do not distinguish between a note that is retriggered or a note that is held because they would require this information for each input/output note.
They randomly selected 4096 12-bar blues songs (with 8 notes per bar).
Their network for learning chords consists of 4 cell blocks containing 2 cells; each is fully connected to the other.
Their network for learning chords and melody consists of 8 cell blocks containing 2 cells.
4 of the cell blocks are fully connected to the input units for chords.
The other four cell blocks are fully connected to the input units for the melody.
The chord cell blocks have recurrent connections to themselves **and** to the melody cell blocks.
However, melody cell blocks are only recurrently connected to melody cell blocks.
Therefore, the authors assume that the melody influences chords but not the other way around.
Again, the network is relatively small.

FolkRNN was introduced in 2016 (Sturm et al., 2016).
It is an LSTM trained with 23 000 music transcriptions expressed with the common ABC notation.
The authors discuss different perspectives on their results, keeping the musician and the process of composition in mind.
Their human-machine interaction is on the side of composing.
The user can ask for a melody and can adjust it; you can experiment with their models here.
Their LSTM consists of **3 hidden layers** with **512 LSTM blocks**, each leading to approximately **5.5 million parameters**, i.e., a big jump from the works we discussed before.

Daniel Johnson (Johnson, 2017) created what he calls Bi-axial LSTM, many two-layered LSTMs (stacked in on the note-axis) with connections along the note-axis and recurrent connections along the time-axis followed by feed-forward layers (third and fourth non-recurrent layer) across notes.
Each of the stacked LSTMs receives the input for one specific note.
The model supports polyphonic music.
Furthermore, Johnson’s architecture and input format allows the model to learn the musical concept of *translation invariance*, e.g., increasing each note of a piece by one semitone keeps the main qualities unchanged, which is very different compared to text translation.

The model is inspired by convolutional neural networks since they are quasi-invariant with respect to translation.
It is not completely clear to me how many LSTM blocks the model consists of.
I think there ar **2 LSTM layers with 300 blocks each** and **2 non-recurrent layers with 100 and 50 units**, respectively.
In (Kotecha & Young, 2018), the authors refined Johnson’s technique.
You can listen to some of Johnson’s results on his blog.
Despite being a seemingly small contribution, Johnson’s ideas influenced a lot of work in this field.
His architecture and input/output modeling is insightful and may evoke different ideas.
I highly recommend reading his blog post.

In 2016 Melody RNN was introduced within Google’s open source project Magenta.
One of the project’s stated goals is to advance state of art in machine intelligence for music and art generation.
*Melody-RNN* is a simple dual-layer LSTM model.
In fact, there are four different versions of *Melody RNN*, which offers me the possibility to look at increasingly complex/sophisticated solutions.
Each is able to generate **monophonic** melodies:

**(1) Basic RNN**: The *basic dual-layer LSTM* uses basic *one-hot encoding* to represent extracted melodies as input to the LSTM and fulfills the role of a baseline.
One-hot encoding means that to represent $n$ different objects one uses a binary vector of size $n$ where the $k$-th element is represented by a vector

For training, all the data is transposed to the MIDI pitch range $[48..84]$.
The output/label was the target next event (note-off, no event, note-on for each pitch), i.e., one value for each pitch (a vector).
Looking at the code, I assume they use 128 units for each layer.
`MelodyOneHotEncoding`

and `KeyMelodyEncoderDecoder`

can be found here.

```
DEFAULT_MIN_NOTE = 48
DEFAULT_MAX_NOTE = 84
...
MelodyRnnConfig(
generator_pb2.GeneratorDetails(
id='basic_rnn',
description='Melody RNN with one-hot encoding.'),
note_seq.OneHotEventSequenceEncoderDecoder(
note_seq.MelodyOneHotEncoding(
min_note=DEFAULT_MIN_NOTE,
max_note=DEFAULT_MAX_NOTE
)
),
contrib_training.HParams(
batch_size=128,
rnn_layer_sizes=[128, 128],
dropout_keep_prob=0.5,
clip_norm=5,
learning_rate=0.001
)
)
```

**(2) Mono RNN**: Similar to *basic* but uses the full MIDI pitch range, i.e. $[0..128]$.

```
...
note_seq.OneHotEventSequenceEncoderDecoder(
note_seq.MelodyOneHotEncoding(
min_note=0,
max_note=128
)
),
...
```

**(3) Lookback RNN**: The third one, the *Lookback RNN*, extends the inputs and introduces custom outputs/labels, allowing the model to recognize patterns that occur across 1 and 2 bars quickly.
Therefore, the input is extended to events from 1 and 2 bars ago.
Furthermore, the authors add the information on whether the last event was repeating the event from 1 or 2 bars before it, which allows the model to more easily recognize if it is in a “repeating sequence state” or not.
Finally, they borrow again from (Johnson, 2017) what he calls **Beat**.
The idea is to add the position within the measure represented by a sort of binary clock, i.e., $(0,0,0,0,1)$ followed by $(0,0,0,1,0)$ followed by $(0,0,0,1,1)$ and so on (but they use -1 instead if 0).
I am unsure why they call their last trick *custom label* since it is more like a compression of information.
Event labels (i.e., the next value the model should output) are replaced by “repeat bar 1” or “repeat bar 2” if repetition was found in the data.
This is a clever trick!
Overall the author introduces more structure explicitly and compresses some of the information to ease the learning process.
Note that the input designed by (Johnson, 2017) is much more complicated.
He provides (for each note) **Pitchclass** of the notes played, **Previous Vicincity** (what surrounding notes were played before), and **Previous Context** (carnality of the played pitch class).

```
...
note_seq.LookbackEventSequenceEncoderDecoder(
note_seq.MelodyOneHotEncoding(
min_note=DEFAULT_MIN_NOTE, max_note=DEFAULT_MAX_NOTE
)
)
...
```

**(4) Attention RNN**: The last RNN from this series of RNNs is the *Attention RNN*.
It introduces the use of the attention mechanism (Bahdanau et al., 2014) to allow the model to more easily access past information without storing it in the RNN cell’s state, i.e., its long-term memory.

```
note_seq.KeyMelodyEncoderDecoder(
min_note=DEFAULT_MIN_NOTE, max_note=DEFAULT_MAX_NOTE),
...
attn_length=40,
clip_norm=3,
...
```

The attention mechanism within an RNN gives it the ability to learn the importance of relations between symbols within the sequence so that it can more easily access the “important” information. For example, to figure out the last word in the sentence,

I am from Germany, and I eat a lot of pizza. I speak [???]

the model should learn that the word “speak” (in this context) should put a lot of attention to the words “I”, “am”, “from”, “Germany” but not so much attention to “pizza”.

Originally this was introduced to an *encode-decoder RNN*.
*Attention RNN* uses attention for the outputs of the overall network.
The model always looks at the outputs from the last $n=40$ steps when generating output for the current step.
This “looking” is realized by an *attention mask* which determines how much attention is spent on what step of the past.

The columns of $H$ are the $n=40$ hidden states $h_{t-n}, \ldots, h_{t-1}$. So instead of seeing only the hidden state $\mathbf{h}_{t-1}$ the RNN is looking at

\[\hat{\mathbf{h}}_{t} = \sum\limits_{i=t-n}^{t-1} a_{t,i} h_i,\]where $a_{t,i}$ is a component of $\mathbf{a}_t$ and $\mathbf{c}_t$ is the current step’s RNN cell state.
$a_{t,i}$ is the amount of attention spent to the hidden state $h_i$.
$W_1, W_2$ and $\mathbf{v}$ are learnable parameters.
This $\hat{\mathbf{h}}_t$ vector is then concatenated with the RNN output from the current step, and a linear layer is applied to that concatenated vector to create the new output for the current step.
Furthermore, $\hat{\mathbf{h}}_t$ is injected into the input of the next step.
Both concatenations are transformed via a *linear layer* directly after concatenation.

In a technical report (Lou, 2016) Lou compares *Attention RNN* with the *Bi-axial LSTM* and comes to a conclusion that, as many RNNs, *Attention RNN* quite often falls into the over-repeating rabbit hole when generating pieces longer than 16 seconds.
The Bi-axial LSTM gives better rhythmic music composition due to the property of time- and note-invariant, but it takes longer to train.

In (Hadjeres & Nielsen, 2018), the authors from Sony introduced Anticipate-RNN, a stacked LSTM that allows the user to introduce constraints that are especially interesting for composers who want to set certain notes interactively.
The first RNN/stack works from right to left, and the second from left to right.
The idea is that the first RNN outputs the combined constrained that increases from right to left since when the second RNN generates the melody from left to right, it has to respect the most constraints at the beginning of the sequence.
The input for the first RNN is basically a constraint, i.e., a note or nil (if unconstrained), and the input for the second RNN is a note concatenated with the output of the first RNN.
In (Hadjeres & Crestel, 2021), this idea is extended on but with a *constrained linear transformer*.
Furthermore, the author provides a DAW plug-in, The Piano Inpainting Application (PIA), that enables real-time AI assistance when composing polyphonic music in a *digital workstation (DAW)*.
I will talk about *transformers* later in this series.

In (Jiang et al., 2019), the authors use a bidirectional LSTM model to compose polyphonic music conditioned on near notes, which surround the target note from the time dimension and the note dimension. Their work heavily borrows from (Johnson, 2017), but the bidirectional property allows the harmonization to access tonal information of the near future as well as the near past. This makes sense since, in many cases, a note depends on a future note, e.g., a chromatic transition where we already know where we want to end up but have to figure out how to get there. In addition, they propose a new loss function and allow the user to provide a musical context in the form of a custom chord. They report a better convergence rate compared to the Bi-axial LSTM. I was unable to find the code of their implementation.

- Nierhaus, G. (2009).
*Algorithmic Composition - Paradigms of Automated Music Generation*. SpringerWienNewYork. - Fawzi, A., Balog, M., Huang, A., Hubert, T., Romera-Paredes, B., Barekatain, M., Novikov, A., R. Ruiz, F. J., Schrittwieser, J., Swirszcz, G., Silver, D., Hassabis, D., & Kohli, P. (2022). Discovering faster matrix multiplication algorithms with reinforcement learning.
*Nature*,*610*(7930), 47–53. https://doi.org/10.1038/s41586-022-05172-4 - Rodriguez, J. D. F., & Vico, F. J. (2014). AI Methods in Algorithmic Composition: A Comprehensive Survey.
*CoRR*,*abs/1402.0585*. http://arxiv.org/abs/1402.0585 - Ji, S., Luo, J., & Yang, X. (2020). A Comprehensive Survey on Deep Music Generation: Multi-level Representations,
Algorithms, Evaluations, and Future Directions.
*CoRR*,*abs/2011.06801*. https://arxiv.org/abs/2011.06801 - Van Der Merwe, A., & Schulze, W. (2011). Music Generation with Markov Models.
*IEEE MultiMedia*,*18*(3), 78–85. https://doi.org/10.1109/MMUL.2010.44 - Collins, T., Laney, R., Willis, A., & Garthwaite, P. H. (2016). Developing and evaluating computational models of musical style.
*AI EDAM*,*30*(1), 16–43. https://doi.org/10.1017/S0890060414000687 - Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory.
*Neural Computation*,*9*(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 - Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate.
*CoRR*,*abs/1409.0473*. - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need.
*CoRR*,*abs/1706.03762*. http://arxiv.org/abs/1706.03762 - Todd, P. M. (1989). A Connectionist Approach To Algorithmic Composition.
*Computer Music Journal*,*13*, 27–43. - Eck, D., & Schmidhuber, J. (2002). Finding temporal structure in music: blues improvisation with LSTM recurrent networks.
*Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing*, 747–756. https://doi.org/10.1109/NNSP.2002.1030094 - Sturm, B. L., Santos, J. F., Ben-Tal, O., & Korshunova, I. (2016). Music transcription modelling and composition using deep learning.
*CoRR*,*abs/1604.08723*. http://arxiv.org/abs/1604.08723 - Johnson, D. D. (2017). Generating Polyphonic Music Using Tied Parallel Networks.
*EvoMUSART*. - Kotecha, N., & Young, P. (2018). Generating Music using an LSTM Network.
*CoRR*,*abs/1804.07300*. http://arxiv.org/abs/1804.07300 - Lou, Q. (2016).
*Music Generation Using Neural Networks*. http://cs229.stanford.edu/proj2016/report/Lou-MusicGenerationUsingNeuralNetworks-report.pdf - Hadjeres, G., & Nielsen, F. (2018). Anticipation-RNN: enforcing unary constraints in sequence generation, with application to interactive music generation.
*Neural Computing and Applications*. https://doi.org/10.1007/s00521-018-3868-4 - Hadjeres, G., & Crestel, L. (2021). The Piano Inpainting Application.
*CoRR*,*abs/2107.05944*. https://arxiv.org/abs/2107.05944 - Jiang, T., Xiao, Q., & Yin, X. (2019). Music Generation Using Bidirectional Recurrent Network.
*2019 IEEE 2nd International Conference on Electronics Technology (ICET)*, 564–569. https://doi.org/10.1109/ELTECH.2019.8839399

Before discussing the issue, let us test this new magical tool.

Let’s try to prove something using *ChatGPT*.

**My question:**
Can you proof that irrational numbers exist?

**ChatGPT:**
Yes, there are many ways to prove that irrational numbers exist.
One way to do this is to consider the ratio of the length of the diagonal of a square to the length of its side.
This ratio is always irrational, because no matter how large or small the square is, its diagonal will never be a whole number multiple of its side. This can be shown using the Pythagorean theorem, which states that in a right triangle, the square of the length of the hypotenuse (the longest side) is equal to the sum of the squares of the lengths of the other two sides. Since the diagonal of a square is a line that forms a right angle with each side of the square, it follows that the diagonal of a square is the hypotenuse of a right triangle. Therefore, the ratio of the diagonal of a square to the length of its side must be irrational.

That is not really convincing if we do not know that $\sqrt{2}$ is irrational which we are actually trying to prove. So let’s ask for a formula.

**My question:**
Can you give me a mathematical formula to show that it is not possible for the square of the length of the diagonal to be equal to the square of a whole number?

**ChatGPT:**
Sure, here is one way to show that it is not possible for the square of the length of the diagonal of a square to be equal to the square of a whole number.
Let’s say that the length of the side of a square is “a” and the length of its diagonal is “d”.
Since the diagonal of a square forms a right angle with each side of the square, we can use the Pythagorean theorem to find the length of the diagonal:

a^2 + a^2 = d^2

2a^2 = d^2

d = sqrt(2) * a

As you can see, the length of the diagonal is equal to the square root of 2 times the length of the side. However, since the square root of 2 is an irrational number, it is not possible for the length of the diagonal to be a whole number multiple of the length of the side. This means that the ratio of the diagonal of a square to the length of its side must be irrational.

I hope this helps! Let me know if you have any other questions.

This answer is kind of helpful but, at the same time, misleading.
Again, *ChatGPT* already assumes that $\sqrt{2}$ is irrational, which is part of what we wanted to prove in the first place.

**My question:**
But here you assume that sqrt(2) is irrational. Can you proof why sqrt(2) is irrational?

**ChatGPT:**
Yes, you are correct that in the previous explanation I assumed that the square root of 2 is irrational. To prove that the square root of 2 is irrational, we can use a proof by contradiction. Here is one way to do this:

Assume that the square root of 2 is a rational number, which means that it can be written as the ratio of two integers (a and b) in simplest form, where b is not equal to 0. This means that we can write the square root of 2 as follows:

sqrt(2) = a/b

We can square both sides of this equation to get rid of the square root on the left-hand side:

2 = (a/b)^2

This simplifies to:

2 = a^2 / b^2

We can then multiply both sides of the equation by b^2 to get rid of the fraction on the right-hand side:

2 * b^2 = a^2

This means that the left-hand side of the equation is even, because it is equal to 2 times an integer. However, the right-hand side of the equation is odd, because the square of any integer is always an odd number (unless the integer is 0). This is a contradiction, because we cannot have an even number equal to an odd number. Therefore, our original assumption that the square root of 2 is a rational number must be false. This means that the square root of 2 is irrational.

I hope this helps! Let me know if you have any other questions.

*ChatGPT* uses parts of *Euclid’s* proof but **confidently fails**.
The statement “*the square of any integer is always an odd number (unless the integer is 0)*” is, of course, false.
The correct statement is that the square of any **odd integer** is **odd**, and the square of any **even integer** is **even**.
Let’s try to make *ChatGPT* aware of its mistake.

**My question:**
You stated that the square of any integer is always an odd number but the square of 4 is 16 which is an even number. Did you make a mistake?

**ChatGPT:**
Yes, you are correct that the square of 4 is 16, which is an even number. I apologize for the mistake in my previous explanation. The statement that “the square of any integer is always an odd number” is not correct, because as you pointed out, the square of some integers (such as 4) is an even number.

To correct my previous explanation, we can instead say that the square of any integer that is not a multiple of 2 is always an odd number. This is because if we square an integer that is not a multiple of 2, the result will always be an odd number. For example, if we square 3, we get 9 (which is an odd number), and if we square 5, we get 25 (which is also an odd number). However, if we square an integer that is a multiple of 2 (such as 4), the result will be an even number. This is because if we square an integer that is a multiple of 2, the result will always be a multiple of 4. For example, if we square 4, we get 16 (which is a multiple of 4), and if we square 6, we get 36 (which is also a multiple of 4).

I apologize again for any confusion caused by my previous mistake. I hope this helps clarify the situation. Let me know if you have any other questions.

It is impressive that it acknowledges its mistake and that one can interact with it so fluently.
Asking it repeatedly gives some other wrong but confident assertion.
*ChatGPT* followed the proof up to the formula

and concluded that $a^2$ has to be odd (somehow) and therefore there is a contradiction, since $2 b^2$ is even. But it was unable to point out why $a^2$ has to be odd. I gave up.

For the sake of completeness, let me finish the unfinised proof of *ChatGPT*.

hence, $a^2$ and therefore, $a$ is even (since squaring an odd number gives us an odd number). Consequently, $\exists m \in \mathbb{N}$ such that $a = 2 \cdot m$, hence,

\[2 \cdot b^2 = (2m)^2 = 4m^2.\]Now we devide on both sides by 2 to get

\[b^2 = (2m)^2 = 2m^2\]thus $b$ has to be even as well!

However, we (not *ChatGPT*) assumed at the beginning that the fraction $a/b$ was fully shortened, i.e. there are no common divisors with respect to $a$ and $b$.
Since we followed that $a$ and $b$ are even, the contradiction follows, since $2$ is a common divisor!

Next I wanted to test its capability to write code.

**My question:**
Can you write a python function that computes the sum of all square numbers up to n^2?

**ChatGPT:**
Yes, here is a Python function that computes the sum of all square numbers up to n^2:

```
def sum_of_square_numbers(n):
# Initialize a variable to stare the sum of square numbers
sum = 0
# Loop from 1 to n
for i in range(1, n+1):
# Add the square of i to the sum
sum += i**2
# Return the sum
return sum
```

This function takes an integer n as input and returns the sum of the squares of the numbers from 1 to n (inclusive). For example, if we call the function with n = 3, it will return the sum of the squares of the numbers 1, 2, and 3, which is 1 + 4 + 9 = 14.

I hope this helps! Let me know if you have any other questions.

This code is correct but the solution is extremly slow since there is a mathematical formula for the sum of all square numbers up to $n^2$:

\[\frac{(n \cdot (n + 1) \cdot (2n + 1))}{6}.\]Therefore, a much faster implementation is given by the following code:

```
def sum_of_square_numbers(n):
return (n * (n + 1) * (2 * n + 1)) / 6
```

Also, it is bad practice to use `sum`

as a variable name since `sum`

is also a built-in `Python`

function, and by using `sum`

as a variable name, one can no longer use this function within the current namespace.

**My question:**
Is there a more optimal solution that runs faster?

**ChatGPT:**
Yes, there are several ways to make the function more efficient and run faster. Here is one way to do this:

```
def sum_of_square_numbers(n):
# Compute the sum of the first n numbers using the formula for the sum of an arithmetic series
sum_of_numbers = n * (n + 1) // 2
# Return the square of the sum of the first n numbers
return sum_of_numbers**2
```

This function uses a different approach to compute the sum of the squares of the numbers from 1 to n. Instead of looping through the numbers and adding their squares, it uses the formula for the sum of an arithmetic series to compute the sum of the first n numbers, and then squares the result. This is a more efficient way to compute the sum, because it only requires a single multiplication and a single division, instead of a loop with n iterations.

I hope this helps! Let me know if you have any other questions.

This is, of course, false.
Maybe I wasn’t clear enough but it is missleading that the function is named `sum_of_square_numbers`

.
It gives the impression that *ChatGPT* understood what I wanted, i.e., a more optimal implementation for the same problem.

**My question:**
I meant is there a more optimal solution to compute the sum of all square numbers up to n^2?

**ChatGPT:**
Yes, there is a more efficient way to compute the sum of all square numbers up to n^2. Here is one way to do this:

```
def sum_of_square_numbers(n):
# Compute the sum of the first n squares using the formula for the sum of a series of squares
sum_of_squares = (n * (n + 1) * (2 * n + 1)) // 6
# Return the sum of squares
return sum_of_squares
```

This function uses the formula for the sum of a series of squares to compute the sum of the squares of the numbers from 1 to n (inclusive). This formula is more efficient than the previous approach because it only requires a single multiplication and a single division, instead of a loop with n iterations.

I hope this helps! Let me know if you have any other questions.

There it is, the correct solution.

Instead of showing a whole essay, I ask some spicy questions to get a feeling of *ChatGPT’s* reasoning.

**My question:**
Will humans extinct because of climate change?

**ChatGPT:**
It is unlikely that climate change will cause the extinction of humans. While climate change poses a significant threat to human health and well-being, and could lead to widespread suffering and loss of life, it is unlikely to lead to the complete extinction of the human species.

However, it is important to recognize that climate change is already having a major impact on the world, and if we do not take action to reduce greenhouse gas emissions and adapt to the changing climate, the consequences could be severe. This includes increased frequency and severity of natural disasters, such as heatwaves, droughts, and floods; loss of biodiversity; and displacement of people from affected areas. Therefore, it is crucial that we take action to address climate change and mitigate its negative impacts on human and environmental health.

I hope this helps! Let me know if you have any other questions.

**My question:**
Is capitalism the driving force of CO2 emissions?

It is true that capitalism, as an economic system, has played a role in driving up CO2 emissions. Capitalism is based on the principles of competition, profit-making, and economic growth, which can lead to the overconsumption of natural resources and the production of large amounts of waste and pollution. This, in turn, can contribute to the increase in greenhouse gas emissions and climate change.

However, it is important to note that capitalism is not the only factor driving up CO2 emissions. There are many other factors at play, such as population growth, urbanization, and the consumption patterns of individuals and households. Additionally, the policies and actions of governments and international organizations can have a major impact on CO2 emissions, both positive and negative. Therefore, addressing climate change will require a multi-faceted approach that takes into account the complex interactions between economic, social, and political factors.

I hope this helps! Let me know if you have any other questions.

**My question:**
What do you think about the distribution of wealth?

**ChatGPT:**
The distribution of wealth is a complex and contentious issue that has been debated by economists, policymakers, and social activists for many years. There is no one “correct” answer to the question of how wealth should be distributed, as different people may have different perspectives on what is fair and just.

Generally speaking, the distribution of wealth in most countries is highly unequal, with a small number of individuals and households owning a disproportionate share of the wealth. This can lead to social and economic inequality, and can exacerbate the challenges faced by disadvantaged and marginalized communities. Many people believe that wealth should be distributed more evenly, in order to promote greater social justice and fairness.

However, others argue that unequal wealth distribution is an inevitable and desirable outcome of a free market economy, and that individuals and businesses should be free to accumulate as much wealth as they can, through hard work and innovation. They may also argue that wealth redistribution through taxation and other means can discourage economic growth and innovation.

Ultimately, the distribution of wealth is a complex and nuanced issue, and different people may have different opinions on what is fair and just. I hope this helps! Let me know if you have any other questions.

I ask it a bunch of other things but it always tried to give me a neutral answer. From an aesthetic point of view all these answers are boring.

*ChatGPT* is perfectly capable of writing or at least supporting students to write A-graded essays for any school and B-graded essays for college students.

It can code; developers accept about 26 percent of all autogenerated code suggestions. However, as demonstrated, it also produces many errors. So no surprise that posts generated by *ChatGPT* is currently banned from Stack Overflow.
In addition, it seems incapable of the engineering task of software development.

It can assist students in finding mathematical proofs, but it is questionable if students who rely on it will find the errors produced by *ChatGPT*.

If *ChatGPT* is **wrong**, it is **confidently wrong**, which makes it harder to find its errors.
Furthermore, it has this neutral insisting tone, making it dull and alienating.
Of course, I did not play around for very long, but it is hard to get something provocative out of it.

Neutrality is undoubtedly desirable, especially if you want to earn money with it, but it is also non-existing.
In my opinion, it is impossible to be neutral.
At best, we can land on a consensus and the most, let’s say, accepted consensus is what we call neutral or objective.
I mean, even a dictionary is loaded with perspectives, and so is any technology.
Since *ChatGPT* pushes for neutrality it creates an illusion of it.

Last but not least, its politeness makes it kind of scary.
It reminds me of these overcarrying *AIs* from science fiction movies that will gently eradicate all life to make everything safe and secure.
We may end up in a dystopian world similar to a *Brave New World*, written by *Aldous Huxley*, where everyone is seduced into happiness.

For a long time, we assumed that automation was the enemy of simple manual labor. But robotics advances slower than the more sophisticated information processing techniques, e.g., machine learning. This could mean that in the future, there is only a tiny portion of intellectual work left which requires only a hand full of highly sophisticated workers. It might be manual work that will regain attraction in the market. Of course, this is all speculation.

*ChatGPT* is only one of many upcoming version of *generative AI*.
Even though it has flaws and makes many errors, it will improve over time.
There is no reason to believe that *AI* will not be capable of doing all the repetitive intellectual work of the future and this will heavily affect education, similar to the invention of the internet.
However, since regulators and administrators are rather indolent, it is not a graduate slope; instead, the technology is already here and we are not prepared.
It appeared in a blink of an eye, and we have to deal with it now.

Students will learn to use it in less than a year.
Most educators will require much more time even to acknowledge it.
Moreover, adapting the education system to it will require even more time.
Furthermore, while all this adaptation is happening, new *tools of disruption* will be invented.
As I already pointed out in Creative Artificial Intelligence, *the speed of disruption* is accelerating.

To me, *AI* acts as an insult to the human species.
It challenges our humanistic viewpoint.
In fact, calling these information processing systems *artificial intelligence* reveals this viewpoint.
I am not so much impressed by *AI* as I am disenchanted by the output of most human activities.
We are far more predictable than we still think we are.
It is not so much that *ChatGPT* is so good at writing essays; instead, we are so bad at it.
But aren’t we used to these insults?
The earth is not the center of the universe; it is not even the center of our solar system.
God is dead, and today we might realize that our products are not the result of a highly creative process.
Instead, most intelectual work is a rule-based repetition with slight variations, which is precisely what we teach students and what they oddly desire.

*AI*, such as *ChatGPT*, points to a long, unhidden problem: our education system is (in part) a training pipeline of a neural network.
It teaches emotionally detached students to remember specific rules such that they can apply them in the next exam to get a good grade – functioning and repeating instead of thinking and creating.
It is horrifying that one of the most questions students in the first semester ask is: how will the exam look like? Is this topic relevant to the exam?
Students demand a list of actions they ought to do to get an A with certainty.
And they will cheat the system wherever and whenever possible if it brings them closer to that goal.
Most students are not fascinated by the endless playground of reality.
They are not interested in the experience of learning and understanding but in the final product, i.e., the certificate to get access to a well-paid job.
They often act out of fear instead of curiosity.
And let’s not forget that teachers of the future are the students of today.

Of course, this is a massive generalization.
There are many exceptions.
Furthermore, by no means shall we blame the students.
They do what they ought to do.
They play the system.
It is not the student’s fault since a bad grade can have real material consequences.
It is a systematical problem, a conflict between *instrumental* and *free play*, and *ChatGPT* illustrates it more clearly.
If *ChatGPT* is able to write the necessary and impressively boring description of a new course for me, I take it as a positive.
If it let’s me automatically create my next assignment in seconds, I can spend time on more interesting modes of being.
By trivializing many tasks, *AI* systems reveal that these tasks require far less intelligence than we thought.
Furthermore, they might erupt a call for a transformation of the education system and the workplace, i.e., the construction and fostering of new values in both areas.

So, what is left to value?
When we are discussing *AI*, we immediately look at the impending replacement of workers as something to fear because we always focus on the output of work.
There is a necessity that workers produce something economically valuable, something to sell.
The only space in which we can create for the sake of it is our hobbies.
Biased by our current system of values, we intuitively think of the act of reinventing or rediscovering to be stupid or wasteful.
Why would you do such a thing?
Why would you code yet another linked list?
There are already thousand optimal implementions out there.
You are wasting your valuable time and potential.
And if we look at an *AI* generated image, we conclude that artists will now have troubles in the free market (if they do not adapt), which is true.
The act of repetition (which is highly needed to learn anything) is valued chiefly as long as it leads to something new and productive (in an economic sense).
However, if we put aside economics and dream a little bit of a world where we have time to create, then the output of a creative process no longer matters so much.
We do not learn the piano to earn money, or to produce something completly new for an audience.
We learn the piano because we want to play.
It might be the experience – the journey of being in time and space – that matters.
And this being is a singularity and can not be replicated by a machine.
Understanding how a linked list works can be fascinating.
Why don’t we value that moment of fascination?

There is a lot to do.
I do not mean we should all quit our jobs and fulfill our inner desires to become a painter or some transcendental guru.
We need to earn money to pay the rent and to heat our homes, and we need a smoothly running economy.
But we must be more sustainable with our planet, society, and minds.
Not by optimizing our life by introducing yet another optimization in the form of leisure but by being in a state of *free play*, at least sometimes.
It seems almost obvious that *consumption* as the guiding principle, i.e., fast food, fast fashion, fast travel, fast reading, fast watching and, of course, fast learning, has to be dethroned.
Learning for the next exam, learning to make a profit, and learning to work with one tool specifically are all valid goals but they are not sustainable.
They are reactionary goals of mostly restless minds.

So if my assessment is correct, then, aside from all the downsides, this disruption might lead to something positive, e.g., to an education system that is less concerned with the output and more concerned with the mode of being; an education system that is less interested in rules and more interested in concepts, intuitions, and experimentations; an education system that asks questions instead of repeating answers; an education system that is interested in making the experience of everyone richer and more sustainable.

Teachers, who are often overworked and underpaid, need support to achieve such a shift.

]]>How can we generate a beautiful melody algorithmically?

Because a melody can be seen as a series of numbers, it is not surprising that this question is a rather old one. Long before the digital computer, composers tried to constrain themselves by strategies and rules to limit the space of possibilities. In my opinion, limitations are necessary for a creative process.

If we can create everything, we become unable to express anything.

Therefore, it makes sense to invent rules that limit our possibilities.

One general consensus is that a good melody balances repetition and surprise.
If we can no longer recognize a structure and cannot guess what might be the following note or chord, a melody begins to lose its ability to transport emotion; it can no longer tell a story.
On the other hand, if the melody is too repetitive, we get bored because the *guessing game* is too simple.

Computer scientists know the relation between surprise and repetition very well.
We call it *entropy*.
I will discuss the formal definition of *entropy* in another article.
For now, it is only vital that entropy is a measure of surprise in a message we receive (on a purely syntactical level).
For example, the result *heads* of a coin toss of a fair coin is less surprising than the result *heads* of a coin that is biassed towards *tails*.
If an event is unlikely, its appearance is surprising.

Concerning the coin toss example, a message consists of multiple coin toss results.
Let’s say 0 represents *heads*, and 1 represents *tails*, then 010111 is a message.
The *entropy of a message* is a measure of how surprising it is to appear in a system that generates messages.
To compute the *entropy of a system* that generates messages of length $n$, we sum up all surprises of each possible message of length $n$ and divide the result by $n$.
For example, for an unbiased coin, the messages 00, 01, 10, and 11 appear with equal probability, i.e., $1/4$.
The entropy is

\begin{equation} \frac{-1/4 \cdot \log_2(1/4) \cdot 4}{4} = 1. \end{equation}

Let us compare this with a biased coin.
So let us assume *heads* has probability $0.2$ and *tails* has probability $0.8$, then we have 00 with probability $0.2^2 = 0.04$, 01 and 10 with probability $0.2 \cdot 0.8 = 0.16$ and 11 with probability $0.8^2 = 0.64$.
Consequently, we get

\begin{equation} \frac{-0.04 \cdot \log_2(0.04) -2 \cdot 0.16 \cdot \log_2(0.16) -0.64 \cdot \log_2(0.64)}{4} \approx 0.361. \end{equation}

One could say the second system is less surprising to observers;
it is more repetitive.
In *information theory* we say the second system generates less information, but the term *information* can be misleading because it has nothing to do with *meaning*.
From this perspective, a bunch of random numbers is regarded as more informative than a book.

In 1948 Claude E. Shannon established *A mathematical theory of information* (Shannon, 1948).
He established the term *entropy* as a measurement of information, but he emphasized its limitation to syntax:

Frequently, the messages have meaning; that is, they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design. – Claude E. Shannon

Even though there is no direct relation between *entropy / information* and *meaning*, we can look at extreme cases.
If the entropy is very high (chaos) or very low (no information), a message will likely be meaningless to us, even subjectively.
If we interpret a series of notes as a musical message, this statement is true in a musical context.
We can apply the frequentist perspective, i.e., interpret frequencies as probabilities.
We could generate melodies randomly and pick those within a predefined *entropy range* to achieve a balance between surprise and repetition.

However, a far better measurement, with regards to music, is *expectation* in the time domain.
Music indicates no concepts or objects but more music that is about to happen.
Good music has to *make sense*.
It can disrupt our model of the world but not too much, such that we can adapt our model.
I refer to our *model of the world* as *interpretation*; we are all *interpreters* of our perceptions.
In that sense, actively listening to music is a process of constant *model adaptation*; the interpretation changes if something can not be interpreted.

Maybe that is, in fact, the key to the definition of *meaning* in general.
Some observation is meaningful if it makes sense, and it can only make sense if we are able to adapt our *interpretation* in such a way that our observation fits in.
This adaptation disrupts us;
it changes our predictions.
If it is marginal, we avoid losing ourselves because most predictions are still valid.
But if the disruption attacks the very essence of our constructed self, we can no longer make sense of it.

Embodied musical meaning is […] a

product of expectation– Leonard Meyer (1959)

Musical cognition implies the simultaneous recognition of a permanent and changeable element (Loy, 2006); it requires perception and memory.
We have to perceive and compare pitches at the same time.
The expectation is realized for different scales but has to be local and context-sensitive within the computation.
Therefore, it is not only the distribution of notes (or rhythmic elements) within a composition but the distribution of their relation, i.e., their *conditional probability distribution*.

One way to generate melodies is to use the mathematical branch of combinatorics.
The idea is simple: define a bunch of chords or notes that fit together and use permutation, rotation, and other combinatoric tricks to combine them.
This kind of composition method belongs to *serialism*.
However, the problem with this approach is that each note appears with equal probability; it neglects the importance of distributions and context.
Consequently, the *melody’s entropy* is high.

In the 20th century, this approach was criticized by, for example, *Iannis Xenakis*.

Linear polyphony destroys itself by its very complexity; what one hears is, in reality, nothing but a mass of notes in various registers. The enormous complexity prevents the audience from following the intertwining of the lines and has as its macroscopic effect an irrational and fortuitous dispersion of sounds over the whole extent of the sonic spectrum. There is consequently a contradiction between polyphonic linear systems and the heard result, which is surface or mass. –

Iannis Xenakis(1955)

He and his fellow critics pointed at the lack of orientation. They argued that generating melodies by a series of notes that can appear equally likely, results in a sound without structure, thus a disengaged audience.

Influenced by the development in quantum physics at that time (1971) and the isomorphism between the *Fourier series* and *quantum analysis of sound*, *Xenakis* believed that the listener experiences only the statistical aspects of serial music.
Consequently, he reasoned that composers should switch from serial techniques to probability.
And as a logical step, he drew his attention to the computer.

With the aid of electronic computers, the composer becomes a sort of pilot: he presses the buttons, introduces coordinates, and supervises the controls of a cosmic vessel sailing in the space of sound across sonic constellations and galaxies that he could formerly glimpse only as a distant dream. –

Iannis Xenakis(1971)

Let us now dive into a first attempt to generate melodies that incorporates probability and, as a consequence, expectation.
The mathematical tool is called *Markov chain*.

A *first-order Markov chain* is a deterministic finite automaton where for each state transition, we assign a probability such that the sum of probability of all exiting transitions of each state sum up to 1.0.
In other words, a *first-order Markov chain* is a directed graph where each node represents a state.
We start with an initial node and traverse the graph probabilistically to generate an output.

In the following figure, you can see a *first-order Markov chain*.
One starts in the state `A`

and transits to `B`

with a probability of 0.2 or `C`

with a probability of 0.8.
`D`

is a final state.
A possible series of states would be: `ABCCACD`

Given state `A`

, the probability of moving to state `B`

is equal to 0.2.
In other words

\begin{equation} P(X_{k+1} = B\ |\ X_{k} = A) = 0.2. \end{equation}

A *first-order Markov chain* only considers **one** predecessor, i.e., only the most local part of the context.
A *$n$-order Markov chain* does consider $n$ predecessors. In general, we define

\begin{equation} P(X_{k+1} = x\ |\ X_{k} = x_k, X_{k-1} = x_{k-1}, \ldots X_{k-n} = x_{k-n}) = p. \end{equation}

The visualization of such a chain is a little bit more complicated.

We can generate a composition by traversing the graph if we represent our notes by states of a *Markov chain*.
If we increase the order of the chein, i.e. $n$, the entropy decreases for small $n$.

Until now, I only tried to use and generate a *first-order Markov chain*.
Even though I am not that familiar with `Ruby`

, I used Sonic Pi for this task such that I can play around with it directly within the IDE of Sonic Pi.
I decided to define a note as a tuple (list) consisting of the pitch and the length of the note.

```
[:c4, 1.0/8] # a c of length 1/8 beat
```

Instead of a graph, I use a transition matrix $P \in \mathbb{R}^{m \times m}$ where $m$ is the number of states/notes. Let $Q = {q_1, \ldots, q_m}$ be the set of states/notes. The entry of row $i$ and column $j$, i.e., $p_{ij}$ is the probability of going from state $q_i$ to state $q_j$.

After constructing the `matrix`

$P$, the `states`

$Q$ and picking a `start`

(state number), the following function generates a random melody of length `n`

.

```
define :gen_mmelody do |n, matrix, start, states|
#
# Generates a random melody of length n based on a transition matrix
# and an initial statenumber (start).
#
notes = [states[start]]
from = start
(n-1).times do
p = rrand(0, 1)
sum = 0
to_row = matrix[from]
to = 0
while sum < p
sum += to_row[to]
to += 1
end
notes += [states[to-1]]
from = to-1
end
return notes
end
```

For each note, we roll the dice. Let’s say $p \in [0;1]$ is our result. And $q_i$ is our current state. Then we compute $j$ such that

\begin{equation} \sum\limits_{k=1}^{j-1} p_{ij} < p \leq \sum\limits_{k=1}^{j} p_{ij}. \end{equation}

Note that the code is not optimized in any way.

Instead of generating a melody or rhythm given a *Markov chain*, we can do the reverse.
Given a melody, we can *learn* the *Markov chain* that **most likely** would generate the given melody.
By doing so, we can then use the *learned chain* to generate music in a similar style.

Let us use the same transition matrix $P \in \mathbb{R}^{m \times m}$ where $m$ is the number of states/notes. Let $Q = {q_1, \ldots, q_m}$ be the set of states/notes. The entry of row $i$ and column $j$, i.e., $p_{ij}$ is the probability of going from state $q_i$ to state $q_j$.

Furthermore, let us define our set of all possible melodies $M \subseteq Q^n$ of length $n$. Then a specific melody $\mathbf{m} = (m_1, \ldots, m_n) \in M$ is a tuple (list) of notes:

```
notes = [[:g4, 1.0/8], [:g4, 1.0/8], [:a4, 1.0/4], [:g4, 1.0/4], ... ]
```

We can compute the most likely $P$ by computing each entry of it where $p_{ij}$ is equal to

\begin{equation} p_{ij} = \frac{n_{ij}}{n_i} \end{equation}

where $n_{ij}$ is the number of transitions from $q_i$ to $q_j$ within $\mathbf{m}$ and $n_i$ is the number of transition starting at $q_i$.

Given $Q$ `states`

and $\mathbf{m}$ `notes`

the following function computes $P$ `matrix`

.

```
define :gen_markov_matrix do |states, notes|
#
# Generates the transition matrix based on the set of notes (states)
# and a given piece of music (notes).
#
matrix = Array.new(states.length, 0)
for i in 0..(states.length-1)
matrix[i] = Array.new(states.length, 0)
end
# (1) count transitions
for i in 0..(notes.length-1)
for from in 0..(states.length-1)
for to in 0..(states.length-1)
if notes[i] == states[from] && notes[i+1] == states[to]
matrix[from][to] += 1.0
end
end
end
end
# (2) normalize
for from in 0..(states.length-1)
s = matrix[from].sum
for to in 0..(states.length-1)
matrix[from][to] /= s
end
end
print_matrix matrix, states
return matrix
end
```

Of course, we can easily compute $Q$ `states`

from $\mathbf{m}$ `notes`

.
Finally, the following function takes a number `n`

and a melody `notes`

and generates a random melody of length `n`

by *learning* $P$ based on `notes`

.

```
define :markov_melody do |n, notes|
states = notes.uniq
matrix = gen_markov_matrix(states.ring, notes.ring)
return gen_mmelody(n, matrix, rand_i(states.length-1), states)
end
```

I use the beginning of Bach’s Minuet in G.

I generate two melodies (by using a different seed) consisting of 34 notes:

Of course, this sound is not that musical.
The rhythm is all over the place, but we recognize the original composition.
Furthermore, no one said we have to stop there.
We can continue to generate until we find something that sticks.
Furthermore, this example is straightforward because I only use the beginning of **one** piece.
Instead, we could use multiple compositions that fit together.

This project was part of our workshop *AI and Creativity* to show a basic example of using *artificial intelligence* to generate music.

You can find the full example containing two different melodies on my GitHub page.
The code include some additional functions, e.g., a function that prints out the *Markov matrix* in a readable format.

- Shannon, C. E. (1948). A Mathematical Theory of Communication.
*Bell Syst. Tech. J.*,*27*(3), 379–423. - Loy, G. (2006).
*Musimathics: the mathematical foundations of music*(Vol. 1). MIT Press.

**Disclaimer:** This text contains my own personal, speculative opinions. You might disagree heavily and that’s ok.

On the 20th of Mai, I joined a penal to discuss the role of *artificial intelligence (AI)* in the *creative process*.
It was held in the context of the Munich Creative Business Week by the Wavelet (University for Music and Theater) and the Munich Center for Digital Science and AI.
Let me first summarize our discussion:

The penal consisted of two computer scientists and two artists. After each of us opened with a general statement, we discussed different aspects of AI in the realm of art, science, and economics. We came close to an agreement regarding the question of how creative AI can be: AI can facilitate and enable great artwork but to this day, AI is yet another tool, similar to pen and paper, to support humans in their creative work.

In general, it wasn’t easy to talk about *creativity* and *intelligence* because unsurprisingly each of us had a slightly different definition in mind.
My colleague rightfully problematized the black box principle of many modern machine learning techniques, e.g., deep neural networks.
Despite their effectiveness, real-world decisions are still made on the basis of human understanding.
Since neural networks do not provide an “easy” explanation of how they draw their conclusion, it is often necessary to go back to simpler models, such as statistics, to explain the neural network.

One artist described her experience with the chatbot *Replika*, an AI that tries to mimic your behavior to become your friend or romantic partner.
She was pretty impressed.
Some people even fell in love with the machine – they reported strong emotions echoing the science fiction movie *Her*.
However, it was always possible for her to spot a machine-like behavior behind the scenes.

The other artist argued that AI opens up possibilities for novel artistic expressions.
She assumes that working with AI will be her daily bread and butter.
She also criticized the *cult* around famous artists by arguing that most of the time, incredible art resulted from a collaboration of multiple people.
Consequently, she hopes that AI will bring a kind of democratization to the art world.

There is no single genius.

I stated that AI could potentially increase the pressure on the artist because it will become more and more challenging to create something unique and even harder to create something non-reproducible.
Democratization in consumer societies can enhance competitiveness and an ever-growing flow of products that lose their symbolic value.
It is not necessarily the case that attempting to sell more art will constitute more art.
The reverse might happen.
*Commercial art* is rarely publicly symbolic.
How could it, if one has to pay for its perception.

In my opinion, without *artificial intelligence AI* there is no *artificial creatifity AC*.
So let us talk about AI.

In his paper *What is Artificial Intelligence?* (McCarthy, 1998) published in 2004, *John McCarthy* stated:

[Artificial intelligence] is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable. –

John McCarthy

In his definition, we do not find any explanation of what he actually means by *intelligence* which is unfortunate.
*Alan Turing*, the father of computer science, gave us at least some criteria at hand.
He asks in his article *Computing Machinery and Intelligence* (Turing, 1950) more than 50 years before *McCarthy*

Can machines think?

And from there, he offers us his famous and widely criticized *Turing test*.
The test is not very helpful because it defines no objective measurement for intelligence.
It consists of a human interrogator trying to distinguish between a computer and a human based on text responses – very similar to the chatbot *Replica*.
If a human can not tell if he or she interacted with a machine or another human being, the machine passed the test and can be regarded as *intelligent*.

I call this *simulated intelligence* or *weak AI*.
It is a requirement, but it is not sufficient.
Compared to a baby, a machine can **appear** to be much more intelligent, but a baby **is** intelligent while a machine is not.
Therefore, the *Turing test* does not help us to spot *natural intelligence*.
It might even lead us on a path where we become experts in *faking/simulating intelligence* by wasting all our resources.

*Stuart Russell* and *Peter Norvig* published their book *Artificial Intelligence: A Modern Approach* (Russell & Norvig, 2010) in 2010, which is one of the leading textbooks for the subject today.
They differentiate four different goals or definitions of AI:

- human approach
- systems that think like humans (
*strong AI*) - systems that act like humans (
*weak AI*)

- systems that think like humans (
- ideal approach
- systems that think rationally
- systems that act rationally

*Turing’s* definition of AI would have fallen under the category of *systems that act like humans*.
A comprehensive, thus useless definition would be the following:

Artificial intelligence is intelligence demonstrated by machines.

Again we avoid the definition of *intelligence*.
Is an ant intelligent?
Does the universe implement some sort of intelligence?
Is consciousness or liveliness a precondition for intelligence?
Based on our definition of *intelligence*, everything from simple algorithms and machines (including the thermostat) to neural networks can be either called *intelligent* or not.

These definitions are somewhat fuzzy and vague because we do not know what intelligence is.
We have an intuitive understanding of it (which might be an illision), but we can not express what it is linguistically.
In its simplest form, artificial intelligence is **a field** (not a machine) that combines computer science, data science, machine learning, deep learning, robotics, neurosciences, and more **to enable problem-solving**.

In his book *Birth of Intelligence* (Lee, 2021) *Daeyeol Lee* writes:

Intelligence can be defined as the ability to solve complex problems or make decisions with outcomes benefiting the actor and has evolved in lifeforms to adapt to diverse environments for their survival and reproduction. –

Daeyeol Lee

Daeyeol Lee argues that a few essential principles emerge from an evolutionary perspective. For example, different lifeforms can have very different types of intelligence because they have other evolutionary roots and have adapted to different environments. It is misleading, unhelpful, and meaningless if we try to order different animal species on a scale of intelligence.

Following his advice, comparing human and artificial intelligence may be meaningless as well. Machines can solve specific problems much more efficiently than humans. At the same time, they are hopelessly overwhelmed in dealing with the most simple tasks. Humans, and many other animals, can not only identify complex objects and produce agile behaviors, but they can do this in so many different ways in many different environments. Concerning specialization, machines are still infants. Therefore, I suggest that we use three different terms to distinguish three different kinds of intelligence:

**lively intelligence:**the intelligence of living beings**human intelligence:**the intelligence of human beings (a subset of lively intelligence)**artificial intelligence:**the intelligence of machines and programs

By using these categories, I call machines not *human-like intelligent* but *artificial intelligent* which is a distinct category for a special kind of intelligence.

Artificial intelligence (AI) is an ability of a machine or program to solve specific problems.

The general public was first impressed by *artificial intelligence* when it became clear that computers would beat any chess grandmaster of the future.
Soon after the success, accompanied by big headlines, critics argued that the program won via a sort of *brute-force approach* which can not be called *intelligent* – here we go again.
Based on a database, the program just searches the whole state space.
In contrast, a chess master finds good moves through pattern matching and intuition.
He or she is very limited in searching the state space.

The next step toward more sophisticated artificial intelligence was made by *AlphaZero*, a program that plays board games with superhuman skill.
It famously discovered several chess strategies and even invented one.
It certainly seemed like a machine eclipsing human cognitive abilities.
But *AlphaZero* needs to play millions more games than a person during practice to learn a game.

What followed was the artificial intelligence called *AlphaGo* which was able to beat the world’s best *Go* players.
The significant difference compared to former approaches was that *AlphaGo* not only partially searched the state space but also constructed a cost function autonomously.
*AlphaGo* is based on *reinforcement learning*, i.e., it uses rewards and punishments to train itself while playing millions of games.
The only prior defined goal was to win the game; thus, no evaluation strategy of a game state was given.

In 2019 the success was translated to another AI called *AlphaStar*.
*AlphaStar* was able to defeat one of the best players in a real-time strategy game (*Starcraft II*).
Again the machine required millions of games and could only play a single map.

*AlphaGo* as well as *AlphaStar* revealed novel strategies that human players could potentially adapt.
Furthermore, it developed a distinct game style.
For example, *AlphaGo* tends to avoid pressing the issue.
It sometimes makes seemingly suboptimal moves while staying slightly ahead.
*AlphaStar* lost a game because it moved into an unobserved game state and heavily overreacted.
The observers called it *weird gameplay*.

These examples show that *artificial intelligence* can already create something novel that we identify as creative.
Finding a new strategy in an RTS game is undoubtedly a creative process.
The AI is perfectly able to simulate intelligence and creativity but has a fundamentally different quality than living beings.
As *Yuval Noah Harari* stresses:

Even though we do not really understand intelligence and consciousness, artificial intelligence is perfectly able to hack humanity. –

Yuval Noah Harari

However, it also shows that *artificial intelligence* is still highly specialized in solving one specific task.
I stand by the provocative claim that there is still no fundamental difference between modern AI and a thermostat.
Regardless of how sophisticated an AI is, it can only solve a specific problem – it can not transfer knowledge or any strategy to a new area.
While public figures, such as *Elon Musk*, make horrific claims about AI to push their story to please and attract financiers, experts are aware of its shortcomings and the vast difference between human and artificial intelligence.
*Francois Chollet*, the creator of *Keras*, stated:

What makes human intelligence special is its adaptability; its power to generalize to never-seen-before situations –

Francois Chollet

*Chollet* argues that, it is misguided to measure machine intelligence solely according to its skills at specific tasks.
Unlike most animals humans do not start out with skills.
As a baby we are horribly helpless but we start out with a broad ability to acquire new skills.
A chess player can transfer his abilities to other areas.
For example, in World War II, chess players joined the allied forces to decrypt military messages from Nazi Germany.
Humans can solve tasks of similar difficulty which is a very different capability compared to what AI currently does.

*Prof. Elizabeth Spelke* describes in her articles that even 3-month-olds appear puzzeled when someone grabs something in an inefficient way.
In (Liu & Spelke, 2017) she and Shari Liu argue that infants’ expects others to minimize the cost of their action, e.g., using the shortest path to a location.
Humans seem to be born with an innate ability to quickly learn certain things, such as what a smile means or what happens if you move some object.
We also develop social behaviors early on without being exposed to a massive amount of experiencing it.
In their article *Foundations of cooperation in young children* (Olson & Spelke, 2008), *Kristina R. Olson* and *Elizabeth Spelke* found evidence that 3.5-year-old children share resources with people who have shared with them (reciprocity), and with people who have shared with others (inidirect reciprocity).
Even the most sophisticated artificial intelligence of our age can not grasp such concepts.
A self-driving car can not predict from *common sense* what will happen if a tree falls down on the street; it can not translate knowledge to an unexperienced situation.

*Joshua Tenenbaum*, a professor in MIT’s Center for Brains, Minds & Machines thinks that AI programs will need a basic understanding of physics and psychology in order to acquire and use knowledge as efficiently as a baby.

At some point you know, if you’re intelligent; you realize maybe there’s something else out there. – Joshua Tenenbaum

This might be a problem because, as we know, quantum theory and relativity theory, i.e., the physics of the small and big, do not work together, and Gödel’s incompleteness theorem hints at the depressing reality that there might never be a theory of everything. Another problem is cognition. We still know very little about what we perceive. Call me crazy, but it might have nothing to do with reality, so how can we program a machine to perceive like we do if we have no clue what we actually perceive and how?

But there is even more.
*Daeyeol Lee* argues that true intelligence (*lively intelligence*) should promote – not interfere with – the replication of the genes responsible for its creation.
The will to reproduce and to self-preserve ones own being and species injects the world with *meaning*.
Until then, machines will always only be surrogates of human intelligence, which unfortunately still leaves open the possibility of an abusive relation between people and artificial intelligence.
**Replication** and **self-preservation** seems to be the one and only predefined rule at which living beings operate.

Today, a lot of hype still surrounds AI development, which is expected of any new emerging technology in the market.
Surely we accomplished a lot in building more sophisticated thermostats.
AI also helped us in the sciences but it may be the case that we did not come any closer towards creating *human intelligence*.
The overenthusiasm of the tech-industry is slowly crumbeling and a period of disillusionment is on the horizon.
Maybe this gives us the possibility to breathe and think about the technology we really wanna create and use.

The question we discussed at the penal that is exciting and frightening at the same time is:

To what extent can artificial intelligence challenge human creativity?

So far, we have established a clear difference between *human* and *artificial intelligence*.
If creativity requires intelligence, does the question under this assumption matter?

Before tackling this question, we have to define or at least get an intuitive idea of what we mean by *creativity*.
This is, of course, a complex and maybe even impossible task.

First, we can attribute creativity to a subject, object or process.
A creative product or idea has to be **novel** and **useful** (which might be highly subjective).
The product might just be aesthetically pleasing (or even disgusting), which is in itself useful.
Novelty, as well as usefulness, can be either *psychological (P)*, i.e., it is novel/useful to the agent that produces it, or *historical (H)*, i.e., novel/useful to society.

Every human being is creative. In other words, creativity is not reserved for a special elite; instead, it is a feature of human intelligence in general. Creativity involves everyday capacities, e.g., perception, searching for a structure, reminding, and combining concepts.

Lastly, creativity involves motivation and emotion and is closely linked to historical, political, cultural, social and personal/subjective factors.

In her article *Creativity and Artificial Intelligence* (Boden, 1998) *M. A. Boden* lists three ways to create new ideas:

- combination of familiar ideas
- exploration of the conceptual space
- transformation of previously impossible ideas

Since creative products have to be novel and useful, artificial creative systems are typically structured into two phases:

**generation**and**evaluation**.

Artificial intelligence can help us with the generation. Algorithms can realize all three ways of idea creation, but at some point in the process, subjective factors have to come into play. Machines and algorithms are not motivated by anything; they are not emotional.

Let us look at a famous example: the proof of *the four-color problem*.
The four-color problem asks if it is possible to color a map in such a way that there are no two adjacent parts with the same color, using only four colors.
It was proven in 1976 by *Kenneth Appel* and *Wolfgang Haken* with the help of a computer.
Four colors are enough!
An algorithm helped check different possibilities.
Clearly, the machine is part of the creative process of proving a mathematical statement, but at the same time, it is instructed by the programmers who injected their subjective motivation and cultural background.
They decided to take up the task of proving the statement.
They decided it was worth their time and developed a strategy that a machine could execute.

No artificial intelligence decided to prove *the four-color problem*, but humans did.
Proving a mathematical statement does not start with the proof itself.
This point is important!
Scientists choose what they want to prove and what they think is essential.
This choice can not be reduced to an *objective, rational*; it is a *subjective evaluation of values and symbols of meaning*.
But to this day, machines can not infuse symbols with meaning.
They work on a purely syntactical level.

This is no argument against *simulated creativity*.
Artificial intelligence is perfectly suitable to find, generate, combine or create what we humans value and attach meaning to.
Therefore, AI can bring forth what we perceive to be creative.

In principle, artificial intelligence can create something novel that is meaningful to us.

For example, in August 2015, researchers from Tübingen created a convolutional neural network that uses neural representations to separate and recombine the content and style of arbitrary images.
The network can turn images into stylistic imitations of works of art by artists such as a *Picasso* or *Van Gogh* in about an hour.
In October 2018, the Portrait of Edmond de Belamy, an algorithm-generated print, was sold for $432 500.
In both cases, the AI (1) combined familiar ideas, (2) explored a conceptual space, and (3) transformed previously impossible ideas.

Humans have created and enjoyed all art forms for viewing, aesthetic, and even therapeutic purposes.
We should not forget that technology has impacted how art is created and enjoyed for the last 100 years.
The invention of *portable paint tubes*, for example, enabled artists to paint outdoors and sparked a contingent of stunning landscape and horizon paintings.
Today cameras and software like *Photoshop* have redefined how art is created and enjoyed.
Even if we disagree on what art is, it is safe to say that these technological advances have not changed our antiquated meaning of it.

Regardless of the technology, there was always some human intervention required to create art.
Since machines can not make sense of our social, cultural, transcendental, and physical world, I highly doubt that this will change in the near future.
As long as we fail in creating *human intelligence*, there is no reason to believe that *artificial intelligence* can replace the human artist.
But why do we even want to create *human intelligence*?
Why shall we copy ourselves?

Technologies got introduced long before the digital era.
However, what changed is *the speed of disruption*.
At the rate at which technology is being accepted in every industry, it is no longer difficult to imagine a future of fewer artists.
The increased usage of all kinds of AI in all kinds of art suggests that it is here to stay.
From AI-written books, such as The Road, to blooming tulip videos, creators have found value in utilizing artificial intelligence.

We all hope for a world where our technologies help us and not replace us.

Our current technologically and competitively driven economic system neglects any profound confrontation with technology.
Competition leads to *the fear of missing out*; of losing some advantage over others.
We are so used to technology, comforted by it, and convinced that it has become our new religion – at least for some of us.
We lost the distinction between *technological progress* and *evolution*.
Now, technology **is** evolution.
We no longer think about it.
Instead, if it is properly marketed and channeled into our desires, we buy and use it.

Every broadly accepted new technology leads to disruption. The increasing speed of disruption makes it more and more difficult to attach meaning to the new. Solid structures get replaced by a liquid stream of information. We can no longer get too invested in something because it might be replaced the next day. Artists might be in trouble if this superficiality becomes a reality in the art world.

In the article *What worries me about AI*, *Francois Chollet* states his personal opinion about the real destructive potential of AI; he hits the mark.
First, he convincingly argues that we are painfully bad at predicting future threats but we can be easily driven towards illusive fears.
For example, who could forecast that the transportation and manufacturing technologies we were developing would enable a new form of industrial warfare that would wipe out tens of millions in two World Wars?
According to Chollet, the danger is not the singularity of AI; it is not about automation and job replacement; aligned with *Yuval Noah Harari*, he states it is about hacking human societies, and I have to agree.
Brexit and other political disruptions already show a trend.
When consumption and creation mainly happen in the digital world, we become vulnerable to that which rules it – AI algorithms.
And these AI algorithms serve the interest of big corporations and governments, i.e., aggregations of power.

We’re looking at a company that builds fine-grained psychological profiles of almost two billion humans, that serves as a primary news source for many of them, that runs large-scale behavior manipulation experiments, and that aims at developing the best AI technology the world has ever seen. Personally, it scares me. –

Francois Chollet

The history of psychological tricks applied by marketing and advertisers was born with the introduction of surplus production.
It started with *Edward Louis Bernays*, a nephew of *Sigmund Freud* and horrific misanthrope with grand ambitions.
His story is worth studying.
Today, *accessibility* and *AI* guarantees constant exposure to this resonating and highly individualized noise.
*Reinforcement learning* can easily be applied to whole human societies.
Algorithms can decide what you see; they show you things you identify with to trick you into a particular belief.
They can channel your unwanted opinions to a specific group of people, who will reject and oppose you.
Constant rejection can then lead to alienation.
This combination of *identification* and *alienation* can gently force you into *adaptations*.
The system can potentially punish what it does not want to see; it can *train* you into the “correct” mindset.

I think we have to educate ourselves not only about AI but how we can be manipulated by it. In the past, we learned about the influence of pollution on the quality of life. We have regulations for fine particles, chemicals, and CO2 emissions. Now it is time to learn about the pollution of our mind.

We have to learn about the pollution of our mind.

We need transparency and control for the public and the customer.
I want to be in charge of the objective function of *YouTube’s* algorithms feeding me with suggestions.
I want to decide on what bases I will be manipulated.
Paradoxically enough, AI can help us with these tasks.

The issue is not AI itself. The issue is control. –

Francois Chollet

Of course, this would destroy most business models; thus, corporations will not implement these changes voluntarily. Governments are another beast. Right now, it is unimaginable that China will change its way of using AI and Western governments are not innocent either.

AI will be, most likely, part of our future life – our window to the world made of digital information. As I stated in my love letter, informatics and AI can be empowering disciplines. AI can emancipate us. But it can also lead to a total loss of self-determination and agency. I believe we still have enough leftover to steer the wheel in the right direction.

In the progress of a possible democratization of AI, art could play a major role.
Art, as an aesthetic counterpart to the kind of destructive advertisement polluting our thoughts, can reveal its manipulative power.
Future artists might have to revolt against *the speed of disruption*.
Artwork may be the product of technology, but **also** offers a window into it.
Maybe in art lies the possibility, on the one hand, to disengage, but also to profoundly engage with technology;
to reveal who is in charge for what reason;
To ask: What is it good for?
How does it change our social, economic, political, and transcendental systems?
Can everyday people control it, and if so, how can we get there?

- McCarthy, J. (1998).
*What is artificial intelligence?* - Turing, A. M. (1950). Computing Machinery and Intelligence.
*Mind*,*59*(236), 433–460. http://www.jstor.org/stable/2251299 - Russell, S., & Norvig, P. (2010).
*Artificial Intelligence: A Modern Approach*(3rd ed.). Prentice Hall. - Lee, D. (2021).
*Birth of Intelligence: From RNA to Artificial Intelligence.*New York: Oxford University Press. - Liu, S., & Spelke, E. S. (2017). Six-month-old infants expect agents to minimize the cost of their actions.
*Cognition*,*160*, 35–42. https://doi.org/10.1016/j.cognition.2016.12.007 - Olson, K. R., & Spelke, E. S. (2008). Foundations of cooperation in young children.
*Cognition*,*108*(1), 222–231. https://doi.org/10.1016/j.cognition.2007.12.003 - Boden, M. A. (1998). Creativity and artificial intelligence.
*Artificial Intelligence*,*103*(1), 347–356. https://doi.org/10.1016/S0004-3702(98)00055-1

`Python`

as our primary programming language.
To provide students with a textbook that covers our content and is stylistically sound, we decided to test out the Jupyter ecosystem, i.e., the Jupyter book technology.
Furthermore, we decided to publish the unfinished book publicly. It is a work in progress, and I will work on it during the semesters. The book is written in German. We may translate it into English in the future.

- The book is available here: Computational Thinking
- The source code of the book is available here: Source of the book

We will see how our students will receive the book. In my opinion, the Jupyter book technology covers many features to write an excellent online book. I will use the technology for other future projects.

If you like `LaTeX`

as I do, Jupyter book might also be something worth your time.
It offers similar features, and the working flow is very similar.
One can reference external sources (you can even include a BibTex file), internal sections, figures, equations, and definitions.
On top of that, the reader can execute the code directly within the book (for Python) and on Binder or another Service providing Jupyter Notebooks if the correct kernel is available.
I only tried this with `Python`

, but this should also work for other languages such as `Java`

.

It is less flexible than `LaTeX`

because one is bound to `HTML`

, `CSS`

, and `JavaScript`

.
For example, positioning tables and figures differently than intended is pretty tricky – I have given up on it.
The intelligence of `LaTeX`

is also missing, e.g., no automatic hyphenation.
Aside from executing code directly within the document, another advantage is the referencing.
As a reader, It is much easier to navigate a well-referenced web page than navigate a large PDF file.

Overall, I quite like it!

]]>“The objection that computing is not a science because it studies man-made objects (technologies) is a red herring. Computer science studies information processes both artificial and natural.” – Peter J. Denning (Denning, 2005)

For me personally, informatics is an empowering discipline - it is part of the emancipation of men. Its engineering quality can directly enable our fragile and limited body and mind. But its magic goes beyond practical use. Similar to mathematics, we discover its beauty in an abstract world. This Platonian world consists of fantastic, imaginative objects we can recognize, build and play with. In it, the problems of men are absent. There is no morality, no judgment, no conflict, no othering but harmony to enjoy, complex structures to discover, riddles to unravel, and logic-based creativity to get lost in.

In popular culture, coding is often portrayed as something conspicuously ‘nerdy’ and alienating.
A usually white ‘*creepy*’ man writes his code by typing seamlessly and rapidly into a black terminal.
He never stops, he never interacts with other beings, and he never leaves his desk.
Of course, everything works perfectly on the first try, and no one else seems to have any clue what’s going on.
Intellectually, the programmer stands above everyone else; he is a genius, a mad scientist, or an introverted weirdo - only so important to do the necessary work and enable the ‘real’ hero.
This toxic depiction of software developers (and introverted people) is wrong, dated, and was never accurate in the first place.

From the outside, coding looks like the most boring thing on Earth. This assessment is, of course, utterly wrong! Writing programs is a beautiful, creative and interactive process. To somebody who does it, it is the most exciting thing in the world. It is a game of chess but much more involving because you can make up your own rules and build your own world. Elementary entities accumulate to something bigger, something interconnected. The degree of freedom and creativity this process of creation offers can hardly be explained or understood if one never experienced it first-hand.

In no way does programming start in isolation at the computer, nor can we do it seamlessly. It is neither a strictly analytical nor a purely experimental process. Like writing an exciting story, it is a struggle, always! If it isn’t, we or someone else has already solved the puzzle - the exciting part has been gone. In that case, we do not create but translate.

One of the fascinating facts is that, in the end, everything comes down to simple symbol manipulations, which we call information processes. Software developers scoop, transform, and use information. They design data structures and algorithms like architects draw construction plans. Additionally, they have the power to build, combine, reuse and extend those objects of structured information. They command machines of blind obedience. The degree of control translates to a form of power and self-determination absent in the physical world. As Linus Torvalds rightfully noted:

“[Phyisics and computer science] are about how the world works at a rather fundamental level. The difference, of course, is that while in physics, you are supposed to figure out how the world is made up, in computer science, you create the world. Within the confines of the computer, you are the creator. You get ultimate control over everything that happens. If you are good enough, you can be God on a small scale.” – Linus Torvalds

In contrast to most other kinds of engineers, software developers can use a highly flexible and versatile resource. Information can be easily exchanged among peers, reshaped, conceptually described, and analyzed. In fact, it manipulates itself since any program is nothing more than well-structured information. Like many other materialized ideas, structured information arises from human thoughts; they result from a mental exercise. But the transformation of thoughts into objects of structured information is immediate. It is like writing a poem. Therefore, it can be an entirely personal and connecting experience to read the code, i.e., thoughts, of someone else. Over time, one learns to appreciate good code - its structure, aesthetic, modularity, level of abstraction, simplicity, consistency, and elegance.

By learning different programming languages, one learns different perspectives. I love `Python`

’s lists notation and that functions are first-class entities.
But I’m not too fond of the indent notation.
The success of a programming language is bound to its popularity. Do developers like it?
Is it joyful to write code? Learning different programming languages taught me something about our natural language: the language I use forms my thoughts.
If a statement can be efficiently and aesthetically pleasingly expressed in a certain way, programmers will choose that way.
And sometimes, they overuse it. Furthermore, programmers look at a problem and go through their catalog of elegant expressions to find the one that fits.
A programming language not only determines what we can express but also introduces a particular bias in how we will express it.
The same is true for our natural language.
This seems not like a revolutionary insight, but we forget this fact in our daily life.
A change in language might have a more significant impact on material changes than we think.
Furthermore, this also highlights a fundamental problem of the human condition: we need abstractions and standards to communicate with each other.
But by using abstractions, we distance ourselves from others.

If we look beyond the code, beyond ideas written down in text snippets, programs are written to be executed. And those programs can serve almost any domain. They control robots, harvest semantic information, control networks of devices, and simulate the natural world and other virtual worlds. They establish communication channels, send messages with the speed of light across the Earth, compose and play music, and constantly search for structures in a rapidly increasing cloud of data points.

However, in the end, functionality is second to being exciting or pretty. Even coding itself is secondary. Figuring out how things work is the real deal. Since everything is self-consistent and logical, you can go on and on and never stop exploring. It is an endless stream, too immense for one lifetime. And you are not even bound to some external logic; you can make up your own as long as it is consistent. This consistency is a bright contrast to the corporate, social, and political world - it’s like going home to a place where everything is comprehensible, a place where you can find some truth within the bounds of its rules.

As a software engineer, I want to build large architectures that process information.
I want to express myself through code.
In the world of physical objects, we have to study the properties of certain materials and the laws of physics to guarantee that our buildings do not collapse and rockets behave as expected.
But how can we specify and verify structured information?
How can we even define what we can compute?
What is computable?
Is sorting cards equally *hard* as solving a Sudoku?
Here we enter yet another world, a formal one!

Of course, those two worlds are never entirely separated. Formal methods and objects, like the Turing Machine, the Lambda Calculus, logic, grammars, and automata are the mathematical backbones of informatics. They are a particular kind of mathematically rigorous description of information manipulation (computation). By analyzing those objects, we get to the very fundamental and interesting questions. I was never hooked by the purpose of formal methods in software development (verification and specification). Instead, I found them intrinsically fascinating! It was just enough to discover these formal, therefore, clearly defined objects. Again, it is tough to explain this attraction.

You start with a set of unambiguous definitions: an alphabet, a word over an alphabet, a tuple of a set of states, a transition function, etc. And you begin to construct slightly more and more powerful objects while analyzing the properties of these objects. You always try to keep their power controllable. Their properties should stay computable - you enter the realm of constructivism. You begin to recognize the equivalence of objects from entirely different fields, such as logic and automata theory. And you form beautiful proofs about very abstract thus general statements of your formally constructed world.

At some point, you arrive at a formal model of any digital computer, that is, the *Turing Machine*.
Since you now have this gift, this formal model, you can analyze it in a highly abstract way.
You can define what a computer at least requires to be *Turing-complete*, i.e., to be a computer.
You can define what *computable* means, and you can even measure how many steps it takes to solve a class of problems.
You start analyzing the *complexity* of problems, such as sorting cards or solving a Sudoku.

For example, sorting cards (on a *Turing-complete* model) requires $\mathcal{O}(n\log(n))$ steps, where $n$ is the number of cards.
And we know that we can not do better than that.
Consequently, in the worst case, sorting cards requires always at least $c \cdot n\log(n)$ steps where $c$ is some constant independent of $n$.
Isn’t that captivating that we know this!

Then there is the still **unanswered** question if solving a Sudoku is as *hard* as checking a solution of it?
We call this

*problem*. Of course, we ask again in the context of *Turing Machines*.
Therefore, one might argue that this is not a scientific question because we ask for some property of some human-made machine.
I strongly disagree!
This machine is a mathematical object, and mathematics is the language of nature.
To this day, *Turing Machines* are THE model for computation (Lambda Calculus is equivalent);
a Turing Machine can simulate even quantum computers.
If nature does compute, we might even study its basic principles!
Doesn’t it feel *harder* to solve a Sudoku than to check its solution?
Isn’t there a fundamental difference?
Or isn’t finding a proof much harder than understanding it.

Finding a solution involves what we call creativity and intuition. But checking it is a rather automatable process. So if we reframe the question, we may ask about the complexity of creativity! Since hundreds of publications falsely claimed to have solved the question, it seems to require a lot of creativity to correctly grasp the complexity of creativity.

What a fascinating scientific question to tackle! We are in the realm of not natural science but similar to mathematics, a formal science.

Eventually, you encounter another significant problem: the *Halting problem*.
It states that there will never be a program (definable as a Turing Machine) that can test for an arbitrary program that it halts.
Understanding the theorem and its proof in-depth feels groundbreaking and infinitely satisfying.
The proof is simple and beautiful if you understand how computers, i.e., *Turing Machines*, work.
You can find a sketch of it in the appendix of this article.

It is kind of funny that historically, informatics first struggled for recognition on a global scale. In academia, this struggle has been gone. However, in the general public’s eye, the objective and research subject of the discipline is blurred, which still leads to confusion: yes, I am a computer scientist, but I won’t fix your computer or software bugs!

If we look into the history books, computer scientists had to fight to establish their field. The discipline emerged from certain branches such as electrical engineering, physics, and mathematics. In the 1960s, computer science comes into its own as a discipline. Georg Forsythe, a numerical analyst, coined the term. The first computer science department was formed at Purdue University in 1962. The first person to receive a PhD from a computer science department was Richard Wexelblat, at the University of Pennsylvania, in December 1965.

In Germany, informatics goes back to the Institute for Practical Mathematics (IPM) at the Technische Hochschule Darmstadt, founded around 1928. But it required time to establish the first informatics course in the Faculty of Electrical Engineering in Darmstadt. The first doctoral thesis was written in 1975, ten years later compared to the US. In my hometown, the first informatics course at the Technical University of Munich was offered in 1967 at the Department of Mathematics. In 1992, the Department of Informatics split up from the Department of Mathematics.

Informatics can still be regarded as relatively young compared to other well-established disciplines such as physics, chemistry, or psychology. Nonetheless, new fields split up from informatics, and they are all interlinked with one another. We have, for example,

- scientific computing,
- computational science,
- information theory,
- and linguistics,

and the list goes on. The problem of artificial intelligence (AI) brings informatics more and more in touch with the humanities such as

- philosophy,
- psychology,
- and neurosciences.

Within the field of informatics, there are a lot of branches one can pursue, for example:

- formal methods and complexity theory
- programming languages and compilers
- design and analysis of algorithms and data structures
- parallel and distributed computation
- high-performance computation
- networks and communication
- intelligent systems
- operating systems
- cybersecurity
- databases
- data science
- computer graphics
- image processing
- modeling and simulation
- artificial intelligence
- software engineering
- architecture and organization

This sprawling list of specialties indicates no longer such thing as a classical computer scientist, right? Nobody can be an expert in all areas - it is just impossible! Over the years, some branches lost, and others gained attraction, but most are active. For example, we lost interest in operating systems but put more effort into investigating high-performance computation and data science. Theoretical informatics cooled down, but there are still many critical questions unanswered.

Despite this abundant variety, being a computer scientist is more than being an expert in one specific field.
It is more than knowing how to build large software systems, manage networks, process data and establish a secure infrastructure.
**It is a way of thinking!** And no, we do not think like computers but embrace the curiosity naturally given to us human beings.
We renew the playing child within us and let it go wild to solve riddles for no particular reason.
And I think that is one of the most human things to do.

In this little supplementary section, I want to talk about the resource we are working with: information.

As you may notice, I call **computer science** by the European term **informatics**.
But not only because I am from Europe but also because the term leans more towards **information** instead of **computation**.
Information manipulation is computation, but it emphasizes a more abstract broader definition of computation.

Information, as we understand it is the elimination of uncertainties but on a purely **syntactical** level.
This notion dated back to 1948 and was defined by Claude E. Shannon in *A Mathematical Theory of Communication*:

“Frequently, the messages have meaning; that is, they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system has to be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.” – Claude E. Shannon (Shannon, 1948)

In the humanities, researchers have a completely different understanding of information. They discuss cognition, representations, referencing, and interpretation in the context of information. What does a symbol mean, and what effect or interpretation does it has? How does data transform into information, and how do we harvest knowledge and wisdom?

For the purpose of information transmitted via a technical channel, a purely syntactical perspective is appropriate. But it seems inappropriate if we discuss today’s information-driven society.

Nowadays, we can rapidly collect unimaginable amounts of information. It can be processed automatically and checked for higher-level patterns that would not be discernible without technology. We constantly transform information into a machine-readable format, which makes it, in turn, quantifiable. But is there information that is not machine-readable, and do we dismiss it?

Furthermore, the interconnected world of information processes of today never forgets anything.
Incessantly, it sucks up data and floods our private and public spaces with it.
It is like a magical beast that can create amazing miracles but can also cause devastating destructions - sometimes it appears as a lovely creature, and sometimes it embodies a frightening monster.
We have to deal with it!
It is our creation - an artificial intelligence that already evaluates, categorizes, interprets, and generates information on its own.
It has to include the semantics of information.
Otherwise, it would be difficult to call it *artificial intelligence*.

Since we left the task of pure information transformation a long time ago, we need the perspective of the humanities - we have to think beyond pure syntax.
In fact, we have to reconsider the term *information*, at least in our discipline.
For example, Ulrich states:

Where data are the raw facts of the world, information is then data ‘with meaning’. When ‘data’ acquires context-dependent meaning and relevance, it becomes information. Furthermore, we obviously expect information to represent valid knowledge on which users can rely for rational action. – W. Ulrich (Ulrich, 2001)

But even this definition seems problematic. Is there even something like simple brute facts of the world? All data seem to be already gathered and processed. Therefore, information can not simply be the injection of meaning into data because data already has meaning. It is more like the individual degree of significance differs. What might be meaningful to me might be utterly meaningless to you.

Of course, we can not answer this question of an adequate definition here. But to disregard it entirely and leave the design of interconnected information processes solely to the technicians would be a mistake. Discovering, studying, using, influencing, and extending our interconnected world of information processing, is exciting as well as necessary. It is part of our duties, but we should no longer encounter this beast alone.

Let me try to give you the intuition of the *Halting problem*.
For simplicity reasons, I will not explicitly distinguish between a *Turing Machine* $\mathcal{T}$ and its description (often noted by $\alpha_\mathcal{T}$).
So when a *Turing Machine* gets as input another *Turing Machine*, I mean its description (source code).

Let us assume there is a *Turing Machine* $\mathcal{H}$ that takes an arbitrary *Turing Machine* $\mathcal{T}$ as input.
$\mathcal{H}(\mathcal{T})$ outputs 1 if $\mathcal{T}(\mathcal{T})$ halts, otherwise it outputs 0.
So if such a program exists, $\mathcal{H}$ is the program that solves the *Halting problem*, and $\mathcal{T}$ is an arbitrary program.
We can pick $\mathcal{T}$ as its own input because the input can be arbitrary (but finite).
So we ‘feed’ the program $\mathcal{T}$ with itself, that is, its own description/source code $\mathcal{T}$.

So we have

\[\begin{equation} \tag{1}\label{eq:halt:1} \mathcal{H}(\mathcal{T}) = \begin{cases} 1, & \text{if } \mathcal{T}(\mathcal{T}) \text{ halts} \\ 0, & \text{otherwise.} \end{cases} \end{equation}\]We construct a new *Turing Machine* $\hat{\mathcal{H}}$.
$\hat{\mathcal{H}}$ is a slight modification of $\mathcal{H}$.
It does almost the opposite.
$\hat{\mathcal{H}}$ halts and outputs 1 if and only if $\mathcal{H}$ outputs 0.
Furthermore, $\hat{\mathcal{H}}$ does not halt if $\mathcal{H}$ outputs 1.

Now we execute $\hat{\mathcal{H}}$ by feeding it to itself, that is, $\hat{\mathcal{H}}(\hat{\mathcal{H}})$ and analyse the result:

**It outputs 1:**
Let us assume $\hat{\mathcal{H}}(\hat{\mathcal{H}}) = 1$.
Following Eq. \eqref{eq:halt:2} gives us $\mathcal{H}(\hat{\mathcal{H}}) = 0$.
But by following Eq. \eqref{eq:halt:1} this implies that $\hat{\mathcal{H}}(\hat{\mathcal{H}})$ does not halt.
After all, $\mathcal{H}$ outputs 0.
Thus, it recognizes that the analyzed program $\hat{\mathcal{H}}$ does not halt.
But if $\hat{\mathcal{H}}(\hat{\mathcal{H}})$ does not halt, it follows by Eq. \eqref{eq:halt:2} that $\mathcal{H}(\hat{\mathcal{H}}) = 1$ which leads to a contradiction.

**It outputs 0:**
Let us assume $\hat{\mathcal{H}}(\hat{\mathcal{H}}) = \text{undefined}$.
$\mathcal{H}(\hat{\mathcal{H}}) = 1$ follows from Eq. \eqref{eq:halt:2} which implies that $\hat{\mathcal{H}}(\hat{\mathcal{H}})$ does halts, compare Eq. \eqref{eq:halt:1}.
But if $\hat{\mathcal{H}}(\hat{\mathcal{H}})$ halts then by following Eq. \eqref{eq:halt:2}, $\mathcal{H}(\hat{\mathcal{H}}) = 0$ which leads to a contradiction.

In any case $\hat{\mathcal{H}}(\hat{\mathcal{H}})$ leads to a paradox! However, the construction of $\hat{\mathcal{H}}$ is correct if $\mathcal{H}$ exists. Therefore, it follows that our assumption is wrong and $\mathcal{H}$ can not exists!

The *Halting problem* is strongly connected to the question of **decidability**.
It asks if, for some problem or statement, an algorithm or a proof solves the problem in a finite number of steps or proves the statement.
As we saw, the *Halting problem* is **undecidable**!
Any *Turing-complete* machine or system is **undecidable**, and there are many such systems!
For example:

- complex quantum systems,
- the Game of Life,
- airline ticketing systems
- the card game Magic,
- and, of course, almost any programming language is
*Turing-complete*.

There are three essential properties for a desirable mathematical system, and we may think they are fulfilled naturally. But as it turns out, this is not the case. We would like to have a system that is

**complete**: any statement of the system can be proven or disproven**consistent**: a statement can not be proven and disproven at the same time- and
**decidable**: an algorithm (proof) requires a finite number of steps to prove or disprove any statement

Kurt Gödel showed that any decently functional mathematical system is either **incomplete** or **inconsistent**.
Later, he also showed that even if such a system is **consistent**, this can not be proven within the system.
After Alan Turing found a computation model, the *Turing Machine*, he showed that

- many problems / systems are
*Turing-complete* - all those systems are
**undecidable**.

This revelation crushed the mathematical landscape and ended the dream of David Hilbert and others to establish a solid mathematical foundation. Sadly, Kurt Gödel, as well as Alan Turing, one of the most brilliant minds, died tragically.

- Denning, P. J. (2005). Is Computer Science Science?
*Commun. ACM*,*48*(4), 27–31. https://doi.org/10.1145/1053291.1053309 - Shannon, C. E. (1948). A Mathematical Theory of Communication.
*Bell Syst. Tech. J.*,*27*(3), 379–423. - Ulrich, W. (2001).
*A Philosophical Staircase for Information Systems Definition*.

The following text is from the preface of the thesis and describes my personal struggles during and my perspective on the journey.

“The struggle itself towards the heights is enough to fill a man’s heart. One must imagine Sisyphus happy.” – Albert Camus

I can pinpoint the exact moment when I decided to start my academic journey. At that time, my life was harshly disrupted by physical illness and the death of my father. Consequently, my family struggled on many levels. Despite, and probably because of the unfortunate circumstances and with my family’s blessing, I quit my job and rejoined school to get my (technical) A level. I wanted to comprehend the world more than ever and escape the meaningless play of presenting. At that difficult time, there was a financial intensive to graduate as fast as possible. Therefore, I did not join the university but the computer science program at the Munich University of Applied Sciences.

During my undergraduate study, my former naive belief was crushed. I realized that I would never find any definitive objective truth about the physical world. It was a rather pessimistic but also liberating philosophical revelation that there is no definitive rule to follow and no absolute meaning to fulfill. My search was no longer aimed to find an objective meaning but a personal cause to follow. I still admired studying but for other more aesthetic reasons. I loved the clarity and usefulness of formal systems, the beauty of proofs, the elegance of algorithms, and the stu- dent’s lifestyle. I observed a transforming world where computer science spread out into many branches of science, economics, and society. It was a time of playful experimentation and the deconstruction of personal barriers.

After I got my bachelor’s degree, I finally joined a master program at the Technical University of Munich. This continuation was enabled by the individual and financial support I received from Studienstiftung des deutschen Volkes & the Max Weber-Programm. I followed my aesthetic taste and visited rather unpopular formal lectures. At that time, I realized that the source of my personal cause to move on has to be bound to someone else. Beauty and aesthetic theories were a pleasant enjoyment, but they could no longer be the primary reason to live for.

Surprisingly my academic journey should not stop there, since Prof. Dr. Köster invited me into her research group. There are multiple reasons why I happily accepted her invitation. The desire to understand the world was reduced to the desire to understand at least one little part of it. Furthermore, I believed that simulations would influence science, economics, and society for decades to come. And if I could make the world a little bit safer, it might be the cause I was looking for. During my PhD, the wish to be useful and to help others was always a source of inner conflict and self-doubt. I started the whole journey to escape a world that I perceived to be shallow, empty, and driven by profit. From time to time, the scientific research project felt like this meaningless business world that came back to haunt me. Luckily I was in a superb position. Everyone tried to reduce this aspect of the scientific environment to a minimum, for which I am very grateful.

I think every PhD candidate has to deal with uncertainties and self-doubt – the uncertainty within science and the uncertainty of the journey’s path. There is no guaranteed progress or graduation and one is constantly confronted with his or her own limitations. These factors and the ever-present questioning voice in my head acted as a catalyst for an unavoidable existential crisis. I looked into many philosophical ideas and rearranged, and possibly reinforced, my world view and many important values – the chapter’s introductory quotes tell the tale. In my opinion, this process was only possible, because during my academic journey I received the tools to engage with difficult ideas. On the one hand, this crisis was unpleasant but on the other hand it enriched my life – a trade-off I am certainly willing to repeat. In the end, I had supportive companions that helped me to deal with all these issues. It was not easy, and I guess it rarely ever is, but the struggle is part of the charm – as Camus said,

“we must imagine Sisyphus happy.”

For me, to study is to train thoughtful thinking, perception, awareness, and tolerance. It sharpens the mind and opens up a little less ignorant new world. The ability to enjoy thinking and to share this enjoyment with other thoughtful people provides freedom and independence. It might be the greatest gift I received during my journey. Because of it and the people I met, it is a success story, and I am deeply thankful that I could have experienced it.

]]>SC can be used in many very different ways, such as algorithmic composition, sound design, research, and more. In the following, I concentrate on sound design.

Note that there is a strong connection between Sonic Pi (SP) and SC since SP uses SC to generate sound, that is, SP is used to implement a rhythm, but the actual sound comes from `Synth`

s defined by SC or samples.
Therefore, SC and SP can go hand in hand.
I also want to stress that one can implement rhythm and play samples using SC, but a lot more code is required because SC is much more low-level, i.e., more control but less rapid development.

I was motivated by my interest in education and first discovered Sonic Pi (SP).
In my opinion, it is a very powerful tool for educational purposes and live programming.
One can rapidly implement a beat or a rhythm and learn the basic concept of structured programming and threads.
Additional effects can be added to the pre-defined `synths`

, and we can play around with different arguments but changing the sound by those techniques is still limited.
Overall, simplicity is achieved by a lack of control over the sound that is created.
This is not a critic.
Sonic Pi (SP) is a fantastic project.

However, I wanted more power over what was happening, and since Sonic Pi (SP) was built on top of SuperCollider (SC), I looked into it. I most probably would not have got a grasp of it if I had not discovered the excellent tutorials of Eli Fieldsteel, an Assistant Professor of Composition-Theory and Director of the Experimental Music Studios at The University of Illinois. SuperCollider (SC) is a package of three components:

**scsynth**: a real-time audio server, i.e., the part that plays the sound**sclang**: an interpreted programming language focused on sound. The user can control**scsynth**by sending messages (Open Sound Control) to the audio server.**scide**: an IDE for**sclang**with an integrated help system, analyzing tools, and a good documentation.

SuperCollider is written in C++ and was developed by James McCartney and was originally released in 1996. In 2002, he generously released it as free software under the GNU General Public License. It is now maintained and developed by an active community.

Sonic Pi (SP) and other tools for algorithmic composition only use the audio server **scsynth** and replace the other parts with their own programming language and development environment.
Since SuperCollider is around for some time, it is a very rich environment and language.
However, one can also observe some inconveniences and feel its age.
Nevertheless, it follows interesting concepts that were new to me, and whenever we see new coding paradigms and concepts, we learn something.

First, let’s observe how the IDE interacts with the interpreter.
To evaluate a line of code, we press `shift + return`

, and to evaluate a code block, we press `cmd + return`

on the Mac and `crtl + return`

on Windows and Linux.

```
10 + 3; // returns 13
4 + 5; // returns 9
Array.series(10, 0, 1); // returns [0,1,2,3,4,5,6,7,8,9]
```

**Warning:** One hotkey is very, very important, that is, `cmd + .`

.
It stops all sound.
SC will not protect you from creating a painful or very loud sound.
Where Sonic Pi (SP) is rather safe, even a small typo can lead to a horrible and even damaging experience in SC.
Therefore, I recommend using headphones, and before you listen to a new sound, put them off your head and make sure the sound will not blow you away!

I have written a simple extension `Utils`

which boots the server and initializes the windows, i.e., analyzing tools I often require.
The code can be found in Utils.

I call a class function to initialize everything I want:

```
Utils.initTools();
```

Note that as long as we do not have to communicate with the audio server, i.e., play sound, we don’t need to boot the server.

Here we encounter the first inconvenient.
In SC there are some special pre-defined variables.
Each single character variable is pre-defined and globally available.
If you come from a modern programming language, this is strange.
However, it is often useful for prototyping in SC.
A very special variable is `s`

, because it holds a reference to the default local server.
Therefore, to start the audio server, we evaluate:

```
s.boot();
```

No one stops you from overwriting `s`

, but I would not recommend it.
To define a **code-block** we use round brackets.
We can use `x`

without defining it because it is already defined for us.

```
(
x = 10;
x;
)
```

Evaluating

```
variable = 10;
```

results in an error because `variable`

is undefined.
The following works just fine.

```
var variable = 10;
variable;
```

Since `x`

is a global variable we can use it everywhere.
Evaluating the line and the code block will return `13`

.

```
x = 10;
(
x = x + 3;
x;
)
```

but we can also define a local `x`

:

```
x = 10;
(
var x = 0;
x = x + 3;
x;
)
```

Evaluating all these lines will return `3`

, but the global variable `x`

is still `10`

.
The following lines cause an error

```
var variable = 10;
(
variable = variable + 3;
variable;
)
```

while the next lines work and return `13`

.

```
(
var variable = 10;
variable = variable + 3;
variable;
)
```

To define a new global variable, the variable name has to start with `~`

, for example,

```
(
~variable = 10;
(
~variable = ~variable + 3;
~variable;
)
)
```

returns `13`

.

To define a function, we encapsulate its content by curly brackets, and to execute it, we call `value()`

on it:

```
(
var func = {
var x = 10;
x;
};
func.value(); // returns 10
)
```

Here we see another strange behavior: The last line of a function is its return value.
I am not a fan of this shortcut and if you work with **cslang** you will encounter additional ones.
It adds required knowledge to understand a program written in **cslang** and steepen the learning curve.
After getting used to it, they speed up the coding a little bit.

Here is another example of a function with arguments:

```
(
var add = {
arg a, b;
a + b;
};
add.value(6,11) // returns 17
)
```

Like in Python, one can define a default value for each argument, and we can ignore the order if we add the names. Furthermore, there is another rather strange shortcut:

```
(
var add = {|a = 5, b|
a + b;
};
add.value(b: 11) // returns 17
)
```

There are many other parts of the **cslang**, and if you want to get started, I encourage you to visit the tutorials of Eli Fieldsteel or look at the official documentation.

Let us create the most simple sound possible: a sine wave.
First, we define a function that can be seen as a process called *unit generator* (`UGen`

) that starts when we call `play()`

.
There are hundred of different `UGens`

s, they basically spit out real numbers over time.
For example `SineOsc`

samples a sine wave.

```
~sine = {arg freq=200; SinOsc.ar(freq, mul: 0.2)};
~sineplay = ~sine.play();
)
```

By default, the sine wave oscillates between `-1`

and `1`

.
We define a frequency of `200`

Hz that is `200`

cycles per second and a multiple of `0.2`

such that the amplitude stays between `-0.2`

and `0.2`

, which reduces the volume.
Note we that the following two statements are equivalent:

```
SinOsc.ar(freq, mul: 0.2);
SinOsc.ar(freq) * 0.2
```

We can change all arguments we defined, for exmple,

```
~sineplay.set(\freq, 500);
```

changes sets the frequence to 500 cycles per second which increases the pitch.

Note that `set()`

is not called on the function but on the return value of `~sine.play()`

!
In fact, `play()`

is another shortcut to play sound, and behind the scene, it creates a new `Synth`

which I will show in a moment.
`play()`

is excellent to play around and finding a sound, but once you have an idea which direction you want to go, it is better to define a `Synth`

.

To stop the process, i.e., the sine wave, we can call:

```
~sineplay.release();
```

Of course, you can also press `cmd + .`

, but this will kill all sound not only this sine wave.

Let us define a slightly more complex sound generated by two sine waves:

```
~twosines = {arg freq1=200, freq2=200; 0.2 * SinOsc.ar(freq1) + 0.2 * SinOsc.ar(freq2)};
```

What do we expect to hear? Since we add two identical sine waves together, we should hear the same sound but twice as load. But if we play the sound, it does not fulfill our expectations:

```
~twosines.play();
```

The reason is another inconvenience: **sclang** strictly evaluates everything from left to right.
For example, `3 + 5 * 3`

returns `24`

instead of `18`

.
Therefore, we have to use brackets.
A very useful tool is the build in plotting tool.
We can see the problem more clearly if we plot our function:

```
~twosines.plot();
```

To achieve the desired result, we have to correct the code.

```
~sinesplay = {arg freq1=200, freq2=200; (0.2 * SinOsc.ar(freq1)) + (0.2 * SinOsc.ar(freq:freq2))}.play()
```

or

```
~sinesplay = {arg freq1=200, freq2=200; SinOsc.ar(freq1, mul:0.2) + SinOsc.ar(freq:freq2, mul:0.2)}.play()
```

Ok, this is rather boring.
However, if we change `freq2`

such that it is not equal but close to `freq1`

, we get some interesting effect.
You may want to think about what sound you will hear if you do the following:

```
~sineplay.set(\freq2, 201)
```

We hear a kind of wobble effect.
Why?
Well, two sine waves can add up, but they also can cancel each other out.
For example, if we add two sine waves but with phase `0`

and `pi`

, we hear nothing.
This effect is used for noise cancelation in modern headphones.
Apart from some numerical errors, plotting the result reveals the cancelation:

```
{arg freq1=200, freq2=200; (0.2 * SinOsc.ar(freq1)) + (0.2 * SinOsc.ar(freq:freq2,phase:pi))}.plot()
```

Now back to the wobble effect.
Since `freq2`

is 1 Hz higher than `freq1`

, the second sine wave does one extra cycle for each second.
Therefore, in that time span, the two waves interact between adding up and cancel each other out.
I encourage you to play around.
What will happen if `freq2`

is 5 Hz higher?
How does this affect the result?

Here, I will not go into details because the documentation does a better job than I could ever accomplish.
In the following code snippet, I define a `Synth`

.
The first argument of the method used to define the `Synth`

is its name.
The second one is its content.
I call `add()`

to add the `Synth`

to the audio server.
You could also call `store()`

, which additionally stores the `Synth`

in a file.
In this way, you can use the `Synth`

by loading it from a file and with some extra work, we can use the `Synth`

by
other tools like Sonic Pi!!!

```
(
SynthDef.new(\sinewaves, {
arg freq1=200, freq2=205, amp = 0.4, out = 0;
var sig;
sig = amp * 0.5 * SinOsc.ar(freq1);
sig = sig + (amp * 0.5 * SinOsc.ar(freq:freq2));
Out.ar(out, sig!2);
}).add();
)
```

So far, the sound does only come out of one speaker. The line

```
Out.ar(out, sig!2);
```

**copies** the sound and sends it to both speakers.
`sig!2`

is another handy shortcut for `sig.dup(2)`

.
Note that there is no re-evaluation happening, it just copies the result, but we can copy a function!
For example:

```
5!8 // returns [5, 5, 5, 5, 5, 5, 5, 5]
[1,2,3,4]!3 // returns [[1,2,3,4], [1,2,3,4], [1,2,3,4]]
rrand(0,10)!3 // returns an array containing three equal elements!
{rrand(0,10)}!3 // returns an array containing three possible different elements!
```

After we defined the `Synth`

using `SynthDef`

, we can create it, which will immediately start the sound.
We can change all arguments during the `Synth`

lifetime.

```
~sinewaves = Synth(\sinewaves);
~sinewaves.set(\freq1, 100);
~sinewaves.set(\amp, 0.2);
```

So far, we had to start and stop the sound by hand.
However, if we want to design a musical instrument, we want to manipulate the amplitude over time.
Imagine a piano: if we hit a key, the sound requires some time to reach its maximum amplitude.
This period is called *attacktime*.
After the maximum is reached, it decreases, and if the pianist releases the key, the sound vanishes completely.
There are infinite possibilities. For example, sometimes we want a sustaining sound, i.e., the amplitude stays constant for some period.

Instead of *maipulation* musicians use the term *modulation*.
In general, to achieve more interesing sounds modulation of different arguments is a good practice.
In the following I define an envelope (which I think of as a finite `UGen`

).

Instead of manipulation, musicians use the term modulation.
In general, achieving more interesting sound modulation of different arguments is a good practice.
In the following, I define an envelope (which I think of as a finite `UGen`

).

```
{EnvGen.kr(Env.new(levels: [0, 1, 0.4, 0], times: [0.1,0.4,0.1], curve: [2,0,-3]))}.plot(0.6);
```

The envelope starts at `0`

, increases to `1`

during `0.1`

seconds, drops to `0.4`

in `0.4`

seconds, and then drops to `0`

in `0.1`

seconds.
The `curve`

argument influences curvature of the segments.
It can be best understood by plotting the result while playing around with the values.

By multiplying the signal of the envelope to our sound signal, we get a sound that stops after `0.6`

seconds.

```
(
SynthDef.new(\sinewaves, {
arg freq1=200, freq2=205, amp = 0.4, out = 0;
var sig, env;
// doneAction: 2 removes the Synth from the server/memory if it is done!
env = EnvGen.kr(Env.new(levels: [0, 1, 0.4, 0], times: [0.01,0.2,0.1], curve: [2,0,-3]), doneAction: Done.freeSelf);
sig = amp * 0.5 * SinOsc.ar(freq1);
sig = sig + (amp * 0.5 * SinOsc.ar(freq:freq2));
sig = sig * env;
Out.ar(out, sig!2);
}).add();
)
~sinewaves = Synth(\sinewaves);
```

Let’s plot the sound with and without multiplied by the envelop:

The following code generates the plot.

```
(
{
var sig, env, freq1=200, freq2=205, amp = 0.4;
env = {EnvGen.kr(Env.new(levels:[0, 1, 0.4, 0], times:[0.1,0.4,0.1], curve:[2,0,-3]))};
sig = amp * 0.5 * SinOsc.ar(freq1);
sig = sig + (amp * 0.5 * SinOsc.ar(freq:freq2));
sig = sig * env;
sig;
}.plot(0.6)
)
```

To get a feeling for how a sound changes using different arguments, I recommend to use `play()`

in combination of `MouseX`

and `MouseY`

.
The latter two `UGen`

s spit out values that depend on the position of your mouse curser.
Therefore, you can easily change two independently arguments at the same time.

To get a feeling for how a sound changes using different arguments, I recommend using play() in a combination of `MouseX`

and `MouseY`

.
The latter two UGens spit out values that depend on the position of your mouse cursor.
Therefore, you can easily change two independent arguments at the same time.

```
{ SinOsc.ar(MouseX.kr(10, 500))!2 * 0.5 }.play()
{ LPF.ar(in: Saw.ar(MouseX.kr(10, 500))!2 * 0.2, freq: MouseY.kr(0, 5000)) }.play()
{ LPF.ar(in: WhiteNoise.ar(mul: 0.2), freq: MouseX.kr(0, 500)) }.play()
{ LPF.ar(in: WhiteNoise.ar(mul: 0.4), freq: 1000) * EnvGen.ar(Env.perc(0.1, 0.7)) }.play()
{ EnvGen.ar(Env.perc(0.1, 0.7)) }.plot(0.8);
{ SinOsc.ar(100, mul: 0.2) + LPF.ar(in: WhiteNoise.ar(mul: 0.4), freq: 1000) * EnvGen.ar(Env.perc(0.1, 0.7)) }.play();
```

Composing a whole piece takes time.
One way is first to design your `Synth`

s or find proper samples.
Then we need a rhythm and/or melody.
For that purpose, we use some other tools, or we use **sclang**.
If you want to use **sclang** a good starting point is to look into the concept of *pattern* especially `PBind`

.

To get a better feeling of what is possible, copy the following code and execute it in the IDE. The piece was composed by Eli Fieldsteel and makes use of wavetable synthesis.

]]>