Introduction

Mélodium is a language and tool to implement robust data flows and transformations. It proposes a wide scope of operations available for large kind of data, from simple text parsing to media stream live transformation.

Originally designed for scientific research in audio and music, it has now been extended to handle lot of big data problematics. Working on many platforms and environments, it is capable of handling large amount of data in real-time with high optimization.

The current book is a summary of its usage and present the purpose of Mélodium, its internal concepts, and how to program and work with it. Please refer to the Standard Reference for detailed documentation, and to the repository for development. Also take a look to the Official Project Website.

Mélodium and this book are a work in progress, aiming to evolve and grow significantly with time. All the informations explained there might no be up-to-date compared to the current state of the project. All this work is done with passion and any comment is good to provide.

Purpose

Mélodium has been created to fill the gap between most of the programming languages and the idea of data streams.

Most of the languages, while using many paradigms, see data as individual boxes that need to be moved in memory in a step-by-step approach. Developer need to define how these boxes are organized and passed as parameter, processed through loops, returned, keeping track of what's happening, and managing error cases in many locations (or assuming all will be fine else program will crash). This approach can be sufficient in many cases, but obviously not all of them.

Now comes the problematic of data streams.

What if we need a program able to run continously and efficiently on data which doesn't fit in a box? What if we need to read tens of thousands of files? What if we need to read files that are giga- or terabytes long? What if we need it to continue to process even if somewhere or sometimes a problem happens? What if we need a program that insure everything to be ok at startup and take care of future problems that may occur?

Here comes Mélodium.

Concepts

Mélodium language is organized between 5 main concepts: sequences, treatments, models, connections, and tracks.

Sequences & Treatments

Sequences are the main element of the language, they can take parameters. Unlike functions, which list instructions to execute and apply changes on variables, sequences describes treatments and flows. A sequence can be seen as a map, on which paths connects from sources to destinations, browsing through different locations with different purposes. Order of declaration has no importance, treatments will run when there are data ready to be processed, and all treatments can be considered as running simultaneously.

Within sequences are treatments, which are basically other sequences that will take parameters, inputs, and provide outputs to the hosting sequence. Treatments are declared once within a sequence, and then can be connected as many times as needed.

sequence MySequence(var foo: u64, var bar: f64)
{
    TreatmentA(alpha=foo)
    TreatmentB(beta=bar, gamma=0.01)

    TreatmentA.output --> TreatmentB.input
}

Models

Models are elements that live through the whole execution of a program. Can be declared and configured, and then instancied in a sequence declaration.

use std/audio/host::AudioOutput

model HostAudioOutput() : AudioOutput
{
    early_end = false
}

Reference for AudioOutput

Models are instancied by sequences in their prelude.

sequence PlayAudio(const early_end: bool)
  model audioOut: AudioOutput(early_end=early_end)
  input signal: Stream<f32>
{
    SendAudio[output=audioOut]()

    Self.signal -> SendAudio.signal
}

Reference for SendAudio

In most cases, models are instancied internally by sequences and not exposed, user developer can make direct call on model-dependent sequences without instancing its own, just giving required parameters to the sequence. The cases where user may give its own defined model is to configure elements such as external software connections or interfaces.

Connections

Connections are basically paths data will follow. Connection can connect treatments outputs to inputs, but also refers to the inputs and outputs of the hosting sequence itself. A connection always links an output and an input of the same type.

sequence Demonstration()
  input  floating_point_value: Stream<f32>
  output positive_increased_value: Stream<u64>
  output integer_value: Stream<i64>
{
    ToU64()
    ToI64()
    StaticAddU16(value=1)

    Self.floating_point_value -> ToU64.value,value -> StaticAdd.value,value -> Self.positive_increased_value
    Self.floating_point_value -> ToI64.value,value --------------------------> Self.integer_value
}

Reference for ToU64, ToI64, StaticAdd

Multiple connections from the same element are totally legal, however overloading a treatment input or a sequence output is forbidden. Also, while omitting usage of a treatment output is legal, every input must be satisfied. Finally, all outputs of a sequence must be satisfied.

Inputs and outputs (and so connections) are either streaming or blocking. A streaming connection Stream<…> is expected to send continuously values from the specified type. A blocking connection Block<…> is expected to send all-at-once. This distinction mainly rely on the core treatments that are used and the intrinsic logic applied on data. What developer should keep in mind is that streaming is the default unless blocking is required.

A specific kind of connection using the data type void exists. It is useful for transmitting information that something happens or should be triggered, schedule events, Block<void>; or to indicate continuation of something that doesn't convey data by itself, Stream<void>.

Tracks

Tracks are at the same time the more implicit and the more important thing in Mélodium. When developer instanciate models and connects treatments altogether, it creates a potential track. The track is the whole ensemble of treatments and flows between them, that are created together, live together, and disappear together.

A track always takes its origin from a model, who request its initialization when needed and as many times as needed, for each file found, each incoming connection, or whatever the model purpose proposes. Each of them will follow the same defined chain of treatments, but on their own. This is one of the core element providing Mélodium its strong scalability.

Programming

Section being build, please directly switch to subsections.

Script files

Launch

Mélodium scripts can be launched using melodium command:

melodium main_script.mel

Or if the main script includes a shebang:

./main_script.mel

Recommended shebang being #!/usr/bin/env melodium.

Hierarchy and inclusion

There are 3 roots when reffering to external elements in Mélodium:

  • main, relative to the main script file;
  • core, relative to the build-in language elements;
  • std, relative to the standard library.

The local root can also be used and refers to the current location of the script. Areas in Mélodium are matching the filesystem hierarchy.

// Import element John from local/foo/bar, coming from file ./foo/bar.mel
use local/foo/bar::John

// Import element Doe from main/baz, coming from file <main dir>/baz.mel
use main/baz::Doe

Note about encoding

Mélodium script files are plain UTF-8 text, without byte order mark. This choice is made for three main reasons:

  1. a choice on encoding, even arbitrary, is better than no choice;
  2. Unicode provides the wider support for any characters from all human languages and scripts, existing and future, ensuring continuity;
  3. the Mélodium engine is implemented in Rust, itself natively representing text as UTF-8.

Data types

Mélodium have multiple core data types, shared across four main categories, plus bool, byte and void:

  • unsigned integers
  • signed integers
  • floating-point numbers
  • textual data

All those types are described across their respective section. Each type has been selected because it meets a very specific purpose.

Unsigned integersSigned integersFloating-point numbersTextLogic
u8i8f32charbyte
u16i16f64stringbool
u32i32void
u64i64
u128i128

Byte

TypeValuesSize
byteAny 8-bits data8 bits / 1 byte

A byte is basically the most atomic unit of data manipulable through Mélodium. It represents any 8-bits data, without more assumption on what it could be.

Bool

TypeValuesSize
booltrue or false8 bits / 1 byte

A bool is a boolean value that can be either set to true or false. Conversion treatments are available for bools to be turned into bytes, numbers, or any kind of value.

Void

TypeValuesSize
voidNone0 bit / 0 byte

void data type does not hold any value, it just indicates that something is existing. It is used through connections to transmit triggers or streaming indicators.

Unsigned integers

TypeRangeSize
u80 to 2⁸-1 (255)8 bits / 1 byte
u160 to 2¹⁶-1 (65,535)16 bits / 2 bytes
u320 to 2³²-1 (4,294,967,295)32 bits / 4 bytes
u640 to 2⁶⁴-1 ( > 18×10¹⁸)64 bits / 8 bytes
u1280 to 2¹²⁸-1 ( > 34×10³⁷)128 bits / 16 bytes

Signed integers

TypeRangeSize
i8-2⁷ (-128) to 2⁷-1 (127)8 bits / 1 byte
i16-2¹⁵ (-32,768) to 2¹⁵-1 (32,767)16 bits / 2 bytes
i32-2³¹ (-2,147,483,648) to 2³¹-1 (2,147,483,647)32 bits / 4 bytes
i64-2⁶³ ( ≈ -9×10¹⁵) to 2⁶³-1 ( ≈ 9×10¹⁵)64 bits / 8 bytes
i128-2¹²⁷ ( ≈ -34×10³⁷) to 2¹²⁷-1 ( ≈ 34×10³⁷)128 bits / 16 bytes

Floating-point numbers

TypeValuesSize
f32See description32 bits / 4 bytes
f64See description64 bits / 8 bytes

Floating-point numbers are defined in IEEE 754-2008. They can mostly be considered as decimal numbers, for a deeper explanation, please refers to the Single-precision floating-point format (for f32) and Double-precision floating-point format (for f64) articles on Wikipedia.

They can store positive or negative values, but also be in one of those three states:

  • positive infinity, can be result of something like 1.0/0.0;
  • negative infinity, can be result of something like -1.0/0.0;
  • not a number, can be result of a square root of negative number (aka. complex number).

Textual data

TypeValuesSize
charAny valid Unicode scalar value32 bits / 4 bytes
stringAny valid UTF-8 textVariable

All textual information is represented as Unicode. A char uses 4 bytes to store any Unicode scalar value, as defined in Unicode Standard. Unlike many other programming languages, Mélodium does not assume a char and a byte (nor combination of bytes) to be equivalent at all, for many reasons such as:

  • a byte only have 256 values, while all human languages combined have much more "letters";
  • a letter in Unicode Text Format can be up to 4 bytes;
  • lot of values are illegal according to Unicode;
  • Unicode standard provide a strong universality of what textual data can be represented;
  • making data types reliable, each one having its own purpose, then char guarantees valid text data while byte only assume it is data.

The string data type can represent any UTF-8 text and its size depends on the length of the text. Interestingly, strings are not a combination of chars, but real UTF-8 strings. Taking the text Mélodium and putting it as vector of chars, 32 bytes (8 chars × 4 bytes) are used, but as string only 9 bytes. This technical subtility is transparent for users and conversion treatments are provided if needed.

Mélodium can handle many encodings through its encoders and decoders, taking and providing byte streams.

Parameters

Sequences and models declares parameters. Parameters are like in any language: elements given by the caller to set up behavior of the model or sequence.

Const and var

In Mélodium, parameters can be either constant or variable, respectively declared with keywords const and var. A constant parameter designates something that will keep the same value during all the execution, on all tracks generated through the given call. They are used mostly to configure models, that have all parameters required to be constant. A variable parameter designates something that may have different values on each track generated.

While a constant can be used to set up constant and variable parameters, variable elements (parameters but also contextes) can only be used to set up other variables.

Reference

The Mélodium reference is available on doc.melodium.tech. The whole Standard Library is documented there, and can be browsed through areas.

Runtime

First of all, Mélodium is not a compiled language but uses a runtime engine. The script files are fully parsed and their logic build and checked before any execution starts. When launching a Mélodium script, multiple stages happens:

  1. Script textual parsing
  2. Script semantic representation building
  3. Usage resolution
  4. Descriptors building
  5. Logic building
  6. Models instanciation
  7. Execution triggering
  8. Tracks launches

About the author

Quentin Vignaud is IT engineer graduated from CESI, and M.Sc. in computing science from UQÀM. Working at Doctolib as data and software engineer, originally authored Mélodium during studies at UQÀM, while doing scientific research in music analysis.

Website: https://www.quentinvignaud.com/