Introduction
Mélodium is a language and tool to implement robust data flows and transformations. It proposes a wide scope of operations available for large kind of data, from simple text parsing to media stream live transformation.
Originally designed for scientific research in audio and music, it has now been extended to handle lot of big data problematics. Working on many platforms and environments, it is capable of handling large amount of data in real-time with high optimization.
The current book is a summary of its usage and present the purpose of Mélodium, its internal concepts, and how to program and work with it. Please refer to the Standard Reference for detailed documentation, and to the repository for development. Also take a look to the Official Project Website.
Mélodium and this book are a work in progress, aiming to evolve and grow significantly with time. All the informations explained there might no be up-to-date compared to the current state of the project. All this work is done with passion and any comment is good to provide.
Purpose
Mélodium has been created to fill the gap between most of the programming languages and the idea of data streams.
Most languages, while using many paradigms, see data as individual boxes that need to be moved in memory in a step-by-step approach. Developer need to define how these boxes are organized and passed as parameter, processed through loops, returned, keeping track of what's happening, and managing error cases in many locations (or assuming all will be fine else program will crash). This approach can be sufficient in many cases, and lot of languages already did a great job at it.
Now comes the problematic of data streams.
What if we need a program able to run continously and efficiently on data without being able to assume it will fit in a box, or even in memory? What if we need it to continue to process even if somewhere or sometimes a problem happens? What if we need a program that insure everything to be ok at startup, and take care safely, by design, of errors that may occur?
Here comes Mélodium.
Concepts
Mélodium language is organized between four main concepts: treatments, models, connections, and tracks.
Treatments
Treatments are the main element of the language, they can take parameters. Unlike functions, which list instructions to execute and apply changes on variables, treatments describes flows of operations that applies on data. It can be seen as a map, on which paths connects from sources to destinations, browsing through different locations with different purposes. Order of declaration has no importance, treatments will run when there are data ready to be processed, and all treatments can be considered as running simultaneously.
Within treatments are other treatments, that also take parameters, inputs, and provide outputs to the hosting treatment. Treatments are declared once, and then can be connected as many times as needed.
treatment myTreatment(var foo: u64, var bar: f64)
{
treatmentA(alpha=foo)
treatmentB(beta=bar, gamma=0.01)
treatmentA.output --> treatmentB.input
}
Models
Models are elements that live through the whole execution of a program. Can be declared and configured, and then instancied in a treatment declaration.
use audio/host::AudioOutput
model HostAudioOutput() : AudioOutput
{
early_end = false
}
Audio module is in active development, no reference is available for now
Models are instancied by treatments in their prelude.
treatment playAudio(const early_end: bool)
model audioOut: AudioOutput(early_end=early_end)
input signal: Stream<f32>
{
sendAudio[output=audioOut]()
Self.signal -> SendAudio.signal
}
Audio module is in active development, no reference is available for now
In most cases, models are instancied internally by treatments and not exposed, user developer can make direct call on model-dependent treatments without instancing its own, just giving required parameters to the sequence. The cases where user may give its own defined model is to configure elements such as external software connections or interfaces.
Connections
Connections are basically paths data will follow. Connection can connect treatments outputs to inputs, but also refers to the inputs and outputs of the hosting treatment itself. A connection always links an output and an input of the same type.
treatment Demonstration()
input floating_point_value: Stream<f32>
output positive_value: Stream<u64>
output integer_value: Stream<i64>
{
toU64()
toI64()
Self.floating_point_value -> toU64.value,value -> Self.positive_value
Self.floating_point_value -> toI64.value,value -> Self.integer_value
}
Multiple connections from the same element are totally legal, however overloading a treatment input or a host treatment output (Self
) is forbidden.
Also, while omitting usage of a treatment output is legal, every input must be satisfied.
Finally, all host treatment outputs must be satisfied.
Inputs and outputs (and so connections) are either streaming or blocking. A streaming connection Stream<…>
is expected to send continuously values from the specified type.
A blocking connection Block<…>
is expected to send all-at-once.
This distinction mainly rely on the core treatments that are used and the intrinsic logic applied on data.
What developer should keep in mind is that streaming is the default unless blocking is required.
A specific kind of connection using the data type void
exists. It is useful for transmitting information that something happens or should be triggered, schedule events, Block<void>
; or to indicate continuation of something that doesn't convey data by itself, Stream<void>
.
Tracks
Tracks are at the same time the more implicit and the more important thing in Mélodium. When developer instanciate models and connects treatments altogether, it creates a potential track. The track is the whole ensemble of treatments and flows between them, that are created together, live together, and disappear together.
A track always takes its origin from a model, who request its initialization when needed and as many times as needed, for each file found, each incoming connection, or whatever the model purpose proposes. Each of them will follow the same defined chain of treatments, but on their own. This is one of the core element providing Mélodium its strong scalability.
Programming
Section being build, please directly switch to subsections.
Script files
Launch
Mélodium scripts can be launched using melodium
command:
melodium main_script.mel
Or if the main script includes a shebang:
./main_script.mel
Recommended shebang being #!/usr/bin/env melodium
.
Script file must contains basic identity informations to be used as entrypoint:
- name (required),
- version (optional, in semver sematic),
- requirements (optional).
#!/usr/bin/env melodium
#! name = my_script
#! version = 0.1.0
#! require = conv fs …
// Content
Note about encoding
Mélodium script files are plain UTF-8 text, without byte order mark. This choice is made for three main reasons:
- a choice on encoding, even arbitrary, is better than no choice;
- Unicode provides the wider support for any characters from all human languages and scripts, existing and future, ensuring continuity;
- the Mélodium engine is implemented in Rust, itself natively representing text as UTF-8.
Data types
Mélodium have multiple core data types, shared across four main categories, plus bool
, byte
and void
:
- unsigned integers
- signed integers
- floating-point numbers
- textual data
All those types are described across their respective section. Each type has been selected because it meets a very specific purpose.
Unsigned integers | Signed integers | Floating-point numbers | Text | Logic |
---|---|---|---|---|
u8 | i8 | f32 | char | byte |
u16 | i16 | f64 | string | bool |
u32 | i32 | void | ||
u64 | i64 | |||
u128 | i128 |
Byte
Type | Values | Size |
---|---|---|
byte | Any 8-bits data | 8 bits / 1 byte |
A byte
is basically the most atomic unit of data manipulable through Mélodium.
It represents any 8-bits data, without more assumption on what it could be.
Bool
Type | Values | Size |
---|---|---|
bool | true or false | 8 bits / 1 byte |
A bool
is a boolean value that can be either set to true
or false
.
Conversion treatments are available for bool
s to be turned into bytes, numbers, or any kind of value.
Void
Type | Values | Size |
---|---|---|
void | None | 0 bit / 0 byte |
void
data type does not hold any value, it just indicates that something is existing.
It is used through connections to transmit triggers or streaming indicators.
Unsigned integers
Type | Range | Size |
---|---|---|
u8 | 0 to 2⁸-1 (255) | 8 bits / 1 byte |
u16 | 0 to 2¹⁶-1 (65,535) | 16 bits / 2 bytes |
u32 | 0 to 2³²-1 (4,294,967,295) | 32 bits / 4 bytes |
u64 | 0 to 2⁶⁴-1 ( > 18×10¹⁸) | 64 bits / 8 bytes |
u128 | 0 to 2¹²⁸-1 ( > 34×10³⁷) | 128 bits / 16 bytes |
Signed integers
Type | Range | Size |
---|---|---|
i8 | -2⁷ (-128) to 2⁷-1 (127) | 8 bits / 1 byte |
i16 | -2¹⁵ (-32,768) to 2¹⁵-1 (32,767) | 16 bits / 2 bytes |
i32 | -2³¹ (-2,147,483,648) to 2³¹-1 (2,147,483,647) | 32 bits / 4 bytes |
i64 | -2⁶³ ( ≈ -9×10¹⁵) to 2⁶³-1 ( ≈ 9×10¹⁵) | 64 bits / 8 bytes |
i128 | -2¹²⁷ ( ≈ -34×10³⁷) to 2¹²⁷-1 ( ≈ 34×10³⁷) | 128 bits / 16 bytes |
Floating-point numbers
Type | Values | Size |
---|---|---|
f32 | See description | 32 bits / 4 bytes |
f64 | See description | 64 bits / 8 bytes |
Floating-point numbers are defined in IEEE 754-2008. They can mostly be considered as decimal numbers, for a deeper explanation, please refers to the Single-precision floating-point format (for f32
) and Double-precision floating-point format (for f64
) articles on Wikipedia.
They can store positive or negative values, but also be in one of those three states:
- positive infinity, can be result of something like
1.0/0.0
; - negative infinity, can be result of something like
-1.0/0.0
; - not a number, can be result of a square root of negative number (aka. complex number).
Textual data
Type | Values | Size |
---|---|---|
char | Any valid Unicode scalar value | 32 bits / 4 bytes |
string | Any valid UTF-8 text | Variable |
All textual information is represented as Unicode. A char
uses 4 bytes to store any Unicode scalar value, as defined in Unicode Standard. Unlike many other programming languages, Mélodium does not assume a char and a byte (nor combination of bytes) to be equivalent at all, for many reasons such as:
- a byte only have 256 values, while all human languages combined have much more "letters";
- a letter in Unicode Text Format can be up to 4 bytes;
- lot of values are illegal according to Unicode;
- Unicode standard provide a strong universality of what textual data can be represented;
- making data types reliable, each one having its own purpose, then
char
guarantees valid text data whilebyte
only assume it is data.
The string
data type can represent any UTF-8 text and its size depends on the length of the text. Interestingly, string
s are not a combination of char
s, but real UTF-8 strings. Taking the text Mélodium
and putting it as vector of chars, 32 bytes (8 chars × 4 bytes) are used, but as string only 9 bytes. This technical subtility is transparent for users and conversion treatments are provided if needed.
Mélodium can handle many encodings through its encoders and decoders, taking and providing byte streams.
Parameters
Treatments and models declares parameters. Parameters are like in any language: elements given by the caller to set up behavior of the model or treatment.
Const and var
In Mélodium, parameters can be either constant or variable, respectively declared with keywords const
and var
.
A constant parameter designates something that will keep the same value during all the execution, on all tracks generated through the given call. They are used mostly to configure models, that have all parameters required to be constant.
A variable parameter designates something that may have different values on each track generated.
While a constant can be used to set up constant and variable parameters, variable elements (parameters but also contextes) can only be used to set up other variables.
Reference
The Mélodium reference is available on doc.melodium.tech. The whole Standard Library is documented there, and can be browsed through areas.
Runtime
Mélodium uses a runtime engine. The script files are fully parsed and their logic build and checked before any execution starts. When launching a Mélodium script, multiple stages happens:
- Script textual parsing and semantic build
- Usage and depedencies resolution
- Logic building
- Models instanciation
- Execution and tracks triggering
About the author
Quentin Vignaud is IT engineer graduated from CESI, and M.Sc. in computing science from UQÀM. Working at Doctolib as data and software engineer, originally authored Mélodium during studies at UQÀM, while doing scientific research in music analysis.
Website: https://www.quentinvignaud.com/