Expressive timing in German late Romantic organ music

Abstract: This paper proposes amathematical model for expressive timing in German late Romantic organmusic based on Riemannian phrasing principles. The model is built following the symmetric hierarchical motivic scheme and then applied to the organ work of Max Reger. It was shown that music expressive content is carried by the model parameter defined as temporal elasticity. Alongwith the standard analytical and empirical evaluation, themodelwas also evaluated in terms of its naturalness. We found that the correlation between model and human performance is comparable with the correlation between distinct professional human interpretations.


. Introduction
During its centuries-long history, the organ has been perceived in a very controversial way. Some composers were impressed by its sound capacities (for example, W. A. Mozart ( ) crowned it as "the king of all instruments"), while others blamed it for being too mechanical and inexpressive. For instance, Igor Stravinsky said: "I dislike the organ's legato sostenuto, and its blur of octaves, as well as the fact that the monster never breathes" (Craft and Stravinsky , p ). This paper proposes an interdisciplinary approach to quantitative organ performance analysis and aims to reveal the instrument's expressive potential. Every performance, live or recorded, contains specific human expression conveyed by the musician to the listeners. W. Göbl wrote that "without this expressivity, the music would not attract people; it is an integral part of the music" (Göbl et al. ). In organ performance, the instrument's constraints do not allow the refined continuous control over note intensity or local variations of timbre. Nonetheless, timing remains always available, and that is why it becomes one of the most important tools to realize the organist's expressive intent.
Although organ music is a significant part of Western musical tradition, very few empirical studies on organ performance have been published so far (Nielsen ; Jerkert ; Gingras ). However, no one, as far as we know, has performed quantitative research on German late Romantic organ music, which is the main focus of the present work. A more detailed account of the expression specificity for this style is given in the following section.

. Quantitative research on expressive timing
Quantitative research on expressive timing in music performance has so far primarily focused on the piano (Palmer ; Gabrielsson ), as well as some other instruments, such as the violin (Cheng and Chew ), the clarinet (Vines et al.
), the cello (Hong ), the guitar (Juslin ), and the harpsichord (Gingras et al. ), while the organ has generally been ignored. The development of MIDI technology has contributed significantly to piano performance research as well (Windsor and Clarke ; Repp ). However, due to the nature of the instrument, the conclusions related to the piano are not directly applicable to organ performances. The main difference is in the way one produces sound: for organs, the air is moved through pitched pipes, while for pianos, strings are struck with felted hammers. The dynamic changes on the . [Editor Note: This paper was made using a L A T E X template and has some adequation facing the docx template.] organ are by default discrete and usually controlled by "pulling" stops (sets of pipes of different shapes that can produce different timbres), while the dynamic variety of the piano is continuous and controlled directly by varying the pressure on the key. That is why dynamics, as a concept managed by fingertips on piano keys, do not translate to the organ. In addition, the organ is a unique instrument producing a sound that does not decay; an organ tone will hold indefinitely as long as a key is pressed and the instrument is powered on. All these particular organ features make a significant difference to the expressive elements in organ performance compared to the piano and other instruments.
The initial motivation for this study was rooted in two papers (Gingras et al. ; Gingras and McAdams ), which are the only two empirical studies to have been published so far on expressive pipe organ performance using MIDI. But the focus in both studies was on Baroque music and J. S. Bach; that is why the observations made are stylistically inappropriate for Romantic music. The main difference is in the use of articulation as an expressive tool: while in Baroque music different types of articulation are allowed, in the Romantic style the articulation is prescribed as an absolute legato or is unambiguously defined by the composer (Laukvik ). In this paper, we present a new analytical concept according to the specific properties of the instrument, as well as to the stylistic attributes of German late Romantic organ music.

. Mathematical model for Riemannian phrasing
The phrasing theory of Hugo Riemann ( -), a renowned German musicologist of the XIXth century, was chosen as the basis for our model. Several studies on music performance have shown the importance of Riemannian principles for the stylistically correct German late Romantic music interpretation (Lohmann ; Laukvik ; Sander ; Szabó ), whereas Riemann's scientific approach to expressivity noticed by Rehding ( ) is particularly beneficial for computer simulation. Furthermore, Riemann was the primary composition professor of Max Reger, whose work is the subject of this research. Riemann's phrasing theory was described in detail in two central works (Riemann , ), as well as was briefly touched in other books and articles (Riemann , ; Riemann and Fuchs ). For this study, because of its direct reference to performance expression questions, the earliest work (Riemann ) was chosen as a model basis.
According to Riemann, the main goal of expressive phrasing is to underline the hierarchical motivic structure, which he considers essential to the correct music score interpretation (Riemann , p ). The phrasing at each level must have the arch-like timing shape: starting from slow, followed by an accelerando, then returning to the original tempo. This concept corresponds to the Phrase Arch rule in the KTH rule system (Friberg et al. ). Riemann shows that the arch-like shape is built on different hierarchical levels (see Figure  ) that is very similar to the hierarchical tree-based model proposed by Todd ( ). The main difference with the latter is in the underlying principles of hierarchical organization: while Riemann's scheme is based on the numbers of notes in the motives, Todd's timing model is relying on Generative Theory of Tonal Music (Lerdahl and Jackendoff ), which inherits the core musicological principles from Schenkerian harmonic analysis. Consequently, the Riemannian approach is more approachable for computational simulation, especially for the harmonically complex late Romantic music.
(Riemann , p VIII) described the short -or -notes motives as "the smallest possible musical units of stand-alone expressive importance", which form the large-scale symmetry as shown in Figure . We propose to model the tempo arch at each level as the positive semi-ellipse, where the long axe is defined by Riemannian length, and the short axe is proportional to the metronomic tempo. The coefficient of proportionality e is defined as temporal elasticity: it shows the maximum of the model tempo deviation against the metronomic Figure : Riemann's motivic scheme (Riemann , p ).
tempo for each level. According to our research hypothesis, temporal elasticity is capable of carrying the expressive information in organ performance in the German late Romantic style. The following part of this paper describes in greater detail the formal mathematical model of the temporal elasticity concept and its empiric evaluation through the listening test.

. . Symmetric model
In the most general symmetric case, the temporal elasticities e ij do not vary within the level and are uniformly related to the global coefficient e for the arch over the whole piece: where i, is the number of the level; j, sequence number of the ellipse on the i th level, and N i , quantity of ellipses on the i th level. The model contains the sum of all arcs on all levels. For example, the model curve Y for one global arch and four subsequent levels will be: where T is the constant metronomic tempo, h and a are the parameters of each ellipse (the ellipse's center and the short axe, respectively). For the duple meters ( Figure ,  Max Reger's Choral Prelude op. a/ "Ach bleib mit deiner Gnade" was chosen for the analytical evaluation of this model. It is a textbook example of the Riemannian symmetry with the time signature / : eight bars long, clear cadences in bars number , , and . Level in our model corresponds here to the cadences in bars and ; level shows the cadences in bars , , and ; level highlights the four quarter-note segments endings, and level corresponds to the smallest microstructure of two quarter-note motives (see Figure ) The professional organ performance of this piece was recorded in MIDI format at the large -manuals Casavant organ in The Church of Saint Andrew and Saint Paul in Montreal (Canada). Local temporal information on . ""Life, Color, Warmth and Truth": Expressive timing in German late Romantic organ music" Per Musi no. , Computing Performed Music: -. e . DOI . / -. .  ). It could be hypothetically compared with the results in (Windsor and Clarke ), where the highest R obtained for timing from the similar symmetric model was .
. However, this comparison is speculative because the model described by Windsor and Clarke ( ) was applied to the early Romantic piece for piano and had a different methodological background.
. If there was no event at the sixteenth-note level, the local tempo value was interpolated and set to the preceding notes local tempo.

. . Improved symmetric model
In real performance practice, the absolute symmetry as described by equation ( ) is rarely kept, and some irregularities may be possible. In the more general case, temporal elasticities e ij take values: where the weights coefficients k ij may differ both within the specific level and over all levels.
To build an improved model, which takes into account the varying weight coefficients, the optimization using the Nelder-Mead simplex algorithm (Lagarias et al. ) was performed. The values of e and k ij were set as optimization parameters for the Matlab fminsearch function in order to minimize the distance between the model Y and the human performance data. Symmetric values e = . and k ij = . were used as the initialization guess for the first simplex. The obtained model provided a highly significant coefficient of determination R = .
(p< . ), which might be comparable with the values of the variance accounted for by a repeat human performance (Todd ). The correlation between the optimized curve and human performance R= . is even higher than the correlation between two professional organists' interpretation of this piece (R= . , see Table in Section ).
But despite the high value of R , the weights obtained through the optimization process cannot be directly used for the model simulation because they contain information about both relevant (performer's expressive intent) and irrelevant (e.g., related to the technical issues) tempo deviations. We undertook a detailed analysis of the weights' distribution so to determine the most prominent trends and map them to the relevant score features or historic performance practice principles.
Figure shows the values for k ij , as well as the mean values for the levels -. The optimized temporal elasticity was close to the initial guess: e = . , and the coefficients k ij were distributed in the range from *to . with the mean over all levels mean(k ij )= . , which correlates well with the initial value for the symmetric model mean(k ij )=k ij = . (see Figure ).
This obtained result contains three important factors to consider: ) Levels and are less elastic than levels and . This coincides well with the "motivic paradigm" of the Riemannian model, with the small motives being its cornerstone, as well as to the brief score analysis of the piece showing that the cadences in bars , , and are all equally strong (and serve as breezing breakpoints between choral phrases) while the cadence in the th bar does not have any additional weight.
) The mean value of the coefficients on the second level is mean(k ij )= . . But the first coefficient k , corresponding to the tempo arch over bar , is significantly higher than the level mean value, k = . (see Figure ). This might result from the initial performer's attempt to grab the audience's attention right at the beginning of the piece by a substantial tempo increase. Interestingly, this corresponds to the observations made already back in by Charles Sears ( ), who investigated the rhythmical patterns in organ performance of the church hymns.
) Neither the symmetric nor the generic model curve did not reach the maximal (fastest) tempo of human performance. That is understandable because with the further increase in temporal elasticity, the "tails" of . ""Life, Color, Warmth and Truth": Expressive timing in German late Romantic organ music" Per Musi no. , Computing Performed Music: -. e . DOI . / -. . the model curve at the beginning and at the end cross the zero-level and thus output the sense-less negative tempo values. Hence, introducing appropriate boundary conditions for a start-and end-tempo might improve the model performance and make it more elastic.
These three observations from the generic model analysis were used to create the appropriate modifications for the improved symmetric model: Modification : Levels and were set to have more weight than levels and . The respective values of k ij are shown in Table . The mean value for the improved model was deliberately kept the same as for the initial symmetric model: mean(k ij ) symmetric =mean(k ij ) improved = . . Modification : The first coefficient at the second level was increased up to k = . (so to give the same elasticity as for the next four bars) for emulation of the performer's expression at the beginning of the piece.
Modification : The model tempo curve was defined to be above or equal to the performance tempo minimum value (final ritardando) to prevent it from getting negative values while increasing the temporal elasticity. The improved model with different elasticity values is shown in Figure . Henceforth, in the improved model, only the global arch elasticity e can vary and thus encompass the variations in expressive timing. The other elasticity coefficients e ij are fixed by being tied to the respective level values of k ij (Table ), and therefore ensure the Riemannian symmetry.
. ""Life, Color, Warmth and Truth": Expressive timing in German late Romantic organ music" Per Musi no. , Computing Performed Music: -. e . DOI . / -. . The improved model was evaluated mathematically with the same algorithm as described above, but this time only the value of e was optimized. The summary of regression analysis depending on the modifications made is presented in Table (all coefficients R were significant at the level p< . ). Table : Summary of regression analysis for the improved symmetric model.

Number of modifications
Optimal value of e R No modifications ("pure" symmetric) .

Three (unequal levels, k increase and boundary conditions)
. .
It is revealed that the improved model has better performance than the "pure" symmetrical model. Specifically, introducing the elasticity boost at the second level makes a significant difference. It is a meaningful finding for the performance practice illustrating how essential are the first bars of the piece. The boundary conditions allow the model to increase the R as well; however, when the elastic extension of the model curve becomes too high, the R slowly decreases.

. Comparative temporal analysis
The temporal analysis of five different audio recordings (Dupont ; Lohmann ; Buttmann ; Pirkl ; Pals ) was performed in order to compare it with the model. The recordings in focus were made on different organs (authentic and modern) and performed by organists with diverse playing experience (organ amateurs and professional organists).

. . Evaluation of beat tracking systems
Beat tracking analysis for organ audio recordings is an extremely challenging task. Because of very reverberant acoustic conditions for each given recording, it is not easy to use an automatic procedure for onset detection . ""Life, Color, Warmth and Truth": Expressive timing in German late Romantic organ music" Per Musi no. , Computing Performed Music: -. e . DOI . / -. . or tempo estimation. Furthermore, there is no standard organ: every organ has its particular stops disposition; the action might be mechanical, pneumatic, electric or electro-pneumatic, as well as different organs do not respond in the same way to a given touch. The developed polyphonic texture of the piece makes it hard to determine the beat "event", when the active melody entrance might be confused with the down-beat, or the beats corresponding to the soft "over legato" articulation might be missing. Thus far, only one scientific research in beat tracking for the organ audio recordings was done (Jerkert ). But it was focused on the analysis of only terse music excerpts from fugues by J. S. Bach, and the beat-tracking procedure was done manually through visual inspection of the spectrogram. That is why a prior evaluation step was made here in order to confirm the appropriateness of selected automated and semi-automated beat tracking systems for German late Romantic organ music.
In order to determine the most appropriate procedure for beat detection, an existing MIDI file (used in the previous section) was played back on a real organ and professionally recorded from the listener's position in the hall so to capture the acoustics of the church. The audio recording of this MIDI was compared to the MIDI track, and MIDI onset times were considered as the "ground truth". According to Milligan and Bailey ( ), the onset detection algorithms, focused on periodicity, are more appropriate for the analysis of instrumental music with unclear soft onsets than the energy-based approaches for the onset detection procedure, commonly used for music with percussive sounds.
Consequently, the initial onset detection was done with Tempogram Toolbox (Grosche and Müller ), based on the general beat-tracking assumption that the beats must occur in a periodic fashion, at least within a certain time window. This algorithm succeeded in detecting the majority of onsets at the th -note level, with some confusions in the fragments with significant local agogical changes. These short fragments were refined manually using the BeatRoot system (Dixon ), displaying the musical data and beats in a graphical interface. BeatRoot was also used to add the onset times on the th -notes level where such note events were present.
Three following sets of data were exported to Matlab and evaluated against the "ground truth" in terms of Precision, Recall and F-measure: ) BR-beats detected initially by BeatRoot (without Tempogram clicks); ) TG-beats detected by Tempogram Toolbox ( th -notes level); ) TGBR-beats detected by Tempogram Toolbox and manually corrected by BeatRoot ( th -notes level).
The tolerance window was set as proposed in McKinney et al. ( ) to the one-fifth of the average "ground truth" inter-onset interval at the th -note level. The summary of the evaluation analysis is given in Table .  The brief analysis of Table shows that BeatRoot has a high Recall value (without false negatives detections) but a very low Precision, which is not acceptable for the current study. Tempogram Toolbox has a higher Precision value (fewer false positives detections) than BeatRoot but still contains some detections outside . ""Life, Color, Warmth and Truth": Expressive timing in German late Romantic organ music" Per Musi no. , Computing Performed Music: -. e . DOI . / -. .  the tolerance window (see Figure ). The manual correction with BeatRoot helped to fix this issue, providing the Precision equal to . with only one inaccurate detection for the whole piece.
Thus, the combination of the Tempogram Toolbox with BeatRoot was proven to be an effective beat tracking system for extracting beats from organ audio files.
All recordings were then analyzed with Tempogram Toolbox and BeatRoot. The data containing beat times at the th -and th -notes level (where present) was imported to Matlab for further processing. The local tempo was calculated at the th -notes level and compared to both symmetric and improved symmetric models. Table shows the outlines of the analysis (all coefficients R were significant at the level p< . ).
According to the analysis, professional organists (Dupont, Lohmann, Buttmann) tend to play more expressive than organ amateurs (Pirkl and Pals): the average values of temporal elasticity are e = . (improved) e = . (symmetric) across the professional organists against the respective values e = . and e = . for . ""Life, Color, Warmth and Truth": Expressive timing in German late Romantic organ music" Per Musi no. , Computing Performed Music: -. e . DOI . / -. .  For both models, improved and symmetric, the average values for professionals R = . and R = . are slightly higher than the respective values for amateurs: R = . and R = . because the professional organists are more likely to be aware of advanced Riemannian phrasing principles (usually taught at the graduate school).
It is also interesting to notice that the highest values for temporal elasticity were obtained for the recordings made on the period instruments with pneumatic action (see Figure ). The most elastic performance with e = . (Lohmann ) was recorded at the Link organ ( ) of the Evangelische Stadtkirche in Giengen an der Brenz, which is "one of the best-preserved instruments of the Reger period, and as such, ideally suited to the realization of the Bavarian composer's music" (Fugatto ). Therefore, it is possible to conclude that the instrument's type plays an important role in the performance expression, and historic organs allow performers the more elastic phrasing.
In order to see the overall model performance in terms of its naturalness, the cross-correlation analysis was made across all audio recordings, MIDI recording and two computed models: the symmetric model with e = . (SM) and the improved symmetric model with e = . (IM). The analysis results are presented in Table . . ""Life, Color, Warmth and Truth": Expressive timing in German late Romantic organ music" Per Musi no. , Computing Performed Music: -. e . DOI . / -. . As expected, the highest correlation among the non-simulated performances was between two professional interpretations of the piece at the organ with the same kind of action (pneumatic): R= . (Lohmann/Buttmann).

Figure :
Average cross-correlation between human performances (blue bars) and between models and human performances (red bars).
The average correlation across the distinct professional recordings (including MIDI) R= . is higher than the correlation between two amateurs R= . and average cross-correlation between amateurs and professionals R= . . The average correlation across all recordings is R= . and R= . for the symmetric and the improved models, respectively (Figure ). As shown in Figure , both correlation coefficients for symmetric and improved symmetric models fall within the interval [ . , . ]; hence the model performance in terms of its naturalness outperforms the average amateur and is close to the professional human interpretation.

. Empirical evaluation of the model
Several studies show that listeners can perceive the expression resulting from the performer's structural interpretation of the piece (Palmer ; Gabrielsson ). Because our Riemannian model relies on the structure hierarchy, it was suggested that the difference in values of the temporal elasticity might result in a different expressive impact. We designed a small listening test to evaluate our model empirically.

. . Methodology
The listening experiment was conducted online with the volunteers, where participants identified themselves as "organ students, working professionals, organ amateurs or enthusiasts", and had "little or no experience with organ or its repertoire". The stimuli were different versions of Max Reger's op. a/ . One version was recorded by a professional human performer, and the other four versions were simulated to contain the artificial expressive timing according to the improved symmetric model with the elasticities values e = . , e = . , e = . , e = . (see Figure ). The model tempo curves were modelled in Matlab using the equation ( ), then obtained tempo values were applied programmatically to the equitemporal MIDI file using the C#/.NET framework DryWetMidi ( ). Model files were played back and recorded from the listener's position at the same instrument and with the same timbre (same organ stops) as the original human recording. This procedure allowed us to eliminate all other possible performance discrepancies except the timing variations.
The web interface presented to the listeners was designed with the Web Audio Evaluation Tool (Jillings et al. ) and included the volume control, audio files, play buttons, as well as sliders with the range -for the response of each stimulus. The question to the listeners was: "Despite the type of emotions felt, please evaluate how expressive each performance is at the scale -(by moving the slider)"; this question was chosen in accordance with the results in Bhatara et al. ( ), where a similar methodology was used. To facilitate the process for the participants, the additional descriptions were attached to the slider: "Not expressive, mechanical" (at position ), "Moderately expressive" (at position ) and "Extremely emotional" (at position ). Users were asked to use the full range of the slider scale and were allowed to listen to each version as many times as they want so to make the final decision. For each user, the order of stimuli shown was randomized.

. . Results
Participants' answers were divided by and imported to SPSS for statistical analysis. The grand mean of listeners' ratings was . (SE= . ), which demonstrated that responses were centred around the scale. Individual means (participants wise) ranged from . to . (SD= . ). A repeated measures one-way ANOVA with a Greenhouse-Geisser correction determined that the effect of temporal elasticity level was significant (F( . , . ) = . , p< . ). As expected, the model version with the highest value of temporal elasticity e = . was rated as the most expressive, and the model with the lowest e = . had the lowest rating; the human performance, corresponding approximately to the e = . (Table ), was rated close to the most elastic model. The linear regression of the mean ratings for the model elasticity values was significant at the . level (R = . ). Figure shows the mean listeners' ratings and the linear trend.
The post-hoc pairwise comparisons with the Turkey HSD test have determined that the difference between ratings for e = . and e = . was significant (p< . ), as well as the difference between rating for the human performance and the least elastic model e = . (p< . ). At the margin of statistical significance was the difference between the ratings for e = . and e = . (p< . ); all other ratings did not significantly differ from each other.

. Discussion
The results of both analytical and empirical evaluation of our model for Riemannian phrasing show that the temporal elasticity is indeed a very effective tool to carry expression. The relatively high (as compared to the other structural timing models) value of the coefficient of determination, especially for the improved symmetric model, demonstrates that the model variance explains up to % of the overall timing variance in human performance. According to the comparative temporal analysis, the model is closer to the professional organists' than to the amateurs' interpretations, which indirectly confirms its stylistic correctness. It is especially remarkable that the correlation model/human performance is comparable with the correlation between distinct professional human interpretations.
An empiric evaluation of the model reveals that the expressivity manifested by the temporal elasticity is comprehensible for the modern listeners, both with or without prior organ experience. The least elastic model was rated as the less expressive or mechanical, followed by the linear upward trend up to the most elastic variant. However, further investigation is needed to explore the correlation between the finer differences in temporal elasticity and listeners' responses.
To further our research, we plan to generalize the proposed model to other Romantic pieces with various structures because the fundamental aspects of Riemannian phrasing theory allow creating models for any arbitrary combination of initial motives provided the general symmetry. The majority of German late Romantic organ pieces do not follow the strict + bars periodic pattern. However, exclusively for this style, it is possible to determine the phrase boundaries denoted by the phrasing slurs in the musical score and build upon them the symmetric hierarchical model using the equation ( ). A major impediment to this process is the first modification for the improved model, which implies the inequality of the weights for different levels. The elaboration of the AI-based algorithm for the automated weighting in our model is reserved for future work.

. Conclusion
Key music concepts defined by Hugo Riemann, one of the most influential theorists of his time, are still of great interest today. In this paper, a mathematical model for Riemannian phrasing was introduced and evaluated, both analytically and empirically, against professional human interpretation. The proposed model has a considerable two-fold implication: it can be used in performance phrasing analysis, as well as in computer simulation of expressive timing for German late Romantic organ music.
Hugo Riemann concludes: "The well-shaped phrasing contributes to the clarity of the motivic and harmonic structure so that it becomes an essential factor for the correct interpretation of the music score; furthermore, the touching, emotional performance would be impossible without the expressive phrasing, which evokes the musical expression of Life, Color, Warmth and Truth" (Riemann , p ) .

. Acknowledgement
This research has been supported by the Fonds de recherche du Québec -Société et culture (FRQSC).

. References
Bhatara . Expressive timing and dynamics in real and artificial musical performances: Using an algorithm as an analytical tool. Music Perception, ( ): -.