Chord spacing and quality: Lessons from timbre research

Although chords are often represented by pitch-class (chroma) content in computational research, chord spacing is often a more salient feature. This paper addresses this disparity between models and cognition by extending the discrete Fourier transform (DFT) theory of chord quality from pitch-classes to pitches. In doing so, we note a structural similarity between music theory’s chord quality and audio engineering’s timbral cepstrum: both are DFTs, performed in the pitch or frequency domains, respectively. We thus treat chord spacing as a hybrid of pitch-class and timbre. To investigate the potential benefits of the DFT on pitch space (P-DFT), we perform two computational experiments. The first explores the P-DFT model theoretically by correlating chord distances calculated with a pitch-class model against those calculated with spacing. The second compares P-DFT estimations of chord distances against listener responses (Kuusi, 2005). Our results show that spacing is a salient feature of chords, and that it can be productively described by timbre-influenced methods.


Introduction
Whether considering chords (Burgoyne et al., 2011), keys (Albrecht & Shanahan, 2013;Temperley & Marvin, 2008), or musical style (Yust, 2019), computational and corpus models of music tend to rely on pitch-classes (PCs, also known as pitch chroma), not pitches or frequencies. Although the assumption of octave equivalence may seem innocuous, given the crucial role of pitch chroma to perception (Krumhansl, 1979), listeners often identify chord similarity more by spacing than by PC content (Kuusi, 2005;Samplaski, 2001). To consider the role of spacing in chord quality, this paper takes advantage of a conceptual similarity between uses of the Discrete Fourier Transform (DFT) on pitch-classes in music theory and on spectra in computational timbre research.
The DFT is an equation that breaks a signal into periodic components. A DFT on a signal returns a set of values f 0-fn: fn (component n) represents a division of the signal's underlying space into n equal parts. If a signal approximates an even division, then the magnitude for that component will be high. Figure 1 shows how DFTs are used in different disciplines. The DFT can be applied to a waveform to yield the pitch spectrum, which is commonly used in timbre research to yield cognitively salient timbre descriptors (Peeters et al., 2011). Shown on the same level, pitch and pitch-class content as used in music theory and pitch-based music cognition are conceptually equivalent in that they both use pitch data.

Figure 1: Summary of Previous DFT Research
Our paper introduces a hitherto unnoticed similarity between the cepstrum-a DFT on the spectrum's DFT data-and music theory's "chord quality," a DFT on pitch-class data (henceforth PC-DFT). The PC-DFT is used by mathematical music theorists to describe chord quality, which emerges from periodicities in pitch-class space (Amiot, 2016;Chiu, in press;Quinn, 2007;Yust, 2015). The cepstrum, the DFT of the spectrum, is also derived from periodicities in pitch data, albeit on all frequencies from the waveform rather than only on fundamentals. The cepstrum is used most often for timbre identification, (Aucoutrier & Pachet, 2004;Herrera-Boyer et al., 2006), and despite its reputation for cognitive opacity, it does correlate with listener judgments of timbre similarity (Aucouturier & Bigand, 2012;Casey et al., 2012;Siedenburg et al., 2016;Terasawa et al., 2005).
Our P-DFT fuses these approaches by considering notated pitches, like the PC-DFT, but avoiding octave equivalence. Such a model was previously proposed by Callender (2007) in a continuous space; we expand on his work with both a more detailed examination of the P-DFT's relation to chord spacing, and a comparison to listener models of chord similarity. We thus study chord spacing using a combination of features from research in both pitch and timbre cognition, continuing in a tradition of investigating pitch-timbre relationships (Allen & Oxenham, 2014;Hasegawa, 2019;Krumhansl & Iverson, 1992;Saariaho, 1987).
To that end, we performed two exploratory computational studies to investigate how the P-DFT's spacing-oriented assessment of chord similarity compares to other measures, in both music-analytical and experimental contexts.

General Method: Calculating and Interpreting P-DFT Components
Both experiments used the same procedure to calculate chords' P-DFTs. First, chords were encoded as normalized characteristic functions in MIDI pitch space. For example, a closed-position C-major chord beginning on middle C would consist of notes C4, E4, and G4, or MIDI values 60, 64, and 67. The corresponding characteristic function would be a 128place vector with the value .33 at positions 60, 64, and 67, and 0 at all other positions. Then, the vectors were zero-padded (128 more zeros were appended) to avoid wrap-around effects.
A DFT was calculated on this vector of 256 values, yielding a new vector of 256 component magnitudes. Due to the Nyquist effect, only the first 128 of these components were non-redundant; the rest were discarded. Distances between chords were calculated as Euclidean distances (Albrecht & Shanahan, 2013;Callender, 2007) between P-DFT vectors normalized in magnitude.
The nth component corresponds to chord spacing of 256/n semitones. For example, an octave is 12 semitones, and 256/12 = 21+⅓, so the corresponding components for the octave-21 and 22-would be high. Because this relationship is inversely proportional, low components correspond to wide spacing, whereas high components indicate specific smaller intervals. This parallels a similar situation in the cepstrum, in which only low components are used for timbre identification, whereas higher ones are more useful for pitch identification (Aucouturier & Pachet, 2004;Muller & Ewert, 2010). Figure 2 shows the P-DFT component magnitudes for the iconic opening chord of Claude Debussy's "Sunken Cathedral" Prelude (henceforth the "Cathedral Chord"). We have circled peaks corresponding to the four-octave space between the two hands, as well as the open spacing, captured by peaks at the fourth, fifth, and tritone average within each hand.

Method
To determine what sorts of musical features are weighed by the P-DFT vs by the PC-DFT, we compared the opening chord of Debussy's "Sunken Cathedral" prelude to 50,000 random chords.
Chords were randomly generated by first selecting the number of pitches to be included (2 to 6), then independently selecting that many notes from the range of a standard grand piano (MIDI values 21 to 108) and removing duplicates. Distances were calculated from each random chord to Debussy's original chord as described above.
We hypothesized that chords similar to (that is, low distance from) Debussy's original chord by PC-DFT would be of a similar PC set-class (that is, related by transposition or inversion accounting for octave equivalence; see Morris, 1987), and that chords similar by P-DFT would have similar spacing on the piano keyboard. Figure 3 plots dissimilarity of P-and PC-DFTs of all random chords, as measured against the Cathedral Chord. Although P-and PC-DFTs are correlated (r = .057, p < .001), there are clearly outliers. By investigating two outliers (selected by their maximal distance from the line extending from minimum to maximum P-and PC-DFT distances), we can see which features most directly affect assessed similarity. Figure 4 compares the P-DFTs of the Cathedral Chord (Orig) and Chord X. Chord X is identical in PC content, and thus has PC-DFT distance 0. However, as an open eleventh, it has a completely different spacing, and thus its P-DFT has completely different components. Figure 5 compares the Cathedral Chord to Chord Y, which is dissimilar in PC-DFT space, but somewhat similar in P-DFT space. It belongs to set-class (013579), which is almost a whole-tone scale and is very different from open fifths or fourths. However, its spacing is quite similar to that of the Cathedral Chord, as shown by shared peaks at components corresponding to wide spacing, octaves, and division of octaves into approximate halves (fifths/sixths).

Discussion
As we predicted, the P-DFT measures some aspect of chord spacing consistent with a subjective impression of Debussy's prelude; the prelude's opening, noted for its These findings show that the P-DFT captures a musically relevant aspect of a chord's quality. Because the Cathedral Chord and Chord Y shared several similar peaks among low components, the P-DFT specifically captures wide-interval spacing. As low components measure timbre in the cepstrum, these components show that the P-DFT's chord spacing is sonically similar to timbre.

Method
In order to verify the model's cognitive salience, we compared P-DFT components with perceptual similarity ratings for chord distances from Kuusi (2005).
Kuusi's experiment explored correlations between subject responses and set-class similarity measures in a post-tonal setting. Her stimuli consisted of 16 chords: 4 different set classes each with 4 different spacings.
Subjects were asked to rate the similarity of all possible pairs of chords on a Likert scale of 1-7. Subject ratings were correlated with set-class similarity measures; the measure yielding the highest correlation with subject responses was CSATSIM (Buchler, 1997) (r = .43). Kuusi's subjects included both trained and less-trained musicians; we considered only the data from trained musicians. In a post-hoc analysis, Kuusi found that the subject-rated similarity between two chords increased with the difference in chord span (distance between the outer two notes of the chord).
Seeing the impact of chord spacing, we hypothesized that the Euclidean Distance between chords' P-DFT components would correlate more with subject data than set-class measures do. Such a finding would support our argument that the P-DFT, in measuring chord spacing, captures a cognitively meaningful aspect of chords erased by octave equivalence.

Discussion
As the P-DFT components describe different elements of chord spacing, we expected distances between P-DFTs to correlate with Kuusi's subject ratings. However, distances between raw P-DFT calculations are still outperformed by some set-class measures, perhaps because subjects use asymmetrical weightings for chord distances: P-DFT distances compare all 128 components equally rather than weighting particular features, whereas listeners might prioritize certain intervals over others. To identify such intervals in the experiment, we isolated and correlated individual component similarities with listener ratings of chord similarity. Figure 6 shows that the 49 components selected by stepwise regression tend to cluster where components are particularly correlated or uncorrelated with listener ratings. Out of the 49 components, 32% corresponded to spacing wider than an octave (between f 0-f20)something that could not be captured by PC measures. Later components clustered around smaller intervals that listeners may have found crucial in chord identification, and which were likely accounted for in PC measures such as CSATSIM. Because this specific combination of wide spacing and specific intervals better predicted correlations with listener data than PC similarity measures did (r = .74), this result implies that our P-DFT model captures subjects' dual reliance on general spacing and on specific interval content.
Kuusi's experiment intended to produce a non-tonal environment, but, depending on the musical context, listeners may rely on different paradigms for evaluating chord proximity. Future experiments should explore this disparity, and the post-hoc exploration from Experiment 2 suggests that the P-DFT might be an effective tool for doing so. Furthermore, as listeners might prioritize certain intervals over others, future computational work might see if adjusted component weights better predicts listener responses, or if pitch-class measures provide additional non-redundant information.

Conclusion
Both of our experiments show that the P-DFT captures aspects of chord spacing that are salient, whether in the historical and music-theoretical reception of a piece or when comparing chords in isolation. As our theoretical framework demonstrates a formal similarity between chord quality and timbre similarity, our results fit with the growing literature on interactions between pitch, harmony, and timbre. Furthermore, by associating certain P-DFT components with specific chord spacings as well as listener similarity, we suggest that cepstrumlike methods may in fact be of potential use in cognitive studies.