| Title: | Tools for Tidy Vowel Normalization |
|---|---|
| Description: | An implementation of tidy speaker vowel normalization. This includes generic functions for defining new normalization methods for points, formant tracks, and Discrete Cosine Transform coefficients, as well as convenience functions implementing established normalization methods. References for the implemented methods are: Johnson, Keith (2020) <doi:10.5334/labphon.196> Lobanov, Boris (1971) <doi:10.1121/1.1912396> Nearey, Terrance M. (1978) <https://sites.ualberta.ca/~tnearey/Nearey1978_compressed.pdf> Syrdal, Ann K., and Gopal, H. S. (1986) <doi:10.1121/1.393381> Watt, Dominic, and Fabricius, Anne (2002) <https://www.latl.leeds.ac.uk/article/evaluation-of-a-technique-for-improving-the-mapping-of-multiple-speakers-vowel-spaces-in-the-f1-f2-plane/>. |
| Authors: | Josef Fruehwald [cre, aut, cph] |
| Maintainer: | Josef Fruehwald <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.4.1.9000 |
| Built: | 2026-05-11 13:52:48 UTC |
| Source: | https://github.com/jofrhwld/tidynorm |
Converts bark to Hz
bark_to_hz(bark)bark_to_hz(bark)
bark |
Frequency in Bark |
A vector of Hz scaled values
Traunmüller, H. (1990). Analytical expressions for the tonotopic sensory scale. The Journal of the Acoustical Society of America, 88(1), 97–100. doi:10.1121/1.399849
bark <- seq(1.5, 13, length = 100) hz <- bark_to_hz(bark) plot(bark, hz)bark <- seq(1.5, 13, length = 100) hz <- bark_to_hz(bark) plot(bark, hz)
check_norm() will generate a message with information
about which normalization procedures have been applied to the
data.
check_norm(.data)check_norm(.data)
.data |
A data frame produced by a tidynorm function. |
This only prints an info message.
speaker_norm <- speaker_data |> norm_nearey( F1:F3, .by = speaker, .silent = TRUE ) check_norm(speaker_norm)speaker_norm <- speaker_data |> norm_nearey( F1:F3, .by = speaker, .silent = TRUE ) check_norm(speaker_norm)
The Discrete Cosine Transform basis functions
dct_basis(n, k)dct_basis(n, k)
n |
The length of the basis. |
k |
The number of basis functions. |
This function will generate the DCT basis functions.
A matrix
basis <- dct_basis(100, 5) matplot(basis, type = "l", lty = 1)basis <- dct_basis(100, 5) matplot(basis, type = "l", lty = 1)
Converts Hz to Bark
hz_to_bark(hz)hz_to_bark(hz)
hz |
Frequency in Hz |
A vector of bark scaled values
Traunmüller, H. (1990). Analytical expressions for the tonotopic sensory scale. The Journal of the Acoustical Society of America, 88(1), 97–100. doi:10.1121/1.399849
hz <- seq(150, 2000, length = 100) bark <- hz_to_bark(hz) plot(hz, bark)hz <- seq(150, 2000, length = 100) bark <- hz_to_bark(hz) plot(hz, bark)
Convert Hz to Mel
hz_to_mel(hz, htk = FALSE)hz_to_mel(hz, htk = FALSE)
hz |
Numeric values in Hz |
htk |
Whether or not to use the HTK formula |
This is a direct re-implementation of the hz_to_mel
function from the librosa library.
The default method is to use the method due to Slaney (1998), which is linear below 1000Hz, and logarithmic above.
If htk=TRUE, the method from HTK, due to O'Shaughnessy (1987) is used.
A numeric vector of Mel values
McFee, B., C. Raffel, D. Liang, D. PW Ellis, M. McVicar, E. Battenberg, and O. Nieto. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, pp. 18-25.
O'Shaughnessy, D (1987). Speech communication: human and machine. Addison-Wesley. p. 150. ISBN 978-0-201-16520-3.
Slaney, M. (1998) Auditory Toolbox: A MATLAB Toolbox for Auditory Modeling Work. Technical Report, version 2, Interval Research Corporation.
hz_to_mel(c(500, 1000, 2000, 3000))hz_to_mel(c(500, 1000, 2000, 3000))
The second derivative of the Inverse DCT
idct_accel(y, n = length(y))idct_accel(y, n = length(y))
y |
DCT coefficients |
n |
The desired length of the idct |
Returns the second derivative (acceleration) of the Inverse DCT (see dct for more details).
A vector with the second derivative of the inverse DCT
x <- seq(0, 1, length = 10) y <- 5 + x + (2 * (x^2)) + (-2 * (x^4)) dct_coefs <- dct(y) y_accel <- idct_accel(dct_coefs) plot(y) plot(y_accel)x <- seq(0, 1, length = 10) y <- 5 + x + (2 * (x^2)) + (-2 * (x^4)) dct_coefs <- dct(y) y_accel <- idct_accel(dct_coefs) plot(y) plot(y_accel)
The first derivative of the Inverse DCT
idct_rate(y, n = length(y))idct_rate(y, n = length(y))
y |
DCT coefficients |
n |
The desired length of the idct |
Returns the first derivative (rate of change) of the Inverse DCT (see dct for more details).
A vector with the first derivative of the inverse DCT
x <- seq(0, 1, length = 10) y <- 5 + x + (2 * (x^2)) + (-2 * (x^4)) dct_coefs <- dct(y) y_rate <- idct_rate(dct_coefs) plot(y) plot(y_rate)x <- seq(0, 1, length = 10) y <- 5 + x + (2 * (x^2)) + (-2 * (x^4)) dct_coefs <- dct(y) y_rate <- idct_rate(dct_coefs) plot(y) plot(y_rate)
Convert Mel to Hz
mel_to_hz(mel, htk = FALSE)mel_to_hz(mel, htk = FALSE)
mel |
Numeric values in Hz |
htk |
Whether or not to use the HTK formula |
This is a direct re-implementation of the hz_to_mel
function from the librosa library.
The default method is to use the method due to Slaney (1998), which is linear below 1000Hz, and logarithmic above.
If htk=TRUE, the method from HTK, due to O'Shaughnessy (1987) is used.
A numeric vector of Hz values
McFee, B., C. Raffel, D. Liang, D. PW Ellis, M. McVicar, E. Battenberg, and O. Nieto. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, pp. 18-25.
O'Shaughnessy, D (1987). Speech communication: human and machine. Addison-Wesley. p. 150. ISBN 978-0-201-16520-3.
Slaney, M. (1998) Auditory Toolbox: A MATLAB Toolbox for Auditory Modeling Work. Technical Report, version 2, Interval Research Corporation.
mel_to_hz(c(7.5, 15, 25, 31))mel_to_hz(c(7.5, 15, 25, 31))
Bark Difference Normalize
norm_barkz( .data, ..., .by = NULL, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_bz", .silent = opt("tidynorm.silent") )norm_barkz( .data, ..., .by = NULL, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_bz", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.by |
|
.drop_orig |
Whether or not to drop the original formant data columns. |
.keep_params |
Whether or not to keep the Location ( |
.names |
A |
.silent |
Suppress normalization information messages when running a |
This is a within-token normalization technique. First all formants are converted to Bark (see hz_to_bark), then, within each token, F3 is subtracted from F1 and F2.
A data frame of Bark Difference normalized formant values
Syrdal, A. K., & Gopal, H. S. (1986). A perceptual model of vowel recognition based on the auditory representation of American English vowels. The Journal of the Acoustical Society of America, 79(4), 1086–1100. doi:10.1121/1.393381
library(tidynorm) ggplot2_inst <- require(ggplot2) speaker_data_barkz <- speaker_data |> norm_barkz( F1:F3, .by = speaker, .names = "{.formant}_bz" ) if (ggplot2_inst) { ggplot( speaker_data_barkz, aes( F2_bz, F1_bz, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() }library(tidynorm) ggplot2_inst <- require(ggplot2) speaker_data_barkz <- speaker_data |> norm_barkz( F1:F3, .by = speaker, .names = "{.formant}_bz" ) if (ggplot2_inst) { ggplot( speaker_data_barkz, aes( F2_bz, F1_bz, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() }
Bark Difference DCT Normalization
norm_dct_barkz( .data, ..., .token_id_col, .by = NULL, .param_col = NULL, .drop_orig = FALSE, .names = "{.formant}_bz", .silent = opt("tidynorm.silent") )norm_dct_barkz( .data, ..., .token_id_col, .by = NULL, .param_col = NULL, .drop_orig = FALSE, .names = "{.formant}_bz", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.token_id_col |
|
.by |
|
.param_col |
A column identifying the DCT parameter number. |
.drop_orig |
Should the originally targeted columns be dropped. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
Important: This function assumes that the DCT coefficients were estimated over bark-transformed formant values.
This is a within-token normalization technique. First all formants are converted to Bark (see hz_to_bark), then, within each token, F3 is subtracted from F1 and F2.
A data frame of normalized DCT parameters.
A data frame of Back Difference normalized dct coefficients.
Syrdal, A. K., & Gopal, H. S. (1986). A perceptual model of vowel recognition based on the auditory representation of American English vowels. The Journal of the Acoustical Society of America, 79(4), 1086–1100. doi:10.1121/1.393381
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_dct <- speaker_tracks |> mutate( across(F1:F3, hz_to_bark) ) |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) # Normalize DCT coefficients speaker_dct_norm <- speaker_dct |> norm_dct_barkz( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) # Apply average and apply inverse dct # to plot tracks track_norm_means <- speaker_dct_norm |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_bz"), mean ) ) |> reframe_with_idct( ends_with("_bz"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_bz, F1_bz, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_dct <- speaker_tracks |> mutate( across(F1:F3, hz_to_bark) ) |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) # Normalize DCT coefficients speaker_dct_norm <- speaker_dct |> norm_dct_barkz( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) # Apply average and apply inverse dct # to plot tracks track_norm_means <- speaker_dct_norm |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_bz"), mean ) ) |> reframe_with_idct( ends_with("_bz"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_bz, F1_bz, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Delta F DCT Normalization
norm_dct_deltaF( .data, ..., .token_id_col, .by = NULL, .param_col = NULL, .drop_orig = FALSE, .names = "{.formant}_df", .silent = opt("tidynorm.silent") )norm_dct_deltaF( .data, ..., .token_id_col, .by = NULL, .param_col = NULL, .drop_orig = FALSE, .names = "{.formant}_df", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.token_id_col |
|
.by |
|
.param_col |
A column identifying the DCT parameter number. |
.drop_orig |
Should the originally targeted columns be dropped. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
Where
is the normalized formant
is the formant number
is the token number
A data frame of Delta F normalized DCT coefficients.
Johnson, K. (2020). The F method of vocal tract length normalization for vowels.
Laboratory Phonology: Journal of the Association for Laboratory Phonology, 11(1),
Article 1. doi:10.5334/labphon.196
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_dct <- speaker_tracks |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) # Normalize DCT coefficients speaker_dct_norm <- speaker_dct |> norm_dct_deltaF( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) # Apply average and apply inverse dct # to plot tracks track_norm_means <- speaker_dct_norm |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_df"), mean ) ) |> reframe_with_idct( ends_with("_df"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_df, F1_df, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_dct <- speaker_tracks |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) # Normalize DCT coefficients speaker_dct_norm <- speaker_dct |> norm_dct_deltaF( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) # Apply average and apply inverse dct # to plot tracks track_norm_means <- speaker_dct_norm |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_df"), mean ) ) |> reframe_with_idct( ends_with("_df"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_df, F1_df, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Generic Formant DCT Normalization Procedure
norm_dct_generic( .data, ..., .token_id_col, .by = NULL, .param_col = NULL, .L = 0, .S = 1/sqrt(2), .by_formant = FALSE, .by_token = FALSE, .names = "{.formant}_n", .silent = opt("tidynorm.silent"), .drop_orig = FALSE, .call = caller_env() )norm_dct_generic( .data, ..., .token_id_col, .by = NULL, .param_col = NULL, .L = 0, .S = 1/sqrt(2), .by_formant = FALSE, .by_token = FALSE, .names = "{.formant}_n", .silent = opt("tidynorm.silent"), .drop_orig = FALSE, .call = caller_env() )
.data |
A data frame of formant DCT coefficients |
... |
|
.token_id_col |
|
.by |
|
.param_col |
A column identifying the DCT parameter number. |
.L |
An expression defining the location parameter. See Details for more information. |
.S |
An expression defining the scale parameter. See Details for more information. |
.by_formant |
Whether or not the normalization method is formant intrinsic. |
.by_token |
Whether or not the normalization method is token intrinsic |
.names |
A |
.silent |
Suppress normalization information messages when running a |
.drop_orig |
Should the originally targeted columns be dropped. |
.call |
Used for internal purposes. |
The following norm_dct_* procedures were built on top of
norm_dct_generic().
This will normalize vowel formant data that has already had the Discrete Cosine Transform applied (see dct) with the following procedure:
Location .L and Scale .S expressions will be used to summarize
the zeroth DCT coefficients.
These location and scale will be used to normalize the DCT coefficients.
norm_dct_generic normalizes DCT coefficients directly.
If is the kth DCT coefficient
the normalization procedure is
Rather than requiring users to remember to multiply expressions for
by , this is done by norm_dct_generic itself, to allow greater
parallelism with how norm_generic works.
Note: If you want to scale values by a constant in the normalization,
you'll need to divide the constant by sqrt(2).
The expressions for calculating and can be
passed to .L and .S, respectively. Available values for
these expressions are
.formantThe original formant value
.formant_numThe number of the formant. (e.g. 1 for F1, 2 for F2 etc)
Along with any data columns from your original data.
DCT normalization requires identifying individual tokens, so there must be a column that
uniquely identifies (or, in combination with a .by grouping, uniquely
identifies) each individual token. This column should be passed to
.token_id_col.
A data frame of normalized DCT coefficients.
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_dcts <- track_subset |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .order = 3 ) track_norm <- track_dcts |> norm_dct_generic( F1:F3, .token_id_col = id, .by = speaker, .by_formant = TRUE, .L = median(.formant, na.rm = TRUE), .S = mad(.formant, na.rm = TRUE), .param_col = .param, .drop_orig = TRUE, .names = "{.formant}_mad" ) head(track_norm) full_tracks <- track_norm |> summarise( .by = c(speaker, vowel, .param), across( F1_mad:F3_mad, mean ) ) |> reframe_with_idct( F1_mad:F3_mad, .by = c(speaker, vowel), .param_col = .param ) head(full_tracks) if (ggplot2_inst) { ggplot( full_tracks, aes(F2_mad, F1_mad, color = speaker) ) + geom_path( aes(group = interaction(speaker, vowel)) ) + scale_y_reverse() + scale_x_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_dcts <- track_subset |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .order = 3 ) track_norm <- track_dcts |> norm_dct_generic( F1:F3, .token_id_col = id, .by = speaker, .by_formant = TRUE, .L = median(.formant, na.rm = TRUE), .S = mad(.formant, na.rm = TRUE), .param_col = .param, .drop_orig = TRUE, .names = "{.formant}_mad" ) head(track_norm) full_tracks <- track_norm |> summarise( .by = c(speaker, vowel, .param), across( F1_mad:F3_mad, mean ) ) |> reframe_with_idct( F1_mad:F3_mad, .by = c(speaker, vowel), .param_col = .param ) head(full_tracks) if (ggplot2_inst) { ggplot( full_tracks, aes(F2_mad, F1_mad, color = speaker) ) + geom_path( aes(group = interaction(speaker, vowel)) ) + scale_y_reverse() + scale_x_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Lobanov DCT Normalization
norm_dct_lobanov( .data, ..., .token_id_col, .by = NULL, .param_col = NULL, .names = "{.formant}_z", .silent = opt("tidynorm.silent"), .drop_orig = FALSE )norm_dct_lobanov( .data, ..., .token_id_col, .by = NULL, .param_col = NULL, .names = "{.formant}_z", .silent = opt("tidynorm.silent"), .drop_orig = FALSE )
.data |
A data frame of formant DCT coefficients |
... |
|
.token_id_col |
|
.by |
|
.param_col |
A column identifying the DCT parameter number. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
.drop_orig |
Should the originally targeted columns be dropped. |
Where
is the normalized formant
is the formant number
is the token number
A data frame of Lobanov normalized DCT Coefficients.
Lobanov, B. (1971). Classification of Russian vowels spoken by different listeners. Journal of the Acoustical Society of America, 49, 606–608.
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_dct <- speaker_tracks |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) # Normalize DCT coefficients speaker_dct_norm <- speaker_dct |> norm_dct_lobanov( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) # Apply average and apply inverse dct # to plot tracks track_norm_means <- speaker_dct_norm |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_z"), mean ) ) |> reframe_with_idct( ends_with("_z"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_z, F1_z, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_dct <- speaker_tracks |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) # Normalize DCT coefficients speaker_dct_norm <- speaker_dct |> norm_dct_lobanov( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) # Apply average and apply inverse dct # to plot tracks track_norm_means <- speaker_dct_norm |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_z"), mean ) ) |> reframe_with_idct( ends_with("_z"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_z, F1_z, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Nearey DCT Normalization
norm_dct_nearey( .data, ..., .token_id_col, .by = NULL, .by_formant = FALSE, .param_col = NULL, .drop_orig = FALSE, .names = "{.formant}_lm", .silent = opt("tidynorm.silent") )norm_dct_nearey( .data, ..., .token_id_col, .by = NULL, .by_formant = FALSE, .param_col = NULL, .drop_orig = FALSE, .names = "{.formant}_lm", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.token_id_col |
|
.by |
|
.by_formant |
Whether or not the normalization method is formant intrinsic. |
.param_col |
A column identifying the DCT parameter number. |
.drop_orig |
Should the originally targeted columns be dropped. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
Important: This function assumes that the DCT coefficients were estimated over log-transformed formant values.
When formant extrinsic:
When formant intrinsic:
Where
is the normalized formant
is the formant number
is the token number
A data frame of Nearey normalized DCT coefficients
Nearey, T. M. (1978). Phonetic Feature Systems for Vowels [Ph.D.]. University of Alberta.
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_dct <- speaker_tracks |> mutate( across( F1:F3, log ) ) |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) # Normalize DCT coefficients speaker_dct_norm <- speaker_dct |> norm_dct_nearey( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) # Apply average and apply inverse dct # to plot tracks track_norm_means <- speaker_dct_norm |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_lm"), mean ) ) |> reframe_with_idct( ends_with("_lm"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_lm, F1_lm, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_dct <- speaker_tracks |> mutate( across( F1:F3, log ) ) |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) # Normalize DCT coefficients speaker_dct_norm <- speaker_dct |> norm_dct_nearey( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) # Apply average and apply inverse dct # to plot tracks track_norm_means <- speaker_dct_norm |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_lm"), mean ) ) |> reframe_with_idct( ends_with("_lm"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_lm, F1_lm, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Watt and Fabricius DCT normalization
norm_dct_wattfab( .data, ..., .token_id_col, .by = NULL, .param_col = NULL, .drop_orig = FALSE, .names = "{.formant}_wf", .silent = opt("tidynorm.silent") )norm_dct_wattfab( .data, ..., .token_id_col, .by = NULL, .param_col = NULL, .drop_orig = FALSE, .names = "{.formant}_wf", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.token_id_col |
|
.by |
|
.param_col |
A column identifying the DCT parameter number. |
.drop_orig |
Should the originally targeted columns be dropped. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
This is a modified version of the Watt & Fabricius Method. The original method identified point vowels over which F1 and F2 centroids were calculated. The procedure here just identifies centroids by taking the mean of all formant values.
Where
is the normalized formant
is the formant number
is the token number
A data frame of Watt & Fabricius normalized DCT coefficients.
Watt, D., & Fabricius, A. (2002). Evaluation of a technique for improving the mapping of multiple speakers’ vowel spaces in the F1 ~ F2 plane. Leeds Working Papers in Linguistics and Phonetics, 9, 159–173.
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_dct <- speaker_tracks |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) # Normalize DCT coefficients speaker_dct_norm <- speaker_dct |> norm_dct_wattfab( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) # Apply average and apply inverse dct # to plot tracks track_norm_means <- speaker_dct_norm |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_wf"), mean ) ) |> reframe_with_idct( ends_with("_wf"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_wf, F1_wf, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_dct <- speaker_tracks |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) # Normalize DCT coefficients speaker_dct_norm <- speaker_dct |> norm_dct_wattfab( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) # Apply average and apply inverse dct # to plot tracks track_norm_means <- speaker_dct_norm |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_wf"), mean ) ) |> reframe_with_idct( ends_with("_wf"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_wf, F1_wf, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Delta F Normalize
norm_deltaF( .data, ..., .by = NULL, .by_formant = FALSE, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_df", .silent = opt("tidynorm.silent") )norm_deltaF( .data, ..., .by = NULL, .by_formant = FALSE, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_df", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.by |
|
.by_formant |
Ignored by this procedure |
.drop_orig |
Whether or not to drop the original formant data columns. |
.keep_params |
Whether or not to keep the Location ( |
.names |
A |
.silent |
Suppress normalization information messages when running a |
Where
is the normalized formant
is the formant number
is the token number
A data frame of Delta F normalized formant values.
Johnson, K. (2020). The F method of vocal tract length normalization for vowels.
Laboratory Phonology: Journal of the Association for Laboratory Phonology, 11(1),
Article 1. doi:10.5334/labphon.196
library(tidynorm) ggplot2_inst <- require(ggplot2) speaker_data_deltaF <- speaker_data |> norm_deltaF( F1:F3, .by = speaker, .names = "{.formant}_df" ) if (ggplot2_inst) { ggplot( speaker_data_deltaF, aes( F2_df, F1_df, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() }library(tidynorm) ggplot2_inst <- require(ggplot2) speaker_data_deltaF <- speaker_data |> norm_deltaF( F1:F3, .by = speaker, .names = "{.formant}_df" ) if (ggplot2_inst) { ggplot( speaker_data_deltaF, aes( F2_df, F1_df, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() }
This is a generic normalization procedure with which you can create your own normalization method.
norm_generic( .data, ..., .by = NULL, .by_formant = FALSE, .by_token = FALSE, .L = 0, .S = 1, .pre_trans = identity, .post_trans = identity, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_n", .silent = opt("tidynorm.silent"), .call = caller_env() )norm_generic( .data, ..., .by = NULL, .by_formant = FALSE, .by_token = FALSE, .L = 0, .S = 1, .pre_trans = identity, .post_trans = identity, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_n", .silent = opt("tidynorm.silent"), .call = caller_env() )
.data |
A data frame containing vowel formant data |
... |
|
.by |
|
.by_formant |
Whether or not the normalization method is formant intrinsic. |
.by_token |
Whether or not the normalization method is vowel intrinsic |
.L |
An expression defining the location parameter. See Details for more information. |
.S |
An expression defining the scale parameter. See Details for more information. |
.pre_trans |
A function to apply to formant values before normalization. |
.post_trans |
A function to apply to formant values after normalization. |
.drop_orig |
Whether or not to drop the original formant data columns. |
.keep_params |
Whether or not to keep the Location ( |
.names |
A |
.silent |
Suppress normalization information messages when running a |
.call |
Used for internal purposes. |
The following norm_* procedures are built on top of norm_generic().
All normalization procedures built on norm_generic produce normalized
formant values () by subtracting a location parameter
() and dividing by a scale parameter ().
The expressions for calculating and can be
passed to .L and .S, respectively. Available values for
these expressions are
.formantThe original formant value
.formant_numThe number of the formant. (e.g. 1 for F1, 2 for F2 etc)
Along with any data columns from your original data.
To apply any transformations before or after normalization,
you can pass a function to .pre_trans and .post_trans.
If .by_formant is TRUE, normalization will be formant intrinsic.
If .by_formant is FALSE, normalization will be formant extrinsic.
If .by_token is TRUE, normalization will be token intrinsic.
If .by_token is FALSE, normalization will be token extrinsic.
A data frame of normalized formant values
library(tidynorm) library(dplyr) speaker_data |> norm_generic( F1:F3, .by = speaker, .by_formant = TRUE, .L = median(.formant, na.rm = TRUE), .S = mad(.formant, na.rm = TRUE), .drop_orig = TRUE, .names = "{.formant}_mad" )library(tidynorm) library(dplyr) speaker_data |> norm_generic( F1:F3, .by = speaker, .by_formant = TRUE, .L = median(.formant, na.rm = TRUE), .S = mad(.formant, na.rm = TRUE), .drop_orig = TRUE, .names = "{.formant}_mad" )
Lobanov Normalize
norm_lobanov( .data, ..., .by = NULL, .by_formant = TRUE, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_z", .silent = opt("tidynorm.silent") )norm_lobanov( .data, ..., .by = NULL, .by_formant = TRUE, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_z", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.by |
|
.by_formant |
Ignored by this procedure |
.drop_orig |
Whether or not to drop the original formant data columns. |
.keep_params |
Whether or not to keep the Location ( |
.names |
A |
.silent |
Suppress normalization information messages when running a |
Where
is the normalized formant
is the formant number
is the token number
A data frame of Lobanov normalized formant values.
Lobanov, B. (1971). Classification of Russian vowels spoken by different listeners. Journal of the Acoustical Society of America, 49, 606–608.
library(tidynorm) ggplot2_inst <- require(ggplot2) speaker_data_lobanov <- speaker_data |> norm_lobanov( F1:F3, .by = speaker, .names = "{.formant}_z" ) if (ggplot2_inst) { ggplot( speaker_data_lobanov, aes( F2_z, F1_z, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() }library(tidynorm) ggplot2_inst <- require(ggplot2) speaker_data_lobanov <- speaker_data |> norm_lobanov( F1:F3, .by = speaker, .names = "{.formant}_z" ) if (ggplot2_inst) { ggplot( speaker_data_lobanov, aes( F2_z, F1_z, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() }
Nearey Normalize
norm_nearey( .data, ..., .by = NULL, .by_formant = FALSE, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_lm", .silent = opt("tidynorm.silent") )norm_nearey( .data, ..., .by = NULL, .by_formant = FALSE, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_lm", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.by |
|
.by_formant |
Whether or not the normalization method is formant intrinsic. |
.drop_orig |
Whether or not to drop the original formant data columns. |
.keep_params |
Whether or not to keep the Location ( |
.names |
A |
.silent |
Suppress normalization information messages when running a |
When formant extrinsic:
When formant intrinsic:
Where
is the normalized formant
is the formant number
is the token number
A data frame of Nearey normalized formant values.
Nearey, T. M. (1978). Phonetic Feature Systems for Vowels [Ph.D.]. University of Alberta.
library(tidynorm) ggplot2_inst <- require(ggplot2) speaker_data_nearey <- speaker_data |> norm_nearey( F1:F3, .by = speaker, .by_formant = FALSE, .names = "{.formant}_nearey" ) if (ggplot2_inst) { ggplot( speaker_data_nearey, aes( F2_nearey, F1_nearey, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() + labs( title = "Formant extrinsic" ) } speaker_data_nearey2 <- speaker_data |> norm_nearey( F1:F3, .by = speaker, .by_formant = TRUE, .names = "{.formant}_nearey" ) if (ggplot2_inst) { ggplot( speaker_data_nearey2, aes( F2_nearey, F1_nearey, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() + labs( title = "Formant intrinsic" ) }library(tidynorm) ggplot2_inst <- require(ggplot2) speaker_data_nearey <- speaker_data |> norm_nearey( F1:F3, .by = speaker, .by_formant = FALSE, .names = "{.formant}_nearey" ) if (ggplot2_inst) { ggplot( speaker_data_nearey, aes( F2_nearey, F1_nearey, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() + labs( title = "Formant extrinsic" ) } speaker_data_nearey2 <- speaker_data |> norm_nearey( F1:F3, .by = speaker, .by_formant = TRUE, .names = "{.formant}_nearey" ) if (ggplot2_inst) { ggplot( speaker_data_nearey2, aes( F2_nearey, F1_nearey, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() + labs( title = "Formant intrinsic" ) }
Bark Difference Track Normalization
norm_track_barkz( .data, ..., .token_id_col, .by = NULL, .time_col = NULL, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_bz", .silent = opt("tidynorm.silent") )norm_track_barkz( .data, ..., .token_id_col, .by = NULL, .time_col = NULL, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_bz", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.token_id_col |
|
.by |
|
.time_col |
|
.order |
The number of DCT parameters to use. |
.return_dct |
Whether or not the normalized DCT coefficients themselves should be returned. |
.drop_orig |
Should the originally targeted columns be dropped. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
This is a within-token normalization technique. First all formants are converted to Bark (see hz_to_bark), then, within each token, F3 is subtracted from F1 and F2.
A data frame of either normalized formant tracks, or normalized DCT parameters.
A data frame of Bark difference normalized formant tracks.
Syrdal, A. K., & Gopal, H. S. (1986). A perceptual model of vowel recognition based on the auditory representation of American English vowels. The Journal of the Acoustical Society of America, 79(4), 1086–1100. doi:10.1121/1.393381
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_barkz( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_bz, F1_bz, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_barkz( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE, .return_dct = TRUE, .names = "{.formant}_bz" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_bz"), mean ) ) |> reframe_with_idct( ends_with("_bz"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_bz, F1_bz, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_barkz( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_bz, F1_bz, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_barkz( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE, .return_dct = TRUE, .names = "{.formant}_bz" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_bz"), mean ) ) |> reframe_with_idct( ends_with("_bz"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_bz, F1_bz, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Delta F Track Normalization
norm_track_deltaF( .data, ..., .token_id_col, .by = NULL, .time_col = NULL, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_df", .silent = opt("tidynorm.silent") )norm_track_deltaF( .data, ..., .token_id_col, .by = NULL, .time_col = NULL, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_df", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.token_id_col |
|
.by |
|
.time_col |
|
.order |
The number of DCT parameters to use. |
.return_dct |
Whether or not the normalized DCT coefficients themselves should be returned. |
.drop_orig |
Should the originally targeted columns be dropped. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
Where
is the normalized formant
is the formant number
is the token number
A data frame of Delta F normalized formant tracks.
Johnson, K. (2020). The F method of vocal tract length normalization for vowels.
Laboratory Phonology: Journal of the Association for Laboratory Phonology, 11(1),
Article 1. doi:10.5334/labphon.196
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_deltaF( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_df, F1_df, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_deltaF( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE, .return_dct = TRUE, .names = "{.formant}_df" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_df"), mean ) ) |> reframe_with_idct( ends_with("_df"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_df, F1_df, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_deltaF( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_df, F1_df, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_deltaF( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE, .return_dct = TRUE, .names = "{.formant}_df" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_df"), mean ) ) |> reframe_with_idct( ends_with("_df"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_df, F1_df, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Normalize formant tracks using Discrete Cosine Transform normalization
norm_track_generic( .data, ..., .token_id_col, .by = NULL, .by_formant = FALSE, .by_token = FALSE, .time_col = NULL, .L = 0, .S = 1/sqrt(2), .pre_trans = identity, .post_trans = identity, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_n", .silent = opt("tidynorm.silent"), .call = caller_env() )norm_track_generic( .data, ..., .token_id_col, .by = NULL, .by_formant = FALSE, .by_token = FALSE, .time_col = NULL, .L = 0, .S = 1/sqrt(2), .pre_trans = identity, .post_trans = identity, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_n", .silent = opt("tidynorm.silent"), .call = caller_env() )
.data |
A data frame containing vowel formant data |
... |
|
.token_id_col |
|
.by |
|
.by_formant |
Whether or not the normalization method is formant intrinsic. |
.by_token |
Whether or not the normalization method is token intrinsic |
.time_col |
|
.L |
An expression defining the location parameter. See Details for more information. |
.S |
An expression defining the scale parameter. See Details for more information. |
.pre_trans |
A function to apply to formant values before normalization. |
.post_trans |
A function to apply to formant values after normalization. |
.order |
The number of DCT parameters to use. |
.return_dct |
Whether or not the normalized DCT coefficients themselves should be returned. |
.drop_orig |
Should the originally targeted columns be dropped. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
.call |
Used for internal purposes. |
The following norm_track_* procedures were built on top of
norm_track_generic.
This will normalize vowel formant tracks in the following steps:
Any .pre_trans transformations will be applied to the formant data.
The Discrete Cosine Transform will be applied to the formant data.
Location .L and Scale .S expressions will be used to summarize the zeroth
DCT coefficients.
These location and scale will be used to normalize the DCT coefficients.
If .return_dct = TRUE, these normalized DCT coefficients will be returned.
Otherwise, the Inverse Discrete Cosine Transform will be applied to the
normalized DCT coefficients.
Any .post_trans transformations will be applied.
All normalization procedures built on norm_track_generic work by normalizing
DCT coefficients directly. If is the kth DCT coefficient
the normalization procedure is
Rather than requiring users to remember to multiply expressions for
by , this is done by norm_track_generic itself, to allow greater
parallelism with how norm_generic works.
Note: If you want to scale values by a constant in the normalization,
you'll need to divide the constant by sqrt(2). Post-normalization scaling
(e.g. re-scaling to formant-like values) is probably best handled with a
function passed to .post_trans.
The expressions for calculating and can be
passed to .L and .S, respectively. Available values for
these expressions are
.formantThe original formant value
.formant_numThe number of the formant. (e.g. 1 for F1, 2 for F2 etc)
Along with any data columns from your original data.
Track normalization requires identifying individual tokens, so there must be a column that
uniquely identifies (or, in combination with a .by grouping, uniquely
identifies) each individual token. This column should be passed to
.token_id_col.
The number of DCT coefficients used is defined by .order. The default
value is 5. Larger numbers will lead to less smoothing, and smaller numbers
will lead to more smoothing.
A data frame of normalized formant tracks.
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_generic( F1:F3, .by = speaker, .token_id_col = id, .by_formant = TRUE, .L = median(.formant, na.rm = TRUE), .S = mad(.formant, na.rm = TRUE), .time_col = t, .drop_orig = TRUE, .names = "{.formant}_mad" ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_mad, F1_mad, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_generic( F1:F3, .by = speaker, .token_id_col = id, .by_formant = TRUE, .L = median(.formant, na.rm = TRUE), .S = mad(.formant, na.rm = TRUE), .time_col = t, .drop_orig = TRUE, .return_dct = TRUE, .names = "{.formant}_mad" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_mad"), mean ) ) |> reframe_with_idct( ends_with("_mad"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_mad, F1_mad, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_generic( F1:F3, .by = speaker, .token_id_col = id, .by_formant = TRUE, .L = median(.formant, na.rm = TRUE), .S = mad(.formant, na.rm = TRUE), .time_col = t, .drop_orig = TRUE, .names = "{.formant}_mad" ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_mad, F1_mad, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_generic( F1:F3, .by = speaker, .token_id_col = id, .by_formant = TRUE, .L = median(.formant, na.rm = TRUE), .S = mad(.formant, na.rm = TRUE), .time_col = t, .drop_orig = TRUE, .return_dct = TRUE, .names = "{.formant}_mad" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_mad"), mean ) ) |> reframe_with_idct( ends_with("_mad"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_mad, F1_mad, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Lobanov Track Normalization
norm_track_lobanov( .data, ..., .token_id_col, .by = NULL, .time_col = NULL, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_z", .silent = opt("tidynorm.silent") )norm_track_lobanov( .data, ..., .token_id_col, .by = NULL, .time_col = NULL, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_z", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.token_id_col |
|
.by |
|
.time_col |
|
.order |
The number of DCT parameters to use. |
.return_dct |
Whether or not the normalized DCT coefficients themselves should be returned. |
.drop_orig |
Should the originally targeted columns be dropped. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
Where
is the normalized formant
is the formant number
is the token number
A data frame of Lobanov normalized formant tracks.
Lobanov, B. (1971). Classification of Russian vowels spoken by different listeners. Journal of the Acoustical Society of America, 49, 606–608.
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_lobanov( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_z, F1_z, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_lobanov( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .return_dct = TRUE, .drop_orig = TRUE, .names = "{.formant}_z" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_z"), mean ) ) |> reframe_with_idct( ends_with("_z"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_z, F1_z, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_lobanov( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_z, F1_z, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_lobanov( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .return_dct = TRUE, .drop_orig = TRUE, .names = "{.formant}_z" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_z"), mean ) ) |> reframe_with_idct( ends_with("_z"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_z, F1_z, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Nearey Track Normalization
norm_track_nearey( .data, ..., .token_id_col, .by = NULL, .by_formant = FALSE, .time_col = NULL, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_lm", .silent = opt("tidynorm.silent") )norm_track_nearey( .data, ..., .token_id_col, .by = NULL, .by_formant = FALSE, .time_col = NULL, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_lm", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.token_id_col |
|
.by |
|
.by_formant |
Whether or not the normalization method is formant intrinsic. |
.time_col |
|
.order |
The number of DCT parameters to use. |
.return_dct |
Whether or not the normalized DCT coefficients themselves should be returned. |
.drop_orig |
Should the originally targeted columns be dropped. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
When formant extrinsic:
When formant intrinsic:
Where
is the normalized formant
is the formant number
is the token number
A data frame of Nearey normalized formant tracks.
Nearey, T. M. (1978). Phonetic Feature Systems for Vowels [Ph.D.]. University of Alberta.
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_nearey( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .by_formant = TRUE, .drop_orig = TRUE ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_lm, F1_lm, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_nearey( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .by_formant = FALSE, .drop_orig = TRUE, .return_dct = TRUE, .names = "{.formant}_lm" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_lm"), mean ) ) |> reframe_with_idct( ends_with("_lm"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_lm, F1_lm, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_nearey( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .by_formant = TRUE, .drop_orig = TRUE ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_lm, F1_lm, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_nearey( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .by_formant = FALSE, .drop_orig = TRUE, .return_dct = TRUE, .names = "{.formant}_lm" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_lm"), mean ) ) |> reframe_with_idct( ends_with("_lm"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_lm, F1_lm, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Watt and Fabricius Track normalization
norm_track_wattfab( .data, ..., .token_id_col, .by = NULL, .time_col = NULL, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_wf", .silent = opt("tidynorm.silent") )norm_track_wattfab( .data, ..., .token_id_col, .by = NULL, .time_col = NULL, .order = 5, .return_dct = FALSE, .drop_orig = FALSE, .names = "{.formant}_wf", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.token_id_col |
|
.by |
|
.time_col |
|
.order |
The number of DCT parameters to use. |
.return_dct |
Whether or not the normalized DCT coefficients themselves should be returned. |
.drop_orig |
Should the originally targeted columns be dropped. |
.names |
A |
.silent |
Suppress normalization information messages when running a |
This is a modified version of the Watt & Fabricius Method. The original method identified point vowels over which F1 and F2 centroids were calculated. The procedure here just identifies centroids by taking the mean of all formant values.
Where
is the normalized formant
is the formant number
is the token number
A data frame of Watt & Fabricius normalized formant tracks.
Watt, D., & Fabricius, A. (2002). Evaluation of a technique for improving the mapping of multiple speakers’ vowel spaces in the F1 ~ F2 plane. Leeds Working Papers in Linguistics and Phonetics, 9, 159–173.
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_wattfab( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_wf, F1_wf, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_wattfab( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE, .return_dct = TRUE, .names = "{.formant}_wf" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_wf"), mean ) ) |> reframe_with_idct( ends_with("_wf"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_wf, F1_wf, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) track_subset <- speaker_tracks |> filter( .by = c(speaker, id), if_all( F1:F3, .fns = \(x) mean(is.finite(x)) > 0.9 ), row_number() %% 2 == 1 ) track_norm <- track_subset |> norm_track_wattfab( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE ) if (ggplot2_inst) { track_norm |> ggplot( aes(F2_wf, F1_wf, color = speaker) ) + stat_density_2d(bins = 4) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() } # returning the DCT coefficients track_norm_dct <- track_subset |> norm_track_wattfab( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .drop_orig = TRUE, .return_dct = TRUE, .names = "{.formant}_wf" ) track_norm_means <- track_norm_dct |> summarise( .by = c(speaker, vowel, .param), across( ends_with("_wf"), mean ) ) |> reframe_with_idct( ends_with("_wf"), .by = speaker, .token_id_col = vowel, .param_col = .param ) if (ggplot2_inst) { track_norm_means |> ggplot( aes(F2_wf, F1_wf, color = speaker) ) + geom_path( aes( group = interaction(speaker, vowel) ) ) + scale_x_reverse() + scale_y_reverse() + scale_color_brewer(palette = "Dark2") + coord_fixed() }
Watt & Fabricius Normalize
norm_wattfab( .data, ..., .by = NULL, .by_formant = TRUE, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_wf", .silent = opt("tidynorm.silent") )norm_wattfab( .data, ..., .by = NULL, .by_formant = TRUE, .drop_orig = FALSE, .keep_params = FALSE, .names = "{.formant}_wf", .silent = opt("tidynorm.silent") )
.data |
A data frame containing vowel formant data |
... |
|
.by |
|
.by_formant |
Ignored by this procedure |
.drop_orig |
Whether or not to drop the original formant data columns. |
.keep_params |
Whether or not to keep the Location ( |
.names |
A |
.silent |
Suppress normalization information messages when running a |
This is a modified version of the Watt & Fabricius Method. The original method identified point vowels over which F1 and F2 centroids were calculated. The procedure here just identifies centroids by taking the mean of all formant values.
Where
is the normalized formant
is the formant number
is the token number
A data fame of Watt & Fabricius normalized formant values.
Watt, D., & Fabricius, A. (2002). Evaluation of a technique for improving the mapping of multiple speakers’ vowel spaces in the F1 ~ F2 plane. Leeds Working Papers in Linguistics and Phonetics, 9, 159–173.
library(tidynorm) ggplot2_inst <- require(ggplot2) speaker_data_wattfab <- speaker_data |> norm_wattfab( F1:F3, .by = speaker, .names = "{.formant}_wf" ) if (ggplot2_inst) { ggplot( speaker_data_wattfab, aes( F2_wf, F1_wf, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() }library(tidynorm) ggplot2_inst <- require(ggplot2) speaker_data_wattfab <- speaker_data |> norm_wattfab( F1:F3, .by = speaker, .names = "{.formant}_wf" ) if (ggplot2_inst) { ggplot( speaker_data_wattfab, aes( F2_wf, F1_wf, color = speaker ) ) + stat_density_2d( bins = 4 ) + scale_color_brewer( palette = "Dark2" ) + scale_x_reverse() + scale_y_reverse() + coord_fixed() }
Options to control the verbosity of tidynorm functions. The convenience function tidynorm_options() will set these options within your current session. For a common behavior across R sessions, set the environment variables as described below. If you've silenced informational messages and want to double check what normalization steps have been taken, use check_norm().
Option values specific to tidynorm can be
accessed by passing the package name to env.
options::opts(env = "tidynorm") options::opt(x, default, env = "tidynorm")
Suppress normalization information messages when running a norm_*() function.
FALSE
tidynorm.silent
R_TIDYNORM_TIDYNORM_SILENT (evaluated if possible, raw string otherwise)
Print warnings from tidynorm functions.
TRUE
tidynorm.warnings
R_TIDYNORM_TIDYNORM_WARNINGS (evaluated if possible, raw string otherwise)
options getOption Sys.setenv Sys.getenv
Reframe data columns using the Discrete Cosine Transform
reframe_with_dct( .data, ..., .token_id_col = NULL, .by = NULL, .time_col = NULL, .order = 5 )reframe_with_dct( .data, ..., .token_id_col = NULL, .by = NULL, .time_col = NULL, .order = 5 )
.data |
A data frame |
... |
|
.token_id_col |
|
.by |
|
.time_col |
A time column. |
.order |
The number of DCT parameters to return. If |
This function will tidily apply the Discrete Cosine Transform with forward normalization (see dct for more info) to the targeted columns.
The DCT only works on a by-token basis, so there must be a column that
uniquely identifies (or, in combination with a .by grouping, uniquely
identifies) each individual token. This column should be passed to
.token_id_col.
The number of DCT coefficients to return is defined by .order. The default
value is 5. Larger numbers will lead to less smoothing when the Inverse
DCT is applied (see idct). Smaller numbers will lead to more smoothing.
If NA is passed to .order, all DCT parameters will be returned, which
when the Inverse DCT is supplied, will completely reconstruct the original
data.
An optional .time_col can also be defined to ensure that the data is
correctly arranged by time.
A data frame with with the targeted DCT coefficients, along with two additional columns
The nth DCT coefficient number
The number of original data values
library(tidynorm) library(dplyr) speaker_small <- filter( speaker_tracks, id == 0 ) speaker_dct <- reframe_with_dct( speaker_small, F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) head( speaker_dct )library(tidynorm) library(dplyr) speaker_small <- filter( speaker_tracks, id == 0 ) speaker_dct <- reframe_with_dct( speaker_small, F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) head( speaker_dct )
Apply a DCT Smooth to the targeted data
reframe_with_dct_smooth( .data, ..., .token_id_col, .by = NULL, .time_col = NULL, .order = 5, .rate = FALSE, .accel = FALSE )reframe_with_dct_smooth( .data, ..., .token_id_col, .by = NULL, .time_col = NULL, .order = 5, .rate = FALSE, .accel = FALSE )
.data |
A data frame |
... |
|
.token_id_col |
|
.by |
|
.time_col |
A time column. |
.order |
The number of DCT parameters to return. If |
.rate |
Whether or not to include the rate of change of signal. |
.accel |
Whether or not to include acceleration of signal. |
This is roughly equivalent to applying reframe_with_dct followed by
reframe_with_idct. As long as the value passed to .order is less than
the length of the each token's data, this will result in a smoothed version
of the data.
The DCT only works on a by-token basis, so there must be a column that
uniquely identifies (or, in combination with a .by grouping, uniquely
identifies) each individual token. This column should be passed to
.token_id_col.
The number of DCT coefficients to return is defined by .order. The default
value is 5. Larger numbers will lead to less smoothing when the Inverse
DCT is applied (see idct). Smaller numbers will lead to more smoothing.
If NA is passed to .order, all DCT parameters will be returned, which
when the Inverse DCT is supplied, will completely reconstruct the original
data.
An optional .time_col can also be defined to ensure that the data is
correctly arranged by time.
Additionally, if .time_col is provided, the original time column will
be included in the output
A data frame where the target columns have been smoothed using the DCT, as well as the signal rate of change and acceleration, if requested.
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_small <- filter( speaker_tracks, id == 0 ) speaker_dct_smooth <- speaker_small |> reframe_with_dct_smooth( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .order = 5 ) if (ggplot2_inst) { speaker_small |> ggplot( aes( t, F1 ) ) + geom_point() + facet_wrap( ~speaker, scales = "free_x", ncol = 1 ) + labs( title = "Original Data" ) } if (ggplot2_inst) { speaker_dct_smooth |> ggplot( aes( t, F1 ) ) + geom_point() + facet_wrap( ~speaker, scales = "free_x", ncol = 1 ) + labs( title = "Smoothed Data" ) }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_small <- filter( speaker_tracks, id == 0 ) speaker_dct_smooth <- speaker_small |> reframe_with_dct_smooth( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .order = 5 ) if (ggplot2_inst) { speaker_small |> ggplot( aes( t, F1 ) ) + geom_point() + facet_wrap( ~speaker, scales = "free_x", ncol = 1 ) + labs( title = "Original Data" ) } if (ggplot2_inst) { speaker_dct_smooth |> ggplot( aes( t, F1 ) ) + geom_point() + facet_wrap( ~speaker, scales = "free_x", ncol = 1 ) + labs( title = "Smoothed Data" ) }
Reframe data columns using the Inverse Discrete Cosine Transform
reframe_with_idct( .data, ..., .token_id_col = NULL, .by = NULL, .param_col = NULL, .n = 20, .rate = FALSE, .accel = FALSE )reframe_with_idct( .data, ..., .token_id_col = NULL, .by = NULL, .param_col = NULL, .n = 20, .rate = FALSE, .accel = FALSE )
.data |
A data frame |
... |
|
.token_id_col |
|
.by |
|
.param_col |
A column identifying the DCT parameter number |
.n |
The size of the outcome of the IDCT |
.rate |
Whether or not to include the rate of change of signal. |
.accel |
Whether or not to include acceleration of signal. |
This will apply the Inverse Discrete Cosine Transform to the targeted columns. See idct.
The IDCT only works on a by-token basis, so there must be a column that
uniquely identifies (or, in combination with a .by grouping, uniquely
identifies) each individual token. This column should be passed to
.token_id_col.
The output of the IDCT can be arbitrarily long as defined by the .n
argument. .n can either be an integer, or an unqoted data column.
The order of the DCT parameters is crucially important. The optional
.param_col will ensure the data is properly arranged.
A data frame with the IDCT of the targeted columns along with an
additional .time column.
A column from 1 to .n by token
library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_small <- filter( speaker_tracks, id == 0 ) speaker_dct <- speaker_small |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .order = 5 ) speaker_idct <- speaker_dct |> reframe_with_idct( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param, .n = 20 ) if (ggplot2_inst) { speaker_small |> mutate( .by = c(speaker, id), time_index = row_number() ) |> ggplot( aes( time_index, F1 ) ) + geom_point() + labs( title = "Original Data" ) } if (ggplot2_inst) { speaker_idct |> ggplot( aes( .time, F1 ) ) + geom_point() + labs( title = "DCT Smooth Data" ) }library(tidynorm) library(dplyr) ggplot2_inst <- require(ggplot2) speaker_small <- filter( speaker_tracks, id == 0 ) speaker_dct <- speaker_small |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t, .order = 5 ) speaker_idct <- speaker_dct |> reframe_with_idct( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param, .n = 20 ) if (ggplot2_inst) { speaker_small |> mutate( .by = c(speaker, id), time_index = row_number() ) |> ggplot( aes( time_index, F1 ) ) + geom_point() + labs( title = "Original Data" ) } if (ggplot2_inst) { speaker_idct |> ggplot( aes( .time, F1 ) ) + geom_point() + labs( title = "DCT Smooth Data" ) }
Speaker Data
speaker_dataspeaker_data
speaker_dataA data frame with 10,697 rows and 8 columns
Speaker ID column
CMU Dictionary vowel class
Modified Labov-Trager vowel class
IPA-like vowel class
Word that the vowel appeared in
The first, second and third formants, in Hz
Speaker Tracks
speaker_tracksspeaker_tracks
speaker_tracksA data frame with 20,000 rows and 9 columns
Speaker ID column
Within speaker id for each token
CMU Dictionary vowel class
Modified Labov-Trager vowel class
IPA-like vowel class
Word that the vowel appeared in
Measurement time point
The first, second and third formants, in Hz
Set tidynorm verbosity
tidynorm_options( .silent = opt("tidynorm.silent"), .warnings = opt("tidynorm.warnings") )tidynorm_options( .silent = opt("tidynorm.silent"), .warnings = opt("tidynorm.warnings") )
.silent |
Suppress normalization information messages when running a |
.warnings |
Print warnings from tidynorm functions. (Defaults to |
tidynorm_options(.silent = TRUE, .warnings = FALSE) speaker_data |> norm_generic(F1:F3) -> norm1 tidynorm_options(.silent = FALSE, .warnings = TRUE) speaker_data |> norm_generic(F1:F3) -> norm2tidynorm_options(.silent = TRUE, .warnings = FALSE) speaker_data |> norm_generic(F1:F3) -> norm1 tidynorm_options(.silent = FALSE, .warnings = TRUE) speaker_data |> norm_generic(F1:F3) -> norm2