Title: | Polygons of Bivariate Density Distributions |
---|---|
Description: | With bivariate data, it is possible to calculate 2-dimensional kernel density estimates that return polygons at given levels of probability. 'densityarea' returns these polygons for analysis, including for calculating their area. |
Authors: | Josef Fruehwald [aut, cre, cph] |
Maintainer: | Josef Fruehwald <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0.9001 |
Built: | 2024-11-09 04:08:51 UTC |
Source: | https://github.com/jofrhwld/densityarea |
A convenience function to get just the areas of density polygons.
density_area( x, y, probs = 0.5, as_sf = FALSE, as_list = FALSE, range_mult = 0.25, rangex = NULL, rangey = NULL, ... )
density_area( x, y, probs = 0.5, as_sf = FALSE, as_list = FALSE, range_mult = 0.25, rangex = NULL, rangey = NULL, ... )
x , y
|
Numeric data dimensions |
probs |
Probabilities to compute density polygons for |
as_sf |
Should the returned values be sf::sf? Defaults to |
as_list |
Should the returned value be a list? Defaults to |
range_mult |
A multiplier to the range of |
rangex , rangey
|
Custom ranges across |
... |
Additional arguments to be passed to |
If both rangex
and rangey
are defined, range_mult
will be disregarded.
If only one or the other of rangex
and rangey
are defined, range_mult
will be used to produce the range of the undefined one.
A list of data frames, if as_list=TRUE
, or just a data frame,
if as_list=FALSE
.
If as_sf=FALSE
, the data frame has the following columns:
An integer id for each probability level
The probability level (originally passed to probs
)
The area of the HDR polygon
If as_sf=TRUE
, the data frame has the following columns:
An integer id for each probability level
The probability level (originally passed to probs
)
The sf::st_polygon()
of the HDR
The area of the HDR polygon
library(densityarea) library(dplyr) library(sf) ggplot2_inst <- require(ggplot2) # basic usage set.seed(10) x <- rnorm(100) y <- rnorm(100) density_area(x, y, probs = ppoints(50)) -> poly_areas_df head(poly_areas_df) # Plotting the relationship between probability level and area if(ggplot2_inst){ ggplot(poly_areas_df, aes(prob, area)) + geom_line() } # Tidyverse usage data(s01) ## Data preprocessing s01 |> mutate(log_F2 = -log(F2), log_F1 = -log(F1)) -> s01 ### Data frame output s01 |> group_by(name) |> reframe(density_area(log_F2, log_F1, probs = ppoints(10))) -> s01_areas_df if(ggplot2_inst){ s01_areas_df |> ggplot(aes(prob, area)) + geom_line() } ### Including sf output s01 |> group_by(name) |> reframe(density_area(log_F2, log_F1, probs = ppoints(10), as_sf = TRUE)) |> st_sf() -> s01_areas_sf if(ggplot2_inst){ s01_areas_sf |> arrange(desc(prob)) |> ggplot() + geom_sf(aes(fill = area)) }
library(densityarea) library(dplyr) library(sf) ggplot2_inst <- require(ggplot2) # basic usage set.seed(10) x <- rnorm(100) y <- rnorm(100) density_area(x, y, probs = ppoints(50)) -> poly_areas_df head(poly_areas_df) # Plotting the relationship between probability level and area if(ggplot2_inst){ ggplot(poly_areas_df, aes(prob, area)) + geom_line() } # Tidyverse usage data(s01) ## Data preprocessing s01 |> mutate(log_F2 = -log(F2), log_F1 = -log(F1)) -> s01 ### Data frame output s01 |> group_by(name) |> reframe(density_area(log_F2, log_F1, probs = ppoints(10))) -> s01_areas_df if(ggplot2_inst){ s01_areas_df |> ggplot(aes(prob, area)) + geom_line() } ### Including sf output s01 |> group_by(name) |> reframe(density_area(log_F2, log_F1, probs = ppoints(10), as_sf = TRUE)) |> st_sf() -> s01_areas_sf if(ggplot2_inst){ s01_areas_sf |> arrange(desc(prob)) |> ggplot() + geom_sf(aes(fill = area)) }
Given numeric vectors x
and y
, density_polygons()
will return
a data frame, or list of a data frames, of the polygon defining 2d kernel
densities.
density_polygons( x, y, probs = 0.5, as_sf = FALSE, as_list = FALSE, range_mult = 0.25, rangex = NULL, rangey = NULL, ... )
density_polygons( x, y, probs = 0.5, as_sf = FALSE, as_list = FALSE, range_mult = 0.25, rangex = NULL, rangey = NULL, ... )
x , y
|
Numeric data dimensions |
probs |
Probabilities to compute density polygons for |
as_sf |
Should the returned values be sf::sf? Defaults to |
as_list |
Should the returned value be a list? Defaults to |
range_mult |
A multiplier to the range of |
rangex , rangey
|
Custom ranges across |
... |
Additional arguments to be passed to |
When using density_polygons()
together with dplyr::summarise()
, as_list
should be TRUE
.
If both rangex
and rangey
are defined, range_mult
will be disregarded.
If only one or the other of rangex
and rangey
are defined, range_mult
will be used to produce the range of the undefined one.
A list of data frames, if as_list=TRUE
, or just a data frame,
if as_list=FALSE
.
If as_sf=FALSE
, the data frame has the following columns:
An integer id for each probability level
An integer id for each sub-polygon within a probabilty level
The probability level (originally passed to probs
)
The values along the original x
and y
dimensions defining
the density polygon. These will be renamed to the original input variable
names.
The original plotting order of the polygon points, for convenience.
If as_sf=TRUE
, the data frame has the following columns:
An integer id for each probability level
The probability level (originally passed to probs
)
A column of sf::st_polygon()
s.
This output will need to be passed to sf::st_sf()
to utilize many of the
features of sf.
library(densityarea) library(dplyr) library(purrr) library(sf) ggplot2_inst <- require(ggplot2) tidyr_inst <- require(tidyr) set.seed(10) x <- c(rnorm(100)) y <- c(rnorm(100)) # ordinary data frame output poly_df <- density_polygons(x, y, probs = ppoints(5)) head(poly_df) # It's necessary to specify a grouping factor that combines `level_id` and `id` # for cases of multimodal density distributions if(ggplot2_inst){ ggplot(poly_df, aes(x, y)) + geom_path(aes(group = paste0(level_id, id), color = prob)) } # sf output poly_sf <- density_polygons(x, y, probs = ppoints(5), as_sf = TRUE) head(poly_sf) # `geom_sf()` is from the `{sf}` package. if(ggplot2_inst){ poly_sf |> arrange(desc(prob)) |> ggplot() + geom_sf(aes(fill = prob)) } # Tidyverse usage data(s01) # Data transformation s01 <- s01 |> mutate(log_F1 = -log(F1), log_F2 = -log(F2)) ## Basic usage with `dplyr::reframe()` ### Data frame output s01 |> group_by(name) |> reframe(density_polygons(log_F2, log_F1, probs = ppoints(5))) -> speaker_poly_df if(ggplot2_inst){ speaker_poly_df |> ggplot(aes(log_F2, log_F1)) + geom_path(aes(group = paste0(level_id, id), color = prob)) + coord_fixed() } ### sf output s01 |> group_by(name) |> reframe(density_polygons(log_F2, log_F1, probs = ppoints(5), as_sf = TRUE)) |> st_sf() -> speaker_poly_sf if(ggplot2_inst){ speaker_poly_sf |> ggplot() + geom_sf(aes(color = prob), fill = NA) } ## basic usage with dplyr::summarise() ### data frame output if(tidyr_inst){ s01 |> group_by(name) |> summarise(poly = density_polygons(log_F2, log_F1, probs = ppoints(5), as_list = TRUE)) |> unnest(poly) -> speaker_poly_df } ### sf output if(tidyr_inst){ s01 |> group_by(name) |> summarise(poly = density_polygons( log_F2, log_F1, probs = ppoints(5), as_list = TRUE, as_sf = TRUE )) |> unnest(poly) |> st_sf() -> speaker_poly_sf }
library(densityarea) library(dplyr) library(purrr) library(sf) ggplot2_inst <- require(ggplot2) tidyr_inst <- require(tidyr) set.seed(10) x <- c(rnorm(100)) y <- c(rnorm(100)) # ordinary data frame output poly_df <- density_polygons(x, y, probs = ppoints(5)) head(poly_df) # It's necessary to specify a grouping factor that combines `level_id` and `id` # for cases of multimodal density distributions if(ggplot2_inst){ ggplot(poly_df, aes(x, y)) + geom_path(aes(group = paste0(level_id, id), color = prob)) } # sf output poly_sf <- density_polygons(x, y, probs = ppoints(5), as_sf = TRUE) head(poly_sf) # `geom_sf()` is from the `{sf}` package. if(ggplot2_inst){ poly_sf |> arrange(desc(prob)) |> ggplot() + geom_sf(aes(fill = prob)) } # Tidyverse usage data(s01) # Data transformation s01 <- s01 |> mutate(log_F1 = -log(F1), log_F2 = -log(F2)) ## Basic usage with `dplyr::reframe()` ### Data frame output s01 |> group_by(name) |> reframe(density_polygons(log_F2, log_F1, probs = ppoints(5))) -> speaker_poly_df if(ggplot2_inst){ speaker_poly_df |> ggplot(aes(log_F2, log_F1)) + geom_path(aes(group = paste0(level_id, id), color = prob)) + coord_fixed() } ### sf output s01 |> group_by(name) |> reframe(density_polygons(log_F2, log_F1, probs = ppoints(5), as_sf = TRUE)) |> st_sf() -> speaker_poly_sf if(ggplot2_inst){ speaker_poly_sf |> ggplot() + geom_sf(aes(color = prob), fill = NA) } ## basic usage with dplyr::summarise() ### data frame output if(tidyr_inst){ s01 |> group_by(name) |> summarise(poly = density_polygons(log_F2, log_F1, probs = ppoints(5), as_list = TRUE)) |> unnest(poly) -> speaker_poly_df } ### sf output if(tidyr_inst){ s01 |> group_by(name) |> summarise(poly = density_polygons( log_F2, log_F1, probs = ppoints(5), as_list = TRUE, as_sf = TRUE )) |> unnest(poly) |> st_sf() -> speaker_poly_sf }
This is the vowel space data from a single speaker, s01, whose audio interview and transcription are part of the Buckeye Corpus (Pitt et al. 2007). The transcript was realigned to the audio using the Montreal Forced Aligner (McAullife et al. 2022) and vowel formant data extracted with FAVE (Rosenfelder et al. 2022).
s01
s01
s01
A dataframe with 4,245 rows and 10 columns
Speaker id
Speaker age (y=young, o=old)
Speaker sex
Word in the transcription
Arpabet transcription of the measured vowel
A modified Labov/Trager notation of the measured vowel
An IPA-like transcription of the measured vowel
The measured F1 frequency (Hz)
The measured F2 frequency (Hz)
The measured vowel duration
McAuliffe, M., Fatchurrahman, M. R., GalaxieT, NTT123, Amogh Gulati, Coles, A., Veaux, C., Eren, E., Mishra, H., Paweł Potrykus, Jung, S., Sereda, T., Mestrou, T., Michaelasocolof, & Vannawillerton. (2022). MontrealCorpusTools/Montreal-Forced-Aligner: Version 2.0.1 (v2.0.1) doi:10.5281/ZENODO.6658586
Pitt, M. A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., & Fosler-Lussier, E. (2007). Buckeye Corpus of Conversational Speech (2nd release). Department of Psychology, Ohio State University. https://buckeyecorpus.osu.edu/
Rosenfelder, I., Fruehwald, J., Brickhouse, C., Evanini, K., Seyfarth, S., Gorman, K., Prichard, H., & Yuan, J. (2022). FAVE (Forced Alignment and Vowel Extraction) Program Suite v2.0.0 https://github.com/JoFrhwld/FAVE