The following vignette has been developed to outline the recording selection process. This workflow is specific to selecting acoustic recordings for the identification of landbirds in the Northwest Territories, Canada by the Canadian Wildlife Service. This workflow can be modified for your own purposes. V2 is the newly adapted version that makes use of the TrillR App but if you would not like to use the app and do it the old fashioned way see V1.

Clear Workspace

rm(list=ls())

Load Required Packages

# Install TrillR if needed
#devtools::install_github("deanrobertevans/TrillR")
library(TrillR)
library(readxl)
library(suncalc)
library(tidyverse)
library(hms)

Set Working Directory

Your working directory will be the location of all of the recordings you want to select from. It is important that recordings are organized in folders within this working directory. Each folder named after the location of the recordings it contains.

setwd("G:\\Deployed_ARU_Recordings")

Set the Location of sox.exe

The TrillR package uses SoX to deal with .wav file manipulation and generating spectrograms. To use many of the functions in this package your first need to direct the package to the location of the sox.exe. The latest release can be downloaded here https://sourceforge.net/projects/sox/files/sox/.

setsox.exe("C:/Users/deane/Desktop/sox-14.4.2/sox.exe")

Get .wav Data

Use the the get.wavs() function to get all recordings and metadata. You can also get the file durations using getDuration=T, however this can be time consuming. If you choose to get file durations then you can filter out recordings that are shorter than a required length (e.g. 180s or 3 min).

data <- get.wavs(start.date = "2017-06-01",end.date = "2017-06-30", getDuration=F, minduration = 180)
# Save file so you don't have to read files again. Especially if you got file durations.
#write.csv(data,"data.csv")

#head(data)

Read in Location Data and Merge

Location data is important if you want to calculate sun time such as sunset and sunrise for selecting recordings. The mergelocations() function will merge this data for you and alert you of any problems.


locationdata <- read_excel("LV.xlsx")

data <- mergelocations(data,locationdata,locationname = "Location", Latitude = "Latitude", Longitude = "Longitude")

Calculate Sunrise/Sunset

Sunrise and Sunset are both used to categorize recordings of interest for interpretation. In order to calculate these our data needs to have coordinates which we did above. For increased speed we will run this function in parallel.

data <- getSunCalcs(data, calc = c("sunrise","sunset"), doParallel=T)

Categorize Recordings

Next is to put recordings into time categories for selection. The idea is that for each location there will be one randomly selected recording for each category. For our purposes lets create 8 time categories. We will divide the the recordings up into two equal halves of the month of June depending on the number of days of recordings for each location. The first half will be early (‘E’) and the second half will be late (L). Then for each early and late June recordings we will select four time periods: an hour before sunrise to an hour after sunrise, an hour after sunrise to three hours after sunrise, three hours after sunrise to five hours after sunrise, and an hour before sunset to an hour after sunset. This results in 8 categories to define:

  • EE First half of recordings and an hour before sunrise to an hour after sunrise.
  • LE Second half of recordings and an hour before sunrise to an hour after sunrise.
  • EM First half of recordings and an hour after sunrise to three hours after sunrise.
  • LM Second half of recordings and an hour after sunrise to three hours after sunrise.
  • EL First half of recordings and three hours after sunrise to five hours after sunrise.
  • LL Second half of recordings and three hours after sunrise to five hours after sunrise.
  • EN First half of recordings and an hour before sunset to an hour after sunset.
  • LN Second half of recordings and an hour before sunset to an hour after sunset.

To achive this we can generate columns for start and end dates/times that is precalculated.

data <- data %>%  dplyr::group_by(location) %>% mutate(category=NA) %>% 
  mutate(start.date=min(JDay),end.date=ceiling(mean(c(max(JDay),min(JDay)))),start.time=as_hms(sunset-3600),end.time=as_hms(sunset+3600)) %>%
  categorize("EN",start.date,end.date,start.time,end.time)%>%
  mutate(start.date=ceiling(mean(c(min(JDay),max(JDay))))+1,end.date=max(JDay),start.time=as_hms(sunset-3600),end.time=as_hms(sunset+3600)) %>% 
  categorize("LN",start.date,end.date,start.time,end.time)%>%
  mutate(start.date=min(JDay),end.date=ceiling(mean(c(max(JDay),min(JDay)))),start.time=as_hms(sunrise-3600),end.time=as_hms(sunrise+3600))  %>% 
  categorize("EE",start.date,end.date,start.time,end.time) %>%
  mutate(start.date=ceiling(mean(c(min(JDay),max(JDay))))+1,end.date=max(JDay),start.time=as_hms(sunrise-3600),end.time=as_hms(sunrise+3600))  %>% 
  categorize("LE",start.date,end.date,start.time,end.time) %>%
  mutate(start.date=min(JDay),end.date=ceiling(mean(c(max(JDay),min(JDay)))),start.time=as_hms(sunrise+3601),end.time=as_hms(sunrise+(3600*3)))  %>% 
  categorize("EM",start.date,end.date,start.time,end.time) %>%
  mutate(start.date=ceiling(mean(c(min(JDay),max(JDay))))+1,end.date=max(JDay),start.time=as_hms(sunrise+3601),end.time=as_hms(sunrise+(3600*3)))  %>% 
  categorize("LM",start.date,end.date,start.time,end.time) %>%
  mutate(start.date=min(JDay),end.date=ceiling(mean(c(max(JDay),min(JDay)))),start.time=as_hms(sunrise+(3600*3)+1),end.time=as_hms(sunrise+(3600*5)))  %>% 
  categorize("EL",start.date,end.date,start.time,end.time) %>%
  mutate(start.date=ceiling(mean(c(min(JDay),max(JDay))))+1,end.date=max(JDay),start.time=as_hms(sunrise+(3600*3)+1),end.time=as_hms(sunrise+(3600*5)))  %>% 
  categorize("LL",start.date,end.date,start.time,end.time) %>% select(-start.date,-end.date,-start.time,-end.time)


#### Remove NAs for your final data to select from.
dataselection <- data[!is.na(data$category),]

Spectrogram Generation

Now that we have all available recordings categorized we can create spectrograms to assess weather suitability using the TrillR app. This can be done multiple ways but if you have time to spare it is recommended to generate a spectrogram for all selectable recordings to prevent having to multiple rounds of recording selection. Using slice_sample(prop = 1) will randomly order recordings for selection in the TrillR app. The idea is to choose the first suitable recording for interpratation hence the random order even though all available recordings are being sampled.

### Random sample of all eligible recordings (essentially all recordings but reorders them)
sample <- dataselection %>% group_by(location,category) %>% slice_sample(prop = 1)

sox.spectrograms(sample, duration = list(start = 0, end = 180), doParallel = T) # 3 minute spectrogram

The sox.spectrograms() function will output a recording selection file that needs to be input into the TrillR App.

Recording Selection

To do recording selection using the TrillR app follow these steps:

  1. Drag your recordingselection.csv file into the TrillR App or Browse for it using the browse button. The app will check your file for all required columns needed for it to run properly.

  2. If you everything was uploaded correctly you will now be within the main part of the app which will allow you to view and select recordings. For a complete guide on how to use this app please visit the GitHub page here: https://github.com/deanrobertevans/TrillRApp.

  3. Use the app to select your recordings. Recordings are randomly organized so choose the first suitable recording.

  4. Once your selection is done save your selection data using the download button and read it back into R. Check that you have the correct number of recordings. If you are missing selections you may need to repeat the selection process after you have removed already selected recordings.

selectiondata <- read.csv("selectedrecordings.csv", stringsAsFactors = F)
selectiondata <- selectiondata[selectiondata$Selected =="Yes",] ### Just get selected

Clip Selection to Length

After you have determined your recording selection the final step might be to clip those files to length as they can often be much longer than what is needed for transcription.

sox.clips(selectiondata,out.path = file.path(getwd(),"Clipped_Selection"), duration = list(start = 0, end = 180))