Selection Workflow V1 • TrillR

The following vignette has been developed to outline the recording selection process. V1 is the first recording selection process if you want a faster way see V2.This workflow is specific to selecting acoustic recordings for the identification of landbirds in the Northwest Territories, Canada by the Canadian Wildlife Service. This can be modified for your own purposes.

Clear Workspace

rm(list=ls())

Load Required Packages

# Install TrillR if needed
#devtools::install_github("deanrobertevans/TrillR")
library(TrillR)
library(readxl)
library(suncalc)
library(tidyverse)
library(hms)

Set Working Directory

Your working directory will be the location of all of the recordings you want to select from. It is important that recordings are organized in folders within this working directory. Each folder named after the location of the recordings it contains.

setwd("G:\\Deployed_ARU_Recordings")

Set the Location of sox.exe

The TrillR package uses SoX to deal with .wav file manipulation and generating spectrograms. To use many of the functions in this package your first need to direct the package to the location of the sox.exe. The latest release can be downloaded here https://sourceforge.net/projects/sox/files/sox/.

setsox.exe("C:/Users/deane/Desktop/sox-14.4.2/sox.exe")

Get .wav Data

Use the the get.wavs() function to get all recordings and metadata. You can also get the file durations using getDuration=T, however this can be time consuming. If you choose to get file durations then you can filter out recordings that are shorter than a required length (e.g. 180s or 3 min).

data <- get.wavs(start.date = "2017-06-01",end.date = "2017-06-30", getDuration=F, minduration = 180)
# Save file so you don't have to read files again. Especially if you got file durations.
#write.csv(data,"data.csv")

#head(data)

Read in Location Data and Merge

Location data is important if you want to calculate sun time such as sunset and sunrise for selecting recordings. The mergelocations() function will merge this data for you and alert you of any problems.


locationdata <- read_excel("LV.xlsx")

data <- mergelocations(data,locationdata,locationname = "Location", Latitude = "Latitude", Longitude = "Longitude")

Calculate Sunrise/Sunset

Sunrise and Sunset are both used to categorize recordings of interest for interpretation. In order to calculate these our data needs to have coordinates which we did above. For increased speed we will run this function in parallel.

data <- getSunCalcs(data, calc = c("sunrise","sunset"), doParallel=T)

Categorize Recordings

Next is to put recordings into time categories for selection. The idea is that for each location there will be one randomly selected recording for each category. For our purposes lets create 8 time categories. We will divide the the recordings up into two equal halves of the month of June depending on the number of days of recordings for each location. The first half will be early (‘E’) and the second half will be late (L). Then for each early and late June recordings we will select four time periods: an hour before sunrise to an hour after sunrise, an hour after sunrise to three hours after sunrise, three hours after sunrise to five hours after sunrise, and an hour before sunset to an hour after sunset. This results in 8 categories to define:

EE First half of recordings and an hour before sunrise to an hour after sunrise.
LE Second half of recordings and an hour before sunrise to an hour after sunrise.
EM First half of recordings and an hour after sunrise to three hours after sunrise.
LM Second half of recordings and an hour after sunrise to three hours after sunrise.
EL First half of recordings and three hours after sunrise to five hours after sunrise.
LL Second half of recordings and three hours after sunrise to five hours after sunrise.
EN First half of recordings and an hour before sunset to an hour after sunset.
LN Second half of recordings and an hour before sunset to an hour after sunset.

To achive this we can generate columns for start and end dates/times that is precalculated.

data <- data %>%  dplyr::group_by(location) %>% mutate(category=NA) %>% 
  mutate(start.date=min(JDay),end.date=ceiling(mean(c(max(JDay),min(JDay)))),start.time=as_hms(sunset-3600),end.time=as_hms(sunset+3600)) %>%
  categorize("EN",start.date,end.date,start.time,end.time)%>%
  mutate(start.date=ceiling(mean(c(min(JDay),max(JDay))))+1,end.date=max(JDay),start.time=as_hms(sunset-3600),end.time=as_hms(sunset+3600)) %>% 
  categorize("LN",start.date,end.date,start.time,end.time)%>%
  mutate(start.date=min(JDay),end.date=ceiling(mean(c(max(JDay),min(JDay)))),start.time=as_hms(sunrise-3600),end.time=as_hms(sunrise+3600))  %>% 
  categorize("EE",start.date,end.date,start.time,end.time) %>%
  mutate(start.date=ceiling(mean(c(min(JDay),max(JDay))))+1,end.date=max(JDay),start.time=as_hms(sunrise-3600),end.time=as_hms(sunrise+3600))  %>% 
  categorize("LE",start.date,end.date,start.time,end.time) %>%
  mutate(start.date=min(JDay),end.date=ceiling(mean(c(max(JDay),min(JDay)))),start.time=as_hms(sunrise+3601),end.time=as_hms(sunrise+(3600*3)))  %>% 
  categorize("EM",start.date,end.date,start.time,end.time) %>%
  mutate(start.date=ceiling(mean(c(min(JDay),max(JDay))))+1,end.date=max(JDay),start.time=as_hms(sunrise+3601),end.time=as_hms(sunrise+(3600*3)))  %>% 
  categorize("LM",start.date,end.date,start.time,end.time) %>%
  mutate(start.date=min(JDay),end.date=ceiling(mean(c(max(JDay),min(JDay)))),start.time=as_hms(sunrise+(3600*3)+1),end.time=as_hms(sunrise+(3600*5)))  %>% 
  categorize("EL",start.date,end.date,start.time,end.time) %>%
  mutate(start.date=ceiling(mean(c(min(JDay),max(JDay))))+1,end.date=max(JDay),start.time=as_hms(sunrise+(3600*3)+1),end.time=as_hms(sunrise+(3600*5)))  %>% 
  categorize("LL",start.date,end.date,start.time,end.time) %>% select(-start.date,-end.date,-start.time,-end.time)


#### Remove NAs for your final data to select from.
dataselection <- data[!is.na(data$category),]

Spectrogram Generation

Now that we have all available recordings categorized we can create spectrograms to assess weather suitability. This can be done multiple ways but it is recommended to do recording selection in rounds. We can make use of the slice_sample(n = 1) to get a random selection of recordings to check weather suitability.

### Random sample of all eligible recordings (essentially all recordings but reorders them)
roundone <- dataselection %>% group_by(location,category) %>% slice_sample(n = 1)

sox.spectrograms(roundone, duration = list(start = 0, end = 180), doParallel = T) # 3 minute spectrogram

The sox.spectrograms() function will output a recording selection file where you will you can choose a selection by changing the “Selected” Column to yes.

Recording Selection

To do recording selection follow these steps:

Open your recordingselection.csv file and if needed sort by spectrogram file name
Find the first recording in the recordingselection.csv and find the corresponding spectrogram and assess it for weather suitablity. If the recording is good change “Selected” from “No” to “Yes”
Continue for all recordings and once complete read in your selection data:

roundoneresults <- read.csv("selectedrecordings.csv", stringsAsFactors = F)
finalselection <- selectiondata[selectiondata$Selected =="Yes",] ### Just get selected add to a final selection dataframe

Remove all of the recordings for round one from your available dataselection data.frame because they are not suitable or have already been selected and you don’t want to select them again.

dataselection <- remove.files(dataselection, roundoneresults)

Remove all the selected recording category/locations from the dataselection data.frame because you already have a recording for that location and category.

dataselection <- remove.selection(dataselection, finalselection)

Now take another sample and create spectrograms for that sample repeating this process until all recordings have been selected.

roundtwo <- dataselection %>% group_by(location,category) %>% slice_sample(n = 1)

sox.spectrograms(roundtwo, duration = list(start = 0, end = 180), doParallel = T) # 3 minute spectrogram

Clip Selection to Length

After you have determined your final recording selection the final step might be to clip those files to length as they can often be much longer than what is needed for transcription.

sox.clips(finalselection,out.path = file.path(getwd(),"Clipped_Selection"), duration = list(start = 0, end = 180))