Warm winter weather confuses plants

Warm December weather dominates in both the north eastern part of the US as well as large parts of Europe. At both sides of the ocean this warm weather makes plants equally confused.

Many plants are blooming in New York’s botanical gardens and Boston parks. It’s unlikely that most perennial plants will suffer irreparable damage. However, Belgian fruit farmers fear that an untimely frost on the unhardened fruit trees might have serious consequences for both the tree vigour and fruit yields (see movie below).

Plants can withstand frost, however, the continuous warm weather in Belgium has left many trees not acclimated to true winter conditions. A sudden return to normal, freezing, winter conditions could cause frost damage to tissues otherwise protected by a tree’s natural anti-freeze, e.g. sugars in living tissue.

With no real frost days in Belgium the effect of this mild winter might even extend into next year’s spring. Many trees need a certain amount of chilling days, or days which are sufficiently cold / freezing, to trigger a proper leaf-out response next spring. High spring temperatures should move the development of leaves towards earlier start dates. However, a lack of chilling days has shown to delay this expected response to warmer spring temperatures. Warm winter temperatures therefore not only pose an immediate risk, due to sudden freezing of tissue, but have delayed consequences which extend into the next growing season.

(Header image: early leafing Sambucus nigra)

PhenoCam grassland paper accepted

I haven’t made much noise about this but the grassland study I submitted to Nature Climate Change was accepted. I’m working on the final edits to resubmit after Christmas. I’m rather happy with this outcome. For more details I suggest to keep an eye on the Nature Climate Change portal. I will discuss the work in detail after the press moratorium.

With the upcoming data paper, presenting an extensive dataset of curated PhenoCam data, there might be enough data to do more grassland related work as many grassland sites were added, where I was rather data limited before.

Scraping Zooniverse statistics

In order to keep track of the Jungle Rhythms project I wanted some basic summary statistics, as shown on the front page of the project. However, the front end API of the project does not allow these basic statistics to be pulled from a database. Furthermore, fetching all the project data can only be done once a day (to prevent heavy traffic on the database), keeping me from generating these statistics myself. Still, I want to keep track of how classifications and users change across time.

So, I wrote a web scraper in R which I run every half hour. It renders the page using PhantomJS, as it is a dynamic page. It then grabs the resulting html file and puts it through the rvest R package to extract all necessary (time stamped) elements and writes everything to file. It updates a file if it exists. You can find the code (an R function) below.

#' Grab basic zooniverse statistics from the front page of a project
#' @param url: Location of zooniverse project
#' @param file: the name of the output file to export statistics to
#' @param path: location of the phantomjs binary (system specific)
#' @keywords zooniverse, statistics, web scraping
#' @export
#' @examples
#' with defaults returns a file called users.stats.csv
#' for the Jungle Rhythms project
#' zooniverse.info()
#' [requires the rvest package for post-processing]
#' [http://phantomjs.org/download.html]
#' 

zooniverse.info <- function(url="http://www.zooniverse.org/projects/khufkens/jungle-rhythms/home",
                                  file="user.stats.csv",
                                  path="~/your.phanthom.js.location/"){
  
  # read the required libraries
  require(rvest)
  
  # grab current date and time (a time stamp)
  date = format(Sys.Date(),"%Y-%m-%d") 
  time = format(Sys.time(),"%H:%M")
    
  # write out a script phantomjs can process
  # change timeout if the page bounces, seems empty !!!
  writeLines(sprintf("var page = require('webpage').create();
                     page.open('%s', function (status) {
                     if (status !== 'success') {
                     console.log('Unable to load the address!');
                     phantom.exit();
                     } else {
                     window.setTimeout(function () {
                     console.log(page.content);
                     phantom.exit();
                     }, 3000); // Change timeout to render page
                     }
                     });", url), con="scrape.js")

  # process the script with phantomjs / scrapes zooniverse page
  system(sprintf("%s/./phantomjs scrape.js > scrape.html",path),wait=TRUE)
  
  # load the retrieved rendered javascript page
  main = read_html("scrape.html")
  
  # set html element selector (which html fields to retrieve)
  sel = '.project-metadata-stat div'
  
  # process the html file using selection and render as text
  data = html_nodes(main,sel) %>% html_text()
  
  # if data is retrieved, append to user.stats.csv file
  # if this fails, you most likely need more time to render
  # the page (see timeout above)
  if (!identical(data, character(0))){
    
    # kick out description fields and convert to numeric
    data = as.numeric(data[-c(2,4,6,8)]) 
    
    # merge into dataframe
    data = data.frame(date, time, t(data))
    colnames(data) = c('date','time','registerd_users',
                       'classifications','subjects','retired_subjects')
    
    # append stats with the current date and time
    # to an already existing data file
    if (file.exists("user.stats.csv")){
      write.table(data,"user.stats.csv",quote=F,row.names=F,col.names=F,append=T)
    }else{
      write.table(data,"user.stats.csv",quote=F,row.names=F,col.names=T)
    }
  }
  
  # remove html file and javascript
  file.remove("scrape.html")
  file.remove("scrape.js")
}

 

Scraping Ameriflux site info

On the flight home from AGU 2015 I realized that the same code that I used to scrape Zooniverse statistics could easily be changed to grab the site summary data from the Ameriflux LBL page. As with the Zooniverse code, it relies on external PhantomJS binaries.

The function returns a data frame with all scraped data (site names, lat/long, altitude etc…). Errors in the table are due to errors in the original data, not the conversion (mainly start and end dates).

I’ll use this function in combination my Ameriflux download tool to provide easier sub-setting of the data. Keep an eye on my blog for upcoming updates to my Ameriflux download tool.

#' Grabs the ameriflux site table from the LBL site
#' @param url: Location of the Ameriflux site table
#' @param path: location of the phantomjs binary (system specific)
#' @keywords Ameriflux, sites, locations, web scraping
#' @export
#' @examples
#' # with defaults, outputting a data frame
#' df <- ameriflux.info()
#' [requires the rvest package for post-processing]
#' http://phantomjs.org/download.html

ameriflux.info <- function(url="http://ameriflux.lbl.gov/sites/site-list-and-pages/",
                           path="~/my.phantom.js.path/"){
  
  # read the required libraries
  require(rvest)
  
  # subroutines for triming leading spaces
  # and converting factors to numeric
  trim.leading <- function (x)  sub("^\\s+", "", x)
  as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
  
  # write out a script phantomjs can process
  # change timeout if the page bounces, seems empty !!!
  writeLines(sprintf("var page = require('webpage').create();
                     page.open('%s', function (status) {
                     if (status !== 'success') {
                     console.log('Unable to load the address!');
                     phantom.exit();
                     } else {
                     window.setTimeout(function () {
                     console.log(page.content);
                     phantom.exit();
                     }, 3000); // Change timeout to render the page
                     }
                     });", url), con="scrape.js")
  
  # process the script with phantomjs / scrapes zooniverse page
  system(sprintf("%s/./phantomjs scrape.js > scrape.html",path),wait=TRUE)
  
  # load html data
  main = read_html("scrape.html")
  
  # set html element selector for the header
  sel_header = 'thead'
  
  # Extract the header data from the html file
  header = html_nodes(main,sel_header) %>% html_text()
  header = unlist(strsplit(header,"\\n"))
  header = unlist(lapply(header,trim.leading))
  header = header[-which(header == "")]
  
  # set html element selector for the table
  sel_data = 'td'
  
  # process the html file and extract stats
  data = html_nodes(main,sel_data) %>% html_text()
  data = matrix(data,length(data)/length(header),length(header),byrow=TRUE)
  df = data.frame(data)
  colnames(df) = header
  
  # reformat variables into correct formats (not strings)
  # this is ugly, needs cleaning up
  df$SITE_ID = as.character(df$SITE_ID)
  df$SITE_NAME = as.character(df$SITE_NAME)
  df$TOWER_BEGAN = as.numeric.factor(df$TOWER_BEGAN)
  df$TOWER_END = as.numeric.factor(df$TOWER_END)
  df$LOCATION_LAT = as.numeric.factor(df$LOCATION_LAT)
  df$LOCATION_LONG = as.numeric.factor(df$LOCATION_LONG)
  df$LOCATION_ELEV = as.numeric.factor(df$LOCATION_ELEV)
  df$MAT = as.numeric.factor(df$MAT)
  df$MAP = as.numeric.factor(df$MAP)
  
  # drop double entries
  df = unique(df)
  
  # drop first row (empty)
  df = df[-1,]
  
  # remove temporary html file and javascript
  file.remove("scrape.html")
  file.remove("scrape.js")
  
  # return data frame
  return(df)
}

 

One week of classifications

The Jungle Rhythms project is running for one week and classifications are coming in steadily. Currently, over 8,000 images have been classified by only a limited amount of users (218). Unless an army of unregistered users are pushing the effort a lot of credit goes to a relatively small but dedicated set of citizen scientists, rather remarkable. In the figure below you see a consistent steady stream of classifications (almost linear over time - on 18/12/2015). On the x-axis you see the date (+ time), on the y-axis you see the total classification count (top panel) and the number of classifications per day (bottom panel). The red vertical bars denote the new registered users who contributed to the project.

This week is also the week of the AGU conference, a the yearly meeting of geoscience scientists in San Francisco. I presented the Jungle Rhythms project and hope this will draw some attention to the project and potentially gather some more contributors.

 

Pagination


© 2018. All rights reserved.

Powered by Hydejack v7.5.1