scientific legitimacy in publishing

Science and climate science in particular has always been at the center of what, post US election, is being described as fake news. Fake news or “post-truth” (more honestly plain lies) have been shaping the discussion around climate change for years. Over the past years the scale of fake news grew and with it mainstream media outlets lost authority and trust.

This flood of fake news is at it’s core a form of obfuscation. Obfuscation aims to hide a true message or signal by increasing the noise which feeds the same channel. It clutters the news sphere using a false equivalency that all information sources (regardless) merit equal weight. Tactics that dominate science discussions that were fed by fake news and fought in the public news sphere are slowly shifting to the formal academic world of scientific publishing as fake (science) open access journals become more common.

The past few years there has been a push for open access journals. Open access journals rely on academics to pay for the final publishing of the journal article, rather than asking for exorbitant access fees post publication. Although promising in terms of free access to scientific work the push for open access has led to a flourishing business of shady journals; facilitated by the publish or perish culture in academia. As with fake news, fake academic journals and fake science obfuscate valid research results by increasing the number of low quality research publications one has to wade through.

For example the journal “Expert Opinion on Environmental Biology” seems like a respectable if not high flying journal with an impact factor of 4.22 (above average in ecology). However, the devil is in the details as the footnote attached reads:

*Unofficial 2015 Journal Impact Factor was established by dividing the number of articles published in 2013 and 2014 with the number of times they are cited in 2015 based on Google search and the Scholar Citation Index database. If ‘X’ is the total number of articles published in 2013 and 2014, and ‘Y’ is the number of times these articles were cited in indexed journals during 2015 than, impact factor = Y/X

Generally journals use citation indices, or impact factors, to indicate their visibility within the academic community. Proper journals are mostly listed by the Institute of Scientific Information(currently ISI Web of Knowledge) and summarized in a yearly Science Citation Index report. Most fake journals can’t establish these credentials and therefore trick scientists by publishing fake numbers (( More so, when searching on the web for ISI one easily comes across imposters as well. Here the service International Scientific Indexing (or ISIndexing.com, the name is well chosen) provides a service focussed on “… to increase the visibility and ease of use of open access scientific and scholarly journals.” )). Although the journal might still contain valid and good research, the tactics used do not instill trust.

More alarming than the profiteering from desperate scientists who chase metrics and the resulting obfuscation is a recent trend of merger acquisitions of more respected journals by fake academic publishers. Here the tactic is to buy small legitimate journals to intersperse with their lesser variety, borrowing trust. Not only will these mergers make it harder to distinguish good from bad journals, it will also increase chances of low quality peer-review, as solid science was never the motive of these predatory publishers. If this is a new trend the question remains how to safe-guard scientific legitimacy of open access journals and science in general, and what format to use?

I would argue that to solve the issue of shady open access journals we need even more radical openness in science. If one is forced to publish data, and code (if not links to how to obtain the data from 3th party sources) it become easier to separate those with quality research from those containing nothing but random noise.

The time invested in a fake research article becomes significantly larger, discouraging abuse. In addition, it will force people into good data management as ugly code and data structures will reflect badly on the scientist as well. Furthermore, since all pieces of the research are available it will also solve issues regarding reproducibility and inter-comparison of research results. Finally I would argue that similar practices could be used in conventional journalism, reporting all raw data used, sources (if not endangering lives) and statistics (if applicable). Transparency is the only way forward in an age of fake news and science, lack of it should be regarded as suspicious.

swath2grid

Recently I needed to convert swath data to gridded data. Most MODIS products come as gridded products which are properly geo-referenced / rectified. However, some low level products are provided as “swath” data which are the “raw” form when it comes to geo-referencing. Luckily most of these swath products do provide ground control point information to convert them from wobbly sensor output to a gridded geo-referenced image.

This procedure of converting from swath to gridded data is normally done with the MODIS Resampling Tool (MRT) software. Here, I provide a few lines of code which will do just the same using the community driven Geospatial Data Abstraction Library (GDAL). I would argue that the four lines of true code beat installing the MRT tool any day.

The code is a mashup of a stackexchange post and converts MODIS L1B data (or similar) to gridded data, requiring you to specify the file name and the requested scientific data subset (SDS). You can find the available SDS using the gdalinfo command, or using the product information sheet. The data is output as a geotiff.

#!/bin/bash
#
# swath to grid conversion for
# MOD04 data, but will work on other
# MODIS L1 / L2 data as well (I think)

# get the reprojection information, stuff it into
# a virtual file (.vrt)
gdal_translate -of VRT HDF4_EOS:EOS_SWATH:"$1":mod04:$2 modis.vrt

# delete the bad ground control points
sed '/X=\"-9.990000000000E+02\"/d' modis.vrt

# grab the filename without an extension
filename=$(basename "$1")
filename="${filename%.*}"

# reproject the data using the cleaned up VRT file
gdalwarp -overwrite -of GTIFF -tps modis.vrt $filename_$2.tif

 

about complacency on politics

After past week’s election of Donald Trump I’ve decided to break what I consider an unspoken rule in science. Mainly, as a young scientist one does not discuss politics openly, nor take a hard line on issues, as it potentially jeopardizes your academic career.

However, the US elections will have dire consequences for both domestic and global environmental policy and science in general, due to the proposed dismantling of the EPA, overt climate change denial by the president-elect and his anti-science stance. Surprisingly, during the past week the language with regards to Mr. Trump’s victory moved to one of reconciliation, to “acceptance”, giving the Mr. Trump “a chance”.

The ramifications of an unchallenged president-elect and reconciliation will reverberate throughout the globe. Hence, as an ecologist, environmentalist and climate scientist I can not in good conscious stand by, and not take a very strong political but scientifically backed position.

Science and scientists have the obligation to challenge old, and dangerous, ideas. I admit that I’ve failed by not doing as much as I could outside the academic sphere or various social media echo chambers. In this spirit, I will contest misinformation and lies about climate change and environmental issues. As climate change has morphed into an inherently social problem it is also my duty to openly support minorities, poor and the disenfranchised who are the first to suffer from climate change.

I will not concede by giving a man who has shown a lack of scientific knowledge, integrity, transparency and fueled by misogyny and racism “a chance”. I will not lower my voice in years to come, silently accepting the new status quo.

virtualforest.io

Over the summer I created the Virtual Forest project. The aim of the Virtual Forest project is to immersively track the seasons using 360 photography and time lapse movies (once enough data is collected) of a forest in the North Eastern US.

You can track the season in a regular browser, or on a mobile device supporting gyroscope feedback. For a complete immersive experience Virtual Forest supports Google Cardboard and Samsung GearVR. Additional support for VR headsets such as the HTC Vive and Oculus Rift is provided through the A-frame framework.

The project gives everyone the experience of tracking the seasons in a North Eastern deciduous forest through tele-presence. It allows for leaf peeping from a distance and visualizes, otherwise slow, ecological processes using timelapse movies. The images gathered will also support research efforts in understanding the timing of changes to the canopy structure. Above all, I hope Virtual Forest will inspire people to venture outdoors and explore a forest in real life as Virtual Forest remains a proxy for what is a wonderful world.

UPDATE:

virtualforest.io got a nice feature on MotherBoard!

UPDATE 2:

virtualforest.io went viral and hit > 18 000 users in 3 days (and still rising)

UPDATE 3:

The project was featured on HackaDay and the A-frame Blog.

UPDATE 4:

The project was mentioned on the Adafruit Blog.

Parameter estimation in R: a simple stomatal conductance model example

Most modeling approaches use some sort of optimization method to estimate parameters. These parameters are the knobs one has to turn to tune a statistical or mechanistic model exactly right to make it fit the data. Below I give a quick example of parameter estimation (in R) using a stomatal conductance (gs) model.

A collaborator in the lab came to me with the question to quickly introduce him to parameter estimation in R in order to optimize a stomatal conductance model. This is the document and code I used to briefly explain the methodology.

In this example a model calculates stomatal conductance based upon environmental variables and measurements of gmax (or the maximum diffuse stomatal conductance). This model illustrates nicely how one can use R and an optimizer to estimate complex model parameters of non-linear models.

The framework I put together estimates parameters using a Generalized Simulated Annealing optimization R package. However, different optimization packages and methods exist. Some worth mentioning are the default ‘optim’ function from the ‘stats’ package and DiffeRential Evolution Adaptive Metropolis (DREAM). However, I found for smaller projects and quick model development GenSA works really well.

The crude stomatal conductance model as formulated borrows heavily from model structures as described by Jarvis (1976) and White et al. 1999, both succintly described in Damour et al. (2010). The latter paper describes a variety of stomatal conductance models, which could be used with the framework as outlined in this example.

In this model example stomatal conductance is modelled as:

gs = gmax * f(CO2) * f(VPD) * f(PAR)

Here CO2 and VPD (vapour pressure deficit) response curves are modelled using an epxonential equation, while PAR (photosynthetically active radiation) is modelled using Michaelis-Menten kinetics and gmax is measured and calculated as defined by Franks & Beerling (2009).

  • f(CO2) = c1 * e -c2[CO2]

  • f(VPD) = v1 * e -v2[VPD]

  • f(PAR) = ( 2000 * PAR) / (p1 + PAR)

Or condensed and written into an R function this gives:

gs.model = function(par = par, data = data) {

 # put variables in readable format
 vpd <- data$vpd
 co2 <- data$co2
 par_val <- data$par
 gmax <- data$gmax

 # unfold parameters
 # for clarity
 c1 <- par[1]
 c2 <- par[2]
 v2 <- par[3]
 p1 <- par[4]

 # model formulation
 gs <- gmax * c1 * exp(c2 * co2) * exp(v2 * vpd) * (( 2000 * par_val) / (p1 + par_val))

 # return stomatal conductance
 return(gs)
}

Here, the ‘data’ and ‘par’ variables are a data frame and vector including the model drivers and parameters respectively.

The optimization minimizes a cost function which in this example is defined as the root mean squared error (RMSE) between the observed and the predicted values.

# run model and compare to true values
# returns the RMSE
cost.function <- function(data, par) {
 obs <- as.vector(data$COND)
 pred <- as.vector(gs.model(par = par, data = data))
 s < (obs - pred)^2
 RMSE <- sqrt(mean(s, na.rm = TRUE))
 return(RMSE)
}

In the final optimization will iteratively step through the parameter space, running the cost function (which in terms calls the main model), and find an optimal solution to the problem (i.e. minimize the RMSE). Often starting model parameters and limits to the parameter space are required. Defining the limits of your parameter space well can significantly reduce the time needed to converge upon a solution. The main reason is that the model structure on it’s own does not have any sense of the physical reality of the world. If measured values can never be lower than 0 it does not make sense to look for a multiplicative parameter in the negative range of the parameter space.

# starting model parameters
par = c(0.65, -0.002, -0.689, 0.05)

# limits to the parameter space
lower <- c(0,-10,-100,0)
upper <- c(100,0,0,100)

# optimize the model parameters
optim.par = GenSA(
 par = par,
 fn = cost.function,
 data = data,
 lower = lower,
 upper = upper,
 control=list(temperature = 4000)
)

As the data I used in the development of this model aren’t published yet I can’t include any graphs. However, I think that this description can give people a quick introduction in getting started with model development (in R).

References

Damour G., Simonneau T., Cochard H., Urban L. (2010) An overview of models of stomatal conductance at the leaf level. Plant Cell & Environment 2010 33, 1419-38.

Jarvis P.G. (1976) The interpretation of the variations in leaf water potential and stomatal conductance found in canopies in the field. Philosophical Transactions of the Royal Society of London. Series B 273, 593–610.

White D.A., Beadle C., Sands P.J., Worledge D. & Honeysett J.L.(1999) Quantifying the effect of cumulative water stress on sto- matal conductance of Eucalyptus globulus and Eucalyptus nitens: a phenomenological approach. Australian Journal of Plant Physiology 26, 17–27.

Franks, P. J. & Beerling, D. J. (2009) Maximum leaf conductance driven by CO2 effects on stomatal size and density over geologic time. Proceedings of the National Academy of Sciences 106, 10343–10347.

Pagination


© 2018. All rights reserved.

Powered by Hydejack v7.5.1