Automated data coverage statistics

Last week I finished the pre-processing code for aligning and screening the COBECORE digitized records. Friday I ran the alignment and classificatoin routine on “format 1” one of the more common data sheet formats in the dataset, which covers the 1950s. Today I processed some of the meta-data produced during the process.

Finding empty cells in tables

I previously outlined how dealing with +70K scans in the COBECORE project presents an inssue when it comes to processing and extracting data. Due to template matching a large part of these issues have been automated away. Yet, even when the data can be extracted one hurdles remains, empty cells in table.

Template matching data tables

With all images scanned and sorted in my COBECORE project the next step involves the transcription of the images into meaningful, machine readable, data. Due to the complexity of the data, such as various handwriting styles in faded or runny ink, automating this process is very difficult. We will therefore aim to crowdsource the transcription of the data. Yet, large tables are difficult to transcribe as the location within a table is of importance, and not only the values. As such, mistakes are more easily made when transcribing tables as a whole.

Why I code

MODIS hdf data extraction in R (part 2)

In a previous blog post I describe how to subset MODIS hdf data. However, this was a rather simple example. Today, a graduate student emailed me to help her out with a subsetting problem she had when running the code, or better the lack of an option to extract a region of interest (rather than point data) in the previous example I gave.


© 2018. All rights reserved.

Powered by Hydejack v7.5.1