Planning ahead

I got a lot of flack on Twitter these days for stating that you better skip post-doc positions when possible. With that I meant that you better start planning ahead, when considering unstable research employment, and aiming for more stable employment of any kind (tenure track or otherwise).

Automated data coverage statistics

Last week I finished the pre-processing code for aligning and screening the COBECORE digitized records. Friday I ran the alignment and classificatoin routine on “format 1” one of the more common data sheet formats in the dataset, which covers the 1950s. Today I processed some of the meta-data produced during the process.

Finding empty cells in tables

I previously outlined how dealing with +70K scans in the COBECORE project presents an inssue when it comes to processing and extracting data. Due to template matching a large part of these issues have been automated away. Yet, even when the data can be extracted one hurdles remains, empty cells in table.

Template matching data tables

With all images scanned and sorted in my COBECORE project the next step involves the transcription of the images into meaningful, machine readable, data. Due to the complexity of the data, such as various handwriting styles in faded or runny ink, automating this process is very difficult. We will therefore aim to crowdsource the transcription of the data. Yet, large tables are difficult to transcribe as the location within a table is of importance, and not only the values. As such, mistakes are more easily made when transcribing tables as a whole.

Pagination


© 2018. All rights reserved.

Powered by Hydejack v7.5.1