End of semester lessons

the value of structure

teaching
code
Author
Published

December 16, 2023

Photo by 2y.kang on Unsplash

The end of the semester is near and code assignments are due. From what I see I can only be proud of the whole class group, with generally well structured projects and clean code.

We live in a world of search and ChatGPT, which make the task of teaching data science at times difficult. There is the obvious proliferation of the poor code snippets, but at the heart of it all this sits the ability to easily search unstructured data.

All students have known a world without effective digital search, with Google launching in ~1998. Search dominates everything from laptop menus and smartphones alike. Incentives to sort or structure data are far and few between in their digital world. Yet, in absence of effective search unstructured data is the curse of anyone who needs to curate it - or when collaborating in a team setting. The assumption that effective search will always be available is also a flawed one.

Current advances in LLM now have the ability to expand search past mere indexing into the domain of context aware classification and prompt-based interrogation of digital documents. For example Bing co-pilot now offers Retrieval-Augmented Generation abilities for free in their search (where search queries “read” webpages and other documents). However, searching through increasing amounts of unstructured data in need of contextualization is predicated on vast amounts of computational power, and does not provide a solution for sharing these searched collections with others.

I’ve always taken the position that small upfront investments in decreasing the cognitive load can vastly improve the ability to search/order data yourself. This is reflected in my data science approaches, providing templates for writing software, managing data and organizing research. These techniques are far from new, as any librarian will tell you. Libraries used to operate on principles of indexed systems. Everything needs its place and an index structure to be easily searched, without search algorithms.

This semester the students managed to expand their skills both in data science coding, but most importantly in my opinion, in managing their projects cleanly in order to more easily collaborate with others (and their future selves). I think the approach suggested was a gentle introduction on why structure matters, without going full Marie Kondo. I hope these skills serve them well.

Reuse

Citation

BibTeX citation:
@online{hufkens2023,
  author = {Hufkens, Koen},
  title = {End of Semester Lessons},
  date = {2023-12-16},
  url = {https://khufkens.com/posts/end-of-semester-and-reflections-on-data-management/},
  langid = {en}
}
For attribution, please cite this work as:
Hufkens, Koen. 2023. “End of Semester Lessons.” December 16, 2023. https://khufkens.com/posts/end-of-semester-and-reflections-on-data-management/.