Document Type
Research paper
Publication Date
4-6-2024
Abstract
Preparing and teaching realistic data science courses requires labor-intensive preparation and course delivery. It is not enough to download data and push buttons on machine learning tools. First, there must be a human expert available in the problem domain to supply data and evaluate work. Without a human expert to provide information that is missing or incorrect in archived data, the tendency is to take the output of machine learning algorithms using potentially faulty data on faith. Second, any real-world data requires custom scripts for correcting invalid values, creating derived attributes, and formatting data for analysis. Then comes the analysis, which is usually iterative because of incremental discoveries, often requiring additional data, expertise, data preparation, and analysis. This case study outlines four domains of data analysis that have been very useful in teaching and student- oriented research: 1) analyzing Java programming student performance as a function of work habits; 2) analyzing physical and chemical relationships in Pennsylvania stream flow data; 3) analyzing audio files for waveform type and noise levels; and 4) analyzing raptor migration counts in Pennsylvania as a function of climate change.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Proceedings of the 39th Annual Pennsylvania Association of Computer and Information Science Educators Conference (PACISE 2024), Kutztown University of PA, April 5 and 6, 2024.