Santa Barbara City College Course Outline

MATH 118 - Data Science for All

MATH 118
Data Science for All
Disciplines
Computer Science (Masters Required)
Mathematics (Masters Required)
4.000
0 - May not be repeated
Introduction to data science using real-world data sets from a variety of disciplines while also presenting inherent uncertainties and issues associated with exploring data. Exposes students to foundational statistical concepts and inferential thinking by learning computation methods in a commonly used programming language such as Python.
48.000-54.000 Total Hours
48.000-54.000 Total Hours
96.000-108.000 Total Hours
96.000-108.000 Total Hours
Prerequisite: MATH 107 or equivalent based on SBCC's Assessment Center placement via multiple measures.
Prerequisite or Corequisite: None
Concurrent Corequisite: None
Course Advisories: None
Limitation on Enrollment: None
Course Objectives:
Use expressions, data types, and basic input/output commands to explore and analyze datasets.
Employ programming concepts of looping and if/else statements for decision making and automation of data analysis.
Write programs that process data via simple data structures such as lists and tables.
Write programs that extract data from tables.
Use and develop functions programmatically to work with lists and tables.
Visualize tabulated data using charts and histograms.
Design and implement programs to simulate experiments.
Establish causality and use randomization when building an experiment.
Analyze the probability of the occurrence of an event.
Estimate using percentiles, bootstrap and confidence intervals.
Apply regression to a data set.
Use problem decomposition, debugging and code design strategies.
Recognize limitations and issues surrounding data analysis in terms of bias, fairness, ethics and privacy.
Examine limitations of prediction and how new data leads to decision changes.
Student Learning Outcomes
Employ foundational programming concepts such as data types, basic data structures such as lists and tables, functions, looping, decision making, and input/output commands to explore and analyze datasets.
Apply foundational data science concepts including extracting data from tables based on specific criteria, computing summary statistics, creating data visualizations, simulating experiments and probability concepts.
Analyze real-world data sets using a modern programming language, problem decomposition, and code design strategies.
Recognize limitations and issues surrounding data analysis in terms of bias, ethics, establishing causality and privacy.
  1. Foundational statistical and computational techniques, and recognizing how and when to apply to real scenarios.


  2. Establishing causality and the use of randomization to build an experiment.


  3. Understanding basic programming concepts such as the use of expressions, names, data types, and arrays.


  4. Applying programming skills with arrays, ranges and tables (sort and select data).


  5. Using data visualization techniques for categorical and numerical data.


  6. Developing functions programmatically to work with tables.


  7. Grouping/classifying data by one variable, including joining tables.


  8. Using iteration (loops) and conditional statements for simulation.


  9. Using tables, and simulation to understand core probability concepts.


  10. Exposure to basic models, testing hypotheses.


  11. Comparing distributions, decisions and uncertainty (Visualization and Data Literacy).


  12. A/B Testing and the significance of results.


  13. Estimating by using percentiles, Bootstrap and Confidence Intervals.


  14. Modeling by using the normal distribution and understanding sample means.


  15. Predicting by using:

    1. Correlation and Linear Regression


    2. Method of Least Squares


    3. Regression Inference


  16. Classifying by using Classifiers:

    1. Nearest Neighbor method


    2. Training and Testing


    3. Rows of Tables, implementing the classifier


  17. Decision making: How new data leads to update our predictions and therefore classifiers.


Methods of Instruction
Discussion
Distance Education
Experiments
Individualized Instruction
Lab
Lecture
Projects

Homework on Unemployment Data

The Federal Reserve Bank of St. Louis publishes data about jobs in the US. Below, we've loaded data on unemployment in the United States. There are many ways of defining unemployment, and our dataset includes two notions of the unemployment rate:

  1. Among people who are able to work and are looking for a full-time job, the percentage who can't find a job. This is called the Non-Employment Index, or NEI.
  2. Among people who are able to work and are looking for a full-time job, the percentage who can't find any job or are only working at a part-time job. The latter group is called "Part-Time for Economic Reasons", so the acronym for this index is NEI-PTER. (Economists are great at marketing.)

Using python in a jupyter notebook complete the following:

  1. The data are in a CSV file called unemployment.csv. Load that file into a table called unemployment.
  2. Sort the data in descending order by NEI, naming the sorted table by_nei. Create another table called by_nei_pter that's sorted in descending order by NEI-PTER instead.
  3. Make a table containing the data for the 10 quarters when NEI was greatest. Call that table greatest_nei and sort in descending order of NEI. Note that each row of unemployment represents a quarter.
  4. It's believed that many people became PTER (recall: "Part-Time for Economic Reasons") in the "Great Recession" of 2008-2009. NEI-PTER is the percentage of people who are unemployed (and counted in the NEI) plus the percentage of people who are PTER. Compute an array containing the percentage of people who were PTER in each quarter. (The first element of the array should correspond to the first row of unemployment, and so on.)
  5. Add pter as a column to unemployment (named "PTER") and sort the resulting table by that column in descending order. Call the table by_pter. Try to do this with a single line of code, if you can.
  6. Create a line plot of the PTER over time.

To do this, create a new table called pter_over_time that adds the year array and the pter array to the unemployment table. Label these columns Year and PTER. Then, generate a line plot using one of the table methods you've learned in class.

  1. Were PTER rates high during the Great Recession (that is to say, were PTER rates particularly high in the years 2008 through 2011)? Assign variable highPTER to True if you think PTER rates were high in this period, and False if you think they weren't.
  1. Appropriate Readings: Students are required to read assigned chapters in text.
  2. Written Assignments: Students are required to complete miscellaneous written and programming assignments on chapter material. 
  3. Appropriate Outside Assignments: Students are expected to spend a sufficient amount of time outside of class to practice techniques taught during class time, read assigned materials, and complete written and programming homework assignments. Students are required to spend Three to four hours in the computer lab or online lab environment each week. 
  4. Appropriate Assignments that Demonstrate Critical Thinking: Students must demonstrate basic computational and statistical skills. 
1. A series of lab programming assignments and projects 2. Quizzes 3. Written homework assignments 4. Midterm examinations 5. A comprehensive final examination requiring demonstrations of problem solving skills using programming and inferential thinking
    Computational and Inferential Thinking: The Foundations of Data ScienceAni Adhikari and John DeNero, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), 2020
11/20/2020
Board of Trustees: 05/27/2021
CAC Approval: 04/19/2021