Santa Barbara City College Course Outline

Department

Subject Area and Course Number

MATH 118

Title Data Science for All

Disciplines

Computer Science (Masters Required)

Mathematics (Masters Required)

Units

4.000

Repeatability 0 - May not be repeated

Catalog Course Description

Introduction to data science using real-world data sets from a variety of disciplines while also presenting inherent uncertainties and issues associated with exploring data. Exposes students to foundational statistical concepts and inferential thinking by learning computation methods in a commonly used programming language such as Python.

Lecture Hours

48.000-54.000 Total Hours

Lab Hours

48.000-54.000 Total Hours

Out-of-Class-Hours

96.000-108.000 Total Hours

Total Contact Hours

96.000-108.000 Total Hours

Prerequisite: MATH 107 or equivalent based on SBCC's Assessment Center placement via multiple measures.
Prerequisite or Corequisite: None
Concurrent Corequisite: None
Course Advisories: None
Limitation on Enrollment: None

Course Objectives:

Use expressions, data types, and basic input/output commands to explore and analyze datasets.

Employ programming concepts of looping and if/else statements for decision making and automation of data analysis.

Write programs that process data via simple data structures such as lists and tables.

Write programs that extract data from tables.

Use and develop functions programmatically to work with lists and tables.

Visualize tabulated data using charts and histograms.

Design and implement programs to simulate experiments.

Establish causality and use randomization when building an experiment.

Analyze the probability of the occurrence of an event.

Estimate using percentiles, bootstrap and confidence intervals.

Apply regression to a data set.

Use problem decomposition, debugging and code design strategies.

Recognize limitations and issues surrounding data analysis in terms of bias, fairness, ethics and privacy.

Examine limitations of prediction and how new data leads to decision changes.

Student Learning Outcomes

Employ foundational programming concepts such as data types, basic data structures such as lists and tables, functions, looping, decision making, and input/output commands to explore and analyze datasets.

Apply foundational data science concepts including extracting data from tables based on specific criteria, computing summary statistics, creating data visualizations, simulating experiments and probability concepts.

Analyze real-world data sets using a modern programming language, problem decomposition, and code design strategies.

Recognize limitations and issues surrounding data analysis in terms of bias, ethics, establishing causality and privacy.

Course Content and Scope

Foundational statistical and computational techniques, and recognizing how and when to apply to real scenarios.

Establishing causality and the use of randomization to build an experiment.

Understanding basic programming concepts such as the use of expressions, names, data types, and arrays.

Applying programming skills with arrays, ranges and tables (sort and select data).

Using data visualization techniques for categorical and numerical data.

Developing functions programmatically to work with tables.

Grouping/classifying data by one variable, including joining tables.

Using iteration (loops) and conditional statements for simulation.

Using tables, and simulation to understand core probability concepts.

Exposure to basic models, testing hypotheses.

Comparing distributions, decisions and uncertainty (Visualization and Data Literacy).

A/B Testing and the significance of results.

Estimating by using percentiles, Bootstrap and Confidence Intervals.

Modeling by using the normal distribution and understanding sample means.

Predicting by using:
1. Correlation and Linear Regression
2. Method of Least Squares
3. Regression Inference
Classifying by using Classifiers:
1. Nearest Neighbor method
2. Training and Testing
3. Rows of Tables, implementing the classifier
Decision making: How new data leads to update our predictions and therefore classifiers.

Methods of Instruction

Discussion

Distance Education

Experiments

Individualized Instruction

Lab

Lecture

Projects

Sample Assignments

Homework on Unemployment Data

The Federal Reserve Bank of St. Louis publishes data about jobs in the US. Below, we've loaded data on unemployment in the United States. There are many ways of defining unemployment, and our dataset includes two notions of the unemployment rate:

Among people who are able to work and are looking for a full-time job, the percentage who can't find a job. This is called the Non-Employment Index, or NEI.
Among people who are able to work and are looking for a full-time job, the percentage who can't find any job or are only working at a part-time job. The latter group is called "Part-Time for Economic Reasons", so the acronym for this index is NEI-PTER. (Economists are great at marketing.)

Using python in a jupyter notebook complete the following:

The data are in a CSV file called unemployment.csv. Load that file into a table called unemployment.
Sort the data in descending order by NEI, naming the sorted table by_nei. Create another table called by_nei_pter that's sorted in descending order by NEI-PTER instead.
Make a table containing the data for the 10 quarters when NEI was greatest. Call that table greatest_nei and sort in descending order of NEI. Note that each row of unemployment represents a quarter.
It's believed that many people became PTER (recall: "Part-Time for Economic Reasons") in the "Great Recession" of 2008-2009. NEI-PTER is the percentage of people who are unemployed (and counted in the NEI) plus the percentage of people who are PTER. Compute an array containing the percentage of people who were PTER in each quarter. (The first element of the array should correspond to the first row of unemployment, and so on.)
Add pter as a column to unemployment (named "PTER") and sort the resulting table by that column in descending order. Call the table by_pter. Try to do this with a single line of code, if you can.
Create a line plot of the PTER over time.

To do this, create a new table called pter_over_time that adds the year array and the pter array to the unemployment table. Label these columns Year and PTER. Then, generate a line plot using one of the table methods you've learned in class.

Were PTER rates high during the Great Recession (that is to say, were PTER rates particularly high in the years 2008 through 2011)? Assign variable highPTER to True if you think PTER rates were high in this period, and False if you think they weren't.

Required Assignments

Appropriate Readings: Students are required to read assigned chapters in text.
Written Assignments: Students are required to complete miscellaneous written and programming assignments on chapter material.
Appropriate Outside Assignments: Students are expected to spend a sufficient amount of time outside of class to practice techniques taught during class time, read assigned materials, and complete written and programming homework assignments. Students are required to spend Three to four hours in the computer lab or online lab environment each week.
Appropriate Assignments that Demonstrate Critical Thinking: Students must demonstrate basic computational and statistical skills.

Method Of Evaluation

1. A series of lab programming assignments and projects 2. Quizzes 3. Written homework assignments 4. Midterm examinations 5. A comprehensive final examination requiring demonstrations of problem solving skills using programming and inferential thinking

Appropriate Texts and Supplies:

Computational and Inferential Thinking: The Foundations of Data Science Ani Adhikari and John DeNero, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), 2020.

Manual:

Computational and Inferential Thinking: The Foundations of Data Science Ani Adhikari and John DeNero, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), 2020-11-20.

Created On 11/20/2020

Board of Trustees: 05/27/2021

CAC Approval: 04/19/2021

Santa Barbara City College Course Outline

MATH 118 - Data Science for All