Harvard University Northwind Data Mining and Statistical Analysis Project
Description
Having Trouble Meeting Your Deadline?
Get your assignment on Harvard University Northwind Data Mining and Statistical Analysis Project completed on time. avoid delay and – ORDER NOW
Option #1: Northwind Data Mining and Statistical Analysis Project Planning
The objective of this Portfolio Project is mining data from a data warehouse, which contains data from the Northwind database that was constructed during your installation of PostgreSQL.
Below are the summarized tasks for this Portfolio Project.
Data Warehouse:
- Create a data warehouse database, including the fact and dimension tables (star schema).
- Create the schema for each table.
- Populate the tables using either ETL (Pentaho) or SQL (PostgreSQL).
Preprocessing for SAS:
- Extract data from the data warehouse, creating a file for input into SAS. The format of the file is your choice. Ensure SAS University Edition accepts your selected format.
Statistical Analysis Using SAS:
- Import data created in the preprocessing step.
- Conduct statistical analysis using the appropriate statistics from each category:
- Summary statistics
- Classification
- Clustering
- Association
- Prepare an analysis report.
Using your plan prepared in Module 3, Milestone 1, and leveraging the data warehouse and preprocessing steps in Module 6, Milestone 2, complete the tasks under Statistical Analysis Using SAS.
Your analysis report must include:
- An analysis of each variable in the data set
- An analysis to determine which variables could serve as appropriate classifier variables
- An analysis to determine if any variables are candidates for clustering
- An analysis to determine if any variables have associations
- Any tables, histograms, or scatterplot graphs necessary to support your analyses
- A recommendation as to the suitability of this data set for meeting your organizations business goal
Your project must meet the following requirements:
- Be 6-8 pages in length, not including the cover and references pages.
- Follow the CSU-Global Guide to Writing & APA (Links to an external site.). Your paper should include an introduction, a body with at least four fully developed paragraphs, and a conclusion.
- Be clearly and well written using excellent grammar and style techniques. Be concise and logical. You are being graded, in part, on the quality of your writing. If you need assistance with your writing style, you can find many writing resources in the CSU-Global Writing Center (Links to an external site.).
- Be supported with at least three peer-reviewed, scholarly references, and one citation from the course textbooks. You may also include references from credible sources in print and from the Internet. The CSU-Global Library (Links to an external site.) is a great place to find these resources.
Refer to the Portfolio Project rubric in the Module 8 folder for more information on the expectations for this assignment.
Option #2: Clothing Store Data Mining and Statistical Analysis Project Planning
The objective of this Portfolio Project is mining data from a data warehouse, which contains data from the Clothing Store csv file supplied to the class. The Clothing Store file contents are covered in greater detail in Chapters 29-31 of the Data Mining and Predictive Analytics textbook. This data set is large, with over 28,000 records and over 50 fields. You may wish to trim the data in the csv file before moving forward.
Below are the summarized tasks for this Portfolio Project.
Data Warehouse:
- Create a data warehouse database, including the fact and dimension tables (star schema).
- Create the schema for each table.
- Load the Clothing Store csv into a database (which you will need to create), including the tables and the schema, or retain the data in csv format.
- Populate the tables using either ETL (Pentaho) or SQL (PostgreSQL).
Preprocessing for SAS:
- Extract data from the data warehouse, creating a file for input into SAS. The format of the file is your choice. Ensure SAS University Edition accepts your selected format.
Statistical Analysis Using SAS:
- Import data created in the preprocessing step.
- Conduct statistical analysis using the appropriate statistics from each category:
- Summary statistics
- Classification
- Clustering
- Association
- Prepare an analysis report.
Using your plan prepared in Module 3, Milestone 1, and leveraging the data warehouse and preprocessing steps in Module 6, Milestone 2, complete the tasks under Statistical Analysis Using SAS.
Your analysis report must include:
- An analysis of each variable in the data set
- An analysis to determine which variables could serve as appropriate classifier variables
- An analysis to determine if any variables are candidates for clustering
- An analysis to determine if any variables have associations
- Any tables, histograms, or scatterplot graphs necessary to support your analyses
- A recommendation as to the suitability of this data set for meeting your organizations business goal
Your project must meet the following requirements:
- Be 6-8 pages in length, not including the cover and references pages.
- Follow the CSU-Global Guide to Writing & APA (Links to an external site.). Your paper should include an introduction, a body with at least four fully developed paragraphs, and a conclusion.
- Be clearly and well written using excellent grammar and style techniques. Be concise and logical. You are being graded, in part, on the quality of your writing. If you need assistance with your writing style, you can find many writing resources in the CSU-Global Writing Center (Links to an external site.).
- Be supported with at least three peer-reviewed, scholarly references, and one citation from the course textbooks. You may also include references from credible sources in print and from the Internet. The CSU-Global Library (Links to an external site.) is a great place to find these resources.