APANPS5335_002_2020_3 - MACHINE LEARNING: CONCEPTS &
APPLICATION
机器学习课程代写 The Enron scandal was an accounting scandal of Enron Corporation, an American energy company based in Houston, Texas.
Directions: You must submit the RMD or Python (Jupyter Notebook) and “knitted” PDF files on the Canvas class website. Make sure that your name and university number are at the top of
your homework
Goal of the Assignment (max number of points: 100) 机器学习课程代写
In this assignment we will use different tools to implement the ideas that we cover in class. The goal of this assignment is to get you proficient in Linear Regression, Regression Diagnostics, GLM Regression.
1. Linear Regression (45 points)
The goal of this exercise is to reinforce our knowledge of linear regression analysis and help you to develop abilities with a non-structured database for that purpose we will create a model to model the Bitcoin Market Cap and find an appropriate model to predict it.
-Obtain your dataset at (use the appropriate quandl library in Python or R) (2.5 points)(https://www.quandl.com/data/BITCOINWATCH/MINING-Bitcoin-Mining-Statistics)
-Perform Exploratory Data Analysis on this dataset (2.5 points) 机器学习课程代写
-Choose the most relevant variables and perform feature engineering. (5 points)
-Create a regression model (state your initial baseline regression model and explain the output, do you need any transformation, any interaction terms in your variables?) (10 points)
-Remove Outliers (explain your test, check the effect of this in your regression model and
describe your steps) (2.5 points)
-Test for Multicollinearity (explain your test, check the effect of this in your regression model, fix the problems and describe your steps) (2.5 points)
-Test Normality Assumption (explain your test and describe your steps) (2.5 points)
-Use a QQ-plot to explore your assumptions (explain your test and describe your steps) (2.5 points)
-Explain why you chose your final model and tell us what is the predicted cost for you from the model? (2.5 points)
-What is the predicted market capitalization for October 16th, 17th and 18th (2020)? (2.5 points)
-Create a dashboard, showing all your test and results. (10 points)
2. GLM Regression (20 points) 机器学习课程代写
(https://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disaster)
(https://www.jstor.org/stable/2290069?seq=1#page_scan_tab_contents)
On January 28, 1986, the NASA space shuttle Challenger had an accident. The vehicle broke apart and disintegrated at 73 seconds into its flight, leading to the deaths of its seven crew
members. The accident had serious consequences for the NASA credibility and resulted in an interruption of 32 months in the shuttle program.
The Presidential Rogers Commission (formed by astronaut Neil A. Armstrong and Nobel laureate Richard P. Feynman, among others) was created in order to investigate the causes of the disaster.
The Rogers Commission elaborated a report (Presidential Commission on the Space Shuttle Challenger Accident 1986) with all the findings. The commission determined that the disintegration began with the failure of an O-ring seal in the solid rocket motor due to the unusually cold temperature (−0.6 Celsius degrees) during the launch. This failure produced a breach of burning gas through the solid rocket motor that compromised the whole shuttle structure, resulting in its disintegration due to the extreme aerodynamic forces. 机器学习课程代写
This space shuttle used two booster rockets to help lift it into orbit. Each booster rocket consists of several pieces whose joints are sealed with rubber O-rings (0.28 inches wide and 37.5 feet in
diameter), which are designed to prevent the release of hot gases produced during combustion. Each booster contains 3 primary O-rings (for a total of 6 for the orbiter).
In the 23 previous flights for which there were data (the hardware for one flight was lost at sea), the O-rings were examined for damage.
One interesting question is the relationship of O-ring damage to temperature (particularly since it was (forecasted to be) cold — 31F — on the morning of January 28, 1986).
There was a good deal of discussion among the Morton Thiokol
(https://en.wikipedia.org/wiki/Thiokol) engineers the previous day as to whether the flight should go on as planned or not. Let’s analyze the challenger dataset challenger.csv
-Perform exploratory data analysis EDA (4 points)
-Is the temperature associated with O-ring incidents? Do you observe any difference between the whole dataset and the last 7 flights previous to the accident? (2 points)
-Our goal is to analyze the relationship between temperature and the probability of O-ring failure Please identify the dependent variable (label)? (2 points)
-In which way was the temperature affecting the probability of O-ring incidents? Why if you run a linear regression is not appropriate for this problem, perform one and test normality. (2 points)
-Perform a logistic regression analysis, and please interpret the output, what is the meaning of the slope coefficient. What was the predicted probability of an incident in an O-ring for the temperature of the launch day? (10 points)
(https://bookdown.org/egarpor/PM-UC3M/)
(https://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disaster)
3. Enron Accounting Fraud (35 points) 机器学习课程代写
The Enron scandal was an accounting scandal of Enron Corporation, an American energy company based in Houston, Texas. It was publicized in October 2001, and led to the bankruptcy
of the company, and the de facto dissolution of Arthur Andersen, which was one of the five largest audit and accountancy partnerships in the world. In addition to being the largest bankruptcy
reorganization in American history at that time, Enron was cited as the biggest audit failure.
Our Problem: Text Categorization using this dataset
The Enron email dataset contains approximately 500,000 emails generated by employees of the Enron Corporation. It was obtained by the Federal Energy Regulatory Commission during its
investigation of Enron's collapse.
1. Find the dataset May 7, 2015 Version of this dataset and provide the web link you found to work on this point. (2.5 points)
2. Describe the dataset, perform Exploratory Data Analysis EDA on it. (2.5 points)
3. What problems do you think can be solved using this dataset? (2.5 points)
4. Perform feature engineering and build a graph that shows the relationship between email senders. (10 points)
5. Read the instructions of the dataset and build a model to categorize the emails. (15 points)
6. What stack of tools did you use to solve this problem? (2.5 points)
商科代写 cs代写 法律学代写 经济学代考_经济学作业代写 艺术代写 心理学代写 哲学代写 伦理学代写 体育学代写 化学代写 教育学代写 医学代写 历史代写
发表回复
要发表评论,您必须先登录。