机器学习代考 ITP-449 Applications of Machine Learning

ITP-449 Applications of Machine Learning

Deﬁnition/Description  机器学习代考

The ﬁnal project for this course entails four (4) distinct questions using four (4) distinct datasets. You will be using several of the ML algorithms that have been covered over the course of the course: KNN, Classiﬁcation Trees,    and Linear Regression.

Question1:WineQualityClassiﬁcationUsingKNN

The goal of this Question is to predict the quality of wine given the other attributes.

1.    Loadthedata from the ﬁle winequality.csv. (2)

2.   Standardizeallvariables other than Quality.  (2)

3.  Partition the dataset (3)  机器学习代考

4.   Build a KNN classiﬁcation model to predict Quality based on alltheremaining numeric variables. (2)

5. a.  random_state = 2020, Partitions 60/20/20, stratify = y

6.   IterateonK ranging from 1 to 30. Plot the accuracy for the train A and train B datasets. (4)

7.  Which value of k producedthebest accuracy in the train A and train B data sets? (2)

8.   Generate predictions for the test partition with the chosen value of k. Plot the confusion matrix ofthe actualvs predicted wine quality. (4)

9.   Print the accuracy of model on the test dataset. (2)

10.   Print the test dataframe with the added columns “Quality” and “Predicted Quality” (4)

Question2:PersonalLoanPredictionUsingTrees  机器学习代考

Load the “UniversalBank.csv". This dataset is taken from the website of the book "Data mining for business    intelligence" by Shmueli, Patel and Bruce, 1st ed, Wiley 2006. The data set provides information about many  people and our goal is to build a model to classify the cases into those who will accept the oﬀer of a personal loan and those who will reject it. In the data, a zero in the Personal Loan column indicates that the concerned person rejected the oﬀer and a one indicates that the person accepted the oﬀer. Answer the following           questions:

1.   Whatisthe target variable? (2)

2.   Ignorethevariables Row and Zip code.  (3)

3.   Partitionthedata 70/30. Random_state = 2020, stratify=y (3)

4.   Howmanyof the cases in the training partition represented people who accepted oﬀers of a personal loan? (3)

5.   Plot the classiﬁcation tree Use entropycriterion. Max_depth= 5, random_state = 2020. (4)

6.   Onthetraining partition, how many acceptors did the model classify as non-acceptors? (3)

7.   Onthetraining partition, how many non-acceptors did the model classify as acceptors? (3)

8.   Whatwasthe accuracy on the training partition?  (2)

9.   Whatwasthe accuracy on the test partition? (2)

Question3:MushroomEdibilityUsingTrees

Your task is to build a classiﬁcation model that predicts the edibility of mushrooms (class variable in the dataset). You have been provided with a dataset as a mushrooms.csv ﬁle. Attribute description:

•   cap-shape: bell=b,conical=c,convex=x,ﬂat=f, knobbed=k,sunken=s

•  cap-surface: ﬁbrous=f,grooves=g,scaly=y,smooth=s

•   cap-color:brown=n,buﬀ=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y

•   bruises?: bruises=t,no=f

•   odor: almond=a,anise=l,creosote=c,ﬁshy=y,foul=f, musty=m,none=n,pungent=p,spicy=s

•   gill-attachment: attached=a,descending=d,free=f,notched=n  机器学习代考

•   gill-spacing: close=c,crowded=w,distant=d

•   gill-color: black=k,brown=n,buﬀ=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e,white=w,yellow=y

•   stalk-shape: enlarging=e,tapering=t

• stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=?

• stalk-surface-above-ring: ﬁbrous=f,scaly=y,silky=k,smooth=s

•   stalk-surface-below-ring: ﬁbrous=f,scaly=y,silky=k,smooth=s

•   stalk-color-above-ring: brown=n,buﬀ=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y

•   stalk-color-below-ring: brown=n,buﬀ=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y

•   veil-type: partial=p,universal=u

•   veil-color: brown=n,orange=o,white=w,yellow=y  机器学习代考

•   ring-number: none=n,one=o,two=t

•   ring-type: cobwebby=c,evanescent=e,ﬂaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z

•   spore-print-color: black=k,brown=n,buﬀ=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y

•   population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y

•   class: p = poisonous, e=edible

1. Buildaclassiﬁcation tree. Random_state =2020. Training partition 0.7. stratify = y, max_depth = 6, use Entropy  机器学习代考

2.   Print the confusion matrix. Also visualizetheconfusion matrix using plot_confusion_matrix from sklearn.metrics (5)

3.   Whatwas the accuracy on the training partition?  (2)

4.   Whatwasthe accuracy on the test partition? (2)

5.   Showtheclassiﬁcation tree. (4)

6.   Listthetop three most important features in your decision tree for determining toxicity. (6)

7.   Classify the following mushroom. (6)

 class ?

 cap-shape x cap-surface s cap-color n bruises t odor y gill-attachment f gill-spacing c gill-size n gill-color k stalk-shape e stalk-root e stalk-surface-above-ring s stalk-surface-below-ring s stalk-color-above-ring w stalk-color-below-ring w veil-type p veil-color w ring-number o ring-type p spore-print-color r population s habitat u

Question 4: Vehicle MPGs Using Linear Regression

Load the data from the ﬁle auto-mpg.csv. The ﬁle contains information about various cars made between 1970 and 1982. The ﬁle contains 398 rows of data. The table below shows an extract of the ﬁrst 10 rows to give you an idea of the data.

1. Summarize the data set. What is the mean of mpg? (2)

2. Whatisthe median value of mpg? (1)  机器学习代考

3.   Which value is higher – mean or median? What does this indicate in terms of the skewnessofthe attribute values? Make a plot to verify your answer.  (2)

4.   Plot the pairplot matrix of all the relevant numeric attributes. (don’t consider Noandcar_name)? (2)

5.   Based on the pairplotmatrix, whichtwo attributes seem to be most strongly linearly correlated? (2)

6.   Basedonthe pairplot matrix, which two attributes seem to be most weakly correlated. (2)

7.   Produce a scatterplot of the two attributes mpg and displacement with displacement on the x axis andmpgon the y axis. (2)

8.   Build a linear regressionmodelwith mpg as the target and displacement as the predictor. Answer the following questions based on the regression model.

1. a.   Foryourmodel, what is the value of the intercept β0 ? (1)

2. b.   Foryourmodel, what is the value of the coeﬃcient β1 of the attribute displacement? (1)

3.   What is the regression equation as perthemodel? (2)

4.   Foryourmodel, does the predicted value for mpg increase or decrease as the displacement increases? (2)

5.   Given a car withadisplacement value of 220, what would your model predict its mpg to be? (2)

6.    Display a scatterplotofthe actual mpg vs displacement and superimpose the linear regression line. (2)

7. g.   Plottheresiduals. (2)

StructuralRequirements

1. The primary entrypoint for your code should be through main ():

def main ():

# your code goes here

and that to call it, you should use the following:

if __name__ == 'main':

main ()

1.   Remembertoprovide a header at the top of your code which includes:

# [Your Full Name]

# ITP-449 [Semester]

# Final Project

1.   Don't forgettoprovide useful comments throughout!

Providedﬁles/data  机器学习代考

Four ﬁles will be required:

Example Output

Deliverables

You should have only one Python ﬁle, which should be named:

ITP-449_Final_Project_LastName_FirstName.py

csvs/

winequality.csv

UniversalBank.csv

mushrooms .csv

auto-mpg.csv

Compress it in a zip ﬁle, which should be named:

ITP-449_Final_Project_LastName_FirstName.zip

Submit this ﬁle on Blackboard. Assignments will only be accepted through Blackboard.

 Section Points (Total: 100) Question 11. (1.0 points each) Question 21. (1.0 points each) Question 31. (1.0 points each) Question 41. (1.0 points each) Code1.    Correct setup of main () function.2.   Correct use of call of main (). 1.0 (1.0 points each) Documentation and Formatting1.    Concise and useful commenting in your codebase is a must. You will need a    header with your name, the semester, the section of the course you are in, and the homework number.2.  You need descriptions of any major sections in your code (functions, classes, methods, et al.).3.   Your code must be generally clear and readable. 3.0 (3.0 points each) Error Handling1.    Program runs without crashing.2.   Program prompts the user to re-enter inputs which are not acceptable. 1.0 (0.5 points each)