大数据作业代考 DATA 604, Spring, 2022, Project 1

大数据作业代考

DATA 604, Spring, 2022, Project 1

大数据作业代考 Divide each class into two sets: the training set consisting of N examples for each of the classes 0 through 9, and the testing set

1) Find and download the dataset of fashion images collected by Zalando Research and known as Fashion MNIST. 大数据作业代考

2) (15 pts) Study the role of the split of the training vs testing data for classifification. Specififically:


  • Divide each class into two sets: the training set consisting of N examples for each of the classes 0 through 9. And the testing set consisting of 7000 N examples of each of the classes 0 through 9.  Where N ranges from 1000 to 6000 Propose and describe a selection algorithm for choosing N out of 7000 images for any integer value of N.


  • Design and implement a method to test and estimate your computer’s capa bility to perform numerical computations. Estimate the computational cost of this Project and use it to decide on a reasonable number of experiments you can perform. Quantify this information and use it to guide your personal numerical goals. Such as the number of difffferent values of N that you will test. 大数据作业代考


  • Use the k nearest neighbors classifification scheme in the standard Euclidean metric with fifixed k = 20 to verify the global success rate of your classififications for each chosen value of N.


  • Draw conclusions about the impact of the size of the training set on the performance of the classifification scheme. To do this provide a method for choosing an optimal size of the training set. Describe what is the notion of optimality that you choose. Substantiate your conclusions with numerical evidence.

大数据作业代考

3) (15 pts) Study the role of the structure of the split of the training vs testing data for classifification. Specififically: 大数据作业代考


  • Propose a new method of selecting the training set.  Which is difffferent from the one proposed in Part 2. Describe the new selection algorithm.

Divide each digit class (using this new split method) into two sets: the training set consisting of N examples for each of the classes 0 through 9. And the testing set consisting of 7000 N examples of each of the classes 0 through 9, where N is the optimal value chosen in Part 2.


  • Use the same k nearest neighbors classifification scheme in the standard Eu clidean metric with fifixed k = 20 to verify the success rate of your classifification for the chosen optimal value of N with the new split of training vs testing data. 大数据作业代考


  • Draw conclusions about the impact of the structure of the training vs testing split based on comparison of results of Part 2 and Part 3.

4) (10 pts) Choose the best training set selection method that you proposed with the optimal value of N from Part 2.  And analyze the role of the metric in the classifification process. For this purpose compare the distance induced by the k · k1 norm, with the Euclidean distance. Draw conclusions. 大数据作业代考

5) (10 pts) Revisit the best classifification results from previous parts.  And this time analyze the success percentages for each class separately.  As well as globally. Analyze the p

中央密苏里大学代写更多代写:cs代写    计量经济代考   机器学习代写      r语言代写  加拿大网课代考

发表评论

客服一号:点击这里给我发消息
客服二号:点击这里给我发消息
微信客服1:essay-kathrine
微信客服2:essay-gloria