大数据作业代考 DATA 604, Spring, 2022, Project 1

DATA 604, Spring, 2022, Project 1

大数据作业代考 Divide each class into two sets: the training set consisting of N examples for each of the classes 0 through 9, and the testing set

1) Find and download the dataset of fashion images collected by Zalando Research and known as Fashion MNIST. 大数据作业代考

2) (15 pts) Study the role of the split of the training vs testing data for classifification. Specififically:

Divide each class into two sets: the training set consisting of N examples for each of the classes 0 through 9. And the testing set consisting of 7000 − N examples of each of the classes 0 through 9. Where N ranges from 1000 to 6000 Propose and describe a selection algorithm for choosing N out of 7000 images for any integer value of N.

Design and implement a method to test and estimate your computer’s capa bility to perform numerical computations. Estimate the computational cost of this Project and use it to decide on a reasonable number of experiments you can perform. Quantify this information and use it to guide your personal numerical goals. Such as the number of difffferent values of N that you will test. 大数据作业代考

Use the k nearest neighbors classifification scheme in the standard Euclidean metric with fifixed k = 20 to verify the global success rate of your classififications for each chosen value of N.

Draw conclusions about the impact of the size of the training set on the performance of the classifification scheme. To do this provide a method for choosing an optimal size of the training set. Describe what is the notion of optimality that you choose. Substantiate your conclusions with numerical evidence.

3) (15 pts) Study the role of the structure of the split of the training vs testing data for classifification. Specififically: 大数据作业代考

Propose a new method of selecting the training set. Which is difffferent from the one proposed in Part 2. Describe the new selection algorithm.

• Divide each digit class (using this new split method) into two sets: the training set consisting of N examples for each of the classes 0 through 9. And the testing set consisting of 7000 − N examples of each of the classes 0 through 9, where N is the optimal value chosen in Part 2.

Use the same k nearest neighbors classifification scheme in the standard Eu clidean metric with fifixed k = 20 to verify the success rate of your classifification for the chosen optimal value of N with the new split of training vs testing data. 大数据作业代考

Draw conclusions about the impact of the structure of the training vs testing split based on comparison of results of Part 2 and Part 3.

4) (10 pts) Choose the best training set selection method that you proposed with the optimal value of N from Part 2. And analyze the role of the metric in the classifification process. For this purpose compare the distance induced by the k · k1 norm, with the Euclidean distance. Draw conclusions. 大数据作业代考

5) (10 pts) Revisit the best classifification results from previous parts. And this time analyze the success percentages for each class separately. As well as globally. Analyze the p

更多代写：cs代写计量经济代考机器学习代写 r语言代写加拿大网课代考

发表回复

要发表评论，您必须先登录。

大数据作业代考 DATA 604, Spring, 2022, Project 1

DATA 604, Spring, 2022, Project 1

1) Find and download the dataset of fashion images collected by Zalando Research and known as Fashion MNIST. 大数据作业代考

2) (15 pts) Study the role of the split of the training vs testing data for classifification. Specififically:

3) (15 pts) Study the role of the structure of the split of the training vs testing data for classifification. Specififically: 大数据作业代考

5) (10 pts) Revisit the best classifification results from previous parts. And this time analyze the success percentages for each class separately. As well as globally. Analyze the p

发表回复

联系我们

分类目录

精选文章

关键词

最近页面