COMP338 – Computer Vision – Assignment 2
计算机视觉作业代写 Use the final model saved in the Step 3.4 to predict the class labels of test images and check them against the ground-truth.
o This assignment is worth 15% of the total mark for COMP338
o 80% of the assignment marks will be awarded for correctness of results
o 20% of the assignment marks will be awarded for the quality of the accompanying report
Submission Instructions 计算机视觉作业代写
o Send all solutions as a single PDF document containing your answers, results, and discussion
of the results. Attach the source code for the programming problems as separate files.
o The deadline for this assignment 08/01/2021, 5:00pm
o Penalties for late submission apply in accordance with departmental policy as set out in the student handbook, which can be found at
and the University Code of Practice on Assessment, found at
Download “Assignment2_SupplementaryMaterials.zip” from Canvas, unzip the file, you can find that
- In the “data” folder, we have the same dataset as the Assignment 1 and there are five classes: airplanes, cars, dog, faces and keyboard, with 70 training images and 10 test images for each class;
. imgdata.py is an example data loader (see Step 3.1);
. img_list_train.npy and img_list_test.npy contain the paths of the training images and test images, for your convenience.
We will use PyTorch to implement the following steps.
Step 1. (40 marks) Feature extraction using a Convolution Neural Network 计算机视觉作业代写
Define a Convolutional Neural Network with the following architecture.
1. Input: 3x250x250, i.e., the number of channels is 3, the size of each channel is 250x250 (if it is a greyscale image, duplicate it into an image of 3 channels);
2. First hidden layer: a convolution layer with a filter size 7x7, stride 2, padding 3, the number of channels (i.e., the number of filters) 64, followed by Batch Normalization and ReLu.
. Second hidden layer: max pooling with a filter size 3x3, stride 2, padding 0;
. Third hidden layer: a convolution layer with a filter size 3x3, stride 1, padding 1, the number of channels (i.e., the number of filters) 64, followed by Batch Normalization and ReLu.
. Fourth hidden layer: max pooling with a filter size 3x3, stride 2, padding 0;
. Fully connected layer, with the output channel 5 (i.e., the number of classes);
. Softmax function to transform the output from the fully connected layer into probabilities.
Step 2. (10 marks) Define a loss function and the optimizer
. Use a classification Cross-Entropy loss;
. Use the ADAM optimizer;
. Try a learning rate of 0.01, 0.001, 0.0001, 0.00001, and discuss the results of different learning rates
Step 3. (20 marks) Train the network 计算机视觉作业代写
- Load the data and train the network, and normalize the input data (an example data loader is given in “imgdata.py”);
. Set the number of batch size as 16 and the number of epochs as 20;
. Train the network and save the final model;
. Plot the losses of the training against the number of epochs (as we do in the lab sessions 7 and 8, in this case we don’t use the validation data);
- (Optional) Plot the change of the accuracies of the training against the number of epochs.
Step 4. (30 marks) Test the network on the test data and report the results
- Use the final model saved in the Step 3.4 to predict the class labels of test images and check them against the ground-truth. If the prediction is correct, we add the sample to the list of correct predictions.
Compute and report the overall (compare against the chance level 1/5 = 20%) and the classification errors per class.
. Compute and show the confusion matrix and analyze the results.
. Discuss the results: for each class, show some images that are correctly classified and some images that are incorrectly classified. Can you explain some of the failures?
. Discuss the results and compare the CNNs model against the Bag of Words model in Assignment 1.What are their advantages and disadvantages?
. (Optional) Try out different optimizers, batch size, network architectures (e.g., different number of convolution and maxpooling layers) etc., and discuss the results;
. (Optional) Use transfer learning to use pretrained models, e.g., ResNet, VGG and AlexNet, and discuss the results.