WTA Obtaining Patient Chest X Ray from Hospital

Lowyat.NET Rules and Regulations FAQ Help Search Members

Science WTA Obtaining Patient Chest X Ray from Hospital, For FYP: Image Detection for COVID-19

views

TSiSean	Sep 25 2020, 08:28 AM, updated 5y ago Show posts by this member only \| Post #1
iz old liao. Senior Member 4,496 posts Joined: Jun 2011	Hi there, would like to ask does anyone know how to obtain a large quantity of Chest X Rays that are labelled, in terms of Healthy/Normal, Pneumonia/Respiratory Infections and Covid19? I am doing my FYP on Covid19 image detection from Machine Learning. Currently on Github there is CovidNet by LindaWang on her approach in detecting Covid19, that was quite successful in detecting Covid patients. But I wont be applying her technique as it is too complexed and I would be most likely going to deploy my own model using TensorFlow. As I'm quite new to this topic. Also, I realized that the number if Normal Healthy and Covid 19 Chest X ray images are quite limited within her dataset, also there are a lot of low resolution images within the dataset which I myself have a hard time differentiating the images. I'm not sure even the machine is able to detect such fuzzy images. So I was wondering was it possible to obtain them from our Government and Private Hospitals through their radiology department? But I'm not really sure how to contact them. Any advice would be nice. I would need about a few hundred or thousands of images from either posterior anterior(PA) or anteriorposterior(AP) view. MRI/CT images are also acceptable. » Click to show Spoiler - click again to hide... « This post has been edited by iSean: Sep 25 2020, 08:48 AM
Card PM	Report Top Like Quote Reply

anakkk	Sep 25 2020, 08:39 AM Show posts by this member only \| Post #2
Look at all my stars!! Senior Member 2,111 posts Joined: Apr 2013	those are confidential, not sure if you can get it or not, maybe need your dean to write a letter to hospital for collaboration
Card PM	Report Top Like Quote Reply

TSiSean	Sep 25 2020, 08:47 AM Show posts by this member only \| Post #3
iz old liao. Senior Member 4,496 posts Joined: Jun 2011	QUOTE(anakkk @ Sep 25 2020, 08:39 AM) those are confidential, not sure if you can get it or not, maybe need your dean to write a letter to hospital for collaboration Thank you for your reply. I'm just an undergrad and fairly new to this, so I think I would approach them eventually and ask.
Card PM	Report Top Like Quote Reply

guess0410	Oct 28 2020, 12:05 AM Show posts by this member only \| Post #4
Getting Started Junior Member 103 posts Joined: Aug 2010	you need to ask your supervisor to contact a physician (doctor) in Infectious disease clinic for a collaboration. The doc need to apply for ethic. Then the data can be released to you. just my personal opinion, most likely the chest X-ray of covid patient is the same with others viral pneumonia. May be you should consider CT
Card PM	Report Top Like Quote Reply

TSiSean	Oct 30 2020, 07:34 PM Show posts by this member only \| Post #5
iz old liao. Senior Member 4,496 posts Joined: Jun 2011	QUOTE(guess0410 @ Oct 28 2020, 12:05 AM) you need to ask your supervisor to contact a physician (doctor) in Infectious disease clinic for a collaboration. The doc need to apply for ethic. Then the data can be released to you. just my personal opinion, most likely the chest X-ray of covid patient is the same with others viral pneumonia. May be you should consider CT Well, hopefully someone in Lowyat here has connection Not sure my uni lecturers has that much connections or not.
Card PM	Report Top Like Quote Reply

tboxmy	Oct 30 2020, 09:32 PM Show posts by this member only \| Post #6
Casual Junior Member 478 posts Joined: Oct 2006	QUOTE(iSean @ Oct 30 2020, 07:34 PM) Well, hopefully someone in Lowyat here has connection Not sure my uni lecturers has that much connections or not. Possibility is slim to find some inference specific to C19 via xrays, but it would be a great find. Just a thought. What I want to say is, you hold Lowyat with such a high degree of confidence!
Card PM	Report Top Like Quote Reply

guess0410	Oct 31 2020, 12:07 AM Show posts by this member only \| Post #7
Getting Started Junior Member 103 posts Joined: Aug 2010	QUOTE(iSean @ Oct 30 2020, 07:34 PM) Well, hopefully someone in Lowyat here has connection Not sure my uni lecturers has that much connections or not. your chance is low.. as a student is hard to convince them. It involve a lot of hard work. to apply the ethic need money, to dig out all related files need time. And what is the return to occupy their precious time? if your SV is really interested and confident, sent out invitation email to all pulmonologist and infectious disease doctor. Target those young, and working in teaching hospital ( there are 5 in Malaysia). Most email can be found online. You might have higher chances.
Card PM	Report Top Like Quote Reply

pipedream	Jan 15 2021, 06:10 PM Show posts by this member only \| Post #8
Look at all my stars!! Senior Member 2,353 posts Joined: Dec 2006	Late to the party but your project caught my interest because I thought of doing the same thing last time What's your level of project? If its an undergraduate FYP project then you don't need to go into that level of detail starting from scratch to rebuild a model Here's my take Instead of retraining the entire model, why not reuse the model from LindaWang or train your own model with the COVID image set from LindaWang github then use the model to predict on small set of chest x-ray obtained locally. It would be indefinitely easier to get say like 20 chest x-rays than thousands of them Also, training on a large dataset does not mean your model would be accurate in detecting unseen datasets (overfitting). It would be more worth of a publication and project if you could improve on detection accuracy by optimization of models instead of building on a new model
Card PM	Report Top Like Quote Reply

TSiSean

Jan 15 2021, 06:53 PM

Show posts by this member only | Post #9

iz old liao.

Senior Member
4,496 posts

Joined: Jun 2011

QUOTE(pipedream @ Jan 15 2021, 06:10 PM)

Late to the party but your project caught my interest because I thought of doing the same thing last time

What's your level of project? If its an undergraduate FYP project then you don't need to go into that level of detail starting from scratch to rebuild a model

Here's my take

Instead of retraining the entire model, why not reuse the model from LindaWang or train your own model with the COVID image set from LindaWang github then use the model to predict on small set of chest x-ray obtained locally.

It would be indefinitely easier to get say like 20 chest x-rays than thousands of them

Also, training on a large dataset does not mean your model would be accurate in detecting unseen datasets (overfitting).

It would be more worth of a publication and project if you could improve on detection accuracy by optimization of models instead of building on a new model

Arlo, yup abit late the party, but my project is still on-going. haha.
It is undergraduate level.

Well I think for now I just use transfer learning, see how model performs, then maybe build new layers on top of it.
And using Linda Wang's model as a benchmark.

So far I obtained a small dataset from UM, that will be used for my Validation Dataset.

So far my dataset is like :

Content	COVID	Normal	Souce
Training	200	200	- COVIDx (LindaWang)
Testing	100	100	- COVIDx (LindaWang)
Validation	100	100	- UM

I'm running on a model on PyTorch in Google Colab that someone else built already built already taken up like 12 hours to run to 18 iterations and disconnects....
https://blog.paperspace.com/fighting-corona...-19-classifier/

For the last part, I only worried that my model isn't quick to train, since I lack resources for a Proper GPU....

I'm running my model based on my personal dataset for the above guide ^. But I didn't manage to reach the Gradient-CAM part, to actually visualize what my model is actually doing.
I am only running it on my laptop.

Also I'm particular new to this field, since COVID-19 pretty much wrecked my plans for doing Physical Project for my FYP.
No choice thought it was interesting and took up this topic. Until realize it is a nightmare waiting for results...

Also your last advice is transfer learning, and optimize those pre-built models with adding new layers is it?
Anyhow, would like to receive any guidance I can get. As I do lack mentorship / good advice for my project.

This post has been edited by iSean: Jan 15 2021, 06:56 PM

Card PM

Report Top

Like

Quote Reply

pipedream

Jan 15 2021, 07:10 PM

Show posts by this member only | Post #10

Look at all my stars!!

Senior Member
2,353 posts

Joined: Dec 2006

QUOTE(iSean @ Jan 15 2021, 06:53 PM)

Arlo, yup abit late the party, but my project is still on-going. haha.
It is undergraduate level.

Well I think for now I just use transfer learning, see how model performs, then maybe build new layers on top of it.
And using Linda Wang's model as a benchmark.

So far I obtained a small dataset from UM, that will be used for my Validation Dataset.

So far my dataset is like :

Content	COVID	Normal	Souce
Training	200	200	- COVIDx (LindaWang)
Testing	100	100	- COVIDx (LindaWang)
Validation	100	100	- UM

I'm running on a model on PyTorch in Google Colab that someone else built already built already taken up like 12 hours to run to 18 iterations and disconnects....
https://blog.paperspace.com/fighting-corona...-19-classifier/

For the last part, I only worried that my model isn't quick to train, since I lack resources for a Proper GPU....

I'm running my model based on my personal dataset for the above guide ^. But I didn't manage to reach the Gradient-CAM part, to actually visualize what my model is actually doing.
I am only running it on my laptop.

Also I'm particular new to this field, since COVID-19 pretty much wrecked my plans for doing Physical Project for my FYP.
No choice thought it was interesting and took up this topic. Until realize it is a nightmare waiting for results...

Also your last advice is transfer learning, and optimize those pre-built models with adding new layers is it?
Anyhow, would like to receive any guidance I can get. As I do lack mentorship / good advice for my project.

You should try Keras. Its actually way easier than PyTorch. I have not use PyTorch before but the code looks complicated

Keras is super easy and modular to use

https://blog.keras.io/building-powerful-ima...ittle-data.html

CUDA is supported on a lot of nvidia models, even my 750m laptop gpu has it, and I think google colab doesnt have CUDA support and running on a single cpu core thats why its so slow. Try using your PC or something. Or ask your SV to borrow university resources.

You are correct. Your FYP would be easier and more worthwhile for publication if you just focus your effort on optimizing a pre-trained model. Use the UM sets to try to further tune the model. Try to get the LindaWang model from her github and start from there. You can save plenty of time without rebuilding the model FYI.

Edit: Also way too complicated for a FYP project in undergrad. Kudos to you but don't overdo things. Your project is actually at least a Masters level even on optimization.

This post has been edited by pipedream: Jan 15 2021, 07:13 PM

Card PM

Report Top

Like

Quote Reply

TSiSean	Jan 15 2021, 07:49 PM Show posts by this member only \| Post #11
iz old liao. Senior Member 4,496 posts Joined: Jun 2011	QUOTE(pipedream @ Jan 15 2021, 07:10 PM) You should try Keras. Its actually way easier than PyTorch. I have not use PyTorch before but the code looks complicated Keras is super easy and modular to use https://blog.keras.io/building-powerful-ima...ittle-data.html CUDA is supported on a lot of nvidia models, even my 750m laptop gpu has it, and I think google colab doesnt have CUDA support and running on a single cpu core thats why its so slow. Try using your PC or something. Or ask your SV to borrow university resources. You are correct. Your FYP would be easier and more worthwhile for publication if you just focus your effort on optimizing a pre-trained model. Use the UM sets to try to further tune the model. Try to get the LindaWang model from her github and start from there. You can save plenty of time without rebuilding the model FYI. Edit: Also way too complicated for a FYP project in undergrad. Kudos to you but don't overdo things. Your project is actually at least a Masters level even on optimization. hopefully I just make it out alive and passed. really bitten too much I chewed. Colab should have their eGPU I think? As I thought it is web training. Linda Wang's from what I saw doesn't publish their model to their public. I can try attempt to email to her to get her code, if she replies. My problem with TensorFlow is that only few people I saw actually did some models with it. And I ain't that proficient with all those Python Libraries. And I haven't find someone with a proper guide on auto tuning the hyperparameters ... or use AI explainability / to visualize the heatmaps on how the model makes it respond. Solely looking at training / testing / validation losses and accuracy doesn't really help for me... Here's my training results for 20/60 epochs so far, before it crashed. » Click to show Spoiler - click again to hide... « ------------------ Epoch 0 Iteration 49-------------------------------------- Accuracy 0.580 Sensitivity 0.160 Specificity 1.000 Area Under ROC 0.929 Validation Loss 0.9678632583655417 ------------------------------------------------------------------------------ Training Performance Epoch 0: Average loss: 0.0083, Accuracy: 273/400 (68%) ------------------ Epoch 1 Iteration 49-------------------------------------- Accuracy 0.745 Sensitivity 0.870 Specificity 0.620 Area Under ROC 0.848 Validation Loss 0.6299769258499146 ------------------------------------------------------------------------------ Training Performance Epoch 1: Average loss: 0.0053, Accuracy: 342/400 (86%) ------------------ Epoch 2 Iteration 49-------------------------------------- Accuracy 0.505 Sensitivity 0.990 Specificity 0.020 Area Under ROC 0.615 Validation Loss 2.6408397589577364 ------------------------------------------------------------------------------ Training Performance Epoch 2: Average loss: 0.0046, Accuracy: 363/400 (91%) ------------------ Epoch 3 Iteration 49-------------------------------------- Accuracy 0.875 Sensitivity 0.910 Specificity 0.840 Area Under ROC 0.918 Validation Loss 0.40587875187397005 ------------------------------------------------------------------------------ Training Performance Epoch 3: Average loss: 0.0030, Accuracy: 371/400 (93%) ------------------ Epoch 4 Iteration 49-------------------------------------- Accuracy 0.715 Sensitivity 0.930 Specificity 0.500 Area Under ROC 0.876 Validation Loss 0.838105290364474 ------------------------------------------------------------------------------ Training Performance Epoch 4: Average loss: 0.0026, Accuracy: 375/400 (94%) ------------------ Epoch 5 Iteration 49-------------------------------------- Accuracy 0.905 Sensitivity 0.920 Specificity 0.890 Area Under ROC 0.966 Validation Loss 0.25731385704129933 ------------------------------------------------------------------------------ Training Performance Epoch 5: Average loss: 0.0018, Accuracy: 382/400 (96%) ------------------ Epoch 6 Iteration 49-------------------------------------- Accuracy 0.845 Sensitivity 0.910 Specificity 0.780 Area Under ROC 0.913 Validation Loss 0.5631416907906532 ------------------------------------------------------------------------------ Training Performance Epoch 6: Average loss: 0.0015, Accuracy: 385/400 (96%) ------------------ Epoch 7 Iteration 49-------------------------------------- Accuracy 0.885 Sensitivity 0.810 Specificity 0.960 Area Under ROC 0.919 Validation Loss 0.48001379884779455 ------------------------------------------------------------------------------ Training Performance Epoch 7: Average loss: 0.0030, Accuracy: 377/400 (94%) ------------------ Epoch 8 Iteration 49-------------------------------------- Accuracy 0.820 Sensitivity 0.880 Specificity 0.760 Area Under ROC 0.893 Validation Loss 0.4984317614138126 ------------------------------------------------------------------------------ Training Performance Epoch 8: Average loss: 0.0020, Accuracy: 382/400 (96%) ------------------ Epoch 9 Iteration 49-------------------------------------- Accuracy 0.700 Sensitivity 0.910 Specificity 0.490 Area Under ROC 0.832 Validation Loss 0.9922116323187947 ------------------------------------------------------------------------------ Training Performance Epoch 9: Average loss: 0.0014, Accuracy: 383/400 (96%) ------------------ Epoch 10 Iteration 49-------------------------------------- Accuracy 0.755 Sensitivity 0.860 Specificity 0.650 Area Under ROC 0.835 Validation Loss 0.7781819646060467 ------------------------------------------------------------------------------ Training Performance Epoch 10: Average loss: 0.0011, Accuracy: 389/400 (97%) ------------------ Epoch 11 Iteration 49-------------------------------------- Accuracy 0.845 Sensitivity 0.840 Specificity 0.850 Area Under ROC 0.888 Validation Loss 0.618116648197174 ------------------------------------------------------------------------------ Training Performance Epoch 11: Average loss: 0.0007, Accuracy: 393/400 (98%) ------------------ Epoch 12 Iteration 49-------------------------------------- Accuracy 0.715 Sensitivity 0.880 Specificity 0.550 Area Under ROC 0.805 Validation Loss 1.1322297202795744 ------------------------------------------------------------------------------ Training Performance Epoch 12: Average loss: 0.0008, Accuracy: 394/400 (98%) ------------------ Epoch 13 Iteration 49-------------------------------------- Accuracy 0.805 Sensitivity 0.840 Specificity 0.770 Area Under ROC 0.883 Validation Loss 0.5807389818131924 ------------------------------------------------------------------------------ Training Performance Epoch 13: Average loss: 0.0011, Accuracy: 389/400 (97%) ------------------ Epoch 14 Iteration 49-------------------------------------- Accuracy 0.610 Sensitivity 0.940 Specificity 0.280 Area Under ROC 0.876 Validation Loss 1.3897523446328706 ------------------------------------------------------------------------------ Training Performance Epoch 14: Average loss: 0.0010, Accuracy: 391/400 (98%) Updating the learning rate to 0.001 ------------------ Epoch 15 Iteration 49-------------------------------------- Accuracy 0.815 Sensitivity 0.910 Specificity 0.720 Area Under ROC 0.916 Validation Loss 0.5976290856697597 ------------------------------------------------------------------------------ Training Performance Epoch 15: Average loss: 0.0011, Accuracy: 388/400 (97%) ------------------ Epoch 16 Iteration 49-------------------------------------- Accuracy 0.700 Sensitivity 0.920 Specificity 0.480 Area Under ROC 0.879 Validation Loss 1.181665936439531 ------------------------------------------------------------------------------ Training Performance Epoch 16: Average loss: 0.0008, Accuracy: 394/400 (98%) ------------------ Epoch 17 Iteration 49-------------------------------------- Accuracy 0.745 Sensitivity 0.920 Specificity 0.570 Area Under ROC 0.899 Validation Loss 0.8194479347649031 ------------------------------------------------------------------------------ Training Performance Epoch 17: Average loss: 0.0004, Accuracy: 398/400 (100%) ------------------ Epoch 18 Iteration 49-------------------------------------- Accuracy 0.715 Sensitivity 0.920 Specificity 0.510 Area Under ROC 0.893 Validation Loss 0.9728903653169982 ------------------------------------------------------------------------------ Training Performance Epoch 18: Average loss: 0.0004, Accuracy: 398/400 (100%) ------------------ Epoch 19 Iteration 49-------------------------------------- Accuracy 0.685 Sensitivity 0.920 Specificity 0.450 Area Under ROC 0.872 Validation Loss 1.2074791560857556 ------------------------------------------------------------------------------ Training Performance Epoch 19: Average loss: 0.0003, Accuracy: 398/400 (100%) ------------------ Epoch 20 Iteration 49-------------------------------------- Accuracy 0.755 Sensitivity 0.920 Specificity 0.590 Area Under ROC 0.902 Validation Loss 0.8047028038976713 ------------------------------------------------------------------------------ Training Performance Epoch 20: Average loss: 0.0005, Accuracy: 397/400 (99%)
Card PM	Report Top Like Quote Reply

pipedream	Jan 15 2021, 07:57 PM Show posts by this member only \| Post #12
Look at all my stars!! Senior Member 2,353 posts Joined: Dec 2006	Here's my take on how your project should proceed It is redundant that you are building another new model with LindaWang image set because she already had one optimized. Don't waste your time and resources redoing something. You have 200 images to play with, honestly I feel that is rather enough to build your own model 1. Train a new model on UM data with a small portion of holdout data for validation 2. Use LindaWang's optimized model and predict on my holdout data and at the same time use my model trained on UM data to predict on LindaWang's data set 3. Compare LindaWang's optimized model and mine ^ Honestly I feel that is already enough for a FYP project If you want to go further 4. Fine tune my model with various methods (This is an entirely new project already - Stop at step 3) - Hypertuning - Add/remove layers - Image processing 5. Go back to step 3 Edit1: Going through your protocol https://blog.paperspace.com/fighting-corona...-19-classifier/ You can reuse almost the same method Except CODE Define the Model We now define our model. We use the pretrained VGG-19 with batch normalization as our model. We then replace its final linear layer with one having 2 neurons at its output, and perform transfer learning over our dataset. We use cross entropy loss as our objective function. You change the pretrained model to be Lindawang's one My suggestion to you is to stop whatever you are doing right now. Download and play around with lindawang's model, from her code she is using tensorflow, so get familiarize yourself with that. After you are familiar with how TF works and how to build models with TF, then proceed with step 1 but using lindawang's model as base. Then your objective and goal would be to compare whether your image set improves lindawang's model. This post has been edited by pipedream: Jan 15 2021, 08:27 PM
Card PM	Report Top Like Quote Reply

pipedream	Jan 15 2021, 08:09 PM Show posts by this member only \| Post #13
Look at all my stars!! Senior Member 2,353 posts Joined: Dec 2006	QUOTE(iSean @ Jan 15 2021, 07:49 PM) hopefully I just make it out alive and passed. really bitten too much I chewed. Colab should have their eGPU I think? As I thought it is web training. Linda Wang's from what I saw doesn't publish their model to their public. I can try attempt to email to her to get her code, if she replies. My problem with TensorFlow is that only few people I saw actually did some models with it. And I ain't that proficient with all those Python Libraries. And I haven't find someone with a proper guide on auto tuning the hyperparameters ... or use AI explainability / to visualize the heatmaps on how the model makes it respond. Solely looking at training / testing / validation losses and accuracy doesn't really help for me... Here's my training results for 20/60 epochs so far, before it crashed. » Click to show Spoiler - click again to hide... « ------------------ Epoch 0 Iteration 49-------------------------------------- Accuracy 0.580 Sensitivity 0.160 Specificity 1.000 Area Under ROC 0.929 Validation Loss 0.9678632583655417 ------------------------------------------------------------------------------ Training Performance Epoch 0: Average loss: 0.0083, Accuracy: 273/400 (68%) ------------------ Epoch 1 Iteration 49-------------------------------------- Accuracy 0.745 Sensitivity 0.870 Specificity 0.620 Area Under ROC 0.848 Validation Loss 0.6299769258499146 ------------------------------------------------------------------------------ Training Performance Epoch 1: Average loss: 0.0053, Accuracy: 342/400 (86%) ------------------ Epoch 2 Iteration 49-------------------------------------- Accuracy 0.505 Sensitivity 0.990 Specificity 0.020 Area Under ROC 0.615 Validation Loss 2.6408397589577364 ------------------------------------------------------------------------------ Training Performance Epoch 2: Average loss: 0.0046, Accuracy: 363/400 (91%) ------------------ Epoch 3 Iteration 49-------------------------------------- Accuracy 0.875 Sensitivity 0.910 Specificity 0.840 Area Under ROC 0.918 Validation Loss 0.40587875187397005 ------------------------------------------------------------------------------ Training Performance Epoch 3: Average loss: 0.0030, Accuracy: 371/400 (93%) ------------------ Epoch 4 Iteration 49-------------------------------------- Accuracy 0.715 Sensitivity 0.930 Specificity 0.500 Area Under ROC 0.876 Validation Loss 0.838105290364474 ------------------------------------------------------------------------------ Training Performance Epoch 4: Average loss: 0.0026, Accuracy: 375/400 (94%) ------------------ Epoch 5 Iteration 49-------------------------------------- Accuracy 0.905 Sensitivity 0.920 Specificity 0.890 Area Under ROC 0.966 Validation Loss 0.25731385704129933 ------------------------------------------------------------------------------ Training Performance Epoch 5: Average loss: 0.0018, Accuracy: 382/400 (96%) ------------------ Epoch 6 Iteration 49-------------------------------------- Accuracy 0.845 Sensitivity 0.910 Specificity 0.780 Area Under ROC 0.913 Validation Loss 0.5631416907906532 ------------------------------------------------------------------------------ Training Performance Epoch 6: Average loss: 0.0015, Accuracy: 385/400 (96%) ------------------ Epoch 7 Iteration 49-------------------------------------- Accuracy 0.885 Sensitivity 0.810 Specificity 0.960 Area Under ROC 0.919 Validation Loss 0.48001379884779455 ------------------------------------------------------------------------------ Training Performance Epoch 7: Average loss: 0.0030, Accuracy: 377/400 (94%) ------------------ Epoch 8 Iteration 49-------------------------------------- Accuracy 0.820 Sensitivity 0.880 Specificity 0.760 Area Under ROC 0.893 Validation Loss 0.4984317614138126 ------------------------------------------------------------------------------ Training Performance Epoch 8: Average loss: 0.0020, Accuracy: 382/400 (96%) ------------------ Epoch 9 Iteration 49-------------------------------------- Accuracy 0.700 Sensitivity 0.910 Specificity 0.490 Area Under ROC 0.832 Validation Loss 0.9922116323187947 ------------------------------------------------------------------------------ Training Performance Epoch 9: Average loss: 0.0014, Accuracy: 383/400 (96%) ------------------ Epoch 10 Iteration 49-------------------------------------- Accuracy 0.755 Sensitivity 0.860 Specificity 0.650 Area Under ROC 0.835 Validation Loss 0.7781819646060467 ------------------------------------------------------------------------------ Training Performance Epoch 10: Average loss: 0.0011, Accuracy: 389/400 (97%) ------------------ Epoch 11 Iteration 49-------------------------------------- Accuracy 0.845 Sensitivity 0.840 Specificity 0.850 Area Under ROC 0.888 Validation Loss 0.618116648197174 ------------------------------------------------------------------------------ Training Performance Epoch 11: Average loss: 0.0007, Accuracy: 393/400 (98%) ------------------ Epoch 12 Iteration 49-------------------------------------- Accuracy 0.715 Sensitivity 0.880 Specificity 0.550 Area Under ROC 0.805 Validation Loss 1.1322297202795744 ------------------------------------------------------------------------------ Training Performance Epoch 12: Average loss: 0.0008, Accuracy: 394/400 (98%) ------------------ Epoch 13 Iteration 49-------------------------------------- Accuracy 0.805 Sensitivity 0.840 Specificity 0.770 Area Under ROC 0.883 Validation Loss 0.5807389818131924 ------------------------------------------------------------------------------ Training Performance Epoch 13: Average loss: 0.0011, Accuracy: 389/400 (97%) ------------------ Epoch 14 Iteration 49-------------------------------------- Accuracy 0.610 Sensitivity 0.940 Specificity 0.280 Area Under ROC 0.876 Validation Loss 1.3897523446328706 ------------------------------------------------------------------------------ Training Performance Epoch 14: Average loss: 0.0010, Accuracy: 391/400 (98%) Updating the learning rate to 0.001 ------------------ Epoch 15 Iteration 49-------------------------------------- Accuracy 0.815 Sensitivity 0.910 Specificity 0.720 Area Under ROC 0.916 Validation Loss 0.5976290856697597 ------------------------------------------------------------------------------ Training Performance Epoch 15: Average loss: 0.0011, Accuracy: 388/400 (97%) ------------------ Epoch 16 Iteration 49-------------------------------------- Accuracy 0.700 Sensitivity 0.920 Specificity 0.480 Area Under ROC 0.879 Validation Loss 1.181665936439531 ------------------------------------------------------------------------------ Training Performance Epoch 16: Average loss: 0.0008, Accuracy: 394/400 (98%) ------------------ Epoch 17 Iteration 49-------------------------------------- Accuracy 0.745 Sensitivity 0.920 Specificity 0.570 Area Under ROC 0.899 Validation Loss 0.8194479347649031 ------------------------------------------------------------------------------ Training Performance Epoch 17: Average loss: 0.0004, Accuracy: 398/400 (100%) ------------------ Epoch 18 Iteration 49-------------------------------------- Accuracy 0.715 Sensitivity 0.920 Specificity 0.510 Area Under ROC 0.893 Validation Loss 0.9728903653169982 ------------------------------------------------------------------------------ Training Performance Epoch 18: Average loss: 0.0004, Accuracy: 398/400 (100%) ------------------ Epoch 19 Iteration 49-------------------------------------- Accuracy 0.685 Sensitivity 0.920 Specificity 0.450 Area Under ROC 0.872 Validation Loss 1.2074791560857556 ------------------------------------------------------------------------------ Training Performance Epoch 19: Average loss: 0.0003, Accuracy: 398/400 (100%) ------------------ Epoch 20 Iteration 49-------------------------------------- Accuracy 0.755 Sensitivity 0.920 Specificity 0.590 Area Under ROC 0.902 Validation Loss 0.8047028038976713 ------------------------------------------------------------------------------ Training Performance Epoch 20: Average loss: 0.0005, Accuracy: 397/400 (99%) The pretrained models. COVIDNet-CXR models (COVID-19 detection using chest x-rays): https://github.com/lindawangg/COVID-Net/blo.../docs/models.md COVIDNet-CT models (COVID-19 detection using chest CT scans): https://github.com/haydengunraj/COVIDNet-CT.../docs/models.md COVIDNet-S models (COVID-19 lung severity assessment using chest x-rays): https://github.com/lindawangg/COVID-Net/blo.../docs/models.md
Card PM	Report Top Like Quote Reply

TSiSean

Jan 15 2021, 08:37 PM

Show posts by this member only | Post #14

iz old liao.

Senior Member
4,496 posts

Joined: Jun 2011

QUOTE(pipedream @ Jan 15 2021, 07:57 PM)

Here's my take on how your project should proceed

It is redundant that you are building another new model with LindaWang image set because she already had one optimized. Don't waste your time and resources redoing something.

You have 200 images to play with, honestly I feel that is rather enough to build your own model

1. Train a new model on UM data with a small portion of holdout data for validation
2. Use LindaWang's optimized model and predict on my holdout data and at the same time use my model trained on UM data to predict on LindaWang's data set
3. Compare LindaWang's optimized model and mine

^ Honestly I feel that is already enough for a FYP project

If you want to go further

4. Fine tune my model with various methods (This is an entirely new project already - Stop at step 3)
- Hypertuning
- Add/remove layers
- Image processing

5. Go back to step 3

Sorry ah I kind of slow to comprehend the machine learning testing methodology. Let me know whether did I get your meaning correct.

Dataset	Training	Testing	Validation
COVID	80	10	10
Normal	80	10	10

Step 1
Let say, I take/holdout 10 each for COVID and and NORMAL to validate Linda's Wang model.
I get the parameters for accuracy, sensitivity, specificity and F1-scores from the confusion matrix from Linda Wang's model.

Step 2
Then I develop my own personal model, training using the training dataset of 160 images, after deducting the 20 images for testing in the training process and use the remaining 20 to do validation.
Obtain the parameters for accuracy, sensitivity, specificity and F1-scores from the confusion matrix from my own model using the validation set?

And compared the parameters from my model and her model?

This post has been edited by iSean: Jan 15 2021, 08:42 PM

Card PM

Report Top

Like

Quote Reply

pipedream

Jan 15 2021, 08:58 PM

Show posts by this member only | Post #15

Look at all my stars!!

Senior Member
2,353 posts

Joined: Dec 2006

QUOTE(iSean @ Jan 15 2021, 08:37 PM)

Sorry ah I kind of slow to comprehend the machine learning testing methodology. Let me know whether did I get your meaning correct.

Dataset	Training	Testing	Validation
COVID	80	10	10
Normal	80	10	10

Step 1
Let say, I take/holdout 10 each for COVID and and NORMAL to validate Linda's Wang model.
I get the parameters for accuracy, sensitivity, specificity and F1-scores from the confusion matrix from Linda Wang's model.

Step 2
Then I develop my own personal model, training using the training dataset of 160 images, after deducting the 20 images for testing in the training process and use the remaining 20 to do validation.
Obtain the parameters for accuracy, sensitivity, specificity and F1-scores from the confusion matrix from my own model using the validation set?

And compared the parameters from my model and her model?

Go read up on deep learning. Get your fundamentals down first before starting. Play and build with simple models to practice first.

Step 1

Split your data set

Its 80-20

80% train
20% test

First thing first:

The test data set should never be use to train your model ever. Even during optimization you should NEVER optimize/tune your model based on the result of the test data set.

These randomized 20 test images will be predicted by LW model and take the predicted accuracy etc value

Next, take randomized 20 test images from LW image set and again predict using LW model <- this will be your control, your goal is to beat or match this

Step 2

Take the 80% test set to build your model, you can do the basic grid search hypertuning etc to get the best validation accuracy (go read up on cross validation).

Use the optimized model predict on UM test set and LW test set

Compare your result with LW model

If you want to do further stuff for your project you can compare the accuracy of your models built with various pretrained models (VGG19 is one of the pretrained model mentioned) to see how the accuracy changes

Complete your project with that. Its an undergrad project, if you are interested to further this wait till you reach postgraduate.

This post has been edited by pipedream: Jan 15 2021, 09:00 PM

Card PM

Report Top

Like

Quote Reply

TSiSean	Jan 15 2021, 09:42 PM Show posts by this member only \| Post #16
iz old liao. Senior Member 4,496 posts Joined: Jun 2011	QUOTE(pipedream @ Jan 15 2021, 08:58 PM) Go read up on deep learning. Get your fundamentals down first before starting. Play and build with simple models to practice first. Step 1 Split your data set Its 80-20 80% train 20% test First thing first: The test data set should never be use to train your model ever. Even during optimization you should NEVER optimize/tune your model based on the result of the test data set. These randomized 20 test images will be predicted by LW model and take the predicted accuracy etc value Next, take randomized 20 test images from LW image set and again predict using LW model <- this will be your control, your goal is to beat or match this Step 2 Take the 80% test set to build your model, you can do the basic grid search hypertuning etc to get the best validation accuracy (go read up on cross validation). Use the optimized model predict on UM test set and LW test set Compare your result with LW model If you want to do further stuff for your project you can compare the accuracy of your models built with various pretrained models (VGG19 is one of the pretrained model mentioned) to see how the accuracy changes Complete your project with that. Its an undergrad project, if you are interested to further this wait till you reach postgraduate. Hopefully I'm not wasting your breathe. I really appreciate your time explaining to me all these, as I don't have someone to guide me through all these.... [If you don't mind guiding, I think I can ask my supervisor to add your name into my thesis if you wanted.] Back to the topic: Problem is they always the online dataset stored into TensorFlow Then this weird like called "(X_train, y_train), (X_test, y_test) = mnist.load_data()" which makes my life miserable when splitting the data, as I have no idea how they actually split the data. Not mistaken "y" are labels/names and "x" are images. =================================================== Also, TensorFlow normally uses a ImageDataGenerator so I also don't know which data they take. Also the terminology between "testing" and "validation" confuses me from time to time... So let me get this straight, when people mention of using 80% as training data it includes the validation data comprising of (20%) for it train the model correct? So the model basically fine-tunes itself from the code with the training data of 80%, from the automatic splitting of the ImageDataGenerator? Then the testing data is the data the model "never" seen before. And it is Feed into the model afterwards to see how well the model is performing? Means, I should technically export out a "Model", then manually feed images into the Model to get the results from Testing and obtain the predicted accuracy etc value?
Card PM	Report Top Like Quote Reply

pipedream	Jan 15 2021, 10:11 PM Show posts by this member only \| Post #17
Look at all my stars!! Senior Member 2,353 posts Joined: Dec 2006	QUOTE(iSean @ Jan 15 2021, 09:42 PM) Hopefully I'm not wasting your breathe. I really appreciate your time explaining to me all these, as I don't have someone to guide me through all these.... [If you don't mind guiding, I think I can ask my supervisor to add your name into my thesis if you wanted.] Back to the topic: Problem is they always the online dataset stored into TensorFlow Then this weird like called "(X_train, y_train), (X_test, y_test) = mnist.load_data()" which makes my life miserable when splitting the data, as I have no idea how they actually split the data. Not mistaken "y" are labels/names and "x" are images. =================================================== Also, TensorFlow normally uses a ImageDataGenerator so I also don't know which data they take. Also the terminology between "testing" and "validation" confuses me from time to time... So let me get this straight, when people mention of using 80% as training data it includes the validation data comprising of (20%) for it train the model correct? So the model basically fine-tunes itself from the code with the training data of 80%, from the automatic splitting of the ImageDataGenerator? Then the testing data is the data the model "never" seen before. And it is Feed into the model afterwards to see how well the model is performing? Means, I should technically export out a "Model", then manually feed images into the Model to get the results from Testing and obtain the predicted accuracy etc value? No problem lah, we learn together. Knowledge is meant to be shared. You need to learn how to google doing programming Unsure anything, just paste that code in google https://stackoverflow.com/questions/5806426...in-and-test-set The mnist is the dataset, so it contains a function called load_data() So what this code does CODE def load_data(path='mnist.npz'): path = get_file(path, origin='https://s3.amazonaws.com/img-datasets/mnist.npz', file_hash='8a61469f7ea1b51cbae51d4f78837e45') with np.load(path, allow_pickle=True) as f: x_train, y_train = f['x_train'], f['y_train'] x_test, y_test = f['x_test'], f['y_test'] return (x_train, y_train), (x_test, y_test) It separates out the dataset that has already been split for you So your call (X_train, y_train), (X_test, y_test) = mnist.load_data() Will automatically define the variables X_train .. y_test to the appropriate sets In your case, you need to manually shuffle and subset your data I am not sure exact code to do in python but one of the way you can do this is by 1. Define your images from 0-100 eg in COVID positive images 2. Randomly draw 80 numbers 3. Subset your image dataset based on the 80 numbers I believe there should be a function that helps you to do this. Do your homework lol. Come back to me with the code and I'll check for you. To answer your second question: That method is kinda dated We have what we called cross-validation/Out of bag sampling method You can read up on it but it does not involved another separate hold out set which your limited dataset will suffer from This post has been edited by pipedream: Jan 15 2021, 10:15 PM
Card PM	Report Top Like Quote Reply

TSiSean	Jan 16 2021, 12:51 AM Show posts by this member only \| Post #18
iz old liao. Senior Member 4,496 posts Joined: Jun 2011	QUOTE(pipedream @ Jan 15 2021, 10:11 PM) No problem lah, we learn together. Knowledge is meant to be shared. You need to learn how to google doing programming Unsure anything, just paste that code in google https://stackoverflow.com/questions/5806426...in-and-test-set The mnist is the dataset, so it contains a function called load_data() So what this code does CODE def load_data(path='mnist.npz'): path = get_file(path, origin='https://s3.amazonaws.com/img-datasets/mnist.npz', file_hash='8a61469f7ea1b51cbae51d4f78837e45') with np.load(path, allow_pickle=True) as f: x_train, y_train = f['x_train'], f['y_train'] x_test, y_test = f['x_test'], f['y_test'] return (x_train, y_train), (x_test, y_test) It separates out the dataset that has already been split for you So your call (X_train, y_train), (X_test, y_test) = mnist.load_data() Will automatically define the variables X_train .. y_test to the appropriate sets In your case, you need to manually shuffle and subset your data I am not sure exact code to do in python but one of the way you can do this is by 1. Define your images from 0-100 eg in COVID positive images 2. Randomly draw 80 numbers 3. Subset your image dataset based on the 80 numbers I believe there should be a function that helps you to do this. Do your homework lol. Come back to me with the code and I'll check for you. To answer your second question: That method is kinda dated We have what we called cross-validation/Out of bag sampling method You can read up on it but it does not involved another separate hold out set which your limited dataset will suffer from [first part] I now remember that I did a few methods from guides on TF by Google previously, I just remembered I built this during July last year prior before taking my internship and putting behind these. https://colab.research.google.com/drive/1MR...G3q?usp=sharing Feel free to comment whether is methodology is okay? This doesn't require to use the annoying (X_train, y_train), (X_test, y_test) method. [second part] well tensorflow 2.0 guide still using it... so so fast dated ah the 80% "training & validation" part? Well gotta read up the cross-validation part. But I think it would be more helpful to better visualize is my dataset having issues by getting something like a gradient map in the tutorial mentioned earlier XoHR4p8AO9o
Card PM	Report Top Like Quote Reply

pipedream	Jan 16 2021, 01:25 AM Show posts by this member only \| Post #19
Look at all my stars!! Senior Member 2,353 posts Joined: Dec 2006	QUOTE(iSean @ Jan 16 2021, 12:51 AM) [first part] I now remember that I did a few methods from guides on TF by Google previously, I just remembered I built this during July last year prior before taking my internship and putting behind these. https://colab.research.google.com/drive/1MR...G3q?usp=sharing Feel free to comment whether is methodology is okay? This doesn't require to use the annoying (X_train, y_train), (X_test, y_test) method. [second part] well tensorflow 2.0 guide still using it... so so fast dated ah the 80% "training & validation" part? Well gotta read up the cross-validation part. But I think it would be more helpful to better visualize is my dataset having issues by getting something like a gradient map in the tutorial mentioned earlier XoHR4p8AO9o Not really dated lah just that I feel cross validation is suitable for small dataset like yours Quick look through your code/script: Looks good, this is how keras is used The thing you can play around is here CODE baseModel = VGG16(weights="imagenet", include_top=False, input_tensor=Input(shape=(224, 224, 3))) # construct the head of the model that will be placed on top of the # the base model headModel = baseModel.output headModel = AveragePooling2D(pool_size=(4, 4))(headModel) headModel = Flatten(name="flatten")(headModel) headModel = Dense(64, activation="relu")(headModel) headModel = Dropout(0.3)(headModel) headModel = Dense(3, activation="softmax")(headModel) See you are actually adding layers to the base model here, you can try changing the basemodel to the LW model then play around with the layers, the activation model, dimensionality etc) I remember there a layer called Convnet that is specifically for image models https://towardsdatascience.com/building-a-c...as-329fbbadc5f5 Try to play around with that as well.
Card PM	Report Top Like Quote Reply

E-Tan	May 26 2021, 04:36 PM Show posts by this member only \| Post #20
Getting Started Junior Member 137 posts Joined: Mar 2013	I'm guessing your FYP project is already long done, but for others looking for imaging databases, I found The Cancer Imaging Archive pretty useful! Looks like their Covid collection is growing https://www.cancerimagingarchive.net/collections/
Card PM	Report Top Like Quote Reply

« Next Oldest · PhD School · Next Newest »