QUOTE(moltenx @ Dec 5 2019, 04:36 PM)
How sure you are that this is regression problem? I see one of the output got a character? For this problem, I would say you have to do feature engineering. Like calculating :
- how many characters
- how many numbers
- sum up all numbers
- average all numbers
- converting characters to numeric representation and sum it up.
A lot of things you can do to create the features. Then try again.
Hi moltenx,
Yup, actually this can be a classification problem as well.
I'm not sure the maximum number the outputs (maybe 3 to 8 outputs), but its always 0-9 and A-Z. And i also tried the method you mentioned and include it in the feature, sum, mean, std, var, first derivative and etc but none of these show any distinct improvement.
So far i tried 3 methods,
1st
Using conventional regression (random forest, KNN, SVM, DenseNet-1D), converting the output and input to binary or uint8, then treat it as multi-output regression problem
2nd
Using classification approach (random forest, SVM, KNN, DenseNet-1D), treat it as multi-output label, for each output, there are 36 categories (0-9, A-Z)
3rd
Using OCR similar approach, convert the 1D inputs to 2D (something like gram matrix), ResNet to extract the features and then use biLSTM to learn the characters from the image
None of the methods work. But from my observation, KNN with 1 neighbourhood can get almost 100% in training accuracy, but 0% in the testing set. This make me think that each of the input is unique on their own since there's only 1 neighbourhood is used and more than 1 neighbourhood will give poorer results. I'm trying to find the pattern in the data, but looks like even the deep learning method doesn't able to learn any pattern so far.