ABCDEFGHIJKLMNOP
1
Serial NumberDataset DescriptionNum Train Steps
steps_per_stats
Num_LayersNum_UnitsDropoutAttention (With Type)Embeddings (Description)Done / Not DoneBest BLEUBest AccuracyTime TakenF1F1 on another set: deasciption |score.Inference
2
Attention or no Attention
3
1- The test, train and dev sets were randomly shuffled and split in the ratio 10:80:104000010021280.2NoNoDone97.6989.75~1 hour minutes On the special test set | 0.7891
4
2- The test, train and dev sets were randomly shuffled and split in the ratio 10:80:104000010021280.2Yes (scaled luong)NoDone97.588~1 hour 33 minutes
5
3Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.3000010021280.2NoNoDone66.395.71~1 hour 30 minutes (empirical)
6
4Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.4000010021280.2Yes (scaled luong)NoDone85.1634.29~1 hour 30 minutes
7
The model performance seemed to immensely increase on using the attention mechanism, though the increase was not stable. But over all the BLEU and Accuracy reached good values and thus points us towards the advantages of using attention in our further studies. The rest of the following experiments will use attention.
8
Fix Dropout
9
1Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.4000010021280.05Yes (scaled luong)NoDone580~1 hour 40 minutes (My PC)
10
2Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.4000010021280.5Yes (scaled luong)NoDone8640.9~1.5 hours (My PC)0.5172Making the dropout value to 0.5 from 0.2 boosted the consistency on the model as well as set a new record for the model performance on test set to 86 BLEU and 40.9% Accuracy with an F1 score using GERBIL = 0.5172 (Macro F1 QALD). The only major issue faced by the model was with dbo_species and dbo_family. The model keeps confusing between the 2.
11
3Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010021280.7Yes (scaled luong)NoDone8545.06~4 hours 15 minutes (College PC)
12
4Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010021280.9Yes (scaled luong)NoDone59.62.3~ 2 hours 18 minutes (GCP K80)
13
Fix attention type
14
1Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010021280.5Yes (luong)NoDone76.79.1~ 1 hour 40 minutes (empirical)
15
2Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010021280.5Yes (bahdanau)NoDone62.50~ 1 hour 40 minutes (empirical)
16
3Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.4000010021280.5Yes (scaled luong)NoDone8640.9~1.5 hours (My PC)0.5172Making the dropout value to 0.5 from 0.2 boosted the consistency on the model as well as set a new record for the model performance on test set to 86 BLEU and 40.9% Accuracy with an F1 score using GERBIL = 0.5172 (Macro F1 QALD). The only major issue faced by the model was with dbo_species and dbo_family. The model keeps confusing between the 2.
17
Fix Number of Units
18
1Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010022560.5Yes(scaled luong)NoDone82.925~ 10 hours 43 minutes (College PC)
19
2Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010025120.5Yes(scaled luong)NoDone55.80~2 hours (GCP)
20
Fix Number of Layers
21
1Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.400010011280.7Yes (scaled luong)NoDone | (The model did not learn beyond 400 iterations and terminated automatically with very high perplexity)10~ 15 minutes
22
2Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010031280.5Yes (scaled luong)NoDone58.705 hours 13 minutes
23
3Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.4000010021280.5Yes (scaled luong)NoDone8640.9~1.5 hours (My PC)0.5172Making the dropout value to 0.5 from 0.2 boosted the consistency on the model as well as set a new record for the model performance on test set to 86 BLEU and 40.9% Accuracy with an F1 score using GERBIL = 0.5172 (Macro F1 QALD). The only major issue faced by the model was with dbo_species and dbo_family. The model keeps confusing between the 2.
24
4Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010041280.5Yes (scaled luong)NoDone630~1 hour 30 minutes
25
Use Embeddings
26
1Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010021280.7Yes (scaled luong)Yes | SPARQL: Biased Graph walks | English: From Previous Models | OOV words were randomly initializedDone9363~1 hour 30 minutes | Best results at 15000 iterations, Interestingly the train set had a hard time to reach good performance. The performance dipped after 40,000 iterations to 85 BLEU and 25% accuracy.
27
The initial part of using embedding already trained previously using the same model usually only increased the learning speed. GIven the embeddings are being used for eukaryotes, the general fasttext models don't have the relavant vocabulary. Need to change the dataset for further evaluations.
28
1Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010021280.7Yes (scaled luong)Yes | fastext for englishNot Done
29
2Special Set: Separate test + Same Vocab + Frequency thresholding. The test and train tests were exclusiveof each other.5000010021280.7Yes (scaled luong)Yes | fasttext for english | fasttext for SPARQLNot Done
30
153000010022560.5YesNo