amazonreviews_e.tar.gz all datasets, distributed bag of words, 40 epochs, 768 dimensions, Gensim 3.8.3 https://github.com/EML4U/Drift-detector-comparison/blob/e88de5f969e607823170c7daa9cde1c440c5529e/word2vec/doc2vec.py max_year = -1 # Max year for training max_docs = -1 # -1 to process all (for development) print_texts = False # Prints iterated texts (for development) doc2vec_vector_size = 768 # Dimensionality of the feature vectors doc2vec_min_count = 2 # Ignores all words with total frequency lower than this doc2vec_epochs = 40 # Number of iterations (epochs) over the corpus. Defaults to 10 for Doc2Vec doc2vec_dm = 0 # Training algorithm, distributed memory (PV-DM) or distributed bag of words (PV-DBOW) doc2vec_seed = -1 # -1, or int for reproducible results (under development) 2021-06-10 23:11:41.177173 Building vocabulary Training model Saved model file /home/eml4u/EML4U/data/amazon/amazonreviews_e.model Doc2Vec(dbow,d768,n5,mc2,s0.001,t3) Runtime: 157842.79472875595 seconds = 2630 min = 43 hours 50 min Gensim version: 3.8.3 EML4U experiment server amazonreviews_d.tar.gz all datasets, distributed bag of words, 40 epochs, 50 dimensions, Gensim 3.8.3 https://github.com/EML4U/Drift-detector-comparison/blob/882e5e1fd1da79708fa587bbd1161f1f3a0c3962/word2vec/paragraph-vector.py max_year = 9999 # Max year for training max_docs = -1 # -1 to process all (for development) print_texts = False # Prints iterated texts (for development) doc2vec_vector_size = 50 # Dimensionality of the feature vectors doc2vec_min_count = 2 # Ignores all words with total frequency lower than this doc2vec_epochs = 40 # Number of iterations (epochs) over the corpus. Defaults to 10 for Doc2Vec doc2vec_dm = 0 # Training algorithm, distributed memory (PV-DM) or distributed bag of words (PV-DBOW) doc2vec_seed = -1 # -1, or int for reproducible results (under development) 2021-05-21 18:56:33.734892 Doc2Vec(dbow,d50,n5,mc2,s0.001,t3) Runtime: 116137.58712434769 seconds = 1936 min = 32 hours 16 min Gensim version: 3.8.3 EML4U experiment server amazonreviews_c.model amazonreviews_c.model.dv.vectors.npy up to year 2000, 10 epochs https://github.com/EML4U/Drift-detector-comparison/tree/ad63c6c0ef10b32348d33a2d388a1491d3571b3c max_year = 2000 # Max year for training max_docs = -1 # -1 to process all (for development) print_texts = False # Prints iterated texts (for development) doc2vec_vector_size = 50 # Dimensionality of the feature vectors doc2vec_min_count = 2 # Ignores all words with total frequency lower than this doc2vec_epochs = 10 # Number of iterations (epochs) over the corpus. Defaults to 10 for Doc2Vec doc2vec_dm = 1 # Training algorithm, distributed memory (PV-DM) or distributed bag of words (PV-DBOW) doc2vec_seed = -1 # -1, or int for reproducible results (under development) 2021-05-19 15:35:30.675268 Doc2Vec(dm/m,d50,n5,w5,mc2,s0.001,t3) Runtime: 2897.5691492557526 seconds python3 paragraph-vector.py 3799,65s user 58,23s system 133% cpu 48:18,98 total Notebook A.W. amazonreviews_b.model up to year 1999, 40 epochs https://github.com/EML4U/Drift-detector-comparison/blob/2e2d8f3539605eea07521410ea88b3c59c6b5471/word2vec/paragraph-vector.py max_year = 2000 # Max year for training max_docs = -1 # -1 to process all (for development) print_texts = False # Prints iterated texts (for development) doc2vec_vector_size = 50 # Dimensionality of the feature vectors doc2vec_min_count = 2 # Ignores all words with total frequency lower than this doc2vec_epochs = 40 # Number of iterations (epochs) over the corpus. Defaults to 10 for Doc2Vec doc2vec_seed = -1 # -1, or int for reproducible results (dev) 2021-05-17 23:20:54.436872 Doc2Vec(dm/m,d50,n5,w5,mc2,s0.001,t3) Runtime: 7149.856955528259 seconds python3 paragraph-vector.py 7655,56s user 60,79s system 107% cpu 1:59:11,07 total Notebook A.W. amazonreviews_a.model up to year 1999, 10 epochs https://github.com/EML4U/Drift-detector-comparison/blob/d9e93bb03d655ab8170eb2283ffd7ecae1f1d9a4/word2vec/paragraph-vector.py max_year = 2000 # Max year for training max_docs = -1 # -1 to process all print_texts = False # Prints iterated texts (for development) doc2vec_vector_size = 50 # Dimensionality of the feature vectors doc2vec_min_count = 2 # Ignores all words with total frequency lower than this doc2vec_epochs = 10 # Number of iterations (epochs) over the corpus. Defaults to 10 for Doc2Vec 2021-05-17 16:07:15.833986 Doc2Vec(dm/m,d50,n5,w5,mc2,s0.001,t3) Runtime: 1994.4850919246674 seconds Notebook A.W.