2022-02-11-Benchmark-extracted-data Filtered data to process in further steps AMORE-NumbersYearsStars.json.gz head -c 100 AMORE-NumbersYearsStars.json [[1, 2007, 3], [2, 2007, 3], [3, 2006, 5], [4, 2007, 3], [5, 2007, 3], [6, 2008, 2], [7, 2006, 1], [% tail -c 100 AMORE-NumbersYearsStars.json [7911680, 2010, 3], [7911681, 2009, 5], [7911682, 2007, 2], [7911683, 2010, 4], [7911684, 2006, 5]]% Tools: extract_year_star.ipynb Input: movies.txt.gz Note: Simple overview of: review number, review year, review stars AMORE-TextDuplicates.json.gz head -c 100 AMORE-TextDuplicates.json [[1, 5615911], [2, 5615912], [3, 5615913], [4, 5615914], [5, 5615915], [6, 5615916], [7, 5615917], [% tail -c 100 AMORE-TextDuplicates.json [7898353, 7898371], [7898354, 7898372], [7898355, 7898373], [7899447, 7899449], [7906809, 7906810]]% Tools: extract_duplicates.ipynb Input: movies.txt.gz Note: Lists of at least 2 review numbers with equal texts AMORE-OpinionCollection.json.gz head -c 100 AMORE-OpinionCollection.json [[1, [["darkness", 1]], [["miracle", 1]], [["raped", 1], ["desert", 2], ["undocumented", 1], ["relen% tail -c 100 AMORE-OpinionCollection.json hollow", 1]], [], [["hollow", 2], ["bent", 2], ["lost", 1]], [["paradise", 1], ["fascinating", 1]]]]% Tools: extract_opinion_words.ipynb opinion_collection.py opinion_lexicon.py Note: List items for review numbers contain the numbers of occurences of opinion words. AMORE-OpinionCounts.json.gz head -c 100 AMORE-OpinionCounts.json {"1": [-5, -4], "2": [-10, -12], "3": [-2, -2], "4": [-8, -8], "5": [-8, -8], "6": [0, 0], "7": [-6,% tail -c 100 AMORE-OpinionCounts.json "7911680": [-2, -2], "7911681": [-1, -1], "7911682": [1, 0], "7911683": [1, 0], "7911684": [-4, -1]}% Tools: extract_opinion_words.ipynb opinion_collection.py opinion_lexicon.py Note: List contains of number-positive-words-in-summary + number-positive-words-in-text - number-negative-words-in-summary - number-negative-words-in-text and number-positive-words-in-summary UNION number-positive-words-in-text - number-negative-words-in-summary UNION number-negative-words-in-text.