These topic-modelling techniques need varied, realistic data to achieve sensible results. This does not make sense to me.Ĭan someone suggest how to improve my code so that I get a reasonable number? The two sentences have effectively 1 words in common, which is "cars", however when I run the code I get that they are 100% similar. In the above code I am comparing how much "This is a book about cars, dinosaurs, and fences" is similar to "I like cars and birds" using the cosine similarity technique. Sims = index # perform a similarity query against the corpus Index = similarities.MatrixSimilarity(lsi) Vec_bow = dictionary.doc2bow(doc.lower().split()) Lsi = models.LsiModel(corpus, id2word=dictionary, num_topics=2) Stoplist = set('for a of the and to in -, is'.split()) From gensim import corpora, models, similaritiesĭocuments =
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |