Words Stemming Based on Structural and Semantic Similarity
Abstract
Words  stemming  is  one  of  the  important  issues  in  the field  of  natural  language processing  and  information retrieval.  There  are  different  methods  for stemming which are mostly language-dependent. Therefore, these  stemmers are only applicable  to  particular  languages.  Because  of the importance  of  this issue,  in  this paper, the proposed method for stemming is aimed to be language-independent. In the  proposed  stemmer,  a  bilingual  dictionary  is  used and  all  of  the  words  in  the dictionary are firstly clustered. The words’ clustering is based on their structural and semantic similarity. Finally, finding the stem of new coming words is performed by making use of the previously formatted clusters. To evaluate the proposed scheme, words  stemming is  done on both  Persian  and  English  languages.  The encouraging results  indicate  the  good  performance  of  the proposed  method  compared  with  its counterparts.