Words Stemming Based on Structural and Semantic Similarity
Words Â stemming Â is Â one Â of Â the Â important Â issues Â in Â the field Â of Â natural Â languageÂ processing Â and Â information retrieval. Â There Â are Â different Â methods Â for stemmingÂ which are mostly language-dependent. Therefore, these Â stemmers are onlyÂ applicable Â to Â particular Â languages. Â Because Â of the importance Â of Â this issue, Â in Â thisÂ paper, the proposed method for stemming is aimed to be language-independent. InÂ the Â proposed Â stemmer, Â a Â bilingual Â dictionary Â is Â used and Â all Â of Â the Â words Â in Â theÂ dictionary are firstly clustered. The wordsâ€™ clustering is based on their structural andÂ semantic similarity. Finally, finding the stem of new coming words is performed byÂ making use of the previously formatted clusters. To evaluate the proposed scheme,Â words Â stemming is Â done on both Â Persian Â and Â English Â languages. Â The encouragingÂ results Â indicate Â the Â good Â performance Â of Â the proposed Â method Â compared Â with Â itsÂ counterparts.