Word Stemming

10.02.05 - 06:36pm
mood: meh
music playing: Lamb - Just Is
I was doing some experimenting with the Porter Stemming algorithm, and have encountered some problems. What I wanted to experiemnt with was its ability to handle the difference between a word that can be used as a verb or a noun, such as the verb "was running", to the noun "the runner". I happily discovered that it handled the difference in those words quite well. Unfortunately I discovered that it wasnt the case for all insteances for "****er" ending words that were turned into nouns. Photographer is getting stemmed to photograph, and in an odd quirk photography is getting stemmed to photographi, I'm not even sure what that weirdness is about. Unfortunately this means that I can't apply the stemming as an automated process. It'll require some human intervention to make sure it's results are accurate. At least it's a start and get's me 90% of the way there.