Log in

No account? Create an account

Word frequencies

« previous entry | next entry »
апр. 16, 2008 | 02:00 pm

Reading a foreign text and having to look up lots of words on every page is pretty exhausting. And if there are really a lot, I tend to forget the words rather quickly.
My strategies to avoid endless flipping the dictionary pages were so far: 

1: choose a scientific book, as the author will try to explain his point as clearly as possible, giving multiple examples an using a rather formal language
2: select a topic that you are rather familiar with, so you can at times make an educated guess about the meaning of an unknown word.  

But even under these conditions the advances are pretty slow. OK, I didn't choose a really easy text, but of course it has to draw my attention as well.

Enter the Jargonizer. It is a C# program I finished today, and which basically does a histogram analysis on the text. It returns a file with two columns: the word and the number of times that it occurs. I manually removed the words I know  and this gave me a list of the 200 most frequent unknown (to me) words in the text. This should speed up the reading.

Some actual data:

Book: Русская Сказка by В.Я. Пропп

Top 10 words:

4527 и
3803 в
1711 не
1133 на
1127 с
1114 сказки
1102 о
917 что
897 к
819 а

Top 10 unknown words:

119 изучения
113 совершенно
100 ред
75 значение
74 изучение
66 является
63 указатель
57 случаях
57 рке
56 происхождение

Ссылка | Оставить комментарий |

Comments {5}


(без темы)

from: risboo6909
date: апр. 20, 2008 12:32 am (UTC)

*опечатка в предыдущем посте: Л. Н. Толстого

Ответить | Уровень выше | Ветвь дискуссии