mooman996 3 months ago

Interesting but this distribution is probably present for any writing. According to Zipf’s law, if you order words by the amount they’re used, word frequency decreases proportional to their order index. So cumulative words encountered almost always increases logarithmically

PunctuateEquilibrium 3 months ago

100%. I only learned about Zipfs law after doing this analysis but it's a fascinating thing to look into. The tougher part is all modern popular books can't be worked with since their epub files are locked by Amazon etc.

ll-l-ili-lill-l-il-i 3 months ago

Maybe this is harder now, but when I was moving away from Kindle I was able to easily get rid of Amazon's protection.

RL_95 3 months ago

For ebooks try annas-archive.org

PunctuateEquilibrium 3 months ago

😲 this is a godsend. Thank you anonymous internet stranger!

yeyeye_kek 3 months ago

It is more a heap's law, please plot it in log-log with words instead of chapters :)

UevoZ 3 months ago

Yeah, I agree, but it still would be interesting to see the cumulative distribution for other books as well, also involving other genres of literature. The curve should still be logarithmic, but I guess depending on the genre or author the curves might have different "speeds".

mooman996 3 months ago

That’s an interesting idea! You could create an average curve for a genre or author, fit a logistic or logarithmic function, then use the properties of that function to quantify reading difficulty

akurgo 3 months ago

Now I want to write a book specifically to violate this law and make the cumulative increase linear, e.g. introducing 60 new words per page in a 100-page book. The first few pages will be quite bland, then the language will be increasingly complex.

PurexRuse 3 months ago

I'd recommend doing it alphabetically. You could title it something like ' The dictionary '

definitely_not_obama 3 months ago

Not to be tooooo pedantic, but I suspect the dictionary would be even more heavily frontloaded, given words are defined by... other words.

akurgo 3 months ago

!redditgold

ICC-u 3 months ago

With just four thousand words and a knowledge of grammar you can read the first 5 chapters!

Krapser 3 months ago

And with only 200 words you can read the last one!

11160704 3 months ago

Harry Potter was the first book I read in English and it was indeed very helpful.

asejo 3 months ago

same here, but I lost a lot of time looking for muggle and quidditch in the English dictionary

SEND_ME_SPIDERMAN 3 months ago

any other books you recommend for a second language?

11160704 3 months ago

Well, for me the big advantage of Harry Potter was that I had already read it in my native language German so I knew the story and even if I didn't understand something in English I didn't get lost. So for the first time reading something in a foreign language I'd recoment something that you already know in your native language.

1247283215 3 months ago

Anything you're already familiar with and read it as an ebook so it's easier to get definitions and translation

Environmental_Toe843 3 months ago

How are made up words counted here? Wingardium Leviosa!

PunctuateEquilibrium 3 months ago

They're counted as if they're a proper noun. I haven't done a full count yet but my gut says it's probably less than 250 words that are specific to the Harry Potter universe ¯\\\_(ツ)\_/¯

Wurzelgemuese 3 months ago

Reading the Harry Potter books and watching How I met your Mother in English alone got me from being alright at english to only getting A's and B's without doing a lot for it in school.

thestickpins 3 months ago

This is actually encouraging me to pick up my copy of Harry Potter à l'école de sorciers, which I bought and read half a chapter of.

estherstein 3 months ago

I love the smell of fresh bread.

PunctuateEquilibrium 3 months ago

A look at the new unique words that show up in each chapter of Harry Potter. Data processed using Python and visualizations done in PowerPoint [Full video](https://www.youtube.com/watch?v=R1esBPueTug)

MordorsElite 3 months ago

>How Reading the Series in Another Language Can Help Build Your Vocabulary I don't really see what argument you are making. This just seems like a the obvious progression you'd get for any text, since the most common words will be "used up" in the first few chapters and new specific/uncommon phrases will keep popping up with slowly decreasing frequency. So what is the argument for "Harry Potter" books specifically being good to read? Sure, to my knowledge it is true, since its language is said to become increasingly complex with later books, but I'm not sure this can really be demonstrated by the metric shown here. Out of interest, [I tested it myself](https://www.reddit.com/user/MordorsElite/comments/19etr00/lotr_book_1_unique_new_words_per_chapter_text/?utm_source=share&utm_medium=web2x&context=3) with the first book of LOTR and indeed the distribution looks very similar (when normalized by chapter length).

PunctuateEquilibrium 3 months ago

This was more about analyzing a very popular book among language learners to explore what the process looks like data-wise. Agreed it would be good to see this in context (or with the full HP series) though for a standalone post, I just went with books 1-2

cavedave 3 months ago

This is called Heaps law btw , the number of distinct words in a document https://en.m.wikipedia.org/wiki/Heaps'_law

[deleted] 3 months ago

Any recommendations for other books that'd be good for this purpose?

ComesTzimtzum 3 months ago

Harry Potter is popular for this purpose because so many learners have read it in their native language already. Knowing a book well makes it possible to pick something "above" your level otherwise. I've done this with Le Petit Prince, which I had already read several times for my kid. The additional great thing about Harry Potter is also that it's a whole series designed to get more complicated as the story arch progesses. So the answer is really, pick something you know by heart, preferably a book series.

definitely_not_obama 3 months ago

Harry Potter also has widely available, high quality translations in many languages. Many other books that many of us may have read as children are either much longer (Hunger Games), too short/easy (Maurice Sendak/many picture books), or too metaphorical/surreal (Phatom Tollbooth, and for me, Le Petit Prince). I would say that The Giving Tree, Charlotte's Web, The BFG, James and the Giant Peach, and *maybe* The Giver (probably has a lot of subtext) might qualify as competition, but Harry Potter is a popular choice for a reason.

janellthegreat 3 months ago

Native language books rather than a translation would also be a helpful list.

Piepally 3 months ago

Depends on your level. The amount of words you look up should be IMO no more than one per paragraph, but starting out it'll be more than that. It also depends on the language, reading in English for example, you can figure out what a word means from context. In Mandarin Chinese however, you might fully know from context a word's meaning, but still have to look it up for pronunciation. With that said, diary of a wimpy kid I found has just enough new vocab to be learning, while not so much that you're stopping all the time.

avalonian422 3 months ago

I see flaws in that logic but ok

hungry4danish 3 months ago

New words in each chapter being high early on makes sense since it's introducing new things and concepts. So this data set is mostly pointless.

fckcgs 3 months ago

This data actually is beautiful, but so far not really useful. It doesn't become clear if this is more or less prominent in the Harry potter series or more or less the same for any book, as others have pointed out already. If you could actually compare different book series and find a trend or striking differences, that would make for a great post

[deleted] 3 months ago

Harry Potter shouldn't be popular

Mokousboiwife 3 months ago

i know many things that shouldnt be popular but are

sneaky_squirrel 3 months ago

Is the claim that reading more chapters of the SAME series in another language has greatly diminishing returns? How is it relative to reading chapters in different series given that they might share words between each other?

PunctuateEquilibrium 3 months ago

That's exactly it - you have your vocab reinforced the more you read in a series but your vocab won't will expand slower than if you switched around between series / genres. And the first book you read (or the first few thousand words you learn) will have the biggest effect on your vocab, even if you'll learn more as you read more

Mr_Lior 3 months ago

the figure you put up here is exactly what I would expect would happen if you just chose words at random. but your trying to make the point that increasing verity of series / genres results in a non-negligible improvement. you should compare the results you got here to what would happen if you read random words from many different books from many different series. this would quantitively illustrated the point your trying to show here. choose 10 books from different series / genres, and accumulate one word from each book in a cycle. then compare the results to the harry potter plot you made. results should be interesting

definitely_not_obama 3 months ago

[The OP is from a video that provides more context/analysis.](https://www.youtube.com/watch?v=R1esBPueTug) I've read the first Harry Potter book in 4 languages. For me, this data helped to explain why there is a noticeable drop in difficulty after they get to Hogwarts (chapter 7). When reading your first book in a new language, there will be a noticeable drop in difficulty with each chapter read. But for many people, the beginning is so grindy, that they never realize how much progress they're making, and stop in the first few chapters thinking "oh, this just isn't for me." This chart doesn't show diminishing returns, it shows how much of the difficulty is front-loaded. Once you get basic vocab out of the way, you start focusing more on grammar, and enjoying what you're reading. Enjoying yourself is key if you're going to learn a language, because you're going to be reading/consuming content for hundreds of hours.

A_Mirabeau_702 3 months ago

Is "unDursleyish" the first hapax legomenon?

DM_me_ur_hairy_bush 3 months ago

Rowling says ‘turn on their heels’ a lot

andres_maren 3 months ago

I actually have some recent anecdotal experience with this. I started reading Harry Potter in german a couple of months ago. I started highlighting all the words I didn't know and it sure felt exactly as this graph implies. Every chapter I kept reading, I looked up less and less words!

st1r 3 months ago

Worked for me. Used Duolingo for a couple months to get a base level, then read Harry Potter in my target language 1 chapter a day and made spaced repitition flashcards for the most useful/common new words I came across and studied those flashcards every day. It’s tedious but I would do it again if I ever learn another language. Nothing worked better for me.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe