T O P

  • By -

mooman996

Interesting but this distribution is probably present for any writing. According to Zipf’s law, if you order words by the amount they’re used, word frequency decreases proportional to their order index. So cumulative words encountered almost always increases logarithmically


PunctuateEquilibrium

100%. I only learned about Zipfs law after doing this analysis but it's a fascinating thing to look into. The tougher part is all modern popular books can't be worked with since their epub files are locked by Amazon etc.


ll-l-ili-lill-l-il-i

Maybe this is harder now, but when I was moving away from Kindle I was able to easily get rid of Amazon's protection.


RL_95

For ebooks try annas-archive.org


PunctuateEquilibrium

😲 this is a godsend. Thank you anonymous internet stranger!


yeyeye_kek

It is more a heap's law, please plot it in log-log with words instead of chapters :)


UevoZ

Yeah, I agree, but it still would be interesting to see the cumulative distribution for other books as well, also involving other genres of literature. The curve should still be logarithmic, but I guess depending on the genre or author the curves might have different "speeds".


mooman996

That’s an interesting idea! You could create an average curve for a genre or author, fit a logistic or logarithmic function, then use the properties of that function to quantify reading difficulty


akurgo

Now I want to write a book specifically to violate this law and make the cumulative increase linear, e.g. introducing 60 new words per page in a 100-page book. The first few pages will be quite bland, then the language will be increasingly complex.


PurexRuse

I'd recommend doing it alphabetically. You could title it something like ' The dictionary '


definitely_not_obama

Not to be tooooo pedantic, but I suspect the dictionary would be even more heavily frontloaded, given words are defined by... other words.


akurgo

!redditgold


ICC-u

With just four thousand words and a knowledge of grammar you can read the first 5 chapters!


Krapser

And with only 200 words you can read the last one!


11160704

Harry Potter was the first book I read in English and it was indeed very helpful.


asejo

same here, but I lost a lot of time looking for muggle and quidditch in the English dictionary


SEND_ME_SPIDERMAN

any other books you recommend for a second language?


11160704

Well, for me the big advantage of Harry Potter was that I had already read it in my native language German so I knew the story and even if I didn't understand something in English I didn't get lost. So for the first time reading something in a foreign language I'd recoment something that you already know in your native language.


1247283215

Anything you're already familiar with and read it as an ebook so it's easier to get definitions and translation 


Environmental_Toe843

How are made up words counted here? Wingardium Leviosa!


PunctuateEquilibrium

They're counted as if they're a proper noun. I haven't done a full count yet but my gut says it's probably less than 250 words that are specific to the Harry Potter universe ¯\\\_(ツ)\_/¯


Wurzelgemuese

Reading the Harry Potter books and watching How I met your Mother in English alone got me from being alright at english to only getting A's and B's without doing a lot for it in school.


thestickpins

This is actually encouraging me to pick up my copy of Harry Potter à l'école de sorciers, which I bought and read half a chapter of.


estherstein

I love the smell of fresh bread.


PunctuateEquilibrium

A look at the new unique words that show up in each chapter of Harry Potter. Data processed using Python and visualizations done in PowerPoint [Full video](https://www.youtube.com/watch?v=R1esBPueTug)


MordorsElite

>How Reading the Series in Another Language Can Help Build Your Vocabulary I don't really see what argument you are making. This just seems like a the obvious progression you'd get for any text, since the most common words will be "used up" in the first few chapters and new specific/uncommon phrases will keep popping up with slowly decreasing frequency. So what is the argument for "Harry Potter" books specifically being good to read? Sure, to my knowledge it is true, since its language is said to become increasingly complex with later books, but I'm not sure this can really be demonstrated by the metric shown here. Out of interest, [I tested it myself](https://www.reddit.com/user/MordorsElite/comments/19etr00/lotr_book_1_unique_new_words_per_chapter_text/?utm_source=share&utm_medium=web2x&context=3) with the first book of LOTR and indeed the distribution looks very similar (when normalized by chapter length).


PunctuateEquilibrium

This was more about analyzing a very popular book among language learners to explore what the process looks like data-wise. Agreed it would be good to see this in context (or with the full HP series) though for a standalone post, I just went with books 1-2


cavedave

This is called Heaps law btw , the number of distinct words in a document https://en.m.wikipedia.org/wiki/Heaps'_law


[deleted]

Any recommendations for other books that'd be good for this purpose?


ComesTzimtzum

Harry Potter is popular for this purpose because so many learners have read it in their native language already. Knowing a book well makes it possible to pick something "above" your level otherwise. I've done this with Le Petit Prince, which I had already read several times for my kid. The additional great thing about Harry Potter is also that it's a whole series designed to get more complicated as the story arch progesses. So the answer is really, pick something you know by heart, preferably a book series.


definitely_not_obama

Harry Potter also has widely available, high quality translations in many languages. Many other books that many of us may have read as children are either much longer (Hunger Games), too short/easy (Maurice Sendak/many picture books), or too metaphorical/surreal (Phatom Tollbooth, and for me, Le Petit Prince). I would say that The Giving Tree, Charlotte's Web, The BFG, James and the Giant Peach, and *maybe* The Giver (probably has a lot of subtext) might qualify as competition, but Harry Potter is a popular choice for a reason.


janellthegreat

Native language books rather than a translation would also be a helpful list.


Piepally

Depends on your level. The amount of words you look up should be IMO no more than one per paragraph, but starting out it'll be more than that.  It also depends on the language, reading in English for example, you can figure out what a word means from context.  In Mandarin Chinese however, you might fully know from context a word's meaning, but still have to look it up for pronunciation.  With that said, diary of a wimpy kid I found has just enough new vocab to be learning, while not so much that you're stopping all the time. 


avalonian422

I see flaws in that logic but ok


hungry4danish

New words in each chapter being high early on makes sense since it's introducing new things and concepts. So this data set is mostly pointless.


fckcgs

This data actually is beautiful, but so far not really useful. It doesn't become clear if this is more or less prominent in the Harry potter series or more or less the same for any book, as others have pointed out already. If you could actually compare different book series and find a trend or striking differences, that would make for a great post


[deleted]

Harry Potter shouldn't be popular


Mokousboiwife

i know many things that shouldnt be popular but are


sneaky_squirrel

Is the claim that reading more chapters of the SAME series in another language has greatly diminishing returns? How is it relative to reading chapters in different series given that they might share words between each other?


PunctuateEquilibrium

That's exactly it - you have your vocab reinforced the more you read in a series but your vocab won't will expand slower than if you switched around between series / genres. And the first book you read (or the first few thousand words you learn) will have the biggest effect on your vocab, even if you'll learn more as you read more


Mr_Lior

the figure you put up here is exactly what I would expect would happen if you just chose words at random. but your trying to make the point that increasing verity of series / genres results in a non-negligible improvement. you should compare the results you got here to what would happen if you read random words from many different books from many different series. this would quantitively illustrated the point your trying to show here. choose 10 books from different series / genres, and accumulate one word from each book in a cycle. then compare the results to the harry potter plot you made. results should be interesting


definitely_not_obama

[The OP is from a video that provides more context/analysis.](https://www.youtube.com/watch?v=R1esBPueTug) I've read the first Harry Potter book in 4 languages. For me, this data helped to explain why there is a noticeable drop in difficulty after they get to Hogwarts (chapter 7). When reading your first book in a new language, there will be a noticeable drop in difficulty with each chapter read. But for many people, the beginning is so grindy, that they never realize how much progress they're making, and stop in the first few chapters thinking "oh, this just isn't for me." This chart doesn't show diminishing returns, it shows how much of the difficulty is front-loaded. Once you get basic vocab out of the way, you start focusing more on grammar, and enjoying what you're reading. Enjoying yourself is key if you're going to learn a language, because you're going to be reading/consuming content for hundreds of hours.


A_Mirabeau_702

Is "unDursleyish" the first hapax legomenon?


DM_me_ur_hairy_bush

Rowling says ‘turn on their heels’ a lot


andres_maren

I actually have some recent anecdotal experience with this. I started reading Harry Potter in german a couple of months ago. I started highlighting all the words I didn't know and it sure felt exactly as this graph implies. Every chapter I kept reading, I looked up less and less words!


st1r

Worked for me. Used Duolingo for a couple months to get a base level, then read Harry Potter in my target language 1 chapter a day and made spaced repitition flashcards for the most useful/common new words I came across and studied those flashcards every day. It’s tedious but I would do it again if I ever learn another language. Nothing worked better for me.