ORIGIONAL DATA: 19746 words, 4214 Unique words, 2630 single copy words Free energy ( -sum( p ln(p)) = 1.490955 -> 85937 characters with all punctuation, caps removed -> 264280 chars when random letters inserted between words ########################################################## mbtqlmlfvetloomingscallmeerishmaelsomeylqyearstvhnjbagoaxhjt jcokhvneverpmqpmindhowzrbdlzjllonggbhqipreciselysunpvskepfdj ktcgarwtnxybgcvdjfbnohavinglittlezorunozsoyapmoneyyvugsgtsqi ############################################################## loomings call me ishmael some years ago never mind how long precisely having little or no money ############################################################## CONVERGENCE: Dictionary:0 = 26 letters, "5 sigma" for (N_data - N_model)/SD required for inclusion xi(composite word)_data >= 2 required for inclusion ############################################################ ITERATION 1: up to "b" (94 words) al 309.1 0.4678 0.3706 an 1039.8 0.7631 0.6053 ar 511.8 0.6050 0.5281 as 358.9 0.4652 0.3758 at 681.1 0.6349 0.5296 ################################################################ ITERATION 2: up to "ai" (386 words) abl 21.3 0.6655 0.2701 ack 21.6 0.6354 0.3786 ag 161.6 0.3573 0.3403 ain 101.9 0.7369 0.5859 ######################################################################### ITERATION 3:(up to "ai") (1021 words) able 50.0 0.9812 0.7809 ag 47.6 0.1432 0.1002 again 35.8 0.9987 0.9958 agan 5.1 0.8783 0.8568 agbl 2.5 0.8538 0.8220 age 57.6 0.8165 0.7108 agou 2.4 0.8209 0.7858 agreat 2.0 0.6885 0.6617 aid 29.2 0.6767 0.6353 ####################################################################### ITERATION 12: (2457 words) When you peruse the final dictionary, some obviously odd entries are the results of the random letters inserted between the english words in the scramble. Its amusing to follow the appearence and disappearence of word fragements between dictionaries. Some examples follow. abed 2.4 0.6284 0.5887 #( 2x "abed" "that bed" "the bed" ) ableg 3.0 0.9943 0.5962 #( unendurable length" "cable of" "Abominable are" "accountable farrago") # all above single copy words # grep ' able' chap1-10.txt | wc -> 0 # grep able dict:info.12 | wc -> 19 ableu 2.0 0.9918 0.9915 #("laughable about" "incommunicable. The") # only one instance of ".communic." in text hence not valid dictionary entry. # (laughs, laugh, laughable)->(zlaughs zlaugh tlaughableu) # Dictionary.6 (no ableu)laugh->3.0 0.9951 0.9951 # Dictionary.12 zlaugh-> 2.0 0.9998 0.9997 abominate 2.0 1.0000 1.0000 about 47.7 0.9985 0.9343 above 4.0 0.9972 0.9969 account 4.0 1.0000 0.5000 acea 1.6 0.7790 0.2700 #( "acea" "place, and" "place Jonah" "a heavenly" "place this") # Dictionary -> place 20.3 0.9993 0.6769 # grep ' place[^a-z]' chap1-10.txt | wc -> 20 # grep '[pP]lace' chap1-10.txt | wc -> 30 # Also in dictionary are, placed->4.0, fireplace->4.0, dplace->1.7) aceu 1.6 0.8286 0.5381 ached 2.1 0.9816 0.5155 achieved 2.0 1.0000 1.0000 aemploy 2.0 1.0000 1.0000 #grep 'employ'-> (he employs, while employed) | wc=2 affrighted 2.0 1.0000 1.0000 afraid 2.1 0.9996 0.5209 after 19.0 0.9991 0.7301 afternoon 2.0 1.0000 0.9999 afterwards 5.0 1.0000 0.9999 ag 23.8 0.0810 0.0500 again 17.6 0.9978 0.4885 against 9.8 0.9971 0.6138 againstn 1.7 0.8322 0.8321 ############################################################## Iteration Dictionary Size Free Energy 1 26 3.2230914 2 94 3.1806764 3 391 3.1117268 . . 11 2405 2.95867879 12 2457 2.957230115375 (previous step in iteration 2.957230115398 (2351 words in common between 11,12 remove whale, whales, whaleman, whalemen from Dictionary.12 2453 2.95785 Iteration Eigenvalue 2 0.9144 3 0.9845 4 0.9924 5 0.9836 6 0.9836 . . 11 0.9676 12 0.9666 Tried to randomize Dict.12 probs and reconverge, ended up with 396 words