A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Layer | Head | Head Comment | Feature Commentary | ||||||||||||||||||||||||
2 | 0 | 0 | Single-token features for: "goals/games", "daily/inbox/newsletter", infection words ("case/seasonal/resistance") | |||||||||||||||||||||||||
3 | 0 | 1 | Top 5 features are all single-token "of" features. Later features are single-token digit features (43, 7), "the", words ending in "-ation" | |||||||||||||||||||||||||
4 | 0 | 2 | Conceptual bi-gram features, single-token ("space"), conceptual related single token ("shorts/adaptations/studio/director") | |||||||||||||||||||||||||
5 | 0 | 3 | Single-token (such 4x), double-token (refer/referred), bigram (detract from), complex bigrams (imag-ines, imag-inations, imag-inary) | |||||||||||||||||||||||||
6 | 0 | 4 | This technique highlighted mostly dead features. Some exception: double bigram + conceptual (fond, fond-ly, fond-ness, affection)3 | |||||||||||||||||||||||||
7 | 0 | 5 | Identified as duplicate token head. Lots of single-token suffix (8/10) are extended embedding (-ances, e.g. clearances performances etc; -ges e.g. ridges hedges sieges; -ately e.g. innately, Lately, Collegiately) | |||||||||||||||||||||||||
8 | 0 | 6 | news headlines and snippets (5/10), some dashes parens and punctuation | |||||||||||||||||||||||||
9 | 0 | 7 | Bigrams (Big Bang, Big Ten; F{annie, iling, iring}) | |||||||||||||||||||||||||
10 | 0 | 8 | Micro-context features ("panels/panel/phot-/modules/cells", "Beetle/Passenger/Cor-olla/Corvette", "(iPad) mini-3, (iPhone )<4/6/7>",) | |||||||||||||||||||||||||
11 | 0 | 9 | Long-range context features (7/10) (e.g. headlines (4x), country lists!, nutrition facts, code) | |||||||||||||||||||||||||
12 | 0 | 10 | single-token (4/10) (e.g. my/our/.;), years, state lists | |||||||||||||||||||||||||
13 | 0 | 11 | single-token (not), similar single-token (few/same/certain; first/second) | |||||||||||||||||||||||||
14 | 1 | 0 | bigrams for cap letter (6/10) , single-token unit of time measure (year/week/night/day), local context ("much [these/do/fun/money]") | |||||||||||||||||||||||||
15 | 1 | 1 | Entity features (3/10) (news agencies "CNN/Returns/AFP", domains "safarinow/utopianist/lacounty", twitter handles), pronoun single token ("He/That/She/They/It"), | |||||||||||||||||||||||||
16 | 1 | 2 | Headline long-range context (3/10), bigram suffix (2/10) | |||||||||||||||||||||||||
17 | 1 | 3 | Mostly dead features but remainder are long-range context news article | |||||||||||||||||||||||||
18 | 1 | 4 | Long-range context (6/10) (news articles, urls) | |||||||||||||||||||||||||
19 | 1 | 5 | Monosemantic | Succession or pairs related behavior. single-token entity (10/10) (men/male/children, human/people/children/girls, he/him/them/his, right, 7, roman numerals, First/Second/Third, third/fourth, 2015-2017, abc) | ||||||||||||||||||||||||
20 | 1 | 6 | conceptual single and bigram verb tokens (5/10) (use/describe/signifies, sent/send/replied, see/seeing/blind, staying/fleeing/escape, view/watch/shot), single-token predicate (or/over/between, new/now), | |||||||||||||||||||||||||
21 | 1 | 7 | Mostly garbage, several headline/news context tracking | |||||||||||||||||||||||||
22 | 1 | 8 | Very weak attribution! Example of highly polysemantic head? Mainly long-range context or dead features | |||||||||||||||||||||||||
23 | 1 | 9 | Garbage | |||||||||||||||||||||||||
24 | 1 | 10 | Feature 3 and 7 are interesting context features. Otherwise: single-token (period) and some garbage | |||||||||||||||||||||||||
25 | 1 | 11 | Duplicate token head. Lots of single-token or related single-token (7/10) (e.g. non, quick, grand/Grand, fore, wh-, serious/seriously, main/Main, sk-) | |||||||||||||||||||||||||
26 | 2 | 0 | short phrases following a predicate (6/10) (e.g. only X / just Y / seen Z / never W / more A / not B), thousand/million X, conceptually unrelated single token | |||||||||||||||||||||||||
27 | 2 | 1 | entity features (4/10) (Feberal/Labor/Senator/Senate, college/conference/school, specific names and Mr, Liberal/Labor/House), lots of garbage sentence/bracket tracking | |||||||||||||||||||||||||
28 | 2 | 2 | mostly bigram and trigram features (8/10) (unapolog-eticaly/rave-lling, [V]er-mont/[V]ac-ancy, "and forth", [W]ast-ewater/[W]ald-au", "[N]er-dy/[N]itty- Gr[itty]", etc) | |||||||||||||||||||||||||
29 | 2 | 3 | Monosemantic | short phrases following a quantifier or temporal/spatial predicate (10/10) (e.g. both X, all Y, every Z, while A, after B, where C, before D, whatever A, either Z) | ||||||||||||||||||||||||
30 | 2 | 4 | mostly single-token and conceptually related single-token | |||||||||||||||||||||||||
31 | 2 | 5 | physical direction and logical relationship (7/10) ("under X", "[adverb ending with -ly] Y", "by X", "after Y", "before Z", "see/saw X", "[Sentence] then X", "called Y", "between X [and Y]", "both A [and B]") | |||||||||||||||||||||||||
32 | 2 | 6 | News headline and story long-range context (5/10), 1.5-token features (it/we/you), bigram "please {reach/read}, you {would, 'll, must}" | |||||||||||||||||||||||||
33 | 2 | 7 | Very weak attribution! News headline and story + bible long-range context (5/10), some multi-range entity (types of OS devices, President/VP/House Press Office), "study {by/which/conducted}", Opening double quote | |||||||||||||||||||||||||
34 | 2 | 8 | Multi-word contextually related and idioms, phrases ending in "to", relating to dollar spend | |||||||||||||||||||||||||
35 | 2 | 9 | Monosemantic | Entity (media/court/government/world/Times/Guardian/etc) followed by description of what it did (10/10) | ||||||||||||||||||||||||
36 | 2 | 10 | "Lambda features" (6/10), garbage news and sports stuff (4/10) | |||||||||||||||||||||||||
37 | 2 | 11 | Mostly BOS token attendance and single-token, some garbage | |||||||||||||||||||||||||
38 | 3 | 0 | Identified as a better duplicate token head. Single-token (something double) capitalized entity (7/10) | |||||||||||||||||||||||||
39 | 3 | 1 | "Either <long phrase> X" and "not only <long phrase> X". book title tracking (2/10), certain idiomatic phrases, news "Also Related/Related" tracking, "[not] only X" | |||||||||||||||||||||||||
40 | 3 | 2 | Monosemantic | So/of/such/how/from/as/that/to/be/by <phrase> (10/10). Emotionally significant adjective noun combo (6/10) ("tremendous loss, willfull ignorance, crucial role, rare feat", "ugly fuck is/sophisticated manner/excitement and sense of", "crazy to/exhauste with/shocked to"), single-token | ||||||||||||||||||||||||
41 | 3 | 3 | Almost Monosemantic | Bigrams like "igX/erX/herX/ emX/opX/incXriX/letXreX" (7/10) | ||||||||||||||||||||||||
42 | 3 | 4 | Nationalities and important figures (6/10) (states, Obama, German/Indian/Spanish/British), gender he/his/him, alcohol/marijuana/cancer | |||||||||||||||||||||||||
43 | 3 | 5 | quantifiers each/every/first/more (4/10). Anyone/asked/etc subject tracking. Single-token (equally/different/same/few, then/second/secondly, gender she/her), the <more or less or better> ("the better/easier/faster/more/less") | |||||||||||||||||||||||||
44 | 3 | 6 | US/Govt/American/president related fragments (4/10), world's most extreme (toughest/most/wealthiest), phrases related to frequency (one at a time, every ten, out of every month), magnitude comparisons (bigger than, pronounced than, far higher, far more), just Y (just about, just weren't), very first/beginning/center | |||||||||||||||||||||||||
45 | 3 | 7 | single-token (9/10), work {to be done/remains/we are doing, etc}. Also concept tracking (raw materials ordinary/standard, types of professions) (3/10) , bigrams (adX) | |||||||||||||||||||||||||
46 | 3 | 8 | Active verbs like feel/make/survive/tell/think (5/10). Mixed, but also: single- and double-token important entities (4/10) (human beings/professionals/committee/alumni/officials/politicians) | |||||||||||||||||||||||||
47 | 3 | 9 | Particles like "while/by/as/from/though/with/given/once" (7/10), particularly, following commas (6/10) | |||||||||||||||||||||||||
48 | 3 | 10 | G-induction? following comma "is/are" "all/the/would" (2/10), is/was/am, newspaper entities, I['m] {could/come/would/had}, question mark ending, who/whom, single-token (home) | |||||||||||||||||||||||||
49 | 3 | 11 | tracking of ordinality or entirety or extremity (6/10) (the first X, the entire/whole Y, more Z, without even W, the last thing..., hard/tough to Z) | |||||||||||||||||||||||||
50 | 4 | 0 | Bisemantic | families of related active verbs (7/10) (cant/can't/and do... multi-word noun, share/sharing multi-word-phrase, open/opened..., put/putting..., issue/issues..., to try X, forced {out/into} Y), tracking fractional amount (3/10) (much of the, approximately eighteen months, as much as, plenty of things, giant heap, most of my) | ||||||||||||||||||||||||
51 | 4 | 1 | why X is a specific thing property ("why his hfather, why do americans, etc..."), so close to / closer to Y, | |||||||||||||||||||||||||
52 | 4 | 2 | Monosemantic | News related context tracking (10/10) | ||||||||||||||||||||||||
53 | 4 | 3 | Bisemantic | Potentially very interesting because of conjunctions of feature families: prescriptive and active assertions (7/10) (would be/result/also..., does not/has never/may not, must be/could be, which opened/consisted of/had gone, can be), specific entity phrase completions (3/10) (the court X, your/their names Y, issue was/wasn't/is) | ||||||||||||||||||||||||
54 | 4 | 4 | Bisemantic | single-token and double-token (10/10) (dates and quantities, is/have/am, countries/gender/parties, Twitter/FB/Pinterest/Flickr, up/down, days of week and months, much, year) | ||||||||||||||||||||||||
55 | 4 | 5 | characterizations of typicality or extremity (5/10) (the usual X, the same Y, the average Z, so common, the right track/place/club, rec, [predicted] record/record-setting) | |||||||||||||||||||||||||
56 | 4 | 6 | gender he/she/their/they, it/we (4/10) | |||||||||||||||||||||||||
57 | 4 | 7 | Monosemantic | Weak/non-standard duplicate token head | ||||||||||||||||||||||||
58 | 4 | 8 | Garbage | |||||||||||||||||||||||||
59 | 4 | 9 | "Lambda features" (2/10), timing related several-word phrases (3/10) (for a while/fifty years/long time, months later/ earlier, few years ago) | |||||||||||||||||||||||||
60 | 4 | 10 | predecessor head? (feature 3 / num 20979) single token and weak entity capture stuff | |||||||||||||||||||||||||
61 | 4 | 11 | Monosemantic | Previous token head. single token (10/10) | ||||||||||||||||||||||||
62 | 5 | 0 | Mostly single token (7/10) and some url tracking | |||||||||||||||||||||||||
63 | 5 | 1 | F/C/Sh/ra-induction, amongst others (4/10). Single token but garbage | |||||||||||||||||||||||||
64 | 5 | 2 | Outlier tracking (2/10) (the best X, the biggest Y), legal snippets relating to possession, "quasi-Lambda features" (6/10) | |||||||||||||||||||||||||
65 | 5 | 3 | Of course X, specific footer text and news context snippets (3/10), may be the case / there are still plenty, See Also: Y | |||||||||||||||||||||||||
66 | 5 | 4 | combination of specific entities (WSJ/SB), "at least / some of", rest seems a bit random | |||||||||||||||||||||||||
67 | 5 | 5 | single-token mostly | |||||||||||||||||||||||||
68 | 5 | 6 | single-token mostly | |||||||||||||||||||||||||
69 | 5 | 7 | related to specific active verbs (4/10) (included in X, passed/passing on to Y, found in Z, fill the/it/with, pay for/him/their), between X and Y | |||||||||||||||||||||||||
70 | 5 | 8 | single-token mostly | |||||||||||||||||||||||||
71 | 5 | 9 | identifying pronoun or particle at beginning of sentence (3/10) | |||||||||||||||||||||||||
72 | 5 | 10 | specific entity short context features (5/10) (storm/hurrcane, capt/officer, court/justices ruled/cited, talks will be / were held, Trumpy X), some Lambda features | |||||||||||||||||||||||||
73 | 5 | 11 | single-token (4/10), law related context tracking (3/10) | |||||||||||||||||||||||||
74 | 6 | 0 | Lambda features for local context tracking following a pronoun (5/10), action tracking on specific entities (3/10) (politicians, countries, voters) | |||||||||||||||||||||||||
75 | 6 | 1 | Interesting common idiom feature. Time "and effort/is now/etc". "New X", one/to | |||||||||||||||||||||||||
76 | 6 | 2 | Unclear long range context tracking (5/10), parenthetical translation of a phrase ("A clever oman" "I admit I'm in love", etc.) | |||||||||||||||||||||||||
77 | 6 | 3 | Almost Monosemantic | Interesting cross-sentence "just" and "ask" tracker. Active verb tracking following a comma (9/10) (wanting to, going to, scheduled to, combine/mix/add, think/believe that, JUST X, formerly Y, asked X, if we/you/can) | ||||||||||||||||||||||||
78 | 6 | 4 | News short and long range context tracking (7/10) | |||||||||||||||||||||||||
79 | 6 | 5 | short phrases and idioms related to agreement building (couldn't be more X, you guessed it, did say/mention, to say the least, you know/you see) | |||||||||||||||||||||||||
80 | 6 | 6 | BOS DFA. single-token (6/10), url suffix completions (2/10), newspaper names | |||||||||||||||||||||||||
81 | 6 | 7 | Certain concepts local context tracking (7/10) (payment, vegetation/wetlands, doing less well, recruiting related, death related, attendance, average). certain phrases quasi-Lambda features (10/10) | |||||||||||||||||||||||||
82 | 6 | 8 | ||||||||||||||||||||||||||
83 | 6 | 9 | Monosemantic | Induction features (10/10) | ||||||||||||||||||||||||
84 | 6 | 10 | X said/replied/cautioned | |||||||||||||||||||||||||
85 | 6 | 11 | Almost Monosemantic | Lots of suffix completions of specific verb/phrase forms. From X, "leave Y (this/the conflict/in the/our prejudices)", something that is so Y (so far/polluted/powerful/popular), at least <short phrase>, keep Z (you/officers/the government/ etc), to let X (you/no one/those people), even W (even Republicans, even top batteries) | ||||||||||||||||||||||||
86 | 7 | 0 | Seems highly polysemantic. one of/two of/some of (2/10). bi-gram features (3/10), common "something as / was as / those" characterizations (personal as possible, diverse as.., serene as.., predictable and mundane as..., easy as...) | |||||||||||||||||||||||||
87 | 7 | 1 | specific active verbs (3/10), (request, help/support, participate, infiltrate), conjunctions (and/or), gender pronoun (2/10) (he/his, her/she), | |||||||||||||||||||||||||
88 | 7 | 2 | Monosemantic | Bizarre induction features (9/10) + some induction successor / generalization features? | ||||||||||||||||||||||||
89 | 7 | 3 | Lambda features, global X, (2/10) "come in / come to / come back" / "in the / within these / in the current", specific kinds of public praise | |||||||||||||||||||||||||
90 | 7 | 4 | ||||||||||||||||||||||||||
91 | 7 | 5 | Highly Polysemantic | Seems highly polysemantic. Steve memorization? Name-related, Secretary/Spokesman, family relationship tracking (3/10) (brothers/sons/father), as small as X / as low as Y | ||||||||||||||||||||||||
92 | 7 | 6 | Ċ garbage, long range context tracking and/or memorization (2/10) | |||||||||||||||||||||||||
93 | 7 | 7 | concept conjunction features (3/10) ("Italy and Cypres" "England and Bradwell" vs "and possible solutions" "and rotate") | |||||||||||||||||||||||||
94 | 7 | 8 | phrasing related to how things are going or a specific action taken (6/10) (situation in country X particularly in news, the decision to Y, thing {starting to / could not}, {quote note}", issue {here is / was}", cutting related phrases, lawsuit filing) | |||||||||||||||||||||||||
95 | 7 | 9 | Almost Monosemantic | reasoning and justification related phrasing (8/10) "of which / to which / just because / for which / but at least / we believe / in fact... / pretty clear that") | ||||||||||||||||||||||||
96 | 7 | 10 | Monosemantic | induction features (10/10) | ||||||||||||||||||||||||
97 | 7 | 11 | tokenization related (6/10) | |||||||||||||||||||||||||
98 | 8 | 0 | news snippet related (7/10) | |||||||||||||||||||||||||
99 | 8 | 1 | Monosemantic | relationship particles (10/10) (with, for, on, to, in, at, by, of, as, from) | ||||||||||||||||||||||||
100 | 8 | 2 | news snippet related (6/10) (news agency, Read more, MORE, copyright, advertisement, picture ) |