ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
LayerHeadHead CommentFeature Commentary
2
00Single-token features for: "goals/games", "daily/inbox/newsletter", infection words ("case/seasonal/resistance")
3
01
Top 5 features are all single-token "of" features. Later features are single-token digit features (43, 7), "the", words ending in "-ation"
4
02
Conceptual bi-gram features, single-token ("space"), conceptual related single token ("shorts/adaptations/studio/director")
5
03
Single-token (such 4x), double-token (refer/referred), bigram (detract from), complex bigrams (imag-ines, imag-inations, imag-inary)
6
04
This technique highlighted mostly dead features. Some exception: double bigram + conceptual (fond, fond-ly, fond-ness, affection)3
7
05
Identified as duplicate token head. Lots of single-token suffix (8/10) are extended embedding (-ances, e.g. clearances performances etc; -ges e.g. ridges hedges sieges; -ately e.g. innately, Lately, Collegiately)
8
06news headlines and snippets (5/10), some dashes parens and punctuation
9
07Bigrams (Big Bang, Big Ten; F{annie, iling, iring})
10
08
Micro-context features ("panels/panel/phot-/modules/cells", "Beetle/Passenger/Cor-olla/Corvette", "(iPad) mini-3, (iPhone )<4/6/7>",)
11
09Long-range context features (7/10) (e.g. headlines (4x), country lists!, nutrition facts, code)
12
010single-token (4/10) (e.g. my/our/.;), years, state lists
13
011single-token (not), similar single-token (few/same/certain; first/second)
14
10
bigrams for cap letter (6/10) , single-token unit of time measure (year/week/night/day), local context ("much [these/do/fun/money]")
15
11
Entity features (3/10) (news agencies "CNN/Returns/AFP", domains "safarinow/utopianist/lacounty", twitter handles), pronoun single token ("He/That/She/They/It"),
16
12Headline long-range context (3/10), bigram suffix (2/10)
17
13Mostly dead features but remainder are long-range context news article
18
14Long-range context (6/10) (news articles, urls)
19
15Monosemantic
Succession or pairs related behavior. single-token entity (10/10) (men/male/children, human/people/children/girls, he/him/them/his, right, 7, roman numerals, First/Second/Third, third/fourth, 2015-2017, abc)
20
16
conceptual single and bigram verb tokens (5/10) (use/describe/signifies, sent/send/replied, see/seeing/blind, staying/fleeing/escape, view/watch/shot), single-token predicate (or/over/between, new/now),
21
17Mostly garbage, several headline/news context tracking
22
18Very weak attribution! Example of highly polysemantic head? Mainly long-range context or dead features
23
19Garbage
24
110Feature 3 and 7 are interesting context features. Otherwise: single-token (period) and some garbage
25
111
Duplicate token head. Lots of single-token or related single-token (7/10) (e.g. non, quick, grand/Grand, fore, wh-, serious/seriously, main/Main, sk-)
26
20
short phrases following a predicate (6/10) (e.g. only X / just Y / seen Z / never W / more A / not B), thousand/million X, conceptually unrelated single token
27
21
entity features (4/10) (Feberal/Labor/Senator/Senate, college/conference/school, specific names and Mr, Liberal/Labor/House), lots of garbage sentence/bracket tracking
28
22
mostly bigram and trigram features (8/10) (unapolog-eticaly/rave-lling, [V]er-mont/[V]ac-ancy, "and forth", [W]ast-ewater/[W]ald-au", "[N]er-dy/[N]itty- Gr[itty]", etc)
29
23Monosemantic
short phrases following a quantifier or temporal/spatial predicate (10/10) (e.g. both X, all Y, every Z, while A, after B, where C, before D, whatever A, either Z)
30
24mostly single-token and conceptually related single-token
31
25
physical direction and logical relationship (7/10) ("under X", "[adverb ending with -ly] Y", "by X", "after Y", "before Z", "see/saw X", "[Sentence] then X", "called Y", "between X [and Y]", "both A [and B]")
32
26
News headline and story long-range context (5/10), 1.5-token features (it/we/you), bigram "please {reach/read}, you {would, 'll, must}"
33
27
Very weak attribution! News headline and story + bible long-range context (5/10), some multi-range entity (types of OS devices, President/VP/House Press Office), "study {by/which/conducted}", Opening double quote
34
28Multi-word contextually related and idioms, phrases ending in "to", relating to dollar spend
35
29MonosemanticEntity (media/court/government/world/Times/Guardian/etc) followed by description of what it did (10/10)
36
210"Lambda features" (6/10), garbage news and sports stuff (4/10)
37
211Mostly BOS token attendance and single-token, some garbage
38
30Identified as a better duplicate token head. Single-token (something double) capitalized entity (7/10)
39
31
"Either <long phrase> X" and "not only <long phrase> X". book title tracking (2/10), certain idiomatic phrases, news "Also Related/Related" tracking, "[not] only X"
40
32Monosemantic
So/of/such/how/from/as/that/to/be/by <phrase> (10/10). Emotionally significant adjective noun combo (6/10) ("tremendous loss, willfull ignorance, crucial role, rare feat", "ugly fuck is/sophisticated manner/excitement and sense of", "crazy to/exhauste with/shocked to"), single-token
41
33
Almost Monosemantic
Bigrams like "igX/erX/herX/ emX/opX/incXriX/letXreX" (7/10)
42
34
Nationalities and important figures (6/10) (states, Obama, German/Indian/Spanish/British), gender he/his/him, alcohol/marijuana/cancer
43
35
quantifiers each/every/first/more (4/10). Anyone/asked/etc subject tracking. Single-token (equally/different/same/few, then/second/secondly, gender she/her), the <more or less or better> ("the better/easier/faster/more/less")
44
36
US/Govt/American/president related fragments (4/10), world's most extreme (toughest/most/wealthiest), phrases related to frequency (one at a time, every ten, out of every month), magnitude comparisons (bigger than, pronounced than, far higher, far more), just Y (just about, just weren't), very first/beginning/center
45
37
single-token (9/10), work {to be done/remains/we are doing, etc}. Also concept tracking (raw materials ordinary/standard, types of professions) (3/10) , bigrams (adX)
46
38
Active verbs like feel/make/survive/tell/think (5/10). Mixed, but also: single- and double-token important entities (4/10) (human beings/professionals/committee/alumni/officials/politicians)
47
39Particles like "while/by/as/from/though/with/given/once" (7/10), particularly, following commas (6/10)
48
310
G-induction? following comma "is/are" "all/the/would" (2/10), is/was/am, newspaper entities, I['m] {could/come/would/had}, question mark ending, who/whom, single-token (home)
49
311
tracking of ordinality or entirety or extremity (6/10) (the first X, the entire/whole Y, more Z, without even W, the last thing..., hard/tough to Z)
50
40Bisemantic
families of related active verbs (7/10) (cant/can't/and do... multi-word noun, share/sharing multi-word-phrase, open/opened..., put/putting..., issue/issues..., to try X, forced {out/into} Y), tracking fractional amount (3/10) (much of the, approximately eighteen months, as much as, plenty of things, giant heap, most of my)
51
41why X is a specific thing property ("why his hfather, why do americans, etc..."), so close to / closer to Y,
52
42MonosemanticNews related context tracking (10/10)
53
43Bisemantic
Potentially very interesting because of conjunctions of feature families: prescriptive and active assertions (7/10) (would be/result/also..., does not/has never/may not, must be/could be, which opened/consisted of/had gone, can be), specific entity phrase completions (3/10) (the court X, your/their names Y, issue was/wasn't/is)
54
44Bisemantic
single-token and double-token (10/10) (dates and quantities, is/have/am, countries/gender/parties, Twitter/FB/Pinterest/Flickr, up/down, days of week and months, much, year)
55
45
characterizations of typicality or extremity (5/10) (the usual X, the same Y, the average Z, so common, the right track/place/club, rec, [predicted] record/record-setting)
56
46gender he/she/their/they, it/we (4/10)
57
47MonosemanticWeak/non-standard duplicate token head
58
48Garbage
59
49
"Lambda features" (2/10), timing related several-word phrases (3/10) (for a while/fifty years/long time, months later/ earlier, few years ago)
60
410predecessor head? (feature 3 / num 20979) single token and weak entity capture stuff
61
411MonosemanticPrevious token head. single token (10/10)
62
50Mostly single token (7/10) and some url tracking
63
51F/C/Sh/ra-induction, amongst others (4/10). Single token but garbage
64
52Outlier tracking (2/10) (the best X, the biggest Y), legal snippets relating to possession, "quasi-Lambda features" (6/10)
65
53Of course X, specific footer text and news context snippets (3/10), may be the case / there are still plenty, See Also: Y
66
54combination of specific entities (WSJ/SB), "at least / some of", rest seems a bit random
67
55single-token mostly
68
56single-token mostly
69
57
related to specific active verbs (4/10) (included in X, passed/passing on to Y, found in Z, fill the/it/with, pay for/him/their), between X and Y
70
58single-token mostly
71
59identifying pronoun or particle at beginning of sentence (3/10)
72
510
specific entity short context features (5/10) (storm/hurrcane, capt/officer, court/justices ruled/cited, talks will be / were held, Trumpy X), some Lambda features
73
511single-token (4/10), law related context tracking (3/10)
74
60
Lambda features for local context tracking following a pronoun (5/10), action tracking on specific entities (3/10) (politicians, countries, voters)
75
61Interesting common idiom feature. Time "and effort/is now/etc". "New X", one/to
76
62Unclear long range context tracking (5/10), parenthetical translation of a phrase ("A clever oman" "I admit I'm in love", etc.)
77
63
Almost Monosemantic
Interesting cross-sentence "just" and "ask" tracker. Active verb tracking following a comma (9/10) (wanting to, going to, scheduled to, combine/mix/add, think/believe that, JUST X, formerly Y, asked X, if we/you/can)
78
64News short and long range context tracking (7/10)
79
65
short phrases and idioms related to agreement building (couldn't be more X, you guessed it, did say/mention, to say the least, you know/you see)
80
66BOS DFA. single-token (6/10), url suffix completions (2/10), newspaper names
81
67
Certain concepts local context tracking (7/10) (payment, vegetation/wetlands, doing less well, recruiting related, death related, attendance, average). certain phrases quasi-Lambda features (10/10)
82
68
83
69MonosemanticInduction features (10/10)
84
610X said/replied/cautioned
85
611
Almost Monosemantic
Lots of suffix completions of specific verb/phrase forms. From X, "leave Y (this/the conflict/in the/our prejudices)", something that is so Y (so far/polluted/powerful/popular), at least <short phrase>, keep Z (you/officers/the government/ etc), to let X (you/no one/those people), even W (even Republicans, even top batteries)
86
70
Seems highly polysemantic. one of/two of/some of (2/10). bi-gram features (3/10), common "something as / was as / those" characterizations (personal as possible, diverse as.., serene as.., predictable and mundane as..., easy as...)
87
71
specific active verbs (3/10), (request, help/support, participate, infiltrate), conjunctions (and/or), gender pronoun (2/10) (he/his, her/she),
88
72MonosemanticBizarre induction features (9/10) + some induction successor / generalization features?
89
73
Lambda features, global X, (2/10) "come in / come to / come back" / "in the / within these / in the current", specific kinds of public praise
90
74
91
75
Highly Polysemantic
Seems highly polysemantic. Steve memorization? Name-related, Secretary/Spokesman, family relationship tracking (3/10) (brothers/sons/father), as small as X / as low as Y
92
76Ċ garbage, long range context tracking and/or memorization (2/10)
93
77concept conjunction features (3/10) ("Italy and Cypres" "England and Bradwell" vs "and possible solutions" "and rotate")
94
78
phrasing related to how things are going or a specific action taken (6/10) (situation in country X particularly in news, the decision to Y, thing {starting to / could not}, {quote note}", issue {here is / was}", cutting related phrases, lawsuit filing)
95
79
Almost Monosemantic
reasoning and justification related phrasing (8/10) "of which / to which / just because / for which / but at least / we believe / in fact... / pretty clear that")
96
710Monosemanticinduction features (10/10)
97
711tokenization related (6/10)
98
80news snippet related (7/10)
99
81Monosemanticrelationship particles (10/10) (with, for, on, to, in, at, by, of, as, from)
100
82news snippet related (6/10) (news agency, Read more, MORE, copyright, advertisement, picture )