1 of 23

2 of 23

Data 100�Feature Engineering

Slides by:

Joseph E. Gonzalez, Deb Nolan, John DeNero, & Josh Hug

jegonzal@berkeley.edu

deborah_nolan@berkeley.edu

denero@Berkeley.edu

josh@joshh.ug

?

3 of 23

Recap of Linear Models

  •  

4 of 23

Feature Engineering

  • The process of transforming the raw features to into more informative features that can be �used in modeling tasks.
  • Feature Engineering enables you to:
    • capture domain knowledge (e.g., periodicity or relationships between features)
    • express non-linear relationships using simple linear models
    • encode non-numeric features to be used as inputs to models

5 of 23

Feature Functions

  • Feature functions transform features into new features

Domain

uid

age

state

hasBought

review

0

32

NY

True

”Meh.”

42

50

WA

True

”Worked out of the box …”

57

16

CA

NULL

“Hella tots lit...”

AK

NY

WY

age

age^2

hasBought missing

0

1

0

32

32^2

0

0

0

0

50

50^2

0

0

0

0

16

16^2

1

Entirely Quantitative and Transformed Values

DataFrame

Different number�of features

6 of 23

Feature Functions

  • Feature functions transform features into new features

Domain

uid

age

state

hasBought

review

0

32

NY

True

”Meh.”

42

50

WA

True

”Worked out of the box …”

57

16

CA

NULL

“Hella tots lit...”

AK

NY

WY

age

age^2

hasBought missing

0

1

0

32

32^2

0

0

0

0

50

50^2

0

0

0

0

16

16^2

1

Entirely Quantitative and Transformed Values

DataFrame

(phi)ture function

Designing the feature functions is a big part of machine learning and data science.

Feature Functions

  • capture domain knowledge
  • substantial contribute to expressivity (and complexity)

7 of 23

Feature Functions

  • Feature functions transform features into new features

Domain

uid

age

state

hasBought

review

0

32

NY

True

”Meh.”

42

50

WA

True

”Worked out of the box …”

57

16

CA

NULL

“Hella tots lit...”

AK

NY

WY

age

age^2

hasBought missing

0

1

0

32

32^2

0

0

0

0

50

50^2

0

0

0

0

16

16^2

1

Entirely Quantitative and Transformed Values

DataFrame

(phi)ture function

Designing the feature functions is a big part of machine learning and data science.

Feature Functions

  • capture domain knowledge
  • substantial contribute to expressivity (and complexity)

8 of 23

Feature Functions Examples

9 of 23

The Constant Feature Function

  • By adding the all 1s column to our original data we were already introducing a feature function:

  • Sometimes this feature and it’s parameter are called:
      • Constant feature, offset, intercept, bias

n

d

n

d

1

1

1

1

1 +

= p

10 of 23

Modeling Non-linear Relationships

Feature Functions:

Note that feature functions don’t depend on parameters.

11 of 23

Encoding Categorical and Text Features

uid

age

state

hasBought

review

0

32

NY

True

”Meh.”

42

50

WA

True

”Worked out of the box …”

57

16

CA

NULL

“Hella tots lit yo ...”

rating

2.0

4.5

4.1

What if x is a record with numbers, text, booleans, etc…

X

Y

12 of 23

Predict rating from review information

uid

age

state

hasBought

review

0

32

NY

True

”Meh.”

42

50

WA

True

”Worked out of the box …”

57

16

CA

NULL

“Hella tots lit yo ...”

RatingsData(uid INTEGER, age FLOAT,

state STRING, hasBought BOOLEAN,

review STRING, rating FLOAT)

Schema:

rating

2.0

4.5

4.1

13 of 23

As a Linear Model?

Can I use X and Y directly in a linear model

    • No! Why?
    • Text, Categorical data, Missing values…

uid

age

state

hasBought

review

0

32

NY

True

”Meh.”

42

50

WA

True

”Worked out of the box …”

57

16

CA

NULL

“Hella tots lit yo ...”

RatingsData(uid INTEGER, age FLOAT,

state STRING, hasBought BOOLEAN,

review STRING, rating FLOAT)

rating

2.0

4.5

4.1

X=

Y=

Domain

X

14 of 23

Basic Transformations

  • Uninformative features: (e.g., UID)
    • Is this informative (probably not?)
    • Transformation: remove uninformative features (why?)
      • They could influence the model.
  • Quantitative Features (e.g., Age)
    • Transformation: May apply non-linear transformations (e.g., log)
    • Transformation: Normalize/standardize (more on this later …)
      • Example: (x – mean)/stdev
  • Categorical Features (e.g., State)
    • How do we convert State into meaningful numbers?
      • Alabama =1 , …, Utah = 50 ?
      • Implies order/magnitude means something … we don’t want that ...
    • Transformation: One-hot-Encode

15 of 23

One Hot Encoding (dummy encoding)

  • Transform categorical feature into many binary features:

state

NY

WA

CA

AK

CA

NY

WA

WY

0

0

1

0

0

0

0

0

1

0

0

1

0

0

0

Corresponding feature functions

See notebook for example code.

Fish

Dog

Cat

Origin of the term: multiple “wires” for possible values one is hot …

16 of 23

Encoding Missing Values

  • Missing values in Quantitative Data
    • Try to impute (estimate) missing values… (tricky)
      • Substitute the sample mean
    • Add a binary field called “missing_col_name”. (why?)
      • Sometimes missing data is signal!
  • Missing values in Categorical Data
    • Add an addition category called “missing_col_name”

17 of 23

Encoding categorical data

  • Categorical Data 🡺 One-hot encoding:

  • Text Data
    • Bag-of-words & N-gram models

“Learning about machine�learning is fun.”

0

0

1

2

1

learning

aardvark

machine

fun

0

zyzzyva

aardwolf

Vector

18 of 23

Bag-of-words Encoding

  • Generalization of one-hot-encoding for a string of text:

  • Encode text as a long vector of word counts (Issues?)
    • Long = millions of columns 🡪typically high dimensional and very sparse
    • Word order information is lost… (is this an issue?)
    • New unseen words at prediction (test) time 🡪 drop them …
  • A bag is another term for a multisetan unordered collection which may contain multiple instances of each element.
  • Stop words: words that do not contain significant information
    • Examples: the, in, at, or, on, a, an, and …
    • Typically removed

“Learning about machine�learning is fun.”

0

0

1

2

1

learning

aardvark

machine

fun

0

zyzzyva

aardwolf

Vector

19 of 23

I made this art piece in graduate school

Do you see the stop word?

There used to be a dustbin and broom

… but the janitors got confused …

20 of 23

N-Gram Encoding

  • Sometimes word order matters:

  • How do we capture word order in a “vector” model?
    • N-Gram: “Bag-of- sequences-of-words”

The book was not well written but I did enjoy it.

The book was well written but I did not enjoy it.

21 of 23

2-Gram Encoding

The book was well written ...

the book

well written

book was

was well

the book

book was

was well

well written

0

0

1

1

1

0

Vector

1

aardvark airlines

apple pen

zyzzyva sf

22 of 23

N-Gram Encoding

  • Sometimes word order matters:

  • How do we capture word order in a “vector” model?
    • N-Gram: “Bag-of- sequences-of-words”
  • Issues:
    • Can be very sparse (many combinations occur only once)
    • Many combinations will only occur at prediction time 🡪 drop ..
    • Often use hashing approximation:
      • Increment counter at hash(“not enjoy”) collisions are okay

The book was not well written but I did enjoy it.

The book was well written but I did not enjoy it.

23 of 23

Feature Transformations to Capture Domain Knowledge

  • Feature functions capture domain knowledge by introducing additional information from other sources �and/or combining features

    • Encoding non-linear patterns

Could do a database lookup

Diurnal patterns.