Hash Tables
1
Lecture 19
CS61B, Spring 2025 @ UC Berkeley
Slides Credit: Josh Hug
Motivation, Set Implementations
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
Sets
We’ve now seen several implementations of the Set (or Map) ADT.
Set
ArraySet
BST
2-3 Tree
LLRB
Map
| contains(x) | add(x) | Notes |
ArraySet | Θ(N) | Θ(N) | |
BST | Θ(N) | Θ(N) | Random trees are Θ(log N). |
2-3 Tree | Θ(log N) | Θ(log N) | Beautiful idea. Very hard to implement. |
LLRB | Θ(log N) | Θ(log N) | Maintains bijection with 2-3 tree. Hard to implement. |
Worst case runtimes
Limits of Search Tree Based Sets
Our search tree based sets require items to be comparable.
Our search tree sets have excellent performance, but could maybe be better?
Today we’ll see the answer to both of the questions above is yes.
WriteItOnTheWallSet
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
Data Structures Reflect Real Life
Data Structures tend to be analogous to real-life things, so it's often useful to try playing the role of a data structure as a human (to get ideas on how they work)
Let's think about a simplified Set of Integers, which requires these two operations:
Our goal is to make these operations as fast as possible.
WriteItOnTheWallSet
Let's introduce a human implementation of Set: WriteItOnTheWall Set
We will have a wall, and a pencil.
Strongly analogous to an "ArraySet"
Two questions:
Is it fast to add: Adding "5" to a wall of 10 numbers
212 | | | | 131 | |||||
759 | | 281 | | | |||||
| | | 670 | | |||||
953 | 984 | | | 104 | |||||
| | 958 | | 526 | |||||
Is it fast to add: Adding "5" to a wall of 10 numbers
212 | | | | 131 | |||||
759 | | 281 | | | |||||
| | | 670 | | |||||
953 | 984 | 5 | | 104 | |||||
| | 958 | | 526 | |||||
Is it fast to add: Adding "5" to a wall of 100 numbers
281 | 953 | 104 | 958 | 212 | 131 | 984 | 670 | 759 | 526 |
161 | 560 | 815 | 289 | 462 | 591 | 828 | 981 | 603 | 922 |
175 | 455 | 286 | 605 | 543 | 375 | 669 | 970 | 651 | 65 |
995 | 13 | 916 | 616 | 721 | 913 | 872 | 881 | 22 | 830 |
584 | 137 | 228 | 86 | 861 | 109 | 821 | 253 | 305 | 530 |
317 | 340 | 494 | 719 | 737 | 677 | 786 | 672 | 216 | 702 |
35 | 770 | 480 | 557 | 74 | 52 | 632 | 765 | 753 | 73 |
194 | 77 | 226 | 764 | 173 | 979 | 454 | 106 | 967 | 551 |
556 | 644 | 739 | 547 | 973 | 796 | 525 | 573 | 920 | 28 |
290 | 708 | 298 | 56 | 288 | 891 | 867 | 579 | 417 | |
Is it fast to add: Adding "5" to a wall of 100 numbers
281 | 953 | 104 | 958 | 212 | 131 | 984 | 670 | 759 | 526 |
161 | 560 | 815 | 289 | 462 | 591 | 828 | 981 | 603 | 922 |
175 | 455 | 286 | 605 | 543 | 375 | 669 | 970 | 651 | 65 |
995 | 13 | 916 | 616 | 721 | 913 | 872 | 881 | 22 | 830 |
584 | 137 | 228 | 86 | 861 | 109 | 821 | 253 | 305 | 530 |
317 | 340 | 494 | 719 | 737 | 677 | 786 | 672 | 216 | 702 |
35 | 770 | 480 | 557 | 74 | 52 | 632 | 765 | 753 | 73 |
194 | 77 | 226 | 764 | 173 | 979 | 454 | 106 | 967 | 551 |
556 | 644 | 739 | 547 | 973 | 796 | 525 | 573 | 920 | 28 |
290 | 708 | 298 | 56 | 288 | 891 | 867 | 579 | 417 | 5 |
Is it fast to contains: Search for "439" on a wall of 10 numbers
212 | | | | 131 | |||||
759 | | 281 | | | |||||
| | | 670 | | |||||
953 | 984 | | | 104 | |||||
| | 958 | | 526 | |||||
Is it fast to contains: Search for "605" on a wall of 100 numbers
281 | 953 | 104 | 958 | 212 | 131 | 984 | 670 | 759 | 526 |
161 | 560 | 815 | 289 | 462 | 591 | 828 | 981 | 603 | 922 |
175 | 455 | 286 | 605 | 543 | 375 | 669 | 970 | 651 | 65 |
995 | 13 | 916 | 616 | 721 | 913 | 872 | 881 | 22 | 830 |
584 | 137 | 228 | 86 | 861 | 109 | 821 | 253 | 305 | 530 |
317 | 340 | 494 | 719 | 737 | 677 | 786 | 672 | 216 | 702 |
35 | 770 | 480 | 557 | 74 | 52 | 632 | 765 | 753 | 73 |
194 | 77 | 226 | 764 | 173 | 979 | 454 | 106 | 967 | 551 |
556 | 644 | 739 | 547 | 973 | 796 | 525 | 573 | 920 | 28 |
290 | 708 | 298 | 56 | 288 | 891 | 867 | 579 | 417 | 5 |
Is it fast to contains: Search for "605" on a wall of 100 numbers
281 | 953 | 104 | 958 | 212 | 131 | 984 | 670 | 759 | 526 |
161 | 560 | 815 | 289 | 462 | 591 | 828 | 981 | 603 | 922 |
175 | 455 | 286 | 605 | 543 | 375 | 669 | 970 | 651 | 65 |
995 | 13 | 916 | 616 | 721 | 913 | 872 | 881 | 22 | 830 |
584 | 137 | 228 | 86 | 861 | 109 | 821 | 253 | 305 | 530 |
317 | 340 | 494 | 719 | 737 | 677 | 786 | 672 | 216 | 702 |
35 | 770 | 480 | 557 | 74 | 52 | 632 | 765 | 753 | 73 |
194 | 77 | 226 | 764 | 173 | 979 | 454 | 106 | 967 | 551 |
556 | 644 | 739 | 547 | 973 | 796 | 525 | 573 | 920 | 28 |
290 | 708 | 298 | 56 | 288 | 891 | 867 | 579 | 417 | 5 |
Is it fast to contains: Search for "白" and "黒" on a wall of 1000 Kanji
須献資竹奈正賓方富秩苗移辞斤灯情腹縄典械迄淵執伐速寛藻具彰嗣謙役傾嘘於柵企娯李燃複主賂院雅設祥憾貌溝庁傲霊号酔旬停冖剤寡墳称遺原貧癶頻甲逮幕徒耂私素片署弧落兵坪感所農腐論眼式齢孝角略宛郎森好価認宅壁洲拐志反渋睡菓紋針縮銅蜂捨廾探確拷率夬干堂窃林鋭屮翔愚賀革月懸挑哀凵拙寝蒲以窟拒数骸頓障豕惨除体帽屏唄且狙伺膨製査用挨石釈雌投預爽繰品暑圏眺滴轄庭怠筒絞幌完築年憎莫高核煙机漬胎般擬岬販華斉遂晩社乃鈍鹿征傷縫苑沈獲重髄酌負党陸誠伊棟晶沼脱炎惚佐弊漠妖宿貿帥柴賦審飠緊散旺塊餅刺左後騰戯攵抗胸吊週症吟丸縁紳拳昇閲境披甫妨余錬謡厚井二痛渡匂白入盲淑廊攻萌蘇検使平鍛徹薪怒現笛腕賜康許舎押手矢霧悟連荘乏拠浩瓶遮員拶朱零濁辶成君囚鮮侮書柳貸鑑火恋悔朔樹偵求服廷謝尹泰坊贈去貴次救量剰慨逸制作哲駆裾血悠曹罒展対無迷福遊殊語俳端云外詩逆士忍避拭放得礁脈彑斜性飛婿底牲油職汽鬼弘隻導局惰将束粘毎妥観神妬加慮腺訟諸残痢必熊侯相根壱訴胡郷慢都系第向勧光著秀喩挿筆擦益肥籍履循勝溶析就巨渚蜜依塔咲溺席含拝慶案返揺礎紙望囗赴雑畏惧盛徐協旗爪競澄倣宜心泥止誓嚇育羅冫鼻匕吐皇到穴庄頼堀滅豆篤裏河棺甘紀省虜醸労恒把浜侍途距足迅辣呪憂亭免控還暁子恵日矛描疒互右排駒盗程玄拘畿虐咽人稼軽満営応閑聴道枢文混拾由国柔箇鉱隆別丁襲邦紫灬枠肺器勉乳菊税口整繕状呂玉恭液我塩窯仁磯脅復嘱侵悲袋親捉計杯孟磨辛挙裸胃能刷柏他誘驚那粉灰雨巣央表泡沿姻腫扉藩妻掛褐可栽藍湯叶危息殺与演弦掃橋夏当景怪合覇癖裂釜頃衤也射紡懇通彦疫崇彡層集賠換適吹忙仙漂読栓薬抜隈癒同花臨否線寄常紅蒼出箸町凝苛編恨公裁瑠欲羊刂派奪報瘍廴画或時穫色伯効討併改腸木麓秘喚宝視汚死顧漫髪保威議喪戸一然引幸摩監斐単雇羨冶汎哉脊間股評訓縦祖領待星倫腰宙柱東登弟英貝矯暇痩章医前況達治其削蚊卓陽司洋鯉凸膚句奔頂元衰殻津解建乞幣男鬱戻琴宀下毛巾忄斥斎装謂附竜兼創熟翁添繁発憧岡露殳凹説歓嫁存眠迎玩県克管悼幽艇漱決違触衆為厳理璃稽請列膝航倒条校帆屯俵棄冥亦刑組村蘭鼓門脂卵刈昆奥敷父吉丑欄塞禍低傘僅要這精飼較慌洪翌多輩追婦持鐘経漆衣難型敬母鋼渇肖猛峡俺振茨遥夕冗舛糧叔州屋覚隔誕抵犠穀隶弁損係活臓娘綱姉抑区利叱陰部叩傍亜伎両版韋夢衡基美曽民焦滑阜肪蔑記安概付湾円艦舌訂務容卜瞭契注亻芽懲優操賞唐野暫飲筈銘静酵牧扱雀調流飢冒覧予尻馴喝弄粧雷質踊便嫡慈悩褒冠棒殿着笠剛旦芋啓供警毒歩健塁証是浦
Is it fast to contains: Search for "白" and "黒" on a wall of 1000 Kanji
須献資竹奈正賓方富秩苗移辞斤灯情腹縄典械迄淵執伐速寛藻具彰嗣謙役傾嘘於柵企娯李燃複主賂院雅設祥憾貌溝庁傲霊号酔旬停冖剤寡墳称遺原貧癶頻甲逮幕徒耂私素片署弧落兵坪感所農腐論眼式齢孝角略宛郎森好価認宅壁洲拐志反渋睡菓紋針縮銅蜂捨廾探確拷率夬干堂窃林鋭屮翔愚賀革月懸挑哀凵拙寝蒲以窟拒数骸頓障豕惨除体帽屏唄且狙伺膨製査用挨石釈雌投預爽繰品暑圏眺滴轄庭怠筒絞幌完築年憎莫高核煙机漬胎般擬岬販華斉遂晩社乃鈍鹿征傷縫苑沈獲重髄酌負党陸誠伊棟晶沼脱炎惚佐弊漠妖宿貿帥柴賦審飠緊散旺塊餅刺左後騰戯攵抗胸吊週症吟丸縁紳拳昇閲境披甫妨余錬謡厚井二痛渡匂白入盲淑廊攻萌蘇検使平鍛徹薪怒現笛腕賜康許舎押手矢霧悟連荘乏拠浩瓶遮員拶朱零濁辶成君囚鮮侮書柳貸鑑火恋悔朔樹偵求服廷謝尹泰坊贈去貴次救量剰慨逸制作哲駆裾血悠曹罒展対無迷福遊殊語俳端云外詩逆士忍避拭放得礁脈彑斜性飛婿底牲油職汽鬼弘隻導局惰将束粘毎妥観神妬加慮腺訟諸残痢必熊侯相根壱訴胡郷慢都系第向勧光著秀喩挿筆擦益肥籍履循勝溶析就巨渚蜜依塔咲溺席含拝慶案返揺礎紙望囗赴雑畏惧盛徐協旗爪競澄倣宜心泥止誓嚇育羅冫鼻匕吐皇到穴庄頼堀滅豆篤裏河棺甘紀省虜醸労恒把浜侍途距足迅辣呪憂亭免控還暁子恵日矛描疒互右排駒盗程玄拘畿虐咽人稼軽満営応閑聴道枢文混拾由国柔箇鉱隆別丁襲邦紫灬枠肺器勉乳菊税口整繕状呂玉恭液我塩窯仁磯脅復嘱侵悲袋親捉計杯孟磨辛挙裸胃能刷柏他誘驚那粉灰雨巣央表泡沿姻腫扉藩妻掛褐可栽藍湯叶危息殺与演弦掃橋夏当景怪合覇癖裂釜頃衤也射紡懇通彦疫崇彡層集賠換適吹忙仙漂読栓薬抜隈癒同花臨否線寄常紅蒼出箸町凝苛編恨公裁瑠欲羊刂派奪報瘍廴画或時穫色伯効討併改腸木麓秘喚宝視汚死顧漫髪保威議喪戸一然引幸摩監斐単雇羨冶汎哉脊間股評訓縦祖領待星倫腰宙柱東登弟英貝矯暇痩章医前況達治其削蚊卓陽司洋鯉凸膚句奔頂元衰殻津解建乞幣男鬱戻琴宀下毛巾忄斥斎装謂附竜兼創熟翁添繁発憧岡露殳凹説歓嫁存眠迎玩県克管悼幽艇漱決違触衆為厳理璃稽請列膝航倒条校帆屯俵棄冥亦刑組村蘭鼓門脂卵刈昆奥敷父吉丑欄塞禍低傘僅要這精飼較慌洪翌多輩追婦持鐘経漆衣難型敬母鋼渇肖猛峡俺振茨遥夕冗舛糧叔州屋覚隔誕抵犠穀隶弁損係活臓娘綱姉抑区利叱陰部叩傍亜伎両版韋夢衡基美曽民焦滑阜肪蔑記安概付湾円艦舌訂務容卜瞭契注亻芽懲優操賞唐野暫飲筈銘静酵牧扱雀調流飢冒覧予尻馴喝弄粧雷質踊便嫡慈悩褒冠棒殿着笠剛旦芋啓供警毒歩健塁証是浦
WriteItOnTheWallSet
WriteItOnTheWallSet had fast adds (Θ(1)), but slow contains (Θ(N), where N is the number of elements)
How can we make this faster?
BobaCounterSet
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
When stuck on a hard problem, get boba
TPT has this interesting device to help organize their boba
BobaCounterSet
Let's try to formalize this with a BobaCounter Set
We will have a wall split into 10 "bins", and a pencil.
Two questions:
Is it fast to add: Adding "5" to a wall of 100 numbers
560 | 920 | 161 | 131 | 212 | 922 | 603 | 73 | 194 | 644 |
670 | 830 | 281 | 651 | 22 | 672 | 953 | 753 | 104 | 984 |
530 | 770 | 591 | 981 | 462 | 702 | 13 | 253 | 764 | 494 |
970 | 290 | 721 | 891 | 52 | 632 | 173 | 973 | 584 | 74 |
480 | 340 | 551 | 821 | 872 | | 573 | 913 | 454 | |
| | 861 | 881 | | | 543 | | | |
765 | 375 | 216 | 556 | 737 | 677 | 288 | 828 | 109 | 669 |
35 | 305 | 796 | 786 | 137 | 77 | 708 | 298 | 719 | 979 |
65 | 995 | 226 | 86 | 557 | 967 | 228 | 28 | 739 | 579 |
815 | 175 | 526 | 616 | 417 | 867 | 958 | | 289 | 759 |
605 | 455 | 106 | 286 | 317 | 547 | | | | |
525 | | 916 | 56 | | | | | | |
Is it fast to add: Adding "5" to a wall of 100 numbers
560 | 920 | 161 | 131 | 212 | 922 | 603 | 73 | 194 | 644 |
670 | 830 | 281 | 651 | 22 | 672 | 953 | 753 | 104 | 984 |
530 | 770 | 591 | 981 | 462 | 702 | 13 | 253 | 764 | 494 |
970 | 290 | 721 | 891 | 52 | 632 | 173 | 973 | 584 | 74 |
480 | 340 | 551 | 821 | 872 | | 573 | 913 | 454 | |
| | 861 | 881 | | | 543 | | | |
765 | 375 | 216 | 556 | 737 | 677 | 288 | 828 | 109 | 669 |
35 | 305 | 796 | 786 | 137 | 77 | 708 | 298 | 719 | 979 |
65 | 995 | 226 | 86 | 557 | 967 | 228 | 28 | 739 | 579 |
815 | 175 | 526 | 616 | 417 | 867 | 958 | | 289 | 759 |
605 | 455 | 106 | 286 | 317 | 547 | | | | |
525 | 5 | 916 | 56 | | | | | | |
Is it fast to contains: Search for "605" on a wall of 100 numbers
560 | 920 | 161 | 131 | 212 | 922 | 603 | 73 | 194 | 644 |
670 | 830 | 281 | 651 | 22 | 672 | 953 | 753 | 104 | 984 |
530 | 770 | 591 | 981 | 462 | 702 | 13 | 253 | 764 | 494 |
970 | 290 | 721 | 891 | 52 | 632 | 173 | 973 | 584 | 74 |
480 | 340 | 551 | 821 | 872 | | 573 | 913 | 454 | |
| | 861 | 881 | | | 543 | | | |
765 | 375 | 216 | 556 | 737 | 677 | 288 | 828 | 109 | 669 |
35 | 305 | 796 | 786 | 137 | 77 | 708 | 298 | 719 | 979 |
65 | 995 | 226 | 86 | 557 | 967 | 228 | 28 | 739 | 579 |
815 | 175 | 526 | 616 | 417 | 867 | 958 | | 289 | 759 |
605 | 455 | 106 | 286 | 317 | 547 | | | | |
525 | 5 | 916 | 56 | | | | | | |
Is it fast to contains: Search for "605" on a wall of 100 numbers
560 | 920 | 161 | 131 | 212 | 922 | 603 | 73 | 194 | 644 |
670 | 830 | 281 | 651 | 22 | 672 | 953 | 753 | 104 | 984 |
530 | 770 | 591 | 981 | 462 | 702 | 13 | 253 | 764 | 494 |
970 | 290 | 721 | 891 | 52 | 632 | 173 | 973 | 584 | 74 |
480 | 340 | 551 | 821 | 872 | | 573 | 913 | 454 | |
| | 861 | 881 | | | 543 | | | |
765 | 375 | 216 | 556 | 737 | 677 | 288 | 828 | 109 | 669 |
35 | 305 | 796 | 786 | 137 | 77 | 708 | 298 | 719 | 979 |
65 | 995 | 226 | 86 | 557 | 967 | 228 | 28 | 739 | 579 |
815 | 175 | 526 | 616 | 417 | 867 | 958 | | 289 | 759 |
605 | 455 | 106 | 286 | 317 | 547 | | | | |
525 | 5 | 916 | 56 | | | | | | |
BobaCounterSet
BobaCounterSet still has equally fast adds, and contains is now only as slow as the wall segment with the most elements
Any problems with this approach?
DynamicArrayOf ListsSet
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
How to minimize wasted space?
Instead of assigning the same amount of wall space per section, dynamically increase the size of each section as items get added there
Easiest solution here is to use Linked Lists
0
1
2
3
4
5
6
7
8
9
3
13
4
5
6
7
8
9
10
11
12
14
15
How to handle large numbers of items?
Let N = number of items in all bins, M = number of bins
If we assume that values are evenly distributed, each bin has about N/M items
So contains runs in Θ(N/M) time.
If M is constant, that reduces to Θ(N).
Is there a common mathematical function that generalizes�"last digit"?
0
1
2
3
4
5
6
7
8
9
3
13
4
5
6
7
8
9
10
11
12
14
15
Reduction Functions
Easiest solution is the "modulus" operator.
Other reductions are possible, e.g. number of digits. But modulus is the most natural and best reduction function.
0
1
2
3
4
5
6
7
8
9
88
0
10
3719
1034854400
% 10
0
reduce
index
4178
44
9
1034854400
Each integer gets reduced into an index.
Increasing M
To keep constant time contains, we need N/M to stay less than some constant k.
Two approaches:
0
1
2
3
4
5
6
7
8
9
88
0
10
3719
1034854400
% 10
0
reduce
index
4178
44
9
1034854400
Each integer gets reduced into an index.
Increasing M
How much to increase M?
0
1
2
3
4
5
6
7
8
9
88
0
10
3719
1034854400
% 10
0
reduce
index
4178
44
9
1034854400
Each integer gets reduced into an index.
DynamicArrayOfListsSet Resizing Example
Suppose we set a rule that when N/M is ≥ 1.5, we double M.
0
1
2
3
N = 0 M = 4 N / M = 0
DynamicArrayOfListsSet Resizing Example
Suppose we set a rule that when N/M is ≥ 1.5, we double M.
0
1
2
3
N = 1 M = 4 N / M = 0.25
7
DynamicArrayOfListsSet Resizing Example
Suppose we set a rule that when N/M is ≥ 1.5, we double M.
0
1
2
3
N = 2 M = 4 N / M = 0.5
7
16
DynamicArrayOfListsSet Resizing Example
Suppose we set a rule that when N/M is ≥ 1.5, we double M.
0
1
2
3
N = 3 M = 4 N / M = 0.75
7
16
3
DynamicArrayOfListsSet Resizing Example
Suppose we set a rule that when N/M is ≥ 1.5, we double M.
0
1
2
3
N = 4 M = 4 N / M = 1
7
16
3
11
DynamicArrayOfListsSet Resizing Example
Suppose we set a rule that when N/M is ≥ 1.5, we double M.
0
1
2
3
N = 5 M = 4 N / M = 1.25
7
16
3
11
20
DynamicArrayOfListsSet Resizing Example
Suppose we set a rule that when N/M is ≥ 1.5, we double M.
0
1
2
3
N = 6 M = 4 N / M = 1.5
N/M is too large. Time to double!
7
16
3
11
20
13
DynamicArrayOfListsSet Resizing Example
When N/M is ≥ 1.5, then double M.
0
1
2
3
N = 6 M = 4 N / M = 1.5
N/M is too large. Time to double!
?
?
?
0
1
2
3
4
5
6
7
?
?
?
?
?
7
16
11
20
13
3
DynamicArrayOfListsSet Resizing Example
When N/M is ≥ 1.5, then double M.
0
1
2
3
N = 6 M = 4 N / M = 1.5
N/M is too large. Time to double!
0
1
2
3
4
5
6
7
N = 6 M = 8 N / M = 0.75
7
16
11
20
13
3
16
3
11
20
13
7
DynamicArrayOfListsSet
The data structure we just built might be called a DynamicArrayOfListsSet.
If we have N items that are evenly distributed, length of each list is ~N/M.
We’ll think more carefully about runtime later.
1034854400
% 6
2
reduce
index
0
1
2
3
4
5
146
0
6
2133
DynamicArrayOfListsSet
103485440
46
9
Each integer gets reduced into an index.
lowercase strings
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
Goal: Storing Strings
The data structure we have so far is great for storing integers.
Storing the Word cat
Suppose we want to add(“cat”)
The key question:
What about after resize?
What are some issues with this approach?
Storing the Word cat (your answer)
Suppose we want to add(“cat”)
The key question:
What are some issues with this approach?
Storing the Word cat (my answer)
Suppose we want to add(“cat”)
The key question:
What are some issues with this approach?
Design Philosophy: Stringy stuff should be done in the String class
The big problem with the previous approach was that Set was responsible for figuring out how to categorize Strings
At the same time, String shouldn't be able to dictate when Set decides to resize
Solution: Set was most flexible when working on ints, so make it so that Set only works on ints.
Define a method f (in String) to convert Strings into an int, and store String s in the bin corresponding to f(s).
Finding a Way to Store the Word cat
Suppose we want to add(“cat”)
The key question:
What is another idea? Assume for now we’re dealing with only lower case letters in English.
Finding a Way to Store the Word cat (Your Answer)
Suppose we want to add(“cat”)
The key question:
What is another idea? Assume for now we’re dealing with only lower case letters in English.
Finding a Way to Store the Word cat (My Answer)
Suppose we want to add(“cat”)
The key question:
What is another idea? Assume for now we’re dealing with only lower case letters in English.
Treating cat as a Base 26 Number
Use all digits by multiplying each by a power of 26.
Why this specific pattern?
The Decimal Number System vs. My System for Strings
In the decimal number system, we have 10 digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Want numbers larger than 9? Use a sequence of digits.
Example: 7091 in base 10
Our system for strings is almost the same, but with letters.
Test Your Understanding
Convert the word “bee” into a number by using our “powers of 26” strategy.
Reminder: cat26 = (3 x 262) + (1 x 261) + (20 x 260) = 207410
Hint: ‘b’ is letter 2, and ‘e’ is letter 5.
Test Your Understanding
Convert the word “bee” into a number by using our “powers of 26” strategy.
Reminder: cat26 = (3 x 262) + (1 x 261) + (20 x 260) = 207410
Hint: ‘b’ is letter 2, and ‘e’ is letter 5.
Uniqueness
As long as we pick a base ≥ 26, this algorithm is guaranteed to give each lowercase English word a unique number!
The Hash Table
We’ve now extended our DynamicArrayOfLinkedLists to handle strings.
0
1
2
3
4
5
6
cats
map
go
fish
cat
lowerCaseToInt
2074
% 7
1
data
integer
Integerization Function
reduce
index
DynamicArrayOfLinkedLists
horse
cat
ball
sad
Implementing englishToInt (optional)
Optional exercise: Try to write a function englishToInt that can convert English strings to integers by adding characters scaled by powers of 26.
Examples:
Implementing englishToInt (optional) (solution)
/** Converts ith character of String to a letter number.� * e.g. 'a' -> 1, 'b' -> 2, 'z' -> 26 */
public static int letterNum(String s, int i) {
int ithChar = s.charAt(i);
if ((ithChar < 'a') || (ithChar > 'z'))
{ throw new IllegalArgumentException(); }
return ithChar - 'a' + 1;
}
public static int englishToInt(String s) {
int intRep = 0;
for (int i = 0; i < s.length(); i += 1) {
intRep = intRep * 26;
intRep = intRep + letterNum(s, i);
}
return intRep;
}
Integer Overflow
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
DataIndexedStringSet
Using only lowercase English characters is too restrictive.
Suppose we wanted a unique integer for each possible such string.
Someone has already done this.
ASCII Characters
The most basic character set used by most computers is ASCII format.
biggest value is 126
DataIndexedStringSet
Maximum possible value for english-only text including punctuation is 126, so can use 126 as our base in order to ensure unique values for all possible strings.
Examples:
=101,809,233
= 203,178,213
Implementing asciiToInt
Below is a simple formula which converts a String to an integer.
What if we want to use characters beyond ASCII?
public static int asciiToInt(String s) {
int intRep = 0;
for (int i = 0; i < s.length(); i += 1) {
intRep = intRep * 126;
intRep = intRep + s.charAt(i);
}
return intRep;
}
Going Beyond ASCII
chars in Java also support character sets for other languages and symbols.
Example: Computing Unique Representations of Kanji
The largest possible value for Kanji is 40,959*, so we’d need to use this as our base if we want to have a unique representation for all possible strings of Kanji.
Example:
横田誠号
横田誠司
横田誠叹
1,867,571,481,361,683,549
1,867,571,481,361,683,550
1,867,571,481,361,683,551
...
...
*If you’re curious, the last character is:
...
...
Hash Codes
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
Finitely Many Integers
So far, we’ve tried to map any possible string to a unique integer.
That is, we tried to map 横田誠司 as a base 40959 number, yielding 1,867,571,481,361,683,550, but this number doesn’t exist in Java as an int.
�Note: Other programming languages do not have finitely many integers. Python, for example, allows an integer to take on any value.
What Happens in Practice: Integer Overflow
In Java, the largest possible integer is 2,147,483,647.
int x = 2147483647;
System.out.println(x);
System.out.println(x + 1);
jug ~/Dropbox/61b/lec/hashing
$ javac BiggestPlusOne.java
$ java BiggestPlusOne
2147483647
-2147483648
Consequence of Overflow
Because Java has a maximum integer, we won’t get the numbers we expect!
Hash Codes
The official term for the number we’re computing is “hash code”.
That is, our integerization function is a “hash code” because the set we’re projected onto is fixed.
Java Uses Base 31
Because the range of our hashCode is finite, it is impossible to pursue unique factorizations for each String.
Instead of base 40,959 or something larger, Java uses 31.
You can verify this by trying the code below in Java:
System.out.println("横田誠司".hashCode());
Java Uses Base 31
Because the range of our hashCode is finite, it is impossible to pursue unique factorizations for each String.
Instead of base 40,959 or something larger, Java uses 31.
Of course there are infinitely many other strings that also map to 839,611,422.
System.out.println("横田誠司".hashCode());
System.out.println("±EreWn".hashCode());
The Hash Table
What we’ve just created here is called a hash table.
0
1
2
3
4
5
6
7
8
9
bee
抱抱
están
抱抱
hashCode()
1034854400
% 10
0
data
hash code
hash function
reduce
index
hash table
dog
الطبيعة
शानदार
포옹
In Java there’s a caveat here. Will revisit later.
守门员
kao
peyrin
justin
yokota
横田誠司
The Hash Table
Note, there are other versions of hash tables out there.
0
1
2
3
4
5
6
7
8
9
bee
抱抱
están
抱抱
hashCode()
1034854400
% 10
0
data
hash code
hash function
reduce
index
hash table
dog
الطبيعة
शानदार
포옹
In Java there’s a caveat here. Will revisit later.
守门员
kao
peyrin
justin
yokota
横田誠司
Hash Tables in Java
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
The Ubiquity of Hash Tables
Hash tables are the most popular implementation for sets and maps.
In Java, implemented as java.util.HashMap and java.util.HashSet.
Object Methods
All classes are hyponyms of Object.
Default implementation of hashCode returns memory address.
Won’t discuss or use in 61B.
From earlier in class.
This is where Java implements hash codes.
Hash Codes in Java
Java’s actual hashCode function for Strings below (code cleaned up slightly):
That is, the two calls below both return 839,611,422.
public int hashCode(String s) {
int intRep = 0;
for (int i = 0; i < s.length(); i += 1) {
intRep = intRep * 31;
intRep = intRep + s.charAt(i);
}
return intRep;
}
More examples of Real Java HashCodes for Strings
System.out.println("a".hashCode());
System.out.println("bee".hashCode());
System.out.println("포옹".hashCode());
System.out.println("kamala lifefully".hashCode());
System.out.println("đậu hũ".hashCode());
jug ~/Dropbox/61b/lec/hashing
$ java JavaHashCodeExamples
97
97410
1732557
1732557
-2108180664
"a"
"bee"
"포옹"
"kamala lifefully"
"đậu hũ"
Using Negative hash codes: yellkey.com/above
Suppose that ‘s hash code is -1.
0
1
2
3
Using Negative hash codes
Suppose that ‘s hash code is -1.
0
1
2
3
-1
Using Negative hash codes in Java
Suppose that ‘s hash code is -1.
0
1
2
3
public class ModTest {
public static void main(String[] args) {
System.out.println(-1 % 4);
System.out.println(Math.floorMod(-1, 4));
}
}
$ java ModTest
-1
3
Hash Tables in Java
Java hash tables:
포옹
a
0
1
2
3
đậu hũ
đậu hũ
hashCode()
-2108180664
Math.floorMod(x, 4)
0
data
hash code
hash function
reduce
index
kamala lifefully
bee
Two Important Warnings When Using HashMaps/HashSets
Warning #1: Never store objects that can change in a HashSet or HashMap!
Warning #2: Never override equals without also overriding hashCode.
We’ll come back to these warnings later.
Hash Table Performance and Summary
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
Hash Table Runtime with No Resizing
Suppose we have:
Average list is around N/M items
Even if items are spread out evenly, lists are of length Q = N/M.
0
1
2
3
4
N = 19 M = 5 N / M = 3.8
Resizing Hash Table Runtime
Suppose we have:
As long as M = Θ(N), then O(N/M) = O(1).
Assuming items are evenly distributed (as above), lists will be approximately N/M items long, resulting in Θ(N/M) runtimes.
0
1
2
3
4
N = 19 M = 5 N / M = 3.8
Regarding Even Distribution
Even distribution of item is critical for good hash table performance.
Will need to discuss how to ensure even distribution.
x
x
Hash Tables in Java
Hash tables:
đậu hũ
hashCode()
-2108180664
Math.floorMod(x, 4)
0
data
hash code
hash function
reduce
index
*: Amortized.
†: Assuming items are evenly spread.
| contains(x) | add(x) |
Bushy BSTs | Θ(log N) | Θ(log N) |
Separate Chaining Hash Table With No Resizing | Θ(N) | Θ(N) |
… With Resizing | Θ(1)† | Θ(1)*† |
Creating a Good Hash Code (extra)
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
What Makes a good .hashCode()?
Goal: We want hash tables that look like the table on the right.
Hashbrowns and Hash Codes
How do you make hashbrowns?
Can think of multiplying data by powers of some base as ensuring that all the data gets scrambled together into a seemingly random integer.
Example hashCode Function
The Java 8 hash code for strings. Two major differences from our hash codes:
@Override
public int hashCode() {
int h = cachedHashValue;
if (h == 0 && this.length() > 0) {
for (int i = 0; i < this.length(); i++) {
h = 31 * h + this.charAt(i);
}
cachedHashValue = h;
}
return h;
}
Example: Choosing a Base
Java’s hashCode() function for Strings:
Our asciiToInt function for Strings:
Which is better?
Example: Base 126
Major collision problem:
Any string that ends in the same last 32 characters has the same hash code.
Typical Base
A typical hash code base is a small prime.
A full treatment of good hash codes is well beyond the scope of our class.
Hashbrowns and Hash Codes
How do you make hashbrowns?
Using a prime base yields better “randomness” than using something like base 126.
Example: Hashing a Collection
Lists are a lot like strings: Collection of items each with its own hashCode:
To save time hashing: Look at only first few items.
@Override
public int hashCode() {
int hashCode = 1;
for (Object o : this) {
hashCode = hashCode * 31;
hashCode = hashCode + o.hashCode();
}
return hashCode;
}
elevate/smear the current hash code
add new item’s hash code
Example: Hashing a Recursive Data Structure
Computation of the hashCode of a recursive data structure involves recursive computation.
@Override
public int hashCode() {
if (this.value == null) {
return 0;
}
return this.value.hashCode() +
31 * this.left.hashCode() +
31 * 31 * this.right.hashCode();
}
Linear Probing (extra)
Lecture 19, CS61B, Spring 2025
Motivation, Set Implementations
Deriving Hash Tables
Hash Tables in Java
Hash Table Performance and Summary
Creating a Good Hash Code (extra)
Linear Probing (extra)
Open Addressing: An Alternate Disambiguation Strategy (Extra)
Instead of using linked lists, an alternate and more exotic strategy is “open addressing”.
If target location is already occupied, use a different location, e.g.
In 61B, we’ll use the “separate chaining” approach, where we have linked lists.
Citations
FAQ
What is the distinction between hash set, hash map, and hash table?
A hash set is an implementation of the Set ADT using the “hash table” as its engine.
A hash map is an implementation of the Map ADT using the “hash table” as its engine.
A “hash table” is a way of storing information, where you have M buckets that store N items. Each item has a “hashCode” that tells you which of M buckets to put that item in.