"NoSQL":Apache Cassandra
Introduktion til et geospatielt publikum
Citat
"You know you have a distributed system when the crash of a computer you’ve never heard of stops you from getting any work done"
- Leslie Lamport
Oversigt
Hvad er Apache Cassandra?
1966
1970
1975
2006
2007
2008
2012
IBM opfinder IMS til Saturn V måneraket
The Maintenance of Duplicate Databases, Johnson og Thomas
Hvad er Apache Cassandra?
Typer af "NoSQL"
Hvad er Apache Cassandra?
Hvad er Apache Cassandra?
Write anywhere
(ALL, QUORUM, SINGLE)
Raison d'être
"Starbucks Does Not Use Two-Phase Commit"
- Gregor Hohpe (2004) [link]
Coffee ready!
One coffee please
$2 please
Great!
Raison d'être
Raison d'être
CAP
Databaser versus CAP
A
C
P
Neo4J, Google Bigtable, MongoDB, HBase, Hypertable, Redis
"ACID", MySQL, SQL Server, Postgres
DynamoDB, Cassandra, Voldemort, CouchDB, Riak
Postgres (Oracle?) versus CAP
Consistent
Available*
Partition-Tolerant
"Kryds fingre for at det virker"
Google Bigtable versus CAP
Consistent
Available*
Partition-Tolerant
"Kom tilbage når det virker"
Dynamo og Cassandra versus CAP
Consistent
Available*
Partition-Tolerant
"Tak fordi du
handlede hos os"
CAP: Eksempel "Postgres"
Client
Server
Replica
Write
Error
CAP: Eksempel "Cassandra"
Client
Server
Replica
Write
OK
Inconsistent
Datamodel
Har du nogensinde kodet noget à la det her i Java?
... = new HashMap<byte[], HashMap<byte[], HashMap<byte[], byte[]>>>();
Datamodel
Eller måske det her i C#?
... = new Dictionary<byte[], Dictionary<byte[], Dictionary<byte[], byte[]>>>();
Så forstår du allerede Cassandras datamodel!
Datamodel: Byggesten
Column:�( byte[] name, byte[] value, IClock clock )�
Super column:�( byte[] name, Map<byte[], IColumn> cols )�
Row:�( byte[] key, Map<byte[], IColumn cols> )�
Column Family:�( byte[] name, Map<byte[], Row> rows )
Datamodel: Cassandra VS Postgres
*) Minder en anelse om, d.v.s. "på niveau"
Cassandra | PostgreSQL |
Column | Column* |
(Super) Column | Column (hstore)* |
Row | Row* |
Column Family | Tabel* |
Keyspace | Database* |
Datamodel: Datatyper
http://www.datastax.com/docs/1.0/ddl/column_family
Datamodel: Indeksering af geodata
Gør-det-selv løsningen (SimpleGeo)
Hands-on Apache Cassandra
Hands-on Apache Cassandra
Connected to Test Cluster at localhost:9160.
[cqlsh 2.0.0 | Cassandra 1.0.8 | CQL spec 2.0.0 | Thrift protocol 19.20.0] Use HELP for help.
cqlsh>
$ bin/cqlsh localhost 9160
Hands-on Apache Cassandra
cqlsh> CREATE KEYSPACE wkt_craft WITH
... strategy_class = 'SimpleStrategy'
... AND strategy_options:replication_factor = 1;
cqlsh>
Hands-on Apache Cassandra
cqlsh> USE wkt_craft;
cqlsh> CREATE COLUMNFAMILY players (
... KEY varchar PRIMARY KEY,
... xp bigint);
cqlsh>
Hands-on Apache Cassandra
cqlsh> INSERT INTO players (KEY, xp) VALUES ('kostas', 0);
cqlsh> INSERT INTO players (KEY, xp) VALUES ('lola', 0);
cqlsh> SELECT * from players;
KEY | xp
--------+----
lola | 0
kostas | 0
Hands-on Apache Cassandra
cqlsh> SELECT * from players where xp=0;
Bad Request: No indexed columns present in by-columns clause with "equals" operator
cqlsh> CREATE INDEX xp_key ON players(xp);
cqlsh> SELECT * from players where xp=0;
KEY | xp
--------+----
lola | 0
kostas | 0
Hands-on Apache Cassandra
cqlsh> UPDATE players USING consistency all SET 'x' = 0, 'y' = 0 WHERE key='lola';
cqlsh> SELECT * from players;
KEY,lola | x,0 | xp,0 | y,0
KEY,kostas | xp,0
Replikering
Replikering
Replikering
Token ring
Placering af data bestemmes af en Partitioner, replication placement strategy og flere andre ting
Replikering
p2p protokol
Udveksling af Merkle-trees
Hinted hands-offs
Konklusion
Fordele:
Konklusion
Ulemper:
Links