1 of 63

Python and MongoDB

2 of 63

Using PyMongo Modules :

The Python driver works with modules. You can treat these much as you treat the classes in the PHP driver. Each module within the PyMongo driver is responsible for a set of operations. There’s an individual module for each of the following tasks (and quite a few more): establishing connections, working with databases, leveraging collections, manipulating the cursor, working with the DBRef module, converting the Object ID, and running server-side JavaScript code.

3 of 63

Working with Documents in Python :

Document in a python shell:

item = { "Type" : "Laptop",

"ItemNumber" : "1234EXD",

"Status" : "In use",

"Location" : { "Department" : "Development",

"Building" : "2B",

"Floor" : 12,

"Desk" : 120101,

"Owner" : "Anderson, Thomas“},

"Tags" : ["Laptop","Development","In Use"]

}

4 of 63

Connecting and Disconnecting :

Establishing a connection to the database requires that you first import the PyMongo driver into Python itself. This is an absolute prerequisite; otherwise, none of the modules will be loaded, and your code will fail.

To import the driver, type the following command in your shell:

>>> import pymongo

Once the driver has been loaded and is known to the Python shell, you can start loading the module you want to work with. The Connection module enables you to establish connections. Type the following statement in the shell to load the Connection module:

>>> from pymongo import Connection

5 of 63

Once your MongoDB service is up and running (this is mandatory if you wish to connect), then you can go ahead and establish a connection to the service by calling the Connection function. If no additional parameters are given, then the function assumes you want to connect to the service on the localhost (the default port number for the localhost is 27017). The following line establishes the connection:

>>> c = Connection()

You can see the connection coming in through the MongoDB service shell. Once you establish a connection, you can use the c dictionary to refer to the connection, just as you did in the shell with db and in PHP with $c. Next, select the database that you want to work with, storing that database under the db dictionary. You can do this just as you would in the MongoDB shell—in this example, you use the inventory database:

>>> db = c.inventory

6 of 63

>>> db

Database(Connection('localhost', 27017), u'inventory’)

The output in the preceding example shows that you that you are connected to the localhost and that you are using the inventory database. Now that the database has been selected, you can select your MongoDB collection in the exact same way. Because you’ve already stored the database name under the db dictionary, you can use that to select the collection’s name, which is called items in this case:

>>> collection = db.items

7 of 63

Inserting Data :

All that remains is to define the document by storing it in a dictionary. Let’s take the preceding example and insert that into the shell:

>>> item = {

... "Type" : "Laptop",

... "ItemNumber" : "1234EXD",

... "Status" : "In use",

... "Location" : {

... "Department" : "Development",

... "Building" : "2B",

8 of 63

... "Floor" : 12,

... "Desk" : 120101,

... "Owner" : "Anderson, Thomas"

... },

... "Tags" : ["Laptop","Development","In Use"]

... }

Once you define the document, you can insert it using the same insert function that is available in the MongoDB shell:

>>> collection.insert(item)

ObjectId('4c57207b4abffe0e0c000000’)

9 of 63

That’s all there is to it: you define the document and insert it using the insert function. There’s one more interesting trick you can take advantage of when inserting documents: inserting multiple documents at the same time. You can do this by specifying both documents in a single dictionary, and then inserting that document afterwards. The result will return two Object IDs; pay careful attention to how the brackets are used in the following example:

>>> two = [{

... "Type" : "Laptop",

... "ItemNumber" : "2345FDX",

... "Status" : "In use",

... "Location" : {

... "Department" : "Development",

... "Building" : "2B",

... "Floor" : 12,

... "Desk" : 120102,

10 of 63

... "Owner" : "Smith, Simon"

... },

... "Tags" : ["Laptop","Development","In Use"]

... },

... {

... "Type" : "Laptop",

... "ItemNumber" : "3456TFS",

... "Status" : "In use",

... "Location" : {

... "Department" : "Development",

... "Building" : "2B",

... "Floor" : 12,

... "Desk" : 120103,

... "Owner" : "Walker, Jan"

11 of 63

...

},

... "Tags" : ["Laptop","Development","In Use"]

... }]

>>> collection.insert(two)

[ObjectId('4c57234c4abffe0e0c000001'), ObjectId('4c57234c4abffe0e0c000002')]

12 of 63

Finding Your Data :

PyMongo provides two functions for finding your data: find_one(), which finds a single document in your collection that matches specified criteria; and find(), which can find multiple documents based on the supplied parameters (if you do not specify any parameters, find() returns all documents in the collection). Let’s look at some examples.

Finding a Single Document:

The find_one() function to find a single document.

>>> collection.find_one()

13 of 63

You can specify additional parameters to ensure that the first document returned matches your query:

>>> collection.find_one({"ItemNumber" : "3456TFS"})

Finding Multiple Documents:

You need to use the find() function to return more than a single document.

>>> for doc in collection.find():

... doc

...

How to specify a query operator using the find() function:

>>> for doc in collection.find({"Tags" : "Laptop"}):

... doc

...

14 of 63

Using Dot Notation

Dot notation is used to search for matching elements in an embedded object.

>>> for doc in collection.find({"Location.Department" : "Development"}):

... doc

...

  • ... >>> for doc in collection.find({"Location.Owner" : "Walker, Jan"}):
  • ... doc

15 of 63

Returning Fields

If your documents are relatively large, and you do not want to return all key/value information stored in a document, you can include an additional parameter in the find() function to specify that only a certain set of fields need to be returned. You do this by providing a list of field names after the search criteria.

>>> for doc in collection.find({'Status' : 'In use'} , {'ItemNumber' : 'true', 'Location.Owner':'true'}):

... doc

...

returns the only current owner’s name, the item number, and the object ID

16 of 63

Simplifying Queries with Sort, Limit, and Skip

The sort(), limit(), and skip() functions will make implementing your queries much easier

sort() function to sort the results by a specific key:

>>> for doc in collection.find ({"Status" : "In use"},

... {"ItemNumber":"true", "Location.Owner" : "True"})

... .sort("ItemNumber"):

... doc

...

17 of 63

limit() function to return only the ItemNumber from the first two items it finds in the collection

>>> for doc in collection.find({}, {"ItemNumber" : "true"}).limit(2):

... doc

...

{u'ItemNumber': u'1234EXD', u'_id': ObjectId('4c57207b4abffe0e0c000000')} {u'ItemNumber': u'2345FDX', u'_id': ObjectId('4c57234c4abffe0e0c000001’)}

skip() function to skip a few items before returning a set of documents

>>> for doc in collection.find({}, {"ItemNumber" : "true"}).skip(2):

... doc

...

{u'ItemNumber': u'3456TFS', u'_id': ObjectId('4c57234c4abffe0e0c000002')}

18 of 63

You can also combine the three functions to select only a certain amount of items found

>>> for doc in collection.find( {'Status' : 'In use’},

... {'ItemNumber':'true', 'Location.Owner':'true'} )

... .limit(2).skip(1).sort("ItemNumber"):

... doc

...

19 of 63

Aggregating Queries

Counting Items with Count():

You can use the count() function if all you want is to perform a count on the total number of items matching your criteria.

>>> collection.find({}).count()

3

You can also specify these count queries more precisely, as in this example:

>>> collection.find({"Status" : "In use", "Location.Owner" : "Walker, Jan"}).count()

1

20 of 63

Counting Unique Items with Distinct()

distinct() function to ensure that any duplicates get ignored:

>>> collection.distinct("ItemNumber")

[u'1234EXD', u'2345FDX', u'3456TFS']

21 of 63

Listing 8–7. admin/posts.php �

<?php

// Let's define some variables!

$db = "blog"; // This is the name of the database

$col_authors = "authors"; // This is the name of the authors collection

$col_posts = "posts"; // This is the name of the posts collection

$limit = 10; // This is the total number of posts displayed

// Set up PHP code to check if the $page is set and if not, set $page to '1'.

if(isset($_GET['page'])){

$page = $_GET['page'];

21

22 of 63

else {

$page = 1; }

if(isset($_POST["search"]))

{ // Use MongoRegEx to specify the search criteria

$regex = new MongoRegex("/$_POST[search]/i");

$query = array("Message" => $regex); }

Else {

$query = array(); }

22

23 of 63

// Let's define some more variables for paging

$offset = ($page -1) * $limit;

$nextpage = ($page +1);

$prevpage = ($page -1);

// Add a form for searching the posts

print "<form method='post'name='search'action=‘posts. php' >";

print "<input type='text' name='search'>";

print "<input type='submit' value='Search'>";

print "</form>";

23

24 of 63

// Add a link to add a post

print "<a href='add.php'>Add a post</a>";

// Connect to the database

$c = new Mongo();

// Execute a search and store the posts under the cursor variable

$cursor = $c->$db->$col_posts->find($query)->skip($offset)->limit($limit)->sort(array('_id'=>-1));

// For each document it finds within the collection, print it’s contents

while ($document = $cursor->getNext())

24

25 of 63

// Get the author's name via MongoDBRef

$ref = $c->$db->getDBRef($document["Author"]);

$author = $ref["Name"];

// Translate the date back using date()

$date = date('M d, Y @ h:i', $document["Date"]->sec);

// Show the title, author, date and message

print "<h1>$document[Title]</h1><br>";

print "<i>By $author on $date</i>";

print "<p>$document[Message]</p>";

25

26 of 63

// Show clickable link to view the comments, and count them

$postid = new MongoId($document['_id']);

if(isset($document["Comments"]))

{

$count = count($document["Comments"]);

}

else {

$count = 0;

}

print "<a href='../view.php?id=$postid'>View Comments ($count)</a> ";

26

27 of 63

// Show clickable links to modify or delete the post

print "<a href='?modify=$postid'>Modify</a> ";

print "<a href='?del=$postid'>Delete</a>"; }

$posts = $c->$db->$col_posts->find()->count();

if ($page > 1) {

print "<a href='?page=$prevpage'>Previous Page</a>";

if ($page * $limit < $posts) {

print "<a href='?page=$nextpage'>Next page</a>"; } }

else {

if ($page * $limit < $posts) {

print "<a href='?page=$nextpage'>Next page</a>"; } }

27

28 of 63

// Use the posted ID to retrieve the data, and input it in a form

$filter = array("_id" => new MongoId($_GET["modify"]));

$post = $c->$db->$col_posts->findOne($filter);

print "<form action='posts.php' name='modifypost' method='post'>";

print "Title<input type='text' name='Title' value='$post[Title]'/><br>";

print "Message <textarea rows='5' cols='40' name='Message'>$post[Message]</textarea>";

print "<input type='hidden' name='id' value='$_GET[modify]'/>";

print "<input type='submit' name='modifypost' value='Change'/>";

print "</form>";

28

29 of 63

// Create a new array to store the changed values in

$arr = array();

$arr['Title'] = addslashes($_POST['Title']);

$arr['Message'] = addslashes($_POST['Message']);

$id = new MongoId($_POST['id']);

$c->$db->$col_posts->update(array("_id" => $id) ,

array ('$set' => $arr));

29

30 of 63

// Specify what ought to happen when the Delete link has been clicked

if(isset($_GET["del"])) {

// Use the posted ID as the _id, and convert it via MongoId

$id = array("_id" => new MongoId($_GET["del"]));

// Specify the options

$options = array('justOne' => true, 'safe' => true);

// Connect to the database and remove the document

$c->$db->$col_posts->remove($id, $options);

// In case the document was deleted successfully, report a success

print "Post removed successfully"; } ?>

30

31 of 63

  • The final page of the application, admin/add.php adds new documents to the collection.
  • The code starts by specifying a number of variables that mostly contain database- and collection-specific information used to connect to the database.
  • Next, the find() command fetches a list of existing authors added to the database previously.

31

32 of 63

  • The code continues with an HTML form used to input the information for a new post.
  • The add button submits this information; this button’s behavior is specified directly below an if statement.
  • The if statement specifies that the information is to be added in the defined collection

32

33 of 63

Listing 8–8. admin/add.php

// connect to the database and get a list of authors

$c = new Mongo();

$db = "blog";

$col_posts = "posts";

$col_authors = "authors";

$cursor = $c->$db->$col_authors->find();

print "<h1>Let's add a post!</h1>";

print "<form action='add.php' name='addpost' method='posts'>";

print "<input type='text' name='Title' value='Fill in the title'><br>";

33

34 of 63

// Create a new array to store the changed values

$arr = array();

$arr['Title'] = addslashes($_GET['Title']);

$arr['Message'] = addslashes($_GET['Message']);

$arr['Author'] = MongoDBRef::create(

$c->$db->$col_authors->getName(),

new MongoId($_GET['names']) );

$arr['Date'] = new MongoDate();

$c->$db->$col_posts->insert($arr);

print "Post added. You can leave this page.";} ?>

34

35 of 63

CAP Theorem

  • CAP Theorem states that two of the following three can be maximized at one time
  • Consistency
  • Availability
  • Partition tolerance

35

36 of 63

Consistency

  • Each client has the same view of the data.
  • Consistency means consistent reads and writes so that concurrent operations see the same valid and consistent data state, which at minimum means no stale data.

36

37 of 63

Availability

  • Availability means the system is available to serve at the time when it’s needed.
  • A system that is busy, uncommunicative or unresponsive when accessed is not available.
  • In CAP Theorem system with minor delays or minimal hold-up is still an available system.
  • In terms of CAP if a system is not available to serve a request when it’s needed, it’s not available.

37

38 of 63

Partition tolerance

  • System works well across distributed physical networks.
  • Partition tolerance measures the ability of a system to continue to service in the event even if few of its cluster members become unavailable.

38

39 of 63

Modifying a Document Atomically

40 of 63

Modifying a Document Atomically

  • findAndModify() function is used to modify a document atomatically and return the results.
  • It can be used to update only a single document and nothing more.
  • But the document returned will not include the modifications made by default; getting this information requires specifying an additional argument.
  • The findAndModify() function can be used with seven parameters, and we must include either the update parameter or the remove parameter.

41 of 63

Modifying a Document Atomically

Following are the seven parameters,

  1. query: Specifies a filter for the query. If this isn’t specified, then all documents in the collection will be seen as possible candidates, after which the first document it encounters will be updated or removed.
  2. sort: Sorts the matching documents in a specified order.
  3. remove: If set to true, removes the first matching document.

42 of 63

Modifying a Document Atomically

  1. update: Specifies the information to update the document with. Any of the modifying operators specified previously can be used for this.
  2. new: If set to true, returns the updated document rather than the selected document. This is not set by default.
  3. fields: Specifies the fields you would like to see returned, rather than the entire document. This works identically to the find() function. The _id field will always be returned.
  4. upsert (optional): If set to true, performs an upsert.

43 of 63

Putting the Parameters to Work

  • For Example, findAndModify() function is used to search for any document that has a key/value pair of "Type": "Desktop" and then update each document that matches the query by setting an additional key/value pair of "Status": "In repair".
  • To ensure that the updated document(s) gets returned, rather than the old document(s) matching the query,

>>> db.command("findandmodify", "items", query ={ "Type": "Desktop"},

… update = {"$set": { “Status": "In repair“} }, new = True )

44 of 63

Putting the Parameters to Work

{

u’ok’: 1.0,

u'value’: {

u'Status': u'In repair’,

u’Tags’: [u'Desktop', u'In use', u'Marketing', u'Warranty’],

u'ItemNumber’: u'4532F00’,

u'Location': {

u'Department': u'Marketing’,

u ‘Building': u’2B’,

u'Owner': u'Martin, Lisa’,

u'Desknumber’: 131131

},

u’ id’: ObjectId('4c5dda114abffe0f34000000’),

u'Type': u'Desktop’

}

}

45 of 63

Putting the Parameters to Work

findAndModify() to remove a document, in this case the output will show which document was removed.

>>> db.command("findandmodify", "items", query = {"Type": "Desktop"},

… sort = {"ItemNumber": -1}, remove = True )

{

u’ok’: 1.0,

u'value’: {

u'Status': u'In use’,

u'Tags: [u'Desktop', u'In use', u'Marketing', u'Warranty’],

u'ItemNumber': u'4532F00’,

46 of 63

Putting the Parameters to Work

u'Location': {

u'Department': u'Marketing’,

u'Building': u'28’,

u’Owner’: u'Martin, Lisa’,

u'Desknumber’: 131131

},

u’ id’: ObjectId('4c5ddbe24abffe0f34000001’),

u'Type': u'Desktop’

}

}

47 of 63

Deleting Data

  • Python driver provides several methods for deleting data.
  • remove() function is used to delete a single document from a collection.
  • drop() or drop_collection() can be used to delete an entire collection.
  • drop_database() function to drop an entire database.

  • remove() function : It allows to specify an argument as a parameter that will be used to find and delete any matching documents in the current collection.

48 of 63

Deleting Data

  • Consider below example, we use the remove() function to remove each document that has key/value pair of

"Status": "In use";

  • Afterward, find_one command is used to confirm the results:

>>> collection.remove({"Status" : "In use"})

>>> collection.find_one({"Status" : "In use"})

>>>

  • First execute find(), so that we can see exactly which documents will get removed.
  • Alternatively, Obiect Id can be used to remove an item.

49 of 63

Deleting Data

  • We can use either the drop() or drop_collection() function to remove it.
  • Both functions work the same way (one is just an alias for the other); specifically, both expect only one parameter, the collection’s names:

>>> db.items.drop()

  • Lastly, drop_database() enables you to delete an entire database.
  • Also this function can be called using the Connection module, as in below example,

>>> c.drop_database(“inventory”)

50 of 63

Creating a Link Between Two Documents

  • Database references can be used to create a link between two documents that reside in different locations.
  • For example, you might create one collection for all employees and another collection for all the items—and then use the DBRef() function to create a reference between the employees and the location of the items, rather than typing them in manually for each item.
  • Data can be referenced in one of two ways,
  • First, we can add a simple reference (manual referencing) that uses the _id field from one document to store a reference to it in another.
  • Second, we can use the DBRef module, which brings a few more options with it rather than manual referencing.

51 of 63

Creating a Link Between Two Documents

  • Creating manual reference,
  • Begin by saving a document.
  • For example, assume we want to save the information for a person into a specific collection.
  • The following example defines a jan dictionary and saves it into the people collection to get back an Object ID:

>>> jan = {

… "First Name":"Jan",

… "Last Name": "Walker",

… "Display Name" : "Walker, Jan",

52 of 63

Creating a Link Between Two Documents

… "Department": "Development",

… "Building": "2B",

… "Floor": 12,

… "Desk": 120103,

… “E-mail”: jw@example.com

… }

>>> people = db.people

>>> people.insert(jan)

ObjectId('4c5ddbe24abffe0f34000002’)

53 of 63

Creating a Link Between Two Documents

  • After we add an item and get its ID back, we can use this information to link the item to another document in another collection.

>>> laptop = {

… “Type” : “Laptop”,

… "Status": "In use",

… "ItemNumber": "12345ABC",

… "Tags": ["Warranty", "In use", "Laptop"],

… "Owner" : jan["_id"]

… }

>>> items = db.items

>>> items.insert(laptop)

ObjectId('4c5e6f6b4abffe0f34000003')

54 of 63

Creating a Link Between Two Documents

  • In case if we want to find out the owner’s information.
  • We have to query for the Object ID given in the Owner field; this is only possible if we know which collection the data is stored in.
  • If we don't know where this information is stored. It was for handling precisely such scenarios that the DBRef() function was created.
  • We can use this function even when we do not know which collection holds the original data. It means we don't have to worry so much about the collection names when searching for the information.
  • The DBRef() function takes three arguments; it can take a fourth argument that we can use to specify additional keyword arguments.

55 of 63

Creating a Link Between Two Documents

  • Here's a list of the three main arguments and what they do,
  • collection (mandatory): Specifies the collection the original data resides in (e.g., people).
  • id (mandatory): Specifies the _id value of the document that should be referred to.
  • database (optional): Specifies the name of the database to reference.
  • The DBRef module must be loaded before we use the DBRef method, so first we have to load the module before going further,

56 of 63

Creating a Link Between Two Documents

>>> from pymongo.dbref import DBRef

  • In the following example, we insert a person into the people collection and add an item to the items collection using DBRef to reference the owner,

>>> mike = {

… "First Name" : "Mike",

… "Last Name" : "Wazowski",

… "Display Name" : "Wazowski, Mike",

… "Department": "Entertainment",

… "Building": "28",

… "Floor": 10,

… "Desk": 120789, "E-Mail": "

… }

57 of 63

Creating a Link Between Two Documents

>>> people.save(mike)

ObjectId('4c5e73714abffe0f34000004’)

  • Here we added a document, but we did it without adding a reference to it.
  • However, we have the Object Id of the document, so now we can add next document to the collection and then use DBRef() to point the owner field at the value of the previously inserted document.
  • Within DBRef() function; first parameter given is the collection name where previously specified document resides.
  • Second parameter is a reference to the _id key in the mike dictionary.

58 of 63

Creating a Link Between Two Documents

>>> laptop = {

… "Type": "Laptop",

… "Status": "In use",

… "ItemNumber": "2345DEF",

… "Tags": ["Warranty", "In use", "Laptop"],

… "Owner": DBRef('people', mike["_id"])

… }

>>> items.save(laptop)

ObjectId('4c5e740a4abffeof34000005’)

  • Adopting this approach provides additional flexibility of not having to look up the collection's name whenever we query for the referenced information.

59 of 63

Retrieving the Information

  • We can accomplish retrieving the information using the Python driver's dereference() function.
  • First define the field previously specified that contains the referenced information as an argument, and then press the Return key.
  • The process of referencing and retrieving information from one document to another from start to finish is as follows,
  • Let's begin by finding the document that contains the referenced data, and then retrieving that document for display.

60 of 63

Retrieving the Information

  • The first step is to create a query that finds a random document with the reference information in it.

>>>> items.find_one({"ItemNumber": "2345DEF"})

{

u'Status': u'In use’,

u'Tags": [u'Warranty', u'In use', u'Laptop’],

u'ItemNumber': u'2345DEF’,

u'Owner': DBRef(u'people’, ObjectId('4c5e73714abffe0f34000004')),

u’id’: ObjectId('4c5e740a4abffeof34000005’),

u'Type': u’ Laptop’

}

61 of 63

Retrieving the Information

  • Next, we have to store the item under a person dictionary.

>>> person = items.find_one({"ItemNumber": "2345DEF"))

  • At this point, we can use the dereference() function to dereference the Owner field to the person["Owner"] field as an argument.
  • This is possible because the Owner field is linked to the data we want to retrieve.

>>> db.dereference(person["Owner"])

{

u'Building': u'28',

62 of 63

Retrieving the Information

u'Floor: 10,

u'Last Name': u'Wazowski',

u'Desk: 120789,

u'E-Mail': u'mw@monsters.inc'

u'First Name': u'Mike',

u'Display Name': u'Wazowski, Mike',

u'Department': u'Entertainment,

u_id': ObjectId('4c5e73714abffe0f34000004')

}

63 of 63

Retrieving the Information

  • In this example, DBRef provides a great way for storing data we want to reference.
  • Additionally, DBRef permits some flexibility in how we specify the collection and database names.
  • It is useful especially in cases where the data really shouldn't be embedded.