In this tutorial, we will be using the Boto3 module in Python to work with Amazon’s NoSQL Database, Dynamo DB. The tutorial will also talk about setting up a local instance of Dynam DB.
NoSQL Databases
NoSQL databases are used to solve challenges faces by RDMS (Relational Database Management System), or simply put Relational Databases. Some cons of an RDMS are listed below
- A schema has to be defined beforehand
- The data to be stored has to be structured
- It is difficult to change tables and relationships
On the other hand, NoSQL databases can handle unstructured data and do not need a schema to be defined.
In this tutorial, we will be working with Amazon Dynamo DB. It is a type of key-value and document database NoSQL database.
Table of Contents
- Pre-requisites
- Setting up Dynamo DB Locally
- Connecting to our DB using Python
- Create Table
- Insert Data
- Get Data
- Update Data
- Delete Data
- Query
- Conclusion
- Resources
Pre-requisites
- Basic Understanding of NoSQL Databases
- Experience with Python
Setting up DynamoDB Locally
Step 1
Download and Install Java SE. To run DynamoDB on your computer, you must have the Java Runtime Environment (JRE) version 8.x or newer. The application doesn’t run on earlier JRE versions.
Step 2
Download and Install AWS CLI Installer. Type the following command in the command prompt to verify the installation.
aws --version
If you get an error, you might have to add a Path variable. Look at this article for more information
Step 3
Download and Extract Amazon Dynamo DB
Step 4
Navigate to the folder where you extracted Dynamo DB and type the following command in a command prompt.
java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb
Do Not Close this terminal unit you are done working with the Database
Step 5
Configure credentials. Type the following command in a new command prompt
aws configure
Step 6
Type the following command
aws dynamodb list-tables --endpoint-url http://localhost:8000
This should return an empty list of tables unless you already have existing tables.
Alternatively, you can also setup Amazon Dynamo DB as a web service
Connecting to our DB using Python
Before we start, we will need to set up and activate a virtual environment
/* Install virtual environment */ pip install virtualenv /* Create a virtual environment */ python -m virtualenv venv /* If the above doesn't work, try the following */ python -m venv venv /* Activate the virtual environment */ venv/Scripts/activate
We will use the boto3 module to interact with the local instance of Dynamo DB.
pip install boto3
Next, we will need to import the library and create a database object
import boto3
We will be creating a class and adding the CRUD operations as its methods.
class dtable: db = None tableName = None table = None table_created = False def __init__(self): self.db = boto3.resource('dynamodb', endpoint_url="http://localhost:8000") print("Initialized")
Test your code by creating an instance of our class
if __name__ == '__main__':
movies = table()
We will be using the instance of the class table we just created later on in the article.
Create Table
In DynamoDB, a table can have two types of primary keys: A single partition key or a composite primary key (partition key + sort key).
We will create a table called Movies. The year of the movie will be the partition key and the title will be the sort key. Below is the format to declare a key schema. Store it in a variable called KeySchema.
primaryKey=[
{
'AttributeName': 'year',
'KeyType': 'HASH' # Partition key
},
{
'AttributeName': 'title',
'KeyType': 'RANGE' # Sort key
}
]
We will also need to declare the data types of the above attributes.
AttributeDataType=[ { 'AttributeName': 'year', 'AttributeType': 'N' #All Number Type }, { 'AttributeName': 'title', 'AttributeType': 'S' #String }, ]
We will also need to limit the number of reads and writes on our database per second
ProvisionedThroughput={
'ReadCapacityUnits': 10,
'WriteCapacityUnits': 10
}
All the required parameters to create a table have now been created. Now we can move on to using these parameters to actually creating the table.
def createTable(self, tableName , KeySchema, AttributeDefinitions, ProvisionedThroughput):
self.tableName = tableName
table = self.db.create_table(
TableName=tableName,
KeySchema=KeySchema,
AttributeDefinitions=AttributeDefinitions,
ProvisionedThroughput=ProvisionedThroughput
)
self.table = table
print(f'Created Table {self.table}')
The above function and our previously defined variables will be used to create the table
movies.createTable(
tableName="Movie",
KeySchema=primaryKey,
AttributeDefinitions=attributeDataType,
ProvisionedThroughput=provisionedThroughput)
Insert Data
The format of the data to be inserted is below
{
'year' : 2020,
'title' : 'Some Title',
'info' : {
'key1' : 'value1',
'key2' : 'value2',
}
}
For each item, other than the primary key(year and title), we have flexibility over the data inside ‘info’. The data inside info doesn’t need to be structured.
Before inserting data, we will create a JSON file with a few movies. You can find the JSON file in my GitHub repo.
def insert_data(self, path):
with open(path) as f:
data = json.load(f)
for item in data:
try:
self.table.put_item(Item = item)
except:
pass
print(f'Inserted Data into {self.tableName}')
Get Item
We can access an item in the database if we know its primary key. In our case, it is the year+Ttitle. We will try to access the table with the year 2020 and title ‘Title1’.
Below is the method of our class which returns the item from the table
def getItem(self,key):
try:
response = self.table.get_item(Key = key)
return response['Item']
except Exception as e:
print('Item not found')
return None
Note: the K in key parameter of the get_item function is uppercase
This is how we would invoke the function
print(movies.getItem(key = {'year' : 2020 , 'title': 'Title 1'}))
Before we move on to Update and Delete, it’ll be beneficial to familiarize yourself with a couple of Expression parameters that can be passed to the update and delete function.
The two expressions are UpdateExpression and ConditionExpression
Below is an example of a UpdateExpression
UpdateExpression=”set info.rating=:rating, info.Info=:info”
:producer and :info are the values we want to use while updating. They can be thought of as placeholders.
We will also need to pass an extra parameter ExpressionAttributeValues to pass values to these variables
ExpressionAttributeValues={
':rating': 5.0,
':info': 'Updated Information'
}
In a way, this is similar to the format() function in Python
You can find a list of common Update Operations (Add, Modify, Delete) over here
ConditionExpression is similar to where clause in SQL. If evaluated to True, the command is executed else the command is ignored.
An example is below
ConditionExpression= "info.producer = :producer",
ExpressionAttributeValues={
':producer': 'Kevin Feige'
}
The ConditionExpression also follows the same format as the UpdateExpression
CondtionExpression can be used for Conditional Updates and Conditional Deletes. We will discuss them below.
You can find a list of Condition Expressions over here
Update
Below is the update method of our class
def updateItem(self,key, updateExpression, conditionExpression,expressionAttributes):
try:
response = self.table.update_item(
Key = key, UpdateExpression = updateExpression,
ConditionExpression = conditionExpression,
ExpressionAttributes = expressionAttributes
)
except Exception as e:
print(e)
return None
We will update the movie produced by Kevin Feige. We will update the Info, add a rating of 5, and append a genre of ‘legendary’ to the list of genres.
upExp = "SET info.Info = :info , info.rating = :rating, info.Genre = list_append(info.Genre, :genre)" condExp = "info.Producer = :producer" expAttr = { ":info" : "Updated Information", ":rating" : 5, ":genre" : ["Legendary"], ":producer" : "Kevin Feige" } print("After Update") movies.updateItem({'year' : 2019 , 'title': 'Title 3'},upExp,condExp,expAttr) print(movies.getItem(key = {'year' : 2019 , 'title': 'Title 3'}))
Delete
The Delete operation is similar to the Update operation. Below is our method to delete an item. It accepts a Key, the condition Expression and Expression Attribute Values.
def deleteItem(self, key, conditionExpression, expressionAttributes):
try:
response = self.table.delete_item(
Key = key,
ConditionExpression = conditionExpression,
ExpressionAttributeValues = expressionAttributes
)
except Exception as e:
print(e)
We will delete the movie with producer = “ABC”
print("Before Delete")
print(movies.getItem(key = {'title':'Title 2' , 'year': 2019}))
print("After Delete")
condExp = "info.Producer = :producer"
expAttr = {':producer' : "ABC" }
movies.deleteItem({'title':'Title 2' , 'year': 2019},condExp,expAttr)
print(movies.getItem(key = {'title':'Title 2' , 'year': 2019}))
Query
We can query the table using the partition key we had provided while creating our table. In our case, it was the year. A partition key is necessary for the query operator, the sort key is optional.
We will be using the Key class, you can read more about it over here.
Import the Key Class
from boto3.dynamodb.conditions import Key
Below is the method for the query
def query(self,projectExpression,expressionAttributes,keyExpression):
try:
response = self.table.query(
ProjectionExpression = projectExpression,
KeyConditionExpression= keyExpression,
)
return response['Items']
except Exception as e:
print(e)
return None
The parameter ProjectionExpression is a string with the list of columns we want the function return. KeyConditionExpression is the key condition using the Key Class. It is necessary to have the partition key present in the KeyConditionExpression. Additionally, you can also pass a parameter FilterExpression which is similar to ConditionExpression
We will display the titles of all movies starting with ‘M’ in 2020.
print("Movies after 2019 with title starting with M")
projection = "title"
Keycondition = Key('year').eq(2020) & Key('title').begins_with('M')
print(movies.query(projection,expAttr,Keycondition))
Conclusion
I hope this article was able to help you out. In case of any errors in the code snippets above, please refer to my Github repo mentioned in the Resources Section. Please do let me know if you find any errors 🙂
Happy Learning!
Resources
Github Repo