Set up an Amazon Book Store on Google App Engine – 2 – Constructing and Signing the Request for Amazon Product Data

23 Feb

This is the part 2 of the ‘Setting up an Amazon book store on Google App Engine’ tutorial series. I assume that you have followed the first part of this tutorial, and completed the app along with it. If you haven’t, then please read the first part, complete it before continuing.

We continue from where we left off in the last part, our homepage, the ‘index.html’ file. On our homepage, we have a nice header message welcoming our users. We have a form in our content area for the user to enter a keyword/topic to search books on. The form action is set to ‘/fetch’, we need to handle this in our code.

Modify the ‘amazon.py’ file and add the following code to it:

import cgi, os
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
from google.appengine.ext.webapp import template

class Init(webapp.RequestHandler):
    def get(self):
        path = os.path.join(os.path.dirname(__file__), 'index.html')
        self.response.out.write(template.render(path, {}))
        
class Fetch(webapp.RequestHandler):
    def post(self):
        key = self.request.get('keyword')
        instance = ConstructUrl()

        url = instance.buildurl(key)
        books = instance.processurl(url)

        template_values = {
            'books':books,
            }
        path = os.path.join(os.path.dirname(__file__), 'results.html')
        self.response.out.write(template.render(path, template_values))

application = webapp.WSGIApplication(
                                     [('/', Init),
                                     ('/fetch', Fetch)],
                                     debug = True)

We’re adding a new request handler ‘Fetch’. When a user enters a keyword an clicks submit, the corresponding action ‘/fetch’ is called. This will instantiate the ‘Fetch’ class. In the ‘post’ function of the ‘Fetch’ class, we’re getting the ‘keyword’ entered by the user in our form, as ‘key=self.request.get(‘keyword’)’. Then we instantiate the class ‘ConstructUrl’ and call the function ‘buildurl’ using the ‘key’ as an argument. The function buildurl will construct the request and sign it.

Modify ‘amazon.py’ and add the following code for the ‘ConstructUrl’ class and the ‘buildurl’ function. We also need to import a few more modules to get our job done:

# Add these imports at the top after the previous import statements
import base64, hashlib, hmac, time, urllib2
from urllib import urlencode, quote

# Add this code above the line 'application=webapp.W....'

class ConstructUrl:
    def buildurl(self, term):
        AWS_ACCESS_KEY_ID = "Your Access Key ID here"
        AWS_SECRET_ACCESS_KEY = "Your Secret Access Key here"
        base_url = "http://ecs.amazonaws.com/onca/xml"
        url_params = {'Operation':"ItemSearch", 'Service':"AWSECommerceService", 'AWSAccessKeyId':AWS_ACCESS_KEY_ID, 'AssociateTag':"Your Associate Tag here", 'Version':"2006-09-11", 'ItemPage':"1", 'ResponseGroup':"Medium", 'SearchIndex':"Books", 'Keywords':term}
        # Add a ISO 8601 compliant timestamp (in GMT)
        url_params['Timestamp'] = time.strftime("%Y-%m-%dT%H:%M:%S", time.gmtime())

        # Sort the URL parameters by key
        keys = url_params.keys()
        keys.sort()

        # Get the values in the same order of the sorted keys
        values = map(url_params.get, keys)

        # Reconstruct the URL paramters and encode them
        url_string = urlencode( zip(keys,values) )
        url_string = url_string.replace('+'," ")
        url_string = url_string.replace(':',":")

        #Construct the string to sign
        string_to_sign = """GET
ecs.amazonaws.com
/onca/xml
%s""" % url_string
        # Sign the request
        signature = hmac.new(
            key=AWS_SECRET_ACCESS_KEY,
            msg=string_to_sign,
            digestmod=hashlib.sha256).digest()

        # Base64 encode the signature
        signature = base64.encodestring( signature )

        # Make the signature URL safe
        signature = signature.replace('+', quote('+'))
        signature = signature.replace('=', quote('='))
        url_string += "&Signature=%s" % signature
        return "%s?%s" % (base_url,url_string)

(The signing part was shamelessly copied and modified from Cloud Carpenters, because I spent countless hours in porting my PHP function to build and sign the request to Python, just to end up learning the ‘Error’ XML by heart !)

The ‘buildurl’ function takes 2 arguments. The first one is ‘self’. Notice we have passed only one argument when we called the ‘buildurl’ function in the ‘post’ function of the ‘Fetch’ class, but here we’re passing 2 arguments. You have to pass ‘self’ when you write the function, but you don’t need to pass it while calling the function, it is available by default. The second argument is the ‘keyword’ the user is searching books for, we passed this as an argument in the function call as ‘key’. Substitute your own Amazon Access Key ID, Secret Access Key where mentioned. We pass some more information required to construct and sign the request in the ‘url_params’ dictionary. Put your own Associate Tag in this dictionary.

If you look at ‘SearchIndex’, we have given it a value ‘Books’. Now that is what specifies in the request that we want ‘Books’ for the given ‘keyword’. If you change it to ‘DVD’, we’ll get back the DVDs list for the keyword searched. There is a list of categories available here: Product Categories. You are free to change it if you want! The ‘ItemPage':1 corresponds to the first page in the returned product data, it contains only 10 products. You have to change this to 2 to get the next 10 products for the entered keyword, 3 for the next 10 and so on until it reaches the total number of pages (‘TotalPages’ in the returned ‘XML’).

I recommend you read the Amazon Product Advertising API docs, and see what else can be done to improve the product search. Here’s the link for it: Amazon Product Advertising API.

Getting back to the code, we pass a timestamp to the ‘url_params’ dictionary. This is required as Amazon requires all the requests to be signed with the current timestamp. The keys of the ‘url_params’ dictionary are sorted alphabetically, if not sorted, the signature is incorrect. For constructing the ‘url_string’ for signing, we replace spaces with ‘+’ symbols. A sha256 encoded message digest is then created with the ‘url_string’ and your Amazon Secret Access key. This digest is the signature, and we Base64 encode it further. The Base64 encoded signature is made url safe by replacing ‘+’ and ‘=’ with their url safe equivalents, using the ‘quote’ function. This signature is now appended to ‘url_string’ formed earlier. The ‘buildurl’ function returns this properly signed ‘url_string’ to the ‘post’ function of the ‘Fetch’ class, where we called ‘buildurl’. We assign the reurned ‘url_string’ to the variable ‘url’.

You can do a trick here to check if the url is signed properly and gets us what we want. Instead of returning the url, you can output it here itself by using ‘self.response.out.write(“%s?%s” % (base_url,url_string))’. Run the app, and you should see the url displayed on a page, after submitting a keyword. Copy the url and paste in your browser’s address bar. If the signature was formed correctly without any mistakes, then you will get back an XML document with the books data for the entered keyword. If not, then you will get an XML document with an error message in it. This message will tell you what was the mistake in the request url. When you get the XML for the books, just go through it to see how it is formed. This will give you an idea of how to parse it to get the values we want to display.

Next we will write the ‘processurl’ function that will take the request url and fire it, parse the XML doc and return the values as a list, which we’ll store in ‘books’. But before we add the ‘processurl’ function, make a new python file and add the following code to it:

import re
import xml.sax.handler

def xml2obj(src):
    """
    A simple function to converts XML data into native Python object.
    """

    non_id_char = re.compile('[^_0-9a-zA-Z]')
    def _name_mangle(name):
        return non_id_char.sub('_', name)

    class DataNode(object):
        def __init__(self):
            self._attrs = {}    # XML attributes and child elements
            self.data = None    # child text data
        def __len__(self):
            # treat single element as a list of 1
            return 1
        def __getitem__(self, key):
            if isinstance(key, basestring):
                return self._attrs.get(key,None)
            else:
                return [self][key]
        def __contains__(self, name):
            return self._attrs.has_key(name)
        def __nonzero__(self):
            return bool(self._attrs or self.data)
        def __getattr__(self, name):
            if name.startswith('__'):
                # need to do this for Python special methods???
                raise AttributeError(name)
            return self._attrs.get(name,None)
        def _add_xml_attr(self, name, value):
            if name in self._attrs:
                # multiple attribute of the same name are represented by a list
                children = self._attrs[name]
                if not isinstance(children, list):
                    children = [children]
                    self._attrs[name] = children
                children.append(value)
            else:
                self._attrs[name] = value
        def __str__(self):
            return self.data or ''
        def __repr__(self):
            items = sorted(self._attrs.items())
            if self.data:
                items.append(('data', self.data))
            return u'{%s}' % ', '.join([u'%s:%s' % (k,repr(v)) for k,v in items])

    class TreeBuilder(xml.sax.handler.ContentHandler):
        def __init__(self):
            self.stack = []
            self.root = DataNode()
            self.current = self.root
            self.text_parts = []
        def startElement(self, name, attrs):
            self.stack.append((self.current, self.text_parts))
            self.current = DataNode()
            self.text_parts = []
            # xml attributes --> python attributes
            for k, v in attrs.items():
                self.current._add_xml_attr(_name_mangle(k), v)
        def endElement(self, name):
            text = ''.join(self.text_parts).strip()
            if text:
                self.current.data = text
            if self.current._attrs:
                obj = self.current
            else:
                # a text only node is simply represented by the string
                obj = text or ''
            self.current, self.text_parts = self.stack.pop()
            self.current._add_xml_attr(_name_mangle(name), obj)
        def characters(self, content):
            self.text_parts.append(content)

    builder = TreeBuilder()
    if isinstance(src, basestring):
        xml.sax.parse(src, builder)
    else:
        xml.sax.parseString(src, builder)
    return builder.root._attrs.values()[0]

Save this file as ‘xmltoobj.py’ in the same folder as our app. It stands for XML to Object. This is an awesome library written by Wai Yip Tung and posted here: XML to Python Data Structure. It converts the XML doc to an iterable Python object. We can get our values from this object using the ‘dot’ notation. It is a very tricky job of parsing an XML doc and can take a lot of code even to get the simplest value! But this library makes it a breeze! I spent literally 3 days struggling to get the values from the books XML, and finally this library saved my life!

We need to import the above module in our code. So modify the ‘amazon.py’ file and add the following snippets to it:

# Add this below all the imports
from xmltoobj import xml2obj

# Add this to the ConstructUrl class

def processurl(self, url):
        data = xml2obj(url)
        books = []
        for i in range(len(data.Items.Item)):
            values = {}
            book = data.Items.Item[i]
            values['asin'] = book.ASIN
            values['image'] = book.MediumImage.URL
            values['title'] = book.ItemAttributes.Title
            values['author'] = book.ItemAttributes.Author
            values['details'] = book.DetailPageURL
            books.append(values)
            del values
        return books

We’re importing the ‘xml2obj’ module from our ‘xmltoobj’ file. The ‘xml2obj’ module gets the XML by firing the request and parses it. We are calling it in our ‘processurl’ function. Arguments passed to ‘processurl’ function are the url (the request) and self (as I mentioned above, even though you don’t pass self in the function call, you have to pass it while defining the function). We further pass the url to the ‘xml2obj’ function and store the returned data in the variable ‘data’. For the app’s sake, I’m parsing only to get the ‘ASIN’s (Amazon Identification Number, similar to ISBN), ‘Title’, a details link, ‘Authors’ and a medium sized image for every book. Notice how we’re accessing the values from the returned data. We’ve created an empty list object ‘books’ which will hold all the books. Then we loop through ‘Items’ node in the parsed XML doc and select all the ‘Item’ nodes (each ‘Item’ node represents a single book #), further storing all properties for a single book in a ‘values’ dictionary object. We append this dictionary to the previously created ‘books’ list and delete ‘values’ dictionary. We delete it because, if not deleted, at the end of the function you’ll have the ‘values’ dictionary holding the data for the last book.
We return the ‘books’ list back to the ‘post’ function in the ‘Fetch’ class.

# :- You can examine the XML doc by the trick I mentioned above, and figure out what and how you want to get, the node structure etc. 1 request returns 10 books at a time only. You can get the next 10 books by changing the ‘ItemPage’ : 2 in the ‘url_params’ dictionary. Changing it to 3, 4, 5 and so on will get you the corresponding page sets. You can use a ‘do-while’ construct to increment the ItemPage value until it reaches ‘TotalPages’, which is the total number of pages in that specific search result.

So we have the books data properly parsed and arranged in a list object, ready to be displayed. We pass the ‘books’ list object to the ‘template_values’ dictionary in the ‘post’ function of the ‘Fetch’ class, which will render the results page. Hope you got everything right, and have the books data as I have it at the end of this part!

In the next part, we’ll see how to loop through the books list and display the books in a well formatted table. We’ll also see some very important things such as how to register a new template tag. Read on!

No comments yet

Leave a Reply