{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Steam Data Collection\n",
    "\n",
    "*This forms part of a larger series of posts for my [blog](http://nik-davis.github.io) on downloading, processing and analysing data from the Steam store. [See all posts here](http://nik-davis.github.io/tag/steam).*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/json": {
       "Software versions": [
        {
         "module": "Python",
         "version": "3.7.3 64bit [MSC v.1900 64 bit (AMD64)]"
        },
        {
         "module": "IPython",
         "version": "7.5.0"
        },
        {
         "module": "OS",
         "version": "Windows 10 10.0.17763 SP0"
        },
        {
         "module": "numpy",
         "version": "1.16.3"
        },
        {
         "module": "pandas",
         "version": "0.24.2"
        }
       ]
      },
      "text/html": [
       "<table><tr><th>Software</th><th>Version</th></tr><tr><td>Python</td><td>3.7.3 64bit [MSC v.1900 64 bit (AMD64)]</td></tr><tr><td>IPython</td><td>7.5.0</td></tr><tr><td>OS</td><td>Windows 10 10.0.17763 SP0</td></tr><tr><td>numpy</td><td>1.16.3</td></tr><tr><td>pandas</td><td>0.24.2</td></tr><tr><td colspan='2'>Thu May 30 12:21:17 2019 GMT Summer Time</td></tr></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{|l|l|}\\hline\n",
       "{\\bf Software} & {\\bf Version} \\\\ \\hline\\hline\n",
       "Python & 3.7.3 64bit [MSC v.1900 64 bit (AMD64)] \\\\ \\hline\n",
       "IPython & 7.5.0 \\\\ \\hline\n",
       "OS & Windows 10 10.0.17763 SP0 \\\\ \\hline\n",
       "numpy & 1.16.3 \\\\ \\hline\n",
       "pandas & 0.24.2 \\\\ \\hline\n",
       "\\hline \\multicolumn{2}{|l|}{Thu May 30 12:21:17 2019 GMT Summer Time} \\\\ \\hline\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "Software versions\n",
       "Python 3.7.3 64bit [MSC v.1900 64 bit (AMD64)]\n",
       "IPython 7.5.0\n",
       "OS Windows 10 10.0.17763 SP0\n",
       "numpy 1.16.3\n",
       "pandas 0.24.2\n",
       "Thu May 30 12:21:17 2019 GMT Summer Time"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# view software version information\n",
    "\n",
    "# http://raw.github.com/jrjohansson/version_information/master/version_information.py\n",
    "%load_ext version_information\n",
    "# %reload_ext version_information\n",
    "\n",
    "%version_information numpy, pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![steam_logo](https://nik-davis.github.io/images/steam_logo_white.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first time I used Steam was with an account I can't remember the name of, playing a game which could well have been [Dark Messiah of Might & Magic](https://store.steampowered.com/app/2100/Dark_Messiah_of_Might__Magic/), on a computer that would easily be blown away by the processing power of my phone today.\n",
    "\n",
    "It was sometime in 2006, when every game I bought came on a disk in a box that now gathers dust somewhere in my parents' house. Surprisingly my current PC has a disk drive, though I can't remember the last time used it.\n",
    "\n",
    "In order to play the multiplayer component of the game, annoyingly I had to install a piece of third-party software I had never heard of called Steam, creating the account I have long since lost in the process. Here's what that software probably looked like at the time:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p class=\"source\" style=\"text-align: center\"><img src=\"https://nik-davis.github.io/images/steam_store_2006.jpg\" width=\"700\">Source: <a href=\"https://gfycat.com/@pcgevan\">pcgevan</a> via <a href=https://www.pcgamer.com/uk/steam-versions/>PC Gamer</a></p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fast-forward a little over 10 years and the <a href=\"https://en.wikipedia.org/wiki/Steam_(software)\">Steam Store</a> is huge, ubiquitous as the home of PC gaming and distribution. Whilst physical copies still just about feel at home on consoles, the PC market has [long since moved digital](https://www.pcgamer.com/uk/analyst-says-digital-sales-made-up-92-percent-of-pc-game-market-in-2013/). In case you are not familiar, Steam is a digital store for purchasing, downloading and playing video games. It hosts a variety of community features, allows pushing game updates to users automatically, and gathers news stories relevant to each title. It's a bit like [Google's Play Store](https://play.google.com) or [Apple's App Store](https://www.apple.com/uk/ios/app-store/) for phones.\n",
    "\n",
    "A large part of Steam's success as a platform is due to its use of frequent sales, convenience as a unified digital game library, and the aforementioned shift to digital over physical. Whilst other platforms are emerging and gaining traction, there is likely no better resource for examining gaming over the last decade. With that in mind, if we can construct a dataset from Steam's data, we will have access to a wealth of information about [nearly 30,000 games](https://store.steampowered.com/about/) released since 2003, when Steam first launched."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Project Goals\n",
    "\n",
    "<!-- PELICAN_BEGIN_SUMMARY -->\n",
    "\n",
    "The motivation for this project is to download, process and analyse a data set of Steam apps (games) from the Steam store, and gain insights into what makes a game more successful in terms of sales, play-time and ratings. We will imagine that we have been approached by a company hoping to develop and release a new title, using the findings we provide them to inform decisions about how best to manage their budget and hopefully increase the success of their next release.\n",
    "\n",
    "The first step will be tackling data collection - the actual retrieval of data from Steam's servers and databases. In the future we'll look at cleaning the data, transforming it into a more useful state, then on to data exploration and analysis. Finally we'll summarise our findings in a non-technical report which would be sent to the fictional company in question.\n",
    "\n",
    "<!-- PELICAN_END_SUMMARY -->\n",
    "\n",
    "At the end of the data collection and cleaning stages, we'd like to end up with a table or database like this:\n",
    "\n",
    "name | id | information | owners | price | rating\n",
    "--- | --- | --- | --- | --- | ---\n",
    "awesome game | 100 | genres, descriptors, variables | 100,000 | £9.99 | 9/10\n",
    "generic shooter 4 | 200 | definitely the best shooter | 50,000 | £39.99 | 6/10\n",
    "... | ... | ... | ... | ... | ...\n",
    "\n",
    "We can then interrogate the data, and investigate whether particular attributes tend to result in more successful games. Metrics like ownership and ratings should help define the success of a title.\n",
    "\n",
    "## Data Acquisition\n",
    "\n",
    "There are a number of ways to get this information. Obviously we could search the web (and especially [kaggle](https://www.kaggle.com/datasets)) for existing datasets, however to avoid letting someone else get away with all that hard work (and mainly for the purposes of learning) we'll be acquiring all the data ourselves from scratch.\n",
    "\n",
    "Often when generating data the best place to start is to check for APIs. Fortunately Valve (the company behind Steam) make one available at https://partner.steamgames.com/. An [API](https://en.wikipedia.org/wiki/Application_programming_interface) such as this allows anyone to interface with data on a website in a controlled way, usually providing a host of useful features to the end-user. Typically an API is a great way for developers to allow access to databases and information on a server. Unfortunately this documentation doesn't include all access points, but others have documented this for us. [This](https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI) documentation of the StorefrontAPI will be particularly useful.\n",
    "\n",
    "We'll be able to get good information about the details of each game from the Steam API, however we're still missing information about popularity and sales. Luckily we can easily get this data from another website, SteamSpy.\n",
    "\n",
    "[SteamSpy](https://steamspy.com/about) is a Steam stats-gathering service and crucially has data easily available through its own API (documentation [here](https://steamspy.com/api.php)). It provides a number of useful metrics including an estimation for total owners of each game.\n",
    "\n",
    "We'll be retrieving data from both APIs and combining them to form our dataset. For the purposes of this project, we'll be performing as little data cleaning as possible at this stage, providing 'dirty' data for data cleaning, the next step in this project.\n",
    "\n",
    "## Section outline:\n",
    "\n",
    "- Create an app list from SteamSpy API using 'all' request\n",
    "- Retrieve individual app data from Steam API, by iterating through app list\n",
    "- Retrieve individual app data from SteamSpy API, by iterating through app list\n",
    "- Export app list, Steam data and SteamSpy data to csv files\n",
    "\n",
    "## API references:\n",
    "\n",
    "- https://partner.steamgames.com/doc/webapi\n",
    "- https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI\n",
    "- https://steamapi.xpaw.me/#\n",
    "- https://steamspy.com/api.php\n",
    "\n",
    "\n",
    "## Import Libraries\n",
    "\n",
    "We begin by importing the libraries we will be using. We start with [standard library imports](https://docs.python.org/3/library/), or those available by default in Python, then import the third-party packages. We'll be using [requests](https://2.python-requests.org/en/master/) to handle interacting with the APIs, then the popular [pandas](http://pandas.pydata.org/pandas-docs/stable/) and [numpy](https://docs.scipy.org/doc/numpy/index.html) libraries for handling the downloaded data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# standard library imports\n",
    "import csv\n",
    "import datetime as dt\n",
    "import json\n",
    "import os\n",
    "import statistics\n",
    "import time\n",
    "\n",
    "# third-party imports\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import requests\n",
    "\n",
    "# customisations - ensure tables show all columns\n",
    "pd.set_option(\"max_columns\", 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we define a general, all-purpose function to process get requests from an API, supplied through a URL parameter. A dictionary of parameters can be supplied which is passed into the get request automatically, depending on the requirements of the API.\n",
    "\n",
    "Rather than simply returning the response, we handle a couple of scenarios to help automation. Occasionally we encounter an SSL Error, in which case we simply wait a few seconds then try again (by recursively calling the function). When this happens, and generally throughout this project, we provide quite verbose feedback to show when these errors are encountered and how they are handled.\n",
    "\n",
    "Sometimes there is no response when a request is made (returns None). This usually happens when too many requests are made in a short period of time, and the polling limit has been reached. We try to avoid this by pausing briefly between requests, as we'll see later, but in case we breach the polling limit we wait 10 seconds then try again.\n",
    "\n",
    "Handling these errors in this way ensures that our function almost always returns the desired response, which we return in json format to make processing easier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_request(url, parameters=None):\n",
    "    \"\"\"Return json-formatted response of a get request using optional parameters.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    url : string\n",
    "    parameters : {'parameter': 'value'}\n",
    "        parameters to pass as part of get request\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    json_data\n",
    "        json-formatted response (dict-like)\n",
    "    \"\"\"\n",
    "    try:\n",
    "        response = requests.get(url=url, params=parameters)\n",
    "    except SSLError as s:\n",
    "        print('SSL Error:', s)\n",
    "        \n",
    "        for i in range(5, 0, -1):\n",
    "            print('\\rWaiting... ({})'.format(i), end='')\n",
    "            time.sleep(1)\n",
    "        print('\\rRetrying.' + ' '*10)\n",
    "        \n",
    "        # recusively try again\n",
    "        return get_request(url, parameters)\n",
    "    \n",
    "    if response:\n",
    "        return response.json()\n",
    "    else:\n",
    "        # response is none usually means too many requests. Wait and try again \n",
    "        print('No response, waiting 10 seconds...')\n",
    "        time.sleep(10)\n",
    "        print('Retrying.')\n",
    "        return get_request(url, parameters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate List of App IDs\n",
    "\n",
    "Every app on the steam store has a unique app ID. Whilst different apps can have the same name, they can't have the same ID. This will be very useful to us for identifying apps and eventually merging our tables of data.\n",
    "\n",
    "Before we get to that, we need to generate a list of app ids which we can use to build our data sets. It's possible to generate one from the Steam API, however this has over 70,000 entries, many of which are demos and videos with no way to tell them apart. Instead, SteamSpy provides an 'all' request, supplying some information about the apps they track. It doesn't supply all information about each app, so we still need to request this information individually, but it provides a good starting point.\n",
    "\n",
    "Because many of the return fields are strings containing commas and other punctuation, it is easiest to read the response into a pandas dataframe, and export the required appid and name fields to a csv. We could keep only the appid column as a list or pandas series, but it may be useful to keep the app name at this stage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>appid</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10</td>\n",
       "      <td>Counter-Strike</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>20</td>\n",
       "      <td>Team Fortress Classic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>30</td>\n",
       "      <td>Day of Defeat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>40</td>\n",
       "      <td>Deathmatch Classic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>50</td>\n",
       "      <td>Half-Life: Opposing Force</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   appid                       name\n",
       "0     10             Counter-Strike\n",
       "1     20      Team Fortress Classic\n",
       "2     30              Day of Defeat\n",
       "3     40         Deathmatch Classic\n",
       "4     50  Half-Life: Opposing Force"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "url = \"https://steamspy.com/api.php\"\n",
    "parameters = {\"request\": \"all\"}\n",
    "\n",
    "# request 'all' from steam spy and parse into dataframe\n",
    "json_data = get_request(url, parameters=parameters)\n",
    "steam_spy_all = pd.DataFrame.from_dict(json_data, orient='index')\n",
    "\n",
    "# generate sorted app_list from steamspy data\n",
    "app_list = steam_spy_all[['appid', 'name']].sort_values('appid').reset_index(drop=True)\n",
    "\n",
    "# export disabled to keep consistency across download sessions\n",
    "# app_list.to_csv('../data/download/app_list.csv', index=False)\n",
    "\n",
    "# instead read from stored csv\n",
    "app_list = pd.read_csv('../data/download/app_list.csv')\n",
    "\n",
    "# display first few rows\n",
    "app_list.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Download Logic\n",
    "\n",
    "Now we have the `app_list` dataframe, we can iterate over the app IDs and request individual app data from the servers. Here we set out our logic to retrieve and process this information, then finally store the data as a csv file.\n",
    "\n",
    "Because it takes a long time to retrieve the data, it would be dangerous to attempt it all in one go as any errors or connection time-outs could cause the loss of all our data. For this reason we define a function to download and process the requests in batches, appending each batch to an external file and keeping track of the highest index written in a separate file.\n",
    "\n",
    "This not only provides security, allowing us to easily restart the process if an error is encountered, but also means we can complete the download across multiple sessions.\n",
    "\n",
    "Again, we provide verbose output for rows exported, batches complete, time taken and estimated time remaining."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "def get_app_data(start, stop, parser, pause):\n",
    "    \"\"\"Return list of app data generated from parser.\n",
    "    \n",
    "    parser : function to handle request\n",
    "    \"\"\"\n",
    "    app_data = []\n",
    "    \n",
    "    # iterate through each row of app_list, confined by start and stop\n",
    "    for index, row in app_list[start:stop].iterrows():\n",
    "        print('Current index: {}'.format(index), end='\\r')\n",
    "        \n",
    "        appid = row['appid']\n",
    "        name = row['name']\n",
    "\n",
    "        # retrive app data for a row, handled by supplied parser, and append to list\n",
    "        data = parser(appid, name)\n",
    "        app_data.append(data)\n",
    "\n",
    "        time.sleep(pause) # prevent overloading api with requests\n",
    "    \n",
    "    return app_data\n",
    "\n",
    "\n",
    "def process_batches(parser, app_list, download_path, data_filename, index_filename,\n",
    "                    columns, begin=0, end=-1, batchsize=100, pause=1):\n",
    "    \"\"\"Process app data in batches, writing directly to file.\n",
    "    \n",
    "    parser : custom function to format request\n",
    "    app_list : dataframe of appid and name\n",
    "    download_path : path to store data\n",
    "    data_filename : filename to save app data\n",
    "    index_filename : filename to store highest index written\n",
    "    columns : column names for file\n",
    "    \n",
    "    Keyword arguments:\n",
    "    \n",
    "    begin : starting index (get from index_filename, default 0)\n",
    "    end : index to finish (defaults to end of app_list)\n",
    "    batchsize : number of apps to write in each batch (default 100)\n",
    "    pause : time to wait after each api request (defualt 1)\n",
    "    \n",
    "    returns: none\n",
    "    \"\"\"\n",
    "    print('Starting at index {}:\\n'.format(begin))\n",
    "    \n",
    "    # by default, process all apps in app_list\n",
    "    if end == -1:\n",
    "        end = len(app_list) + 1\n",
    "    \n",
    "    # generate array of batch begin and end points\n",
    "    batches = np.arange(begin, end, batchsize)\n",
    "    batches = np.append(batches, end)\n",
    "    \n",
    "    apps_written = 0\n",
    "    batch_times = []\n",
    "    \n",
    "    for i in range(len(batches) - 1):\n",
    "        start_time = time.time()\n",
    "        \n",
    "        start = batches[i]\n",
    "        stop = batches[i+1]\n",
    "        \n",
    "        app_data = get_app_data(start, stop, parser, pause)\n",
    "        \n",
    "        rel_path = os.path.join(download_path, data_filename)\n",
    "        \n",
    "        # writing app data to file\n",
    "        with open(rel_path, 'a', newline='', encoding='utf-8') as f:\n",
    "            writer = csv.DictWriter(f, fieldnames=columns, extrasaction='ignore')\n",
    "            \n",
    "            for j in range(3,0,-1):\n",
    "                print(\"\\rAbout to write data, don't stop script! ({})\".format(j), end='')\n",
    "                time.sleep(0.5)\n",
    "            \n",
    "            writer.writerows(app_data)\n",
    "            print('\\rExported lines {}-{} to {}.'.format(start, stop-1, data_filename), end=' ')\n",
    "            \n",
    "        apps_written += len(app_data)\n",
    "        \n",
    "        idx_path = os.path.join(download_path, index_filename)\n",
    "        \n",
    "        # writing last index to file\n",
    "        with open(idx_path, 'w') as f:\n",
    "            index = stop\n",
    "            print(index, file=f)\n",
    "            \n",
    "        # logging time taken\n",
    "        end_time = time.time()\n",
    "        time_taken = end_time - start_time\n",
    "        \n",
    "        batch_times.append(time_taken)\n",
    "        mean_time = statistics.mean(batch_times)\n",
    "        \n",
    "        est_remaining = (len(batches) - i - 2) * mean_time\n",
    "        \n",
    "        remaining_td = dt.timedelta(seconds=round(est_remaining))\n",
    "        time_td = dt.timedelta(seconds=round(time_taken))\n",
    "        mean_td = dt.timedelta(seconds=round(mean_time))\n",
    "        \n",
    "        print('Batch {} time: {} (avg: {}, remaining: {})'.format(i, time_td, mean_td, remaining_td))\n",
    "            \n",
    "    print('\\nProcessing batches complete. {} apps written'.format(apps_written))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we define some functions to handle and prepare the external files.\n",
    "\n",
    "We use `reset_index` for testing and demonstration, allowing us to easily reset the index in the stored file to 0, effectively restarting the entire download process.\n",
    "\n",
    "We define `get_index` to retrieve the index from file, maintaining persistence across sessions. Every time a batch of information (app data) is written to file, we write the highest index within `app_data` that was retrieved. As stated, this is partially for security, ensuring that if there is an error during the download we can read the index from file and continue from the end of the last successful batch. Keeping track of the index also allows us to pause the download, continuing at a later time.\n",
    "\n",
    "Finally, the `prepare_data_file` function readies the csv for storing the data. If the index we retrieved is 0, it means we are either starting for the first time or starting over. In either case, we want a blank csv file with only the header row to begin writing to, se we wipe the file (by opening in write mode) and write the header. Conversely, if the index is anything other than 0, it means we already have downloaded information, and can leave the csv file alone."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def reset_index(download_path, index_filename):\n",
    "    \"\"\"Reset index in file to 0.\"\"\"\n",
    "    rel_path = os.path.join(download_path, index_filename)\n",
    "    \n",
    "    with open(rel_path, 'w') as f:\n",
    "        print(0, file=f)\n",
    "        \n",
    "\n",
    "def get_index(download_path, index_filename):\n",
    "    \"\"\"Retrieve index from file, returning 0 if file not found.\"\"\"\n",
    "    try:\n",
    "        rel_path = os.path.join(download_path, index_filename)\n",
    "\n",
    "        with open(rel_path, 'r') as f:\n",
    "            index = int(f.readline())\n",
    "    \n",
    "    except FileNotFoundError:\n",
    "        index = 0\n",
    "        \n",
    "    return index\n",
    "\n",
    "\n",
    "def prepare_data_file(download_path, filename, index, columns):\n",
    "    \"\"\"Create file and write headers if index is 0.\"\"\"\n",
    "    if index == 0:\n",
    "        rel_path = os.path.join(download_path, filename)\n",
    "\n",
    "        with open(rel_path, 'w', newline='') as f:\n",
    "            writer = csv.DictWriter(f, fieldnames=columns)\n",
    "            writer.writeheader()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download Steam Data\n",
    "\n",
    "Now we are ready to start downloading data and writing to file. We define our logic particular to handling the steam API - in fact if no data is returned we return just the name and appid - then begin setting some parameters. We define the files we will write our data and index to, and the columns for the csv file. The API doesn't return every column for every app, so it is best to explicitly set these.\n",
    "\n",
    "Next we run our functions to set up the files, and make a call to `process_batches` to begin the process. Some additional parameters have been added for demonstration, to constrain the download to just a few rows and smaller batches. Removing these would allow the entire download process to be repeated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Starting at index 0:\n",
      "\n",
      "Exported lines 0-4 to steam_app_data.csv. Batch 0 time: 0:00:10 (avg: 0:00:10, remaining: 0:00:10)\n",
      "Exported lines 5-9 to steam_app_data.csv. Batch 1 time: 0:00:10 (avg: 0:00:10, remaining: 0:00:00)\n",
      "\n",
      "Processing batches complete. 10 apps written\n"
     ]
    }
   ],
   "source": [
    "def parse_steam_request(appid, name):\n",
    "    \"\"\"Unique parser to handle data from Steam Store API.\n",
    "    \n",
    "    Returns : json formatted data (dict-like)\n",
    "    \"\"\"\n",
    "    url = \"http://store.steampowered.com/api/appdetails/\"\n",
    "    parameters = {\"appids\": appid}\n",
    "    \n",
    "    json_data = get_request(url, parameters=parameters)\n",
    "    json_app_data = json_data[str(appid)]\n",
    "    \n",
    "    if json_app_data['success']:\n",
    "        data = json_app_data['data']\n",
    "    else:\n",
    "        data = {'name': name, 'steam_appid': appid}\n",
    "        \n",
    "    return data\n",
    "\n",
    "\n",
    "# Set file parameters\n",
    "download_path = '../data/download'\n",
    "steam_app_data = 'steam_app_data.csv'\n",
    "steam_index = 'steam_index.txt'\n",
    "\n",
    "steam_columns = [\n",
    "    'type', 'name', 'steam_appid', 'required_age', 'is_free', 'controller_support',\n",
    "    'dlc', 'detailed_description', 'about_the_game', 'short_description', 'fullgame',\n",
    "    'supported_languages', 'header_image', 'website', 'pc_requirements', 'mac_requirements',\n",
    "    'linux_requirements', 'legal_notice', 'drm_notice', 'ext_user_account_notice',\n",
    "    'developers', 'publishers', 'demos', 'price_overview', 'packages', 'package_groups',\n",
    "    'platforms', 'metacritic', 'reviews', 'categories', 'genres', 'screenshots',\n",
    "    'movies', 'recommendations', 'achievements', 'release_date', 'support_info',\n",
    "    'background', 'content_descriptors'\n",
    "]\n",
    "\n",
    "# Overwrites last index for demonstration (would usually store highest index so can continue across sessions)\n",
    "reset_index(download_path, steam_index)\n",
    "\n",
    "# Retrieve last index downloaded from file\n",
    "index = get_index(download_path, steam_index)\n",
    "\n",
    "# Wipe or create data file and write headers if index is 0\n",
    "prepare_data_file(download_path, steam_app_data, index, steam_columns)\n",
    "\n",
    "# Set end and chunksize for demonstration - remove to run through entire app list\n",
    "process_batches(\n",
    "    parser=parse_steam_request,\n",
    "    app_list=app_list,\n",
    "    download_path=download_path,\n",
    "    data_filename=steam_app_data,\n",
    "    index_filename=steam_index,\n",
    "    columns=steam_columns,\n",
    "    begin=index,\n",
    "    end=10,\n",
    "    batchsize=5\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>name</th>\n",
       "      <th>steam_appid</th>\n",
       "      <th>required_age</th>\n",
       "      <th>is_free</th>\n",
       "      <th>controller_support</th>\n",
       "      <th>dlc</th>\n",
       "      <th>detailed_description</th>\n",
       "      <th>about_the_game</th>\n",
       "      <th>short_description</th>\n",
       "      <th>fullgame</th>\n",
       "      <th>supported_languages</th>\n",
       "      <th>header_image</th>\n",
       "      <th>website</th>\n",
       "      <th>pc_requirements</th>\n",
       "      <th>mac_requirements</th>\n",
       "      <th>linux_requirements</th>\n",
       "      <th>legal_notice</th>\n",
       "      <th>drm_notice</th>\n",
       "      <th>ext_user_account_notice</th>\n",
       "      <th>developers</th>\n",
       "      <th>publishers</th>\n",
       "      <th>demos</th>\n",
       "      <th>price_overview</th>\n",
       "      <th>packages</th>\n",
       "      <th>package_groups</th>\n",
       "      <th>platforms</th>\n",
       "      <th>metacritic</th>\n",
       "      <th>reviews</th>\n",
       "      <th>categories</th>\n",
       "      <th>genres</th>\n",
       "      <th>screenshots</th>\n",
       "      <th>movies</th>\n",
       "      <th>recommendations</th>\n",
       "      <th>achievements</th>\n",
       "      <th>release_date</th>\n",
       "      <th>support_info</th>\n",
       "      <th>background</th>\n",
       "      <th>content_descriptors</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>game</td>\n",
       "      <td>Counter-Strike</td>\n",
       "      <td>10</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Play the world's number 1 online action game. ...</td>\n",
       "      <td>Play the world's number 1 online action game. ...</td>\n",
       "      <td>Play the world's number 1 online action game. ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>English&lt;strong&gt;*&lt;/strong&gt;, French&lt;strong&gt;*&lt;/st...</td>\n",
       "      <td>https://steamcdn-a.akamaihd.net/steam/apps/10/...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'minimum': '\\r\\n\\t\\t\\t&lt;p&gt;&lt;strong&gt;Minimum:&lt;/st...</td>\n",
       "      <td>{'minimum': 'Minimum: OS X  Snow Leopard 10.6....</td>\n",
       "      <td>{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>['Valve']</td>\n",
       "      <td>['Valve']</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'currency': 'GBP', 'initial': 719, 'final': 7...</td>\n",
       "      <td>[7]</td>\n",
       "      <td>[{'name': 'default', 'title': 'Buy Counter-Str...</td>\n",
       "      <td>{'windows': True, 'mac': True, 'linux': True}</td>\n",
       "      <td>{'score': 88, 'url': 'https://www.metacritic.c...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'id': 1, 'description': 'Multi-player'}, {'i...</td>\n",
       "      <td>[{'id': '1', 'description': 'Action'}]</td>\n",
       "      <td>[{'id': 0, 'path_thumbnail': 'https://steamcdn...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'total': 66232}</td>\n",
       "      <td>{'total': 0}</td>\n",
       "      <td>{'coming_soon': False, 'date': '1 Nov, 2000'}</td>\n",
       "      <td>{'url': 'http://steamcommunity.com/app/10', 'e...</td>\n",
       "      <td>https://steamcdn-a.akamaihd.net/steam/apps/10/...</td>\n",
       "      <td>{'ids': [2, 5], 'notes': 'Includes intense vio...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>game</td>\n",
       "      <td>Team Fortress Classic</td>\n",
       "      <td>20</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>One of the most popular online action games of...</td>\n",
       "      <td>One of the most popular online action games of...</td>\n",
       "      <td>One of the most popular online action games of...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>English, French, German, Italian, Spanish - Sp...</td>\n",
       "      <td>https://steamcdn-a.akamaihd.net/steam/apps/20/...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'minimum': '\\r\\n\\t\\t\\t&lt;p&gt;&lt;strong&gt;Minimum:&lt;/st...</td>\n",
       "      <td>{'minimum': 'Minimum: OS X  Snow Leopard 10.6....</td>\n",
       "      <td>{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>['Valve']</td>\n",
       "      <td>['Valve']</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'currency': 'GBP', 'initial': 399, 'final': 3...</td>\n",
       "      <td>[29]</td>\n",
       "      <td>[{'name': 'default', 'title': 'Buy Team Fortre...</td>\n",
       "      <td>{'windows': True, 'mac': True, 'linux': True}</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'id': 1, 'description': 'Multi-player'}, {'i...</td>\n",
       "      <td>[{'id': '1', 'description': 'Action'}]</td>\n",
       "      <td>[{'id': 0, 'path_thumbnail': 'https://steamcdn...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'total': 2816}</td>\n",
       "      <td>{'total': 0}</td>\n",
       "      <td>{'coming_soon': False, 'date': '1 Apr, 1999'}</td>\n",
       "      <td>{'url': '', 'email': ''}</td>\n",
       "      <td>https://steamcdn-a.akamaihd.net/steam/apps/20/...</td>\n",
       "      <td>{'ids': [2, 5], 'notes': 'Includes intense vio...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>game</td>\n",
       "      <td>Day of Defeat</td>\n",
       "      <td>30</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Enlist in an intense brand of Axis vs. Allied ...</td>\n",
       "      <td>Enlist in an intense brand of Axis vs. Allied ...</td>\n",
       "      <td>Enlist in an intense brand of Axis vs. Allied ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>English, French, German, Italian, Spanish - Spain</td>\n",
       "      <td>https://steamcdn-a.akamaihd.net/steam/apps/30/...</td>\n",
       "      <td>http://www.dayofdefeat.com/</td>\n",
       "      <td>{'minimum': '\\r\\n\\t\\t\\t&lt;p&gt;&lt;strong&gt;Minimum:&lt;/st...</td>\n",
       "      <td>{'minimum': 'Minimum: OS X  Snow Leopard 10.6....</td>\n",
       "      <td>{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>['Valve']</td>\n",
       "      <td>['Valve']</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'currency': 'GBP', 'initial': 399, 'final': 3...</td>\n",
       "      <td>[30]</td>\n",
       "      <td>[{'name': 'default', 'title': 'Buy Day of Defe...</td>\n",
       "      <td>{'windows': True, 'mac': True, 'linux': True}</td>\n",
       "      <td>{'score': 79, 'url': 'https://www.metacritic.c...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'id': 1, 'description': 'Multi-player'}, {'i...</td>\n",
       "      <td>[{'id': '1', 'description': 'Action'}]</td>\n",
       "      <td>[{'id': 0, 'path_thumbnail': 'https://steamcdn...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'total': 2013}</td>\n",
       "      <td>{'total': 0}</td>\n",
       "      <td>{'coming_soon': False, 'date': '1 May, 2003'}</td>\n",
       "      <td>{'url': '', 'email': ''}</td>\n",
       "      <td>https://steamcdn-a.akamaihd.net/steam/apps/30/...</td>\n",
       "      <td>{'ids': [], 'notes': None}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>game</td>\n",
       "      <td>Deathmatch Classic</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Enjoy fast-paced multiplayer gaming with Death...</td>\n",
       "      <td>Enjoy fast-paced multiplayer gaming with Death...</td>\n",
       "      <td>Enjoy fast-paced multiplayer gaming with Death...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>English, French, German, Italian, Spanish - Sp...</td>\n",
       "      <td>https://steamcdn-a.akamaihd.net/steam/apps/40/...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'minimum': '\\r\\n\\t\\t\\t&lt;p&gt;&lt;strong&gt;Minimum:&lt;/st...</td>\n",
       "      <td>{'minimum': 'Minimum: OS X  Snow Leopard 10.6....</td>\n",
       "      <td>{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>['Valve']</td>\n",
       "      <td>['Valve']</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'currency': 'GBP', 'initial': 399, 'final': 3...</td>\n",
       "      <td>[31]</td>\n",
       "      <td>[{'name': 'default', 'title': 'Buy Deathmatch ...</td>\n",
       "      <td>{'windows': True, 'mac': True, 'linux': True}</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'id': 1, 'description': 'Multi-player'}, {'i...</td>\n",
       "      <td>[{'id': '1', 'description': 'Action'}]</td>\n",
       "      <td>[{'id': 0, 'path_thumbnail': 'https://steamcdn...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'total': 942}</td>\n",
       "      <td>{'total': 0}</td>\n",
       "      <td>{'coming_soon': False, 'date': '1 Jun, 2001'}</td>\n",
       "      <td>{'url': '', 'email': ''}</td>\n",
       "      <td>https://steamcdn-a.akamaihd.net/steam/apps/40/...</td>\n",
       "      <td>{'ids': [], 'notes': None}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>game</td>\n",
       "      <td>Half-Life: Opposing Force</td>\n",
       "      <td>50</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Return to the Black Mesa Research Facility as ...</td>\n",
       "      <td>Return to the Black Mesa Research Facility as ...</td>\n",
       "      <td>Return to the Black Mesa Research Facility as ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>English, French, German, Korean</td>\n",
       "      <td>https://steamcdn-a.akamaihd.net/steam/apps/50/...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'minimum': '\\r\\n\\t\\t\\t&lt;p&gt;&lt;strong&gt;Minimum:&lt;/st...</td>\n",
       "      <td>{'minimum': 'Minimum: OS X  Snow Leopard 10.6....</td>\n",
       "      <td>{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>['Gearbox Software']</td>\n",
       "      <td>['Valve']</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'currency': 'GBP', 'initial': 399, 'final': 3...</td>\n",
       "      <td>[32]</td>\n",
       "      <td>[{'name': 'default', 'title': 'Buy Half-Life: ...</td>\n",
       "      <td>{'windows': True, 'mac': True, 'linux': True}</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'id': 2, 'description': 'Single-player'}, {'...</td>\n",
       "      <td>[{'id': '1', 'description': 'Action'}]</td>\n",
       "      <td>[{'id': 0, 'path_thumbnail': 'https://steamcdn...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'total': 4402}</td>\n",
       "      <td>{'total': 0}</td>\n",
       "      <td>{'coming_soon': False, 'date': '1 Nov, 1999'}</td>\n",
       "      <td>{'url': 'https://help.steampowered.com', 'emai...</td>\n",
       "      <td>https://steamcdn-a.akamaihd.net/steam/apps/50/...</td>\n",
       "      <td>{'ids': [], 'notes': None}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   type                       name  steam_appid  required_age  is_free  \\\n",
       "0  game             Counter-Strike           10             0    False   \n",
       "1  game      Team Fortress Classic           20             0    False   \n",
       "2  game              Day of Defeat           30             0    False   \n",
       "3  game         Deathmatch Classic           40             0    False   \n",
       "4  game  Half-Life: Opposing Force           50             0    False   \n",
       "\n",
       "   controller_support  dlc                               detailed_description  \\\n",
       "0                 NaN  NaN  Play the world's number 1 online action game. ...   \n",
       "1                 NaN  NaN  One of the most popular online action games of...   \n",
       "2                 NaN  NaN  Enlist in an intense brand of Axis vs. Allied ...   \n",
       "3                 NaN  NaN  Enjoy fast-paced multiplayer gaming with Death...   \n",
       "4                 NaN  NaN  Return to the Black Mesa Research Facility as ...   \n",
       "\n",
       "                                      about_the_game  \\\n",
       "0  Play the world's number 1 online action game. ...   \n",
       "1  One of the most popular online action games of...   \n",
       "2  Enlist in an intense brand of Axis vs. Allied ...   \n",
       "3  Enjoy fast-paced multiplayer gaming with Death...   \n",
       "4  Return to the Black Mesa Research Facility as ...   \n",
       "\n",
       "                                   short_description  fullgame  \\\n",
       "0  Play the world's number 1 online action game. ...       NaN   \n",
       "1  One of the most popular online action games of...       NaN   \n",
       "2  Enlist in an intense brand of Axis vs. Allied ...       NaN   \n",
       "3  Enjoy fast-paced multiplayer gaming with Death...       NaN   \n",
       "4  Return to the Black Mesa Research Facility as ...       NaN   \n",
       "\n",
       "                                 supported_languages  \\\n",
       "0  English<strong>*</strong>, French<strong>*</st...   \n",
       "1  English, French, German, Italian, Spanish - Sp...   \n",
       "2  English, French, German, Italian, Spanish - Spain   \n",
       "3  English, French, German, Italian, Spanish - Sp...   \n",
       "4                    English, French, German, Korean   \n",
       "\n",
       "                                        header_image  \\\n",
       "0  https://steamcdn-a.akamaihd.net/steam/apps/10/...   \n",
       "1  https://steamcdn-a.akamaihd.net/steam/apps/20/...   \n",
       "2  https://steamcdn-a.akamaihd.net/steam/apps/30/...   \n",
       "3  https://steamcdn-a.akamaihd.net/steam/apps/40/...   \n",
       "4  https://steamcdn-a.akamaihd.net/steam/apps/50/...   \n",
       "\n",
       "                       website  \\\n",
       "0                          NaN   \n",
       "1                          NaN   \n",
       "2  http://www.dayofdefeat.com/   \n",
       "3                          NaN   \n",
       "4                          NaN   \n",
       "\n",
       "                                     pc_requirements  \\\n",
       "0  {'minimum': '\\r\\n\\t\\t\\t<p><strong>Minimum:</st...   \n",
       "1  {'minimum': '\\r\\n\\t\\t\\t<p><strong>Minimum:</st...   \n",
       "2  {'minimum': '\\r\\n\\t\\t\\t<p><strong>Minimum:</st...   \n",
       "3  {'minimum': '\\r\\n\\t\\t\\t<p><strong>Minimum:</st...   \n",
       "4  {'minimum': '\\r\\n\\t\\t\\t<p><strong>Minimum:</st...   \n",
       "\n",
       "                                    mac_requirements  \\\n",
       "0  {'minimum': 'Minimum: OS X  Snow Leopard 10.6....   \n",
       "1  {'minimum': 'Minimum: OS X  Snow Leopard 10.6....   \n",
       "2  {'minimum': 'Minimum: OS X  Snow Leopard 10.6....   \n",
       "3  {'minimum': 'Minimum: OS X  Snow Leopard 10.6....   \n",
       "4  {'minimum': 'Minimum: OS X  Snow Leopard 10.6....   \n",
       "\n",
       "                                  linux_requirements  legal_notice  \\\n",
       "0  {'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...           NaN   \n",
       "1  {'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...           NaN   \n",
       "2  {'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...           NaN   \n",
       "3  {'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...           NaN   \n",
       "4  {'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...           NaN   \n",
       "\n",
       "   drm_notice  ext_user_account_notice            developers publishers demos  \\\n",
       "0         NaN                      NaN             ['Valve']  ['Valve']   NaN   \n",
       "1         NaN                      NaN             ['Valve']  ['Valve']   NaN   \n",
       "2         NaN                      NaN             ['Valve']  ['Valve']   NaN   \n",
       "3         NaN                      NaN             ['Valve']  ['Valve']   NaN   \n",
       "4         NaN                      NaN  ['Gearbox Software']  ['Valve']   NaN   \n",
       "\n",
       "                                      price_overview packages  \\\n",
       "0  {'currency': 'GBP', 'initial': 719, 'final': 7...      [7]   \n",
       "1  {'currency': 'GBP', 'initial': 399, 'final': 3...     [29]   \n",
       "2  {'currency': 'GBP', 'initial': 399, 'final': 3...     [30]   \n",
       "3  {'currency': 'GBP', 'initial': 399, 'final': 3...     [31]   \n",
       "4  {'currency': 'GBP', 'initial': 399, 'final': 3...     [32]   \n",
       "\n",
       "                                      package_groups  \\\n",
       "0  [{'name': 'default', 'title': 'Buy Counter-Str...   \n",
       "1  [{'name': 'default', 'title': 'Buy Team Fortre...   \n",
       "2  [{'name': 'default', 'title': 'Buy Day of Defe...   \n",
       "3  [{'name': 'default', 'title': 'Buy Deathmatch ...   \n",
       "4  [{'name': 'default', 'title': 'Buy Half-Life: ...   \n",
       "\n",
       "                                       platforms  \\\n",
       "0  {'windows': True, 'mac': True, 'linux': True}   \n",
       "1  {'windows': True, 'mac': True, 'linux': True}   \n",
       "2  {'windows': True, 'mac': True, 'linux': True}   \n",
       "3  {'windows': True, 'mac': True, 'linux': True}   \n",
       "4  {'windows': True, 'mac': True, 'linux': True}   \n",
       "\n",
       "                                          metacritic  reviews  \\\n",
       "0  {'score': 88, 'url': 'https://www.metacritic.c...      NaN   \n",
       "1                                                NaN      NaN   \n",
       "2  {'score': 79, 'url': 'https://www.metacritic.c...      NaN   \n",
       "3                                                NaN      NaN   \n",
       "4                                                NaN      NaN   \n",
       "\n",
       "                                          categories  \\\n",
       "0  [{'id': 1, 'description': 'Multi-player'}, {'i...   \n",
       "1  [{'id': 1, 'description': 'Multi-player'}, {'i...   \n",
       "2  [{'id': 1, 'description': 'Multi-player'}, {'i...   \n",
       "3  [{'id': 1, 'description': 'Multi-player'}, {'i...   \n",
       "4  [{'id': 2, 'description': 'Single-player'}, {'...   \n",
       "\n",
       "                                   genres  \\\n",
       "0  [{'id': '1', 'description': 'Action'}]   \n",
       "1  [{'id': '1', 'description': 'Action'}]   \n",
       "2  [{'id': '1', 'description': 'Action'}]   \n",
       "3  [{'id': '1', 'description': 'Action'}]   \n",
       "4  [{'id': '1', 'description': 'Action'}]   \n",
       "\n",
       "                                         screenshots movies   recommendations  \\\n",
       "0  [{'id': 0, 'path_thumbnail': 'https://steamcdn...    NaN  {'total': 66232}   \n",
       "1  [{'id': 0, 'path_thumbnail': 'https://steamcdn...    NaN   {'total': 2816}   \n",
       "2  [{'id': 0, 'path_thumbnail': 'https://steamcdn...    NaN   {'total': 2013}   \n",
       "3  [{'id': 0, 'path_thumbnail': 'https://steamcdn...    NaN    {'total': 942}   \n",
       "4  [{'id': 0, 'path_thumbnail': 'https://steamcdn...    NaN   {'total': 4402}   \n",
       "\n",
       "   achievements                                   release_date  \\\n",
       "0  {'total': 0}  {'coming_soon': False, 'date': '1 Nov, 2000'}   \n",
       "1  {'total': 0}  {'coming_soon': False, 'date': '1 Apr, 1999'}   \n",
       "2  {'total': 0}  {'coming_soon': False, 'date': '1 May, 2003'}   \n",
       "3  {'total': 0}  {'coming_soon': False, 'date': '1 Jun, 2001'}   \n",
       "4  {'total': 0}  {'coming_soon': False, 'date': '1 Nov, 1999'}   \n",
       "\n",
       "                                        support_info  \\\n",
       "0  {'url': 'http://steamcommunity.com/app/10', 'e...   \n",
       "1                           {'url': '', 'email': ''}   \n",
       "2                           {'url': '', 'email': ''}   \n",
       "3                           {'url': '', 'email': ''}   \n",
       "4  {'url': 'https://help.steampowered.com', 'emai...   \n",
       "\n",
       "                                          background  \\\n",
       "0  https://steamcdn-a.akamaihd.net/steam/apps/10/...   \n",
       "1  https://steamcdn-a.akamaihd.net/steam/apps/20/...   \n",
       "2  https://steamcdn-a.akamaihd.net/steam/apps/30/...   \n",
       "3  https://steamcdn-a.akamaihd.net/steam/apps/40/...   \n",
       "4  https://steamcdn-a.akamaihd.net/steam/apps/50/...   \n",
       "\n",
       "                                 content_descriptors  \n",
       "0  {'ids': [2, 5], 'notes': 'Includes intense vio...  \n",
       "1  {'ids': [2, 5], 'notes': 'Includes intense vio...  \n",
       "2                         {'ids': [], 'notes': None}  \n",
       "3                         {'ids': [], 'notes': None}  \n",
       "4                         {'ids': [], 'notes': None}  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# inspect downloaded data\n",
    "pd.read_csv('../data/download/steam_app_data.csv').head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download SteamSpy data\n",
    "\n",
    "To retrieve data from SteamSpy we perform a very similar process. Our parse function is a little simpler because of the how data is returned, and the maximum polling rate of this API is higher so we can set a lower value for `pause` in the `process_batches` function and download more quickly. Apart from that we set the new variables and make a call to the `process_batches` function once again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Starting at index 0:\n",
      "\n",
      "Exported lines 0-4 to steamspy_data.csv. Batch 0 time: 0:00:04 (avg: 0:00:04, remaining: 0:00:11)\n",
      "Exported lines 5-9 to steamspy_data.csv. Batch 1 time: 0:00:04 (avg: 0:00:04, remaining: 0:00:07)\n",
      "Exported lines 10-14 to steamspy_data.csv. Batch 2 time: 0:00:04 (avg: 0:00:04, remaining: 0:00:04)\n",
      "Exported lines 15-19 to steamspy_data.csv. Batch 3 time: 0:00:04 (avg: 0:00:04, remaining: 0:00:00)\n",
      "\n",
      "Processing batches complete. 20 apps written\n"
     ]
    }
   ],
   "source": [
    "def parse_steamspy_request(appid, name):\n",
    "    \"\"\"Parser to handle SteamSpy API data.\"\"\"\n",
    "    url = \"https://steamspy.com/api.php\"\n",
    "    parameters = {\"request\": \"appdetails\", \"appid\": appid}\n",
    "    \n",
    "    json_data = get_request(url, parameters)\n",
    "    return json_data\n",
    "\n",
    "\n",
    "# set files and columns\n",
    "download_path = '../data/download'\n",
    "steamspy_data = 'steamspy_data.csv'\n",
    "steamspy_index = 'steamspy_index.txt'\n",
    "\n",
    "steamspy_columns = [\n",
    "    'appid', 'name', 'developer', 'publisher', 'score_rank', 'positive',\n",
    "    'negative', 'userscore', 'owners', 'average_forever', 'average_2weeks',\n",
    "    'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount',\n",
    "    'languages', 'genre', 'ccu', 'tags'\n",
    "]\n",
    "\n",
    "reset_index(download_path, steamspy_index)\n",
    "index = get_index(download_path, steamspy_index)\n",
    "\n",
    "# Wipe data file if index is 0\n",
    "prepare_data_file(download_path, steamspy_data, index, steamspy_columns)\n",
    "\n",
    "process_batches(\n",
    "    parser=parse_steamspy_request,\n",
    "    app_list=app_list,\n",
    "    download_path=download_path, \n",
    "    data_filename=steamspy_data,\n",
    "    index_filename=steamspy_index,\n",
    "    columns=steamspy_columns,\n",
    "    begin=index,\n",
    "    end=20,\n",
    "    batchsize=5,\n",
    "    pause=0.3\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>appid</th>\n",
       "      <th>name</th>\n",
       "      <th>developer</th>\n",
       "      <th>publisher</th>\n",
       "      <th>score_rank</th>\n",
       "      <th>positive</th>\n",
       "      <th>negative</th>\n",
       "      <th>userscore</th>\n",
       "      <th>owners</th>\n",
       "      <th>average_forever</th>\n",
       "      <th>average_2weeks</th>\n",
       "      <th>median_forever</th>\n",
       "      <th>median_2weeks</th>\n",
       "      <th>price</th>\n",
       "      <th>initialprice</th>\n",
       "      <th>discount</th>\n",
       "      <th>languages</th>\n",
       "      <th>genre</th>\n",
       "      <th>ccu</th>\n",
       "      <th>tags</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10</td>\n",
       "      <td>Counter-Strike</td>\n",
       "      <td>Valve</td>\n",
       "      <td>Valve</td>\n",
       "      <td>NaN</td>\n",
       "      <td>125219</td>\n",
       "      <td>3366</td>\n",
       "      <td>0</td>\n",
       "      <td>20,000,000 .. 50,000,000</td>\n",
       "      <td>11760</td>\n",
       "      <td>1</td>\n",
       "      <td>435</td>\n",
       "      <td>1</td>\n",
       "      <td>999</td>\n",
       "      <td>999</td>\n",
       "      <td>0</td>\n",
       "      <td>English, French, German, Italian, Spanish - Sp...</td>\n",
       "      <td>Action</td>\n",
       "      <td>0</td>\n",
       "      <td>{'Action': 5247, 'FPS': 4638, 'Multiplayer': 3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>20</td>\n",
       "      <td>Team Fortress Classic</td>\n",
       "      <td>Valve</td>\n",
       "      <td>Valve</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3337</td>\n",
       "      <td>634</td>\n",
       "      <td>0</td>\n",
       "      <td>2,000,000 .. 5,000,000</td>\n",
       "      <td>19</td>\n",
       "      <td>0</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>499</td>\n",
       "      <td>499</td>\n",
       "      <td>0</td>\n",
       "      <td>English, French, German, Italian, Spanish - Sp...</td>\n",
       "      <td>Action</td>\n",
       "      <td>0</td>\n",
       "      <td>{'Action': 721, 'FPS': 289, 'Multiplayer': 241...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>30</td>\n",
       "      <td>Day of Defeat</td>\n",
       "      <td>Valve</td>\n",
       "      <td>Valve</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3451</td>\n",
       "      <td>405</td>\n",
       "      <td>0</td>\n",
       "      <td>5,000,000 .. 10,000,000</td>\n",
       "      <td>15</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "      <td>499</td>\n",
       "      <td>499</td>\n",
       "      <td>0</td>\n",
       "      <td>English, French, German, Italian, Spanish - Spain</td>\n",
       "      <td>Action</td>\n",
       "      <td>0</td>\n",
       "      <td>{'FPS': 769, 'World War II': 237, 'Multiplayer...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>40</td>\n",
       "      <td>Deathmatch Classic</td>\n",
       "      <td>Valve</td>\n",
       "      <td>Valve</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1288</td>\n",
       "      <td>270</td>\n",
       "      <td>0</td>\n",
       "      <td>5,000,000 .. 10,000,000</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "      <td>12</td>\n",
       "      <td>0</td>\n",
       "      <td>499</td>\n",
       "      <td>499</td>\n",
       "      <td>0</td>\n",
       "      <td>English, French, German, Italian, Spanish - Sp...</td>\n",
       "      <td>Action</td>\n",
       "      <td>0</td>\n",
       "      <td>{'Action': 620, 'FPS': 132, 'Classic': 99, 'Mu...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>50</td>\n",
       "      <td>Half-Life: Opposing Force</td>\n",
       "      <td>Gearbox Software</td>\n",
       "      <td>Valve</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5296</td>\n",
       "      <td>295</td>\n",
       "      <td>0</td>\n",
       "      <td>5,000,000 .. 10,000,000</td>\n",
       "      <td>381</td>\n",
       "      <td>0</td>\n",
       "      <td>389</td>\n",
       "      <td>0</td>\n",
       "      <td>499</td>\n",
       "      <td>499</td>\n",
       "      <td>0</td>\n",
       "      <td>English, French, German, Korean</td>\n",
       "      <td>Action</td>\n",
       "      <td>0</td>\n",
       "      <td>{'FPS': 853, 'Action': 278, 'Classic': 227, 'S...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   appid                       name         developer publisher  score_rank  \\\n",
       "0     10             Counter-Strike             Valve     Valve         NaN   \n",
       "1     20      Team Fortress Classic             Valve     Valve         NaN   \n",
       "2     30              Day of Defeat             Valve     Valve         NaN   \n",
       "3     40         Deathmatch Classic             Valve     Valve         NaN   \n",
       "4     50  Half-Life: Opposing Force  Gearbox Software     Valve         NaN   \n",
       "\n",
       "   positive  negative  userscore                    owners  average_forever  \\\n",
       "0    125219      3366          0  20,000,000 .. 50,000,000            11760   \n",
       "1      3337       634          0    2,000,000 .. 5,000,000               19   \n",
       "2      3451       405          0   5,000,000 .. 10,000,000               15   \n",
       "3      1288       270          0   5,000,000 .. 10,000,000                9   \n",
       "4      5296       295          0   5,000,000 .. 10,000,000              381   \n",
       "\n",
       "   average_2weeks  median_forever  median_2weeks  price  initialprice  \\\n",
       "0               1             435              1    999           999   \n",
       "1               0              25              0    499           499   \n",
       "2               0               9              0    499           499   \n",
       "3               0              12              0    499           499   \n",
       "4               0             389              0    499           499   \n",
       "\n",
       "   discount                                          languages   genre  ccu  \\\n",
       "0         0  English, French, German, Italian, Spanish - Sp...  Action    0   \n",
       "1         0  English, French, German, Italian, Spanish - Sp...  Action    0   \n",
       "2         0  English, French, German, Italian, Spanish - Spain  Action    0   \n",
       "3         0  English, French, German, Italian, Spanish - Sp...  Action    0   \n",
       "4         0                    English, French, German, Korean  Action    0   \n",
       "\n",
       "                                                tags  \n",
       "0  {'Action': 5247, 'FPS': 4638, 'Multiplayer': 3...  \n",
       "1  {'Action': 721, 'FPS': 289, 'Multiplayer': 241...  \n",
       "2  {'FPS': 769, 'World War II': 237, 'Multiplayer...  \n",
       "3  {'Action': 620, 'FPS': 132, 'Classic': 99, 'Mu...  \n",
       "4  {'FPS': 853, 'Action': 278, 'Classic': 227, 'S...  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# inspect downloaded steamspy data\n",
    "pd.read_csv('../data/download/steamspy_data.csv').head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next Steps\n",
    "\n",
    "Here we have defined and demonstrated the download process used to generate the data sets. This was completed separately but the full, raw data can be found [on Kaggle](https://kaggle.com/nikdavis/steam-store-raw).\n",
    "\n",
    "We now have two tables of data with a variety of information about apps on the Steam store. From the Steam data it looks like there are some useful columns like `required_age`, `developers` and `genres` which we can eventually turn into features for analysis, and a `price_overview` column which may inform the success and sales of each game. The `owners` column of the SteamSpy data could be useful, however the [margin of error](https://steamspy.com/about) means data may not be accurate enough for meaningful analysis, we'll have to see what we can manage after cleaning. Instead we may have to use the `positive` and `negative` ratings or average play-time to create our metrics. There is also a `tags` column which appears to crossover with the `categories` and `genres` columns in the Steam data. We may wish to merge these, or keep one over the other.\n",
    "\n",
    "These are all decisions we'll come to in later stages of the project. With the data downloaded, this stage is now complete. In the next step, we'll take care of preparing and cleaning the data, readying a complete data set to use for analysis."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}