Saturday, February 8, 2014

Post 18: Working with Real-Life Data using NumPy - Data Extraction and Querying

In this example, we would take a weather data set from National Oceanic and Atmospheric Administration (NOAA) and analyze the data using NumPy constructs.


To get the data from the set from the figshare, you can use the code as below:

import numpy as np
import zipfile
import os

import urllib2

dirname='weather'
if os.path.exists(dirname): 
                print 'weather directory exists, skipping download/unzip'

else: 
                os.mkdir(dirname) 
                
                url = 'http://files.figshare.com/1378975/weather.zip'
                response = urllib2.urlopen(url)
                fname = 'weather.zip'
                with open(fname, 'wb') as f:
                                f.write(response.read())
                zfile = zipfile.ZipFile('weather.zip')

                for name in zfile.namelist()[1:]:
                                print name
                                with open(name, 'w') as f:
                                                f.write(zfile.read(name))
 

Now we can open one or any of the text files in the zip that we have downloaded. Each text file contains the data for one of the cities, as indicated in the file name. We can open the CA - Los Angeles file using the command:

>> with open('weather/CALOSANG.txt', 'r') as f:
       print '\n'.join(f.readlines()[:3])


To load that data in numpy using the following command:

>> w_data = np.loadtxt('weather/CALOSANG.txt')
>> w_data[0]





As shown  above, the above command would give the default datatype. We can convert it to user defined data type using the command as below:
>> dt=np.dtype([('Month', 'int8'), ('Day', 'int8'), ('Year', 'int16'), ('Temp', 'float64')])
>> w_data = np.loadtxt('weather/CALOSANG.txt', dtype=dt)
>> w_data[:5]


To see the first 2 rows:
>> w_data[0:2]

To slice in the middle of the array:
>> w_data[30:35]


To check out every 7 days, starting from the first:
>> w_data[0::7]




To see the last 5 entries in the data set:
>> w_data[-5:]

To reverse the entire array:
>> w_data[::-1]

To look into the column entries (Entering the custom data type helps in this), we can use the custom data type name. So if we want to see the Temp data:
>> w_data['Temp']

If we want to mix the column and the row entries and select first 5 temperatures, we can write the command as:
>> w_data['Temp'][:5]




In the next post, we would take this data and do some plotting and modifying this data to carry on with our analysis.

No comments:

Post a Comment