Cloud composer orchestration via cloud build

Cloud composer orchestration via cloud build

Google cloud composer is a managed apache airflow service that helps create, schedule, monitor and manage workflows.Cloud Composer automation helps you create Airflow environments quickly and use Airflow-native tools, such as the powerful Airflow web interface and command line tools, so you can focus on your workflows and not your infrastructure.

In this article I will descibe how an engineering team can manage, develop and publish DAGS after running a full CI/CD build pipeline using google cloud build .

Read more →

Simple Forecasting

Timeseries financial forecasting

Recently , I have been looking into various ways to forecast a time series dataset. This is an old pursuit in the field of statistics and there are many well known ways to achieve this.

In this post I will demonstrate a very basic (Naive) approach of forecasting a quarterly dataset of sales figure, by using previous 4 years (16 quarters) and forecasting/predicting the next 1 year of sales aggregate.

Read more →

download file using webdriver firefox and python selenium

Selenium is one of my favourite tool for automation.

In this post, I will demonstrate some basic code to download a file from a website in a headless mode , and also provide a docker file to make things simpler.

Python Code

Here is some basic code which will make an attempt to download a **7zip exe. **

from pyvirtualdisplay import Display
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected\_conditions as EC
from selenium.webdriver.common.by import By
import os
import time

print("******************************** STARTING ********************************")

display = Display(visible=0, size=(1024, 768))
display.start()

# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set\_preference('browser.download.folderList', 2) # custom location
profile.set\_preference('browser.download.manager.showWhenStarting', False)
profile.set\_preference('browser.download.dir', '/srv/download')
profile.set\_preference("browser.download.manager.alertOnEXEOpen", False)
profile.set\_preference("browser.download.manager.closeWhenDone", False)
profile.set\_preference("browser.download.manager.focusWhenStarting", False)
#application/octet-stream,application/vnd.ms-excel 
profile.set\_preference('browser.helperApps.neverAsk.saveToDisk', 'application/x-msdownload,application/octet-stream')
try:
    browser = webdriver.Firefox(profile)
    browser.get('https://www.7-zip.org/')
    download\_button = WebDriverWait(browser, 20).until(EC.element\_to\_be\_clickable((By.CSS\_SELECTOR, 'td.Item a')))
    download\_button.click()
    print("clicked...")
    time.sleep(10    print (os.listdir("/srv/download"))
except Exception as ex:
    print(ex)
 
browser.close()
display.stop()


print("******************************** FINITO ********************************")

The code is fairly simple , we need

Read more →

web crawling or scraping using scrapy in python

Scrapy is a very popular web scraping/crawling framework, I have been using it for quite some time now.

In this post, I will demonstrate creating a very basic web crawler.

Install Scrapy

Installation is via pip pip install scrapy

Minimalistic Code

A very simple scraper is created like this

To Run , simply type scrapy runspider scraper.py

Running, above code will output something like below

Read more →

how to make https requests with python httplib2 ssl

Here are few snippets to make secure http requests using various python libraries.

httplib2

import httplib2

link = "https://example.com
h = httplib2.Http(".cache")
r, content = h.request(link, "GET")

another exmaple


import httplib2

h = httplib2.Http(".cache")
h.add_credentials('user', 'pass')
r, content = h.request("https://api.github.com", "GET")

print r['status']
print r['content-type']

Urllib2

Here is a simmilar example using urlib2 for comparison and lines of code.


import urllib2

gh_url = 'https://example.com'

auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(None, gh_url, 'user', 'password')

opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
handler = urllib2.urlopen(gh_url)

print handler.getcode()
print handler.headers.getheader('content-type')

Requests

The easiest, has always been requests.

Read more →

minimum insertions to form a palindrome

Brute-force approach

Here I present a few approaches to deduce “minimum insertions” required to convert a string into a palindrome.

The basic brute force approach is quite simple, given a string with length L, start comparing, the first character from left and the last character while scanning inwards.

Here is a basic test for a palindrome.

   L = len(s) 
   for i in range(L):
       if s[i] != s[L - i - 1]:
           return False,i,L-i -1
   return True,0,0

The above code returns True if the string is a palindrome or returns False with mismatching indices.

Read more →

merge sort in python

Many useful algorithms are recursive in structure: to solve a given problem, they call themselves recursively one or more times to deal with closely related subproblems.
These algorithms typically follow a divide-and-conquer approach: they break the problem into several subproblems that are similar to the original problem but smaller in size, solve the subproblems recursively, and then combine these solutions to create a solution to the original problem.

The divide-and-conquer paradigm involves three steps at each level of the recursion:

Read more →

Using Geotiff as datasource via gdal

Recently, I have been working on algorithms which needs elevation data as well as Land Cover data, with world coverage. Google has an excellent elevation API however free usage comes with a limit.

While searching, I came across a dataset in geotiff format for landcover as well as a processed version of world elevation. Elevation data comes in various resolution (250m,500m,1km), landcover is 500m .

So how do we read it ? In python its quite easy to use osgo/gdal library and read all bands. Geotiff is a rster file and values can be packed in every band (which is basically a 2d array).

Read more →

How to create fishnets or geospatial grids

There are many use cases in GIS world, where the information has to be aggregated, an easy way to achieve this is via gridding or binning, where the area of interest is divided into small sections called grids or bins.

These sections are mostly of rectangular form (which can be easily converted into geotiffs), but in some cases even circles or hexagons are also used.

You can read a good tutorial from mapbox using Qgis with a mmqgis plugin here.

Read more →

How to transform projections between Spherical Mercator and EPSG 4326

Projections in GIS are commonly referred to by their “EPSG” codes, these are identifiers managed by the European Petroleum Survey Group.

One common identifier is “EPSG:4326”, which describes maps where latitude and longitude are treated as X/Y values.

Spherical Mercator has an official designation of EPSG:3857. However, before this was established, a large amount of software used the identifier EPSG:900913. This is an unofficial code, but is still the commonly usedin many GIS systems.

Read more →

How to Query a Shape file for Point inside a polygon using ogr python

Recently I was trying to build a quick geo lookup service in python, which could be used like an “info tool” in QGIS. This task is trivial in almost all geospatial databases, however I wasn’t able to find much online around querying a shape file.

In this post I will demonstrate a simple python code to query a shape file which contains world countries. The file can be downloaded from here.

Read more →

Binary Search Tree in python

BST data structure supports many dynamic-set operations including

  1. Search
  2. Minimum
  3. Maximum
  4. Predecessor
  5. Successor
  6. Insert
  7. Delete

These basic operations allow us to treat this data structure both as a dictionary and as a priority queue.

Basic operations on a binary tree takes time proportional to the height of the tree, O(lg n) [worst case] and even O(n) if the tree is a linear chain.

If you want to learn more about practical application of these trees check this post out.

Read more →

Heap Sort in python

The (binary) heap data structure is an array object that we can view as a nearly complete binary tree.
Each node of the tree corresponds to an element of the array.
The tree is completely filled on all levels except possibly the lowest, which is filled from the left up to a point.
An array A that represents a heap is an object with two attributes:

  • length, which (as usual) gives the number of elements in the array.

Read more →

Insertion sort in python

Insertion sort, is an efficient algorithm for sorting a small number of elements.

Insertion sort works the way many people sort a hand of playing cards. We start with an empty left hand and the cards face down on the table. We then remove one card at a time from the table and insert it into the correct position in the left hand.

To find the correct position for a card, we compare it with each of the cards already in the hand, from right to left.

Read more →

Read GAE Admin Backups fromLevelDB format and export GAE Entities using bulkloader

Google datastore is pretty awesome when one needs a quick no-sql data storage. However recently I have experienced a problem in exporting my GAE Datastore as csv and in certain cases as a line delimited Json file. Its not very hard to do so and perhaps the easiest way to handle such thing is to write an export handler in you web app, however, there are alternative ways which I have highlighted below. 2. One of the easiest alternative is by using Datastore Admin . This tool will easily let you backup you GAE DataStore Entities to google cloud storage in the same project which can later be downloaded locally, by using cloud console or gsutil like this gsutil cp -R /gs/your_bucket_name/your_path /local_target 4. Then there is Remote API for python and Java, which perhaps was created to modify the datastore directly via code from your local machine 6. Finally there is a python utility called bulkloader.py which is coupled with remote_apii , this utility does require python sdk to be installed and added to your system path.

Read more →

print two-dimensional array in spiral order

So I saw this problem in a book today about printing a 2d matrix in spiral order

Here are two solutions to it

Solution one.

def printSpiralTL(m,x1,y1,x2,y2):
    for i in range(x1,x2):
        print m[y1][i]
    for j in range(y1+1,y2+1):
        print m[j][x2-1]

    if x2-x1 > 0:
        printSpiralBL(m, x1, y1 + 1, x2-1, y2)
    

def printSpiralBL(m,x1,y1,x2,y2):
    for i in range(x2-1,x1-1,-1):
        print m[y2][i]
    for j in range(y2-1,y1-1,-1):
        print m[j][x1]
    if x2-x1 > 0:
        printSpiralTL(m, x1+1, y1, x2, y2-1)
    
m = [
    
    [1, 2, 3, 4], 
    [5, 6, 7, 8],
    [9, 0, 1, 2],   
    [3, 4, 5, 6], 
    [7, 8, 9, 1]
        
    ]

Output:

Read more →

Normalizing Ranges of Numbers

Range Normalization is a normalization technique that allows you to map a number to a specific range.

Lets say that we have a data set where the values are in a range of 1 to 10, however we wish to normalise it to a range between 0 and 5

Mathematically speaking the equation comes down to

eq1

eq2

translated to Python

class Normaliser:

   def \_\_init\_\_(self,dH,dL,nH,nL):
       self.dH = dH
       self.dL = dL
       self.nH = nH
       self.nL = nL

   def normalize(self,x):
       return ((x - self.dL) / (self.dH - self.dL))  * (self.nH - self.nL) + self.nL

   def denormalize(self,x):
       return ((self.dL - self.dH) * x - self.nH * self.dL + self.dH * self.nL) / (self.nL - self.nH)

if \_\_name\_\_ == "\_\_main\_\_":
   norm = Normaliser(10,1,5,0);

   for a in range(1,11):
       x = norm.normalize(a);
       y = norm.denormalize(x);
       print str(a) + " : " + str(x) + " : " + str(y)

The results

Read more →

How to configure Apache mod_wsgi

I am a big fan and user of python. one of the most popular ways to create quick web app in python is via using mod wsgi.

The aim of mod_wsgi is to implement a simple to use Apache module which can host any Python application which supports the Python WSGI interface.

The module would be suitable for use in hosting high performance production web sites, as well as your average self managed personal sites running on web hosting services.

Read more →

Serve the contents of any directory with Python’s SimpleHTTPServer

Generally, when I am in a middle of prototyping a concept or in a need of quickly executing Ajax requests or using browser features which would need the page to be hosted on a web server, I use Python’s SimpleHTTPServer module.

Python’s SimpleHTTPServer is a great way of serve the contents of the current directory,all one needs to do is change directory and execute a command which will expose all contents as if they were hosted in a web page.

Read more →

Basic authentication in web.py via attribute

Here I demonstrate the process of Basic Authentication in web.py python web framework.

There is a proof of concept article provided in the main site,however I just thought doing the same via an attribute might be a cleaner solution.

HTTP Basic authentication implementation is one of the easiest ways to secure web pages because it doesn’t require cookies, session handling, or the development of login pages. Rather, HTTP Basic authentication uses static headers which means that no handshakes have to be done in anticipation,however the n the credentials are passed as plain-text and could be intercepted.

Read more →

How To Install and use Python Web.py framework on Windows

Web.py has been one of my favorite web frameworks as its pretty easy to get cracking on it.

It’s super quick to install and one can come up with a prototype and rapid web services in matter of minutes.

Install on windows

If you haven’t configured easy_install on windows, then read this article.

Once easy_install has been configured believe it or not, all you have to do is open a command prompt and type

Read more →

How to setup easy_install on Windows

If one has been using python, then installing various libraries and modules is basically a breeze using easy_install utility, however for folks using windows, easy_install utility has to be setup properly before using it.

First lets make sure that python is properly installed and PYTHON_HOME environment variable is configured:

Install Python on Windows

If not already installed download python installer from here.

After it’s done downloading, double click to run the installer, and select default options (unless you have other custom needs of course ).

Read more →

Notepad++ with Python

After reading an excellent article by Kazi Manzur Rashidon setting up a development environment for Iron Ruby using Notepad++, I was immediately struck with an idea of using the same excellent tool with Python 2.6.

Now don’t get me wrong here, theoretically there is nothing wrong with IDLE, but having a light weight IDE for those who don’t want to use Pydevplugin for Aptanaor Eclipse, I think Notepad++ is indeed a nice little dev tool.

Read more →