I’m writing a little module in python that contains lots of sorting algorithms (which can be found on my GitHub: here), and one i found particularly fun was the Binary Tree Sort.

This implements a “Node” Class to form the tree. Each Node has data and a left and right value. Left and Right should be other nodes or None. It has a “flatten” method that recursively flattens the tree into a sorted list. It makes use of a method, “_concatList”, that I had originally written for implementing the Quick Sort, but it ended up working quite well here too.

The sort itself is very simple. It loops through the list and inserts each element to a tree, and finally flattens it.

# Binary Tree Node
class Node:
	left, right, data = None, None, 0
	
	def __init__(self, data):
		self.left = None
		self.right = None
		self.data = data
		
	# add data to the tree
	def insert(self, data):
		# return false if the data is a duplicate
		if data == self.data:
			return false
			
		# if data is less than the current node go left
		elif data < self.data:
			# if no data here, make a new node
			if self.left == None:
				self.left = Node(data)
				
			# else try the same on its left child
			else:
				self.left.insert(data)
				
		# if its greater, go right
		else:
			# if no data here, make a new node
			if self.right == None:
				self.right = Node(data)
				
			# else try the same on its right child
			else:
				self.right.insert(data) 
				
	# flatten a node into a list
	def flatten(self):
		# if no children, return this nodes data as a list
		if self.left == None and self.right == None:
			return [self.data]
		# create empty list
		li = []
		
		# append the first list
		if self.left is not None:
			li.extend(self.left.flatten())
			
		# append data
		li.append(self.data)
		
		# append the second list
		if self.right is not None:
			li.extend(self.right.flatten())
			
		return li
		
# Binary Tree Sort
def BinaryTreeSort(li):
	# create the root node with the first element
	root = Node(li[0])
	
	# add each element to the tree
	for i in xrange(1, len(li)):
		root.insert(li[i])
		
	#flatten the tree
	return root.flatten()
		
# Concatenate two sorted lists
# used for Binary Search Tree and Quick Sort
def _concatList(l1, a, l2):
	# create a new list
	li = []
	if l1 is not None:
		# append each of the elements from l1
		for l in l1:
			li.append(l)
	else:
		pass
		
	if a is not None:
		# append the middle if there is one
		li.append(a)
	else:
		pass
		
		
	if l2 is not None:
		# append each of the elements from l2
		for l in l2:
			li.append(l)
	else:
		pass
		
	return li

Here’s my test:

>>> Disorder.BinaryTreeSort([5,2,3,8,1,9,15,4,18])
[1, 2, 3, 4, 5, 8, 9, 15, 18]

 

A while ago I made a post about a Web Crawler in less that 50 lines of python. Well, I took that and made it a little (and really, I mean just a little..) more sophisticated, and stuck it on GitHub. Anyone who wants to can contribute, or use the code however they want.

PyCrawler On GitHub

 

I got kind of bored today, and wrote a pretty simple web crawler with python and it turned out to be less than 50 lines. It doesn’t store output, I’ll leave that up to anyone who wants to use the code, because, well, theres just too many ways to choose from. Right now you pass it a starting link as a parameter and it will crawl forever untill it runs out of links. But that is not a likely condition. So here ya go. Have fun. Feel free to ask questions

import sys
import re
import urllib2
import urlparse
tocrawl = set([sys.argv[1]])
crawled = set([])
keywordregex = re.compile('<meta\sname=["\']keywords["\']\scontent=["\'](.*?)["\']\s/>')
linkregex = re.compile('<a\s(?:.*?\s)*?href=[\'"](.*?)[\'"].*?>')

while 1:
	try:
		crawling = tocrawl.pop()
		print crawling
	except KeyError:
		raise StopIteration
	url = urlparse.urlparse(crawling)
	try:
		response = urllib2.urlopen(crawling)
	except:
		continue
	msg = response.read()
	startPos = msg.find('<title>')
	if startPos != -1:
		endPos = msg.find('</title>', startPos+7)
		if endPos != -1:
			title = msg[startPos+7:endPos]
			print title
	keywordlist = keywordregex.findall(msg)
	if len(keywordlist) &gt; 0:
		keywordlist = keywordlist[0]
		keywordlist = keywordlist.split(", ")
		print keywordlist
	links = linkregex.findall(msg)
	crawled.add(crawling)
	for link in (links.pop(0) for _ in xrange(len(links))):
		if link.startswith('/'):
			link = 'http://' + url[1] + link
		elif link.startswith('#'):
			link = 'http://' + url[1] + url[2] + link
		elif not link.startswith('http'):
			link = 'http://' + url[1] + '/' + link
		if link not in crawled:
			tocrawl.add(link)

** EDIT **

This was a very early draft of this program. As it turns out, I revisited this project a few months later and it evolved much more.
If you would like to check out the more evolved form, feel free to have a look here at my github!

© 2012 Code Brain Suffusion theme by Sayontan Sinha