Friday, March 28, 2014

Week 11 - Sorting and Efficiency

Time has been going so fast!!! It is already just a week left until the courses end. I cannot say whether if that is a good thing or not; I want to go home so bad yet I do not want to study for my finals. :(
However, it is sorting and efficiency that I will write about for this week’s entry. Starting week 9, we have been started to looking into sorting. We started with the binary search trees: 

                     

           "A binary search tree (BST), sometimes also called an ordered
  or sorted binary tree, is a node-based binary tree data structure 
  which has the following properties:
      -    The left subtree of a node contains only nodes with keys 
           less than the node's key.
      -    The right subtree of a node contains only nodes with keys 
           greater thanthe node's key.
      -    The left and right subtree each must also be a binary search tree.
      -    There must be no duplicate nodes. "
                             -- (http://en.wikipedia.org/wiki/Binary_search_tree)
I think this is a good, reasonable start for learning about sorting with recursion. In CSC165, I have always had this question of when would us encountering a function with a worse runtime of O(log n) (the log base is usually 2). Now, I know, the binary search tree is a really standard example. And, yes, it would save a lot of effort for sorting; instead of go through every element leaves and parents of a tree or list etc. I could just go down with only one branch. This way is much more efficient. For example, I have 100 items in total in a tree, if I just go through each of them, it would cost me O(n) runtime (which means 100 steps), but if have the tree organized as a binary search tree, I could finish my search in log2 100 steps (about 6.643856steps). The numbers definitely showed the improved efficiency. 

                          
 (This is a graph I found online that illustrated the slopes of different runtimes.)
After the binary search tree, we start looking at just sorting function, which are recursive. We have seen and had some experience with just sorting without trees (which was not recursive) in CSC108. I could still remember that we have covered bubble sort, insertion sort and selection sort. In week 10 and week 11, we have covered three new sorting methods: quick sort, merge sort and briefly talked about: count sort, Tim sort (python’s own sorting method, a hybrid sorting algorithm derived from merge sort and insertion sort). Yet, we focus more on the first two:
Quick sort: choose a pivot; decide where the pivot goes with
respect to the rest of the list, repeat on the partitions.
(average runtime: O(n log n); worst runtime: O(n2).)
Merge sort: divide the list in half, (merge) sort the halves, then
merge the sorted results
(average runtime: O(n log n); worst runtime: O(n log n).)
(Above definitions and data are from lecture notes and Wikipedia.)

Different sorting methods must have their own merits for dealing with different inputs and situations; I do not think there is a perfect sorting function that would perform with the shortest runtime for every input. In Lab #10, the handout wanted us to compare and graph the runtime results for various sorting methods and with different input (sorted, random, sorted but reversed), and the graphs look pretty colorful and interesting:

           
(since the Merge 1&2, Quicksort 1&2, list.sort results are really close, I also put them together on separate graphs to screenshot close-ups)
The x axis is the numbers of items for the inputs and the y axis is the numbers of seconds that would take the function to run and complete. One can see, the more flat (linear or constant) the slope, the more efficient the sorting method. Maybe the inputs we have right now are short and easy compare some real data that we are going to deal with in the future. To sort a huge, massive, tons of data, the more efficient the sorting method, the more time one saved and the less chance that the computer would make mistakes and being too busy.
I had fun and it was really interesting that we have came back to talk about sorting and efficiency again after CSC108. Recursion does not take more time, but save more time. :D

No comments:

Post a Comment