hello friends! new(ish)!

Computer science

From InstallGentoo Wiki v2
Revision as of 21:45, 29 January 2014 by >Chchjesus (→‎Where to go from here)
Jump to navigation Jump to search

What is Computer Science? As Harold Abelson from the SICP lectures says, the name Computer Science is in fact misleading.

"Computer science is not about computers, in the same sense that physics is not really about particle accelerators, and biology is not about microscopes and Petri dishes."

Thus, "computer" is not needed in the title. Secondly, a lot of Computer Science is mathematics and theory. There is very little actual scientific method like one would find in Physics, Chemistry, or Biology, so in this sense, it's not really a science as much as it is a form of applied mathematics or engineering. So Computer Science should be replaced with a much more informative name.

So what is Computer Science then? CS consists of algorithms for solving problems in the most mathematically convenient or efficient way. Take for example the problem of finding the fastest route between several locations in a city; this is solved using a graph searching algorithm. There are also data structures designed for storing and indexing huge amounts of data in a database, protocols for making sure that data transmission is reliable, statistical learning methods behind AI, and so on.

Getting started

The most recommended Computer Science books are Structure and Interpretations of Computer Programs (SICP), and what is considered the holy grail of CS literature: The Art of Computer Programming (TAOCP) by Donald Knuth.

You'll find some eBooks at http://books.gentoomen.org/ and you can buy physical ones for cheap from http://abebooks.com

A good place to start would be the SICP lectures on youtube, found here

Algorithms

An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values as output. It is the part that does the magic.

There are a few programming concepts discussed in the Programming concepts page.

For the rest, check Introduction to Algorithms (by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein), and if you have the balls and the time, check The Art of Computer Programming (by Donald D. Knuth).

There is also a good introduction to many concepts explained here.

Sorting Algorithms

A big topic in computer applications is the quick and efficient sorting of data. There are many different sorting techniques, as a programmer one has to balance the algorithm's ease of reading vs the efficiency of said code. Many standard libraries/APIs already have sorting functions already written, however it is still good practice to know some of the theory behind how the different algorithms work.

Bubble Sort

The bubble sort is a very simple to read algorithm that is the standard first sorting method that most programmers learn. The bubble sort gets its name because values are swapped among each other, gradually moving to the top of the list.

//bubble sort example
#include <stdio.h>
#define SIZE 10
int main( void ){
int a[SIZE] = {2,6,9,1,4,15,7,2}; //array to be sorted
int count; //to be used to count passes
int i; //comparisons counter
int hold; //temp variable to hold numbers
//start of bubble sort
for(count=0; count<SIZE-1; count++){//loop to control number of passes
//size is decreased by 1 to prevent array out of bounds error
for(i=0; i<SIZE-1; i++){//comparison loop
if(a[i]>a[i+1]){//if a[i] bigger than a[i+1], swap values
hold = a[i];
a[i] = a[i+1];
a[i+1] = hold;
}//end if
}//end inner for
}//end outer for
//print array
for(i=0; i<SIZE; i++)
printf("%4d", a[i]);
return 0;
}

Selection Sort

A bit further up on the food chain is the selection sort. The selection sort goes through each element position and finds the value which should occupy the next position in the sorted array. When it finds the appropriate element, the algorithm exchanges it with the valuewhich previously occupied the desired position. On the second pass it looks for the second smallest element and swaps it with the second position and so on.

//Selection sort example
// finds smallest element between left hand and right hand
// moves element to correct position
  1. include <stdio.h>
void selectionSort(int array[], int n){
//n is length
int lh, rh, i, temp;
for (lh=0; lh < n; lh++){//everything left of lh is considered sorted
rh = lh; //reset right hand
//runs through lh-rh bounds of array
for (i = lh + 1; i<n; i++){
//if lowest element found, copy index to right hand
if (array[i] < array[rh]) rh = i;
}
//move lowest element to correct position
temp = array[lh];
array[lh] = array[rh];
array[rh] = temp;
}
}
int main( void ){
int len = 8;
int a[8] = {2,6,9,1,4,15,7,2}; //array to be sorted
int i;
selectionSort(a, len);
//print array
for(i=0; i<len; i++)
printf("%4d", a[i]);
return 0;
}

Merge sort

   // This is an iterative version of merge sort.
   // It merges pairs of two consecutive lists one after another.
   // After scanning the whole array to do the above job,
   // it goes to the next stage. Variable k controls the size
   // of lists to be merged. k doubles each time the main loop
   // is executed.
   #include <stdio.h>
   #include <math.h>
   int i,n,t,k;
   int a[100000],b[100000];
   int merge(l,r,u)
     int l,r,u;
   { int i,j,k;
     i=l; j=r; k=l;
     while (i<r && j<u) { 
       if (a[i]<=a[j]) {b[k]=a[i]; i++;} 
       else {b[k]=a[j]; j++;}
       k++;
     }
     while (i<r) { 
       b[k]=a[i]; i++; k++;
     }
     while (j<u) { 
       b[k]=a[j]; j++; k++;
     }
     for (k=l; k<u; k++) { 
       a[k]=b[k]; 
     }
   }
   sort()
   { int k,u;
     k=1;
     while (k<n) {
       i=1;
       while (i+k<=n) {
         u=i+k*2;
         if (u>n) u=n+1;
         merge(i,i+k,u);
         i=i+k*2;
       }
       k=k*2;
     }
   }
   main()
   { printf("input size \n");
     scanf("%d",&n); 
   /*  for (i=1;i<=n;i++) scanf("%d",&a[i]); */
     for (i=1;i<=n;i++) a[i]=random()%1000;
     t=clock();
     sort();
     for (i=1;i<=10;i++) printf("%d ",a[i]);
     printf("\n");
     printf("time= %d millisec\n",(clock()-t)/1000);
   }

Performance

Search Algorithms

Think of an ordered phone book. Consider two ways to search it for a name: linear search and binary search.

Linear search consists of starting at page 1, at the first name, and checking every name, one at a time, in order, until you find the name. Considering a phone book of one million names, this is very useful if you're searching for the number of Aaron A. Aardvark, but could be tedious if looking for Matthew Maloney, or downright awful if looking for a person named Z. Zerthis.

Binary search consists of opening the book up to half way, and comparing the name you are searching for with the items on either side. If the name you are looking for is earlier on, then you'll want the left half. If it is later on, you'll want the right half. What you have done here, is effectively cut the amount of names to search in half. You then go to the middle of whichever half you chose, and repeat the process until you've found what you're looking for. This is what we humans naturally do in a general sense.

To really get a hold on how much better Binary search is compared to linear search, consider a name that is roughly 3 quarters to the end of the phone book. With 1 million names, a linear search would take you roughly 750,000 comparisons before you found that name. With binary search, it would take roughly 3 or 4 comparisons.

There are also two string searching/pattern matching algorithms known as the Knuth-Morris-Pratt (KMP) and the Boyer-Moore (BM) algorithms.

Computers use graph searching algorithms called the Floyd-Warshall algorithm, and Dijkstra's (pronounced Daik-stra) algorithm, to find the optimal path between two or more nodes connected by pathes on a graph.

Recursion

Recursion is the concept of something entering inside itself, or other a new instance of itself. Functional programming languages such as Lisp dialects are known to employ this often.

Recursion can be tough to get your head around at first, because for a lot of people it is a new concept. However, once understood, it's quite simple. The main things to remember are the base case, the terminating case, and the concept of an accumulator.

Base case: This is the starting step of recursion.

Termination case: This tells the recursion when to stop. Without this, it would simply recur forever, or at least until the machine runs out of memory.

Accumulator: This is how you can pass data through a function recursively. It's called an accumulator because it accumulates data, or collects it as it goes. Here's an example of a program in Python that recursively adds numbers from 0 to 10 into a list, and then returns it.

The base case will be starting at 0. The terminating case is at 10. We'll use a list as an accumulator.

Lets define our function:

   def one_to_ten(number, accumulator):
       if number > 10:
           return accumulator
       else:
           accumulator.append(number)
           return one_to_ten(number + 1, accumulator)
   
   empty_list = []
   base_case = 0
   
   result = one_to_ten(base_case, empty_list)
   
   print(result) # result is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Going through it step by step:

 1. Our function takes a number and the accumulator
 2. Check that the termination case hasn't been met
 3. If it has, don't call the function anymore. Just return the accumulator.
 4. Otherwise, put the number in the accumulator
 5. Call the function from within itself, but with number + 1, so the next recursion of it
    will have the next number. Note that we're also passing in the newly updated accumulator.

So, the function enters a second instance of itself, but with number plus one, and it does this 10 times, until number is 10 (which will be 10 levels deep). However, when number becomes 11, it will not recur anymore, and it will return the accumulator back to the 9th function, which will also finish and return to the 8th function, which returns back to the 7th function, and so on until the first call of the function has returned. You could say it naturally unrolls itself.

Linked Lists

Linked List Example

Binary Search Trees

A binary search tree is a way to very quickly search sorted data. It is simply constructing a tree by following a binary search.

   /* This program repeatedly inserts n random numbers into
      a binary search tree. Then it traverse the tree
      in in-order to store the keys into array a.
      After the traversal, sorted numbers are in array a. */
   #include <stdio.h>
   #include <math.h>
   #include <stdlib.h>
   #define nil 0
   int k=1;
   struct item *proot;
   struct item{
     int key;
     struct item *adr;
     struct item *left;
     struct item *right;
   };
   struct item a[1000000];
   void insert(int x){
     struct item *p, *q;
     p=proot;
     do{
       q=p;
       if (x<=(*p).key)p=(*p).left; else p=(*p).right;
     }
     while (p!=nil);
     p=malloc(sizeof(struct item)); 
     (*p).key=x; (*p).left=nil; (*p).right=nil;
     (*p).adr=p;
     if (x<=(*q).key) (*q).left=p; else (*q).right=p;
   }
   traverse(struct item *proot)
   { struct item *p;
     p=proot;  
     if (p!=nil){
       printf("("); 
       traverse((*p).left);
       printf("%d", (*p).key);
       a[k]=(*p); k++;
       traverse((*p).right);
       printf(")"); 
     }
   }
   main()
   { 
     int i,n,t;
     scanf("%d",&n);
     proot=malloc(sizeof(struct item));
     (*proot).key=99999;
     (*proot).left=nil;
     (*proot).right=nil;
     for(i=1;i<=n;i++) a[i].key=random()%100000;
   //  for (i=1;i<=n;i++)printf("%d ",a[i].key);
     printf("\n");
     t=clock();
     for (i=1;i<=n;i++)insert(a[i].key);
     traverse(proot->left);
     printf("\n");
     printf("time= %d millisec\n\n", (clock()-t)/1000);
   /*
     for (i=1;i<=n;i++)
       printf("%3d %3d %3d %3d\n\n",(int)(a[i].adr) % 1000, a[i].key, 
                                (int)(a[i].left) % 1000, (int)(a[i].right) % 1000);
   */
   }

Hash Tables