hello friends! new(ish)!
Databases
Databases are where serious applications and servers hold all their data. A program that allows databases to be created and modified is called a Database Management System (DBMS).
The language to communicate with a DBMS is SQL: Structured Query Language.Most DBMSes accept all SQL commands. Some have quirks for SQL commands. Some have non standard SQL commands. But for basic stuff, any DBMS will do.
There are a number of floss DBMSes:
- PostgreSQL
- MariaDB
You may have also heard of some proprietary ones:
- Oracle
- Microsoft SQL "sequel" Server
Inside a database you will find a number of tables (think spreadsheets, not furniture). Each table has some fields/columns that store different types of data, and a number of entries/rows which hold that data. Tables will also have one or more Primary Keys, which are used to find a unique row. Often the primary key will just be a number, like a customer ID.
To make the utterly boring topic of databases interesting, let's mess about with the Ashley Madison database. Ashley Madison is/was a dating site for married people. It was owned in 2015 and there is a torrent available.
Installing a DBMS
MariaDB is cross platform, so let's have a play. First step is to grab the binaries:
- On Linux, install the mysql package.
- On Windows, download the standalone zipfile from the MariaDB downloads page. Extract it somewhere when it finished downloading. For reference, this guide is using v 10.0.21.
With the binaries downloaded and extracted, you need to do some initial setup:
- On Linux, following the ArchWiki MariaDB page we must:
- # mysql_install_db --user=mysql --basedir=/usr --datadir=/var/lib/mysql
- your database files will be in /var/lib/mysql
- # mysql_install_db --user=mysql --basedir=/usr --datadir=/var/lib/mysql
- On Windows:
- Open a command prompt by hitting Windows Key + R and typing cmd.exe and hitting enter.
- Navigate to your extracted mariadb/bin directory, e.g.
- cd ..
- cd downloads/mariadb/bin
- mysql_install_db.exe -d C:/am
- your database files will be in C:/am
Now it's time to run the mysqld daemon. This will run the DBMS as a service on your machine.
- Linux: # systemctl start mysqld
- Windows: run mysqld.exe from the mariadb/bin directory.
- A black terminal window will pop up briefly and fade away. You can check that mysqld.exe is running in Task Manager.
- If your get a firewall prompt asking for network access, feel free to deny it as we don't need our database accessable on the network.
You can also set some security parameters in Linux:
- # mysql_secure_installation
- disable everything and set a root password for the database.
At this point the DBMS is installed, configured, and running.
- Now it's time to create a database.
- Windows: in your command prompt, run mysql.exe -u root
- Linux: mysql -u root -p
- At the mysql prompt, enter:
- create database am;
- this will make a new database called am. the semicolon denotes the end of a command.
- create database am;
- Quit by typing exit.
Now let's import the Ashley Madison data to our new "am" database.
- Importing the data will take many hours. It's at least an overnight job. It's not a small database. If the Ashley Madison webserver caught on fire and you were the database administrator, this is the exact same shit you'd do.
- Extract a database dump from the AM torrent. For this example we'll use the email table dump.
- Windows: Extract aminno_member_email.dump.gz with 7zip. Then move the .dump file to the mariadb/bin directory (or setup your PATH variable to know about mariadb).
- Linux: $ 7z x aminno_member_email.dump.gz
With the .dump extracted, import it into the databasse.
- Windows: in your command prompt, enter:
- mysql.exe am -v -u root < aminno_member_email.dump
- Linux:
- mysql -v -u root -p am < aminno_member_email.dump
- For both of these commands, -v denotes verbose mode, which will output updates on what the database is doing. Without this, you will have no output and start to wonder if anything is happening.
- Database imports are bottlenecked by your HDD. If you have an SSD, do everything on the SSD. You will not notice more than 5% or so of your CPU or RAM being used.
Basic SQL Syntax
With one or more tables imported into the database we can have a look at what we have.
Grab yourself a MariaDB prompt by running:
- Windows: mysql.exe am -u root
- Linux: mysql am -u root -p
Tables and Table Properties
The commands to show tables and their properties differ from DBMS to DBMS. Here we'll be using MariaDB's cxommands.
show tables;
This will list all the tables in the database. If you've imported the aminno_member_email.dump file, you should see the aminno_member_email table listed;
describe aminno_member_email;
This will list the fields (columns) in the table.
- Field is the field's name.
- Type is the datatype (varchar is text, int is a number, tinyint is used for true/false).
- Null defines whether or not a field can be empty or not.
- Key lists whether a field is used as a key. PRI means Primary Key, and this field must be unique (in our case it's the "pnum" field).
- Default lists any default value that a field may have.
Basic Searching
Now that we know what tables we have and what they look like, we can view some of the entries.
select * from aminno_member_email limit 5;
This command will list the first 5 entries in the table. Let's break it down:
- Select tells the DBMS that we're going to search for entries.
- The asterix says we want all of the fields to be displayed in our search results.
- from aminno_member_email tells the DBMS what table to search.
- limit 5 tells the DBMS to stop searching once 5 results have been found (this is just here so that we didn't output the millions and millions of entries.
select pnum, email from aminno_member_email limit 5;
This query is the same as the above one, except we're only listing the pnum and email fields in the results.
select pnum, email from aminno_member_email where email="darren@housecapades.com";
This time we're searching for a specific email address. This query will return any entries that have this exact email address. Everything in inverted commas is case sensitive. We've also dropped the limit on search results, since we're not expecting to see too many results.
select pnum, email from aminno_member_email where email like "%@housecapades.com";
This time we're searching for email addresses which match a pattern. This query will take a little longer to do it's work.
Searching Across Multiple Tables
If you've imported multiple tables into the database, you can search across multiple tables to find more useful data. This is known as "joining" the tables.
You need to tell the DBMS how to match up data from one table to another. This is where Keys come in.