Database
- piece of software, store and retrieve
- similar to data.json()
Relational Databases
- Primary Key
- unique
- ordered
- can’t be null
- Foreign Key
- reference the primary key of other table
SQL
- can use various types of tables
e.g. B-tree …
SQL - Relations
- One to many
- Many to many
Combine…
- Many to one
- one to Many
- One to One
SQL - Queries
SELECT _ FROM _ WHERE _ : SELECT COL FROM TABLE WHERE CONDITION
- SELECT * FROM “Users”
- *: everything
- SELECT “Email” FROM “Users” WHERE Id=7
- Time Complexity
approach table by index, not directly by primary key => O(n) - -> (because a primary key is ordered,) binary_search => O(logN)
SQL - Indexes
- SELECT “Email” FROM “Users” WHERE Email = “bill.gates@microsoft.com“
- Time Complexity
approach table by checking each element => O(n) - -> set email as a primary key by adding a table and search in the original table using id as a reference key => O(2*logN)
SQL - Joins
Sharding and Replicating
Not good in developer stand, but for safety (backup)
- Sharding : divide into half and put each one into different machines
- Replicating : write the same data into different machines
In case of sharding the data might be nicely distributed and hence the queries. - To “improve query” response on reading data, replication will help. You could write away to your primary and read from secondaries to distribute the queries. Also the primary then is relieved of the expensive reads, and can be busy with only writing.
In case of replicating existing shards, there will be more hosts to respond to a query request.
- There is a some improvement with sharding if you choose a good shard key. Writing queries away ‘might’ be distributed if you do that correctly. The main reason for sharding is to “horizontally expand your database”. Working with big data, and not wanting to create/insert bigger and bigger disks … you can just create new servers next to it, as much as you want.