Write Path
in DBImpl::Write
- create a
Writerobject that holds the write batch - lock the mutex (RAII Scoped Lock) and wait for the current write to be at the front of the writer queue
- how do multiple writers lock the same lock at the same time? or do they all contend on the
MutexLock l(&mutex_)?
- how do multiple writers lock the same lock at the same time? or do they all contend on the
- if we need to throttle, then we wait -
DBImpl::MakeRoomForWrite - get the last sequence number
- the Writer now holds responsibility for locking the memtable and WAL, so the main writer thread can release the lock
- WAL commit: https://github.com/google/leveldb/blob/4a0c572440c7df2f56a6f5fb5aec9e366d522edb/db/db_impl.cc#L1236
- if WAL write was successful, write to memtable https://github.com/google/leveldb/blob/4a0c572440c7df2f56a6f5fb5aec9e366d522edb/db/db_impl.cc#L1245
- Lock the mutex again, and update the high sequence number
- remove the writers from the queue (why does this happen?)
- signal the waiting head of the write queue
Read Path
in DBImpl::Get
- Create a scoped RAII Lock for the mutex
- Get the snapshot (as determined by a Sequence Number) for the read (either via the
ReadOptionsor by getting the latest Sequence Number) - Create a
LookupKey- what is an
internal_key?
- what is an
- refcount the latest memtable, immutable memtable, and
Version- what is
Version?Versionappears to be how LevelDB represents an on-disk snapshot. Conversely,VersionSetis what appears to represent a "snapshot manager"- Each
Versionis a node in a linked list; it links to the next and previous version as well - It has a list of files in each level
- Each
- what is the
manifest_file_number_? Looks to be the snapshot ID, so that we have clear versioning amongst the snapshots.
- what is
- Unlock the mutex and lookup the key in the memtable
- If the key is not present in the memtable, look up the immutable memtable
- if the key is not present in the immutable memtable, look up the current snapshot (see
Version::Get)- create a tracker struct for lookup information
- call into
Version::ForEachOverlapping- takes a callback to run on every file whose key range matches the lookup key- the callback does the actual work of opening the file and reading from it - when the callback returns
false, the search stops and returns - first search level 0 in order of newest to oldest - process all level 0 files that overlap and run the callback on them
- if the callback doesn't return, proceed to search lower levels (1, 2, 3, 4)
- important to note: the guarantee we have now is that only one file will contain the information for any key - so first we search for the first overlapping file within the level, then we search within the file
- for each level, binary search the files until we find the first file whose high key is greater than the lookup key
- if the lookup key is smaller than the low key of the file, then we skip the file
- the lookup key is larger than the low key of the file, we can look within the file
- the callback does the actual work of opening the file and reading from it - when the callback returns
- Within the callback:
- update internal stats tracking for what the last file read in the read was
- look up the
TableCachewithin theVersionSet- TableCache is an LRU Cache (exposed as the interface
Cache) that maps keys to values, and maintains some other metadata for a key in aHandle. TheTableCacheexists because opening an SSTable (lots of work regarding decoding the index blocks, filters, etc) is an expensive process, so we can cache the file metadata in a cache and avoid having to read it on every read for the key. - This opens the file (if not already open) and searches the contents of the
Tablein a block-wise fashion (Table::InternalGet) - if the search is rejected by the bloom filter, return early - Once the block is found, seek to the key
- if the key is present (i.e key we seeked to is equal to search key, not greater/lesser), and not corrupt, and not a deletion, then capture the value
- and return false
- TableCache is an LRU Cache (exposed as the interface
- notes: seeing a lot of "charge against/for", wonder what this means