A file based B-Tree implementation
The B-Tree module (btree.h/btree.c) implements a file-based B-Tree used by the db module for indexing. The module is separated out for ease of testing and reuse. The file that the B-Tree is stored in is broken up into 512 byte pages, which aligns with common disk block sizes for 16-bit and 32-bit operating systems. Both keys and values are int32. If a key has more values than can be stored in its node, then an overflow page is allocated for them. Overflow pages are stored in a separate page in the same file. The very first page of the file contains a header with various information about the tree.
To create a B-Tree you need to provide the path to the file the tree will be stored in.
Usage Example
#include "btree.h"
int main(void) {
BTree tree;
int32 values[100];
int16 count;
int32 userKey;
/* Create a new B-Tree */
CreateBTree("index.btree");
/* Open the B-Tree */
OpenBTree(&tree, "index.btree");
/* Insert numeric key-value pairs */
BTreeInsert(&tree, 100, 200); /* Key: 100, Value: 200 */
BTreeInsert(&tree, 100, 201); /* Multiple values for same key */
BTreeInsert(&tree, 150, 300); /* Different key */
/* Insert using string-based keys */
userKey = StringKey("alice");
BTreeInsert(&tree, userKey, 1); /* User ID 1 for username 'alice' */
/* Find values for a key */
if (BTreeFind(&tree, 100, values, 100, &count))
printf("Found %d values for key 100\n", count);
/* Find by username (case-insensitive) */
userKey = StringKey("ALICE"); /* Same key as 'alice' */
if (BTreeFind(&tree, userKey, values, 100, &count))
printf("User alice has ID: %ld\n", (long)values[0]);
/* Delete a specific value */
BTreeDeleteValue(&tree, 100, 200);
/* Delete all values for a key */
BTreeDelete(&tree, 150);
/* Close the tree */
CloseBTree(&tree);
return 0;
}
Implementation Details
File Structure
The B-Tree file is organized into 512-byte pages with little-endian byte order:
- Page 0: Header page containing tree metadata
- Page 1+: Node pages (internal nodes, leaf nodes, or overflow pages)
Page Types
Header Page (Page 0)
- Magic number:
"BTRE"(4 bytes) - Format version:
uint16(2 bytes) - Tree order:
uint16(2 bytes) - maximum children per node - Root page number:
int32(4 bytes) - Next free page:
int32(4 bytes) - Total page count:
int32(4 bytes)
- Magic number:
Leaf Node Pages
- Page type:
byte(3 = leaf) - Key count:
uint16(2 bytes) - Next leaf pointer:
int32(4 bytes) - for range queries - Entries: Array of leaf entries
- Each entry contains:
- Key:
int32(4 bytes) - Value count:
uint16(2 bytes) - Values: Array of
int32(up to 60 values) - Overflow page:
int32(4 bytes, 0 if none)
- Key:
- Each entry contains:
- Page type:
Internal Node Pages (for future expansion)
- Page type:
byte(2 = internal) - Key count:
uint16 - Keys and child page pointers
- Page type:
Overflow Pages (for future expansion)
- Page type:
byte(4 = overflow) - Value count:
uint16 - Next overflow page:
int32 - Values: Array of
int32(up to 120 values)
- Page type:
Capacity
With 512-byte pages:
- Maximum keys per leaf node: 60
- Maximum values per key (in-node): 60
- Maximum values in overflow page: 120
- Tree order: 61 (max 60 keys, 61 children)
Current Limitations
The current implementation is simplified and suitable for small to medium datasets:
- No node splitting: The root is always a leaf node. Once
BT_MAX_KEYSis reached, inserts fail. - No internal nodes: Tree does not grow in height, limiting capacity.
- Basic overflow: Overflow pages are allocated but not yet implemented.
- Linear search: Uses linear search within nodes (good for small nodes).
These limitations make the implementation simple and suitable for learning, testing, and small-scale use. Future enhancements can add full B-Tree splitting, balancing, and multi-level trees.
Data Structures
/* Constants */
#define BT_PAGE_SIZE 512
#define BT_MAX_KEYS 60 /* Maximum keys per node */
#define BT_MAX_VALUES 60 /* Maximum values per key in a leaf */
#define BT_MAX_OVERFLOW 120 /* Maximum values in an overflow page */
/* Page type constants */
#define PT_NONE 0
#define PT_HEADER 1
#define PT_INTERNAL 2
#define PT_LEAF 3
#define PT_OVERFLOW 4
typedef struct {
char magic[4];
uint16 version;
uint16 order;
int32 root_page;
int32 next_free_page;
int32 page_count;
} BTreeHeader;
typedef struct {
int32 key;
uint16 value_count;
int32 values[BT_MAX_VALUES];
int32 overflow_page;
} LeafEntry;
typedef struct {
byte page_type;
uint16 key_count;
int32 next_leaf;
LeafEntry entries[BT_MAX_KEYS];
} LeafNode;
typedef struct {
char filename[256];
FILE *fp;
BTreeHeader header;
bool is_open;
} BTree;
Functions
File Operations
bool CreateBTree(const char *filename)
- Creates a new B-Tree file
- Initializes header with magic number and metadata
- Creates empty root leaf node
- Returns
trueon success
bool OpenBTree(BTree *tree, const char *filename)
- Opens an existing B-Tree file
- Reads and validates header (checks magic number)
- Returns
trueon success
void CloseBTree(BTree *tree)
- Writes updated header to disk
- Closes the file
- Marks tree as not open
Data Operations
bool BTreeInsert(BTree *tree, int32 key, int32 value)
- Inserts a key-value pair
- If key exists, adds value to that key's value list
- If key doesn't exist, creates new entry (keys are kept sorted)
- Returns
trueon success
bool BTreeFind(BTree *tree, int32 key, int32 *values, int16 max_values, int16 *count)
- Finds all values for a given key
- Returns values in the provided array, up to
max_values - Sets
countto number of values found - Returns
trueif key exists
bool BTreeDelete(BTree *tree, int32 key)
- Deletes a key and all its associated values
- Shifts remaining entries to fill gap
- Returns
trueon success
bool BTreeDeleteValue(BTree *tree, int32 key, int32 value)
- Deletes a specific key-value pair
- If no values remain for key, deletes the key entirely
- Returns
trueon success
Utility Functions
int32 StringKey(const char *s)
- Generates a B-Tree key from a string
- Converts string to lowercase for case-insensitive lookups
- Computes CRC-16 hash of the lowercase string
- Returns the CRC-16 value as an
int32key - Use this to create keys for indexing string values like usernames
Example:
int32 key;
key = StringKey("Username"); /* Same as StringKey("username") */
BTreeInsert(&tree, key, 12345); /* Index user ID 12345 under this name */
Testing
Run the B-Tree test suite:
make && ./bin/test_btree
The test suite covers:
- Type size verification (catches LP64 issues)
- Header round-trip serialization
- File creation and opening
- Single and multiple key-value insertions
- Multiple values per key
- Key lookups (existing and non-existent)
- Value and key deletion
- Data persistence across close/reopen
- StringKey utility function (consistency, case-insensitivity, uniqueness)
All 10 tests pass successfully.
Future Enhancements
For production use with larger datasets, consider adding:
- Node splitting and merging: Allow tree to grow beyond single leaf
- Internal nodes: Multi-level B-Tree for better scalability
- Overflow page implementation: Support unlimited values per key
- Binary search: Within nodes for better performance
- Range queries: Leverage next-leaf pointers for scans
- Bulk loading: Efficient initial tree construction
- Defragmentation: Reclaim deleted page space