Mischiefblog
I make apps for other people

Getting started with Berkeley Database Java Edition

Posted by Chris Jones
On January 17th, 2011 at 21:40

Permalink | Trackback | Links In |

Comments (1) |
Posted in General

What is a Berkeley Database and how is it different from Oracle or a relational database?

A Berkeley Database (BDB) is a key-value store, essentially a disk-based HashMap. DBMs store the keys as hashes in fixed size buckets.

Values in a DBM are typically accessed through put and get operations. This database model is reused in other databases:

  • The original Database Manager (DBM) was written in 1979 by Ken Thompson
  • Gnu DBM (GDBM), JDBM, and Tokyo Cabinet (among many others) are examples of key-value DBM implementations
  • BigTable is a large scale key-value store (with column attributes) that relies on GFS to split the database (something you normally have to do for yourself with disk-based DBMs)
  • Memcached is a memory-based key-value store used to cache slow lookups
  • Voldemort, Cassandra, and Dynamo distribute key-value data across a cluster of eventually consistent stores

DBMs are frequently used as the back-end for relational databases, frequently as indexes but also as row stores.

DBMs are not:

  • ISAM: they’re organized as a tree, not as a collection of fixed-length records
  • Relational: records in a DBM may point to other records, but the database itself does not enforce referential integrity
  • Columnar: keys are associated with one value, so additional values need to be placed in tokenizable strings
  • Value searchable: strictly speaking, a DBM should only be searched or retrieved by key
  • Schema based: they don’t have schemas so it’s up to the software to determine how records are related
  • SQL based: they don’t have a structured query language for lookups

DBMs are:

  • very fast
  • efficient
  • distributable (see Dynamo)
  • inexpensive (low CPU and memory overhead, depending on cache configuration)

A DBM assumes you know what key you want to retrieve and provides the minimum tools necessary to do so.

The world’s simplest DBM example

This is in Python. When working with a Python dictionary type, you normally initialize the dictionary and access members with a mapping operator ([]):

>>> d = {}
>>> d['a'] = 'apple'
>>> d['b'] = 'banana'
>>> d['c'] = 'cantaloupe'
>>> d['b']
'banana'

When working with a DBM in Python, the DBM library provides a dictionary implementation backed by the DBM implementation:

>>> import anydbm
>>> db = anydbm.open('mydb.dbm','c')
>>> db['a'] = 'apple'
>>> db['b'] = 'banana'
>>> db['c'] = 'cantaloupe'
>>> db.close()
>>> db = None
>>> db
>>> db = anydbm.open('mydb.dbm','c')
>>> db['b']
'banana'

In the second example:

  1. we imported the anydbm library (which looks for a suitable DBM implementation on the host),
  2. created (if is doesn’t already exist) the database file,
  3. populated the database with key-value pairs,
  4. closed and dereferenced the database to prove the values were persisted on disk,
  5. then reopened the database and retrieved a value the on-disk store.

Why would you use a DBM?

You use a DBM when:

  • you need to perform batch processing that will be prohibitively expensive with a SQL database
  • you’re working with very large or time-sensitive data sets
  • you’re trying to reduce your relational database (RDB) expense
  • you don’t have the time or resources to invest in RDB development (one-off development, application caches)
  • you want a searchable cache of non-relational values (such as an extracted index from a database)
  • you need to replicate the data set across many read-only hosts
  • you want to embed a database in your application (instead of making the application rely on an external database)
  • you’re scaling up and a RDB can’t scale with you

Choosing to use a DBM or RDB depends on your application, audience, load, knowledge, service level agreements, available hardware, etc. It’s a decision that’s not simply determined by software architecture but also by your organization’s goals, guidelines, and primary care architect’s input.

Getting started with Berkeley JE

Download the Berkeley JE from Oracle:

http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html

Install JE into a reasonable place and create a library for it with Eclipse (assuming you’re not using Maven).

Trivial Berkeley JE demonstration

package com.mischiefbox.bugged;

import java.io.File;

import com.sleepycat.je.Database;
import com.sleepycat.je.DatabaseConfig;
import com.sleepycat.je.DatabaseEntry;
import com.sleepycat.je.DatabaseException;
import com.sleepycat.je.Environment;
import com.sleepycat.je.EnvironmentConfig;
import com.sleepycat.je.LockMode;
import com.sleepycat.je.OperationStatus;
import com.sleepycat.je.Transaction;

public class Bugged {
    final File envDir = new File("/home/chjones/tmp");
    final EnvironmentConfig envConfig = new EnvironmentConfig();
    final Environment env;
    final DatabaseConfig dbConfig = new DatabaseConfig();
    final Database db;

    public Bugged() {
        // set up the database
        envConfig.setTransactional(true);
        envConfig.setAllowCreate(true);

        env = new Environment(envDir, envConfig);

        dbConfig.setTransactional(true);
        dbConfig.setAllowCreate(true);
        dbConfig.setSortedDuplicates(true);

        db = env.openDatabase(null, "bugged", dbConfig);

        store();
        retrieve();

        db.close();
    }

    private void store() {
        // store a key/value pair
        byte [] firstKey = "firstKey".getBytes();
        byte [] firstValue = "firstValue".getBytes();
        DatabaseEntry keyEntry = new DatabaseEntry(firstKey);
        DatabaseEntry valueEntry = new DatabaseEntry(firstValue);

        try {
            Transaction txn = env.beginTransaction(null, null);
            OperationStatus status = db.put(txn, keyEntry, valueEntry);

            if (!status.equals(OperationStatus.SUCCESS)) {
                System.err.println("Error: " + status.toString());
            }

            txn.commit();
        } catch (DatabaseException e) {
            e.printStackTrace();
        }
    }

    private void retrieve() {
        // retrieve the value for the key
        byte [] key = "firstKey".getBytes();

        DatabaseEntry keyEntry = new DatabaseEntry(key);
        DatabaseEntry valueEntry = new DatabaseEntry();

        try {
            Transaction txn = env.beginTransaction(null, null);
            OperationStatus status = db.get(txn, keyEntry, valueEntry, LockMode.READ_UNCOMMITTED);

            if (!status.equals(OperationStatus.SUCCESS)) {
                System.err.println("Error: " + status.toString());
            } else {
                String value = new String(valueEntry.getData());
                if (value.equals("firstValue")) {
                    System.out.println("OK");
                } else {
                    System.err.println("Failed:  value returned '" + value + "' does not equal 'firstValue'");
                }
            }
        } catch (DatabaseException e) {
            e.printStackTrace();
        }
    }

    public static void main(String [] args) {
        new Bugged();
    }
}

One Response to “Getting started with Berkeley Database Java Edition”

  1. Dave Segleau Says:

    Chris,

    Great blog entry. Thanks for using and blogging about Berkeley DB. I thought that I would mention that Berkeley DB (the C library, not the Java Edition) has now added a SQL API. We’ve integrated the SQLite3() API on top of Berkeley DB, thereby combining two of the most popular open source libraries in a single package. You get the ease of use of SQLite with the concurrency, scalability and reliability of Berkeley DB.

    So, we’ve added SQL and Schemas to Berkeley DB, which should make it much easier to use for newer developers. We’ve also included support for JDBC, ODBC and ADO.NET. It’s still just as fast, reliable and scalable. Now it’s just easier to use and manage the data.

    Regards,

    Dave