Medical Informatics/Case Studies of Computer Based Medical Records

Small Office Medical Record Systems edit

Determination by Regional Professional and Business Environments edit

Australian case study 1 edit

a feature based evolution of GP computerized medical records edit

General practice medical records in Australia had the RACGP standard medical record forms for patient registration and summary, and progress notes, prior to the widespread introduction of PC based computer medical records, mainly based on the Windows operating system.

Using a computer based medical record system gave advantages of increasing script legibility, so that medication errors from written GP to pharmacist miscommunication were theoretically reduced, but also one source of business inefficiency where pharmacists had to phone GPs to clarify iilegible scripts was eliminated. It became apparent that referrals to specialists could be made faster, when an integrated address book and word processor functionality was added. The medical justification was that specialists could also get summary lists of medical conditions and medications prescribed automatically included in the form letter printing facility, thus improving the patient handover process, but this was rarely relied upon , except perhaps by surgeons.

Pathology and radiology reports could also be downloaded by Australian medical record systems ,usually via a pathology provider outsourced proprietary download client , which placed result files in a file directory for GP record systems to parse , given some simple line orientated format, like PIT. This was in the context of internet access developing from dial up , to ADSL /cable , with constant access , and the requirement of developing secure transfers over the internet using encryption and authentication internet standards.

In the Australian context, the early adoption of one vendors solution meant that database technology used to store the small business records remained fairly stable for over a decade, and in fact the DBF file format of DBase / Foxpro came to be the foundations of most general practice electronic medical records for many years.

Since the scale of general practice was mostly small business, there was not much need to change the simpler but elegant database technology of the DBASE format, and the Windows network filesystem opportunistic file locking sufficed for using the system even in multi doctor practices with concurrent access by 10 or more users. The b-tree based DBASE index files provided the scalability as records accumulated years of patient progress notes , actions, vaccinations, referrals, imaged scanned reply letters, downloaded text results. But the file structure of DBASE index files was not openly documented, and the ability to rapidly find all the associated records for a given patient without detailed know-how about DBASE IDX/MDX files, was one major hurdle to customizing functionality of the GP computer based medical record.

For a period, this meant that it was well within the intellectual resources of any general practitioner to become the master of his own record system, as DBF file format was well known, as it was a logical heap file format, where records are simply appended to the end of files storing a particular table's information. But the lack of easily available / understandable information about DBASE *index* file structure, provided a major hurdle.

This case study will try to illustrate how the simple , logical , transparent organization of DBASE can intersect with basic understanding of one particular computer algorithm, i.e. Linear Hashing, and provide the means for extending functionality. With the move to vendor specific implementations of SQL database systems, this era is sadly ended, and futureproofing and protection from vendor market withdrawal is much harder.

Background computer science edit

Modern database systems are usually front ended by a SQL declarative programming interface, as SQL is comparatively easy to learn for non-programmers, but needs quite a lot of infra-structure to support it, and puts quite a distance between the language grammar of SQL , and the underlying file formats and algorithms used for retrieval. Most database systems still relying on indices to indicate where a record is stored given a key value for that record, and the algorithms that make up indices are either based on variants of B-Trees or Hashing. The main advantage of B Tree indexes is that index keys are stored sorted, so that it is possible to retrieve a range of records based on an lower key value or an upper key value . for example. All records whose patient.surname start with Smi , could be retrieved by using Smia to Smiz as upper and lower range values.

The Hashing algorithms are used in indices mainly because of their relative speed ; roughly speaking, it takes longer to traverse the branches of a tree for a monkey to get to a banana then for the monkey to read a sign pointing to the banana. Thus Hash indices are generally faster when implementing a *JOIN* operation. As an example of a Join operation , a patient's medical record might consist of entries in the a demographic table, referenced by an ID field, and this ID field is called the Primary Key. The patient's medications be stored in a medications table, which also has a DEMOGRAPHIC_ID field, and this is what is termed a Foreign Key field, so that with a Join between the demographic table and the medications table, we can , by identifying a patient with an ID value, identify what medications the patient is on.

What is Hashing ? Hashing is where some data value is translated into a integer number. Generally , a good hashing function will generate a roughly random integer number , for domain of data values to be translated. The reason that a good hash function is required, is that the hash value produced is used as an index into a storage location, which in this context, is a file location within a index hash file. Therefore, for space to be efficiently used, the possible hash values derivable from the population of input data should be roughly evenly distributed across the storage addressing range.

Another useful concept is Collision, where two different values are hashed to the same hash integer number. In order to handle a collision, extra space can be allocated at each location, so that there a fixed number of extra locations, and this can be called a Bucket (of slots). What if the number of keys hashed to one location exceed the number of slots in a Bucket ? Then another bucket can be referenced by the bucket , and this is called overflow bucket / block /space.

Because of the smaller scale of general practice database, it could be argued that standard hash algorithm can be used to implement external index files , using very large index file spaces, with a bucket to each possible hash value, and overflow buckets appended to the end of the index files if ever required.

A more elegant algorithm is the Linear Hashing algorithm, and it is prominent enough to be mentioned in almost any decent database fundamentals textbook , along with Extendible Hashing, as the hashing algorithms of choice for use with the large , file based indices requirements of database systems.

The main limitation of file based indices, is the nature of files :- they are easy to create, and easy to append to , but it is expensive to insert a block data in the middle of a file, as at the point of insertion, the rest of the file after the insertion point must be copied to a temporary file, the original file truncated at the insertion point, the new block appended to truncated file, and the temporary file contents appended to this. BTrees expand a Node/Block at a time, and a new Node can be appended to the end of the file. Extendible/linear hashes can also expand a block at a time, although Extendible hashes probably either require an estimation of the maximal size of a directory structure reserved at the start of the index file, or a separate directory file used , because directory structures periodically double in size.

BTree indices use an algorithm where keys are associated with pointers to file addresses marking the start of other blocks of file data representing other nodes in the B Tree. The B Tree is self balancing because overflow of entries means a Node is split , and an extra key demarking the two new nodes is passed up to the parent node, which slows down growth of the height of the tree; this eventually leads to split of the root node. New nodes are simply appended to the end of the index file, and file offset values ( node "pointers") allow random file access to traverse the blocks from root downwards.

Extendible and linear hashes are kin to radix searches, because the address space is increased by increasing the number of bits used in a hash value , every extra bit doubling the address space. The wikipedia encyclopedic references give adequate description of extendible and linear hashing.

However, since Linear Hashing is used in the code below, a brief recap is in order : linear hashing works by using a conditional hash function, which depends an iteration variable , split , which , on the first iteration of the split value, ranges from hash value 0 to N-1 , the initial number of buckets on the first iteration. The conditional hash function works so:

 h' = h mod M where M = N x 2 ^ level, if h' < split, then h' = h mod M' where M = N x 2 ^(level +1).

Whenever any bucket overflows, more space is needed, and this is handled temporarily by creation of an overflow bucket attached to the bucket that overflowed. But it is also handled by appending a bucket to the current list of buckets, so when the split variable is zero, the appended bucket in N. Once this is done, the split variable is incremented : thereafter the condition hash function will use mod 2N instead of N for any value originally hashed as zero, so that records will be hashed to either 0 or N. i.e if y = x mod R and y' = x mod 2R , then y' = y , or y' = y + R for any non-negative integer x. This conveniently hashes values traversed by split into the original numbered bucket n or another numbered bucket which is n + N times 2 ^ level . This is linear hashing , because the number of buckets grow linearly . Whenever the split variable hits N * 2 ^ level ( or N left bit shifted level times ) , then level is incremented, and split is reset to 0, in order for the conditional hash function to access the next level of added buckets, while ranging round robin across all available buckets splitting them before they accumulate too many records e.g. for 0 to N-1, N to 2N-1 is accessible ; for 0 to 2N-1, 2N to 4N-1 is accessible , and so on.

Case study account : allergy checker , and OS directory based file logs of consultations edit

In this case study, the initial "itch to scratch" was a user requirement that users be monitored for their entry of allergies into medical records, as , although doctors almost automatically ask whether any allergies exist before prescribing, the expectation of general practice in Australia was that allergies be recorded on all patients , regardless of whether the currency of the allergy list is being checked at each visit. This led to the implementation of an external program written in java to inspect the DBF tables related to allergy , and initially it was found that the in-memory Map classes (HashMap, TreeMap) were upto the task of indexing these tables on startup of the program, and maintaining a temporary in memory index to facilitate lookup . A method was found by polling the operating system "registry" of program associated attributes in order to detect when a patient record was opened, so that a lookup of the allergy tables could be triggered, and an reminder window flashed up if allergy recording was not yet done, and this check would occur periodically using a java Timer class.

The next user requirement was that consultation statistics could be recorded, to help with reviewing previous consultations , so that the clinician could review his practice if he so wished, and easily find patients he had recently seen , according to other detail not provided by the medical records program. This required lookup in a table which had a size that exceeded the capability of the in memory Map java classes, so it was proposed that a Linear Hashing implementation be used to scaleably index such tables.

Unfortunately, the initial implementation , which mixed in the block management as file entities, did not work and was difficult to debug, so the second implementation concentrated on proving the algorithm was working in memory first.

Once the bugs were ironed out ( which mainly due to the problems of unsigned integer representation inherent in the java language introducing a bug into the dynamic hashing function ), the in-memory Linear Hashing implementation was used as a cache structure for a file based block manager . A block was basically saved as a file with the filename being the same as the block number.

Due to the ability of java's in built LinkedList type being able to store null values at a given index location, the null value was used as a marker of a cache miss , requiring the file block manager to load in a block.

Whenever a block was loaded, a check was also made by comparing the runtime free memory with the size of block list multiplied by the average block size, and if it was below a certain multiple of this, then a number of non-empty blocks would be chosen by traversing the list and writing them to disk. Once the block was written, the data in the block was made available to the garbage collector by clearing any references to them (e.g. calling map.clear() ) , and then invoking the garbage collector ( System.gc() ). This would result in an increase in available java memory for data object allocation.

Because the loading and saving of blocks already worked for the caching , it was found that these could be used to load a previously created linear hash structure, or save a newly constructed one. This was also needed because constructing a persistable index structure from a large number of records took more than 10 minutes, and this was an unacceptable startup time for such a monitoring program.

Below is the Map interface plugin, along with a File manager class with static public methods for saving and loading the index. It illustrates that from a basic understanding of algorithms, this can be used to provide transparency and extensibility of computer based medical record systems. This doesn't provide the DBF file access logic, but this is relatively straight forward to implement using easily available descriptions of DBF file structure and basic types , and these classes can be used as the drop-ins for the java associative array Map classes, as programmatic representations of database indexes, which can be built by walking the list of records stored in a DBF table file. Once an index is built and then saved to disk, it can be reloaded on future startups of the monitoring program. Before using any index, the number of entries in the index can be compared to the current number of records recorded in the header of the DBF table file, and if is less , than records can be read from position L , the last number of records, to N the current number of records, and also added to the index object. Thus once an index is built, it is periodically updated, by checking against the previous number of records in the file , stored as part of the index header on disc whenever the index is saved to disc. A shutdown hook is used to ensure the index is updated in normal shutdown.

The specific use of the linear hash map implementation was to create a progress entry index clustered on patient identity keys. Each list of progress notes associated for a given patient, was stored as a linked list object against the patient ID as a key. The patient ID was then passed through the dynamic hash function and the key-value pair of patientID-to-linked_list_progress_ID was stored as an entry into a Block object of limited size. Given this representation, the reader could infer that the bucket sizes were not variable, or used a lot of wasted space to accommodate for maximal value list sizes. Because the blocks were of variable size, it was decided that each block would be a file, and a whole file system directory would be used to hold the linear map's data. This would add kernel overhead of opening/closing a file when dealing with a block, so the implementation may have been slower because of the simplification decision to represent a block as a single file, instead of storing fixed size blocks appended to a single index file. Nevertheless the performance was brisk enough, when a fully built index was used to retrieve a particular patient's records in order to record consultation review information.

This user API of using the linear hashing map to build indices for joining/ relating tables trades the hidden complexity of SQL join queries, for explicit review of table schemas, for the arguably equal complexity of learning the java language , and using the java language's flexibility to output custom reports. There seems no obstacle to simply translating the methods used to a different language, python for instance.

What is missing from this simplistic implementation of database indexing ? ACID is a shopping list acronym that comes to mind; atomicity , isolation, consistency and durability , in the face of concurrent access and random system crashes during operation. So far, for single user machine use as a local allergy checker and consultation statistics recorder, there seems to be no problem in 3 months of usage, but the desktop machine has never suffered a failure in the middle of a index file image update , which is rare, but because not specifically accounted for , would likely require the index files to be deleted, and a prolonged fresh initial index required at next operating system startup ( the programs run via the Startup Menu of the windows os being used).

The easiest way to deal with concurrency is to wrap the Map interface with java's Collections.syncrhonizedMap wrapper , and the standard way of dealing with recoverability is write ahead logging with checkpointing. Some DBMS use copy-on-write and keep previous versions of written data values, but what is the minimum that will suffice ? The main writing operations are overwriting a value for an existing key-value pair, inserting a new key value pair, and splitting a block. One recovery scheme is called shadow pages, where a copy of the page directory / block list is kept at checkpoint, and blocks written to storage are given version numbers, so block x would have the name 'x.v' . This could be implemented by adding a serializable small map attribute that maps a block number to a block.version file name. If a crash happens after the last checkpoint, then the shadow block list and block name map are made the current block list and name map. If required, periodic garbage collection can be done by deleting any file's with names containing earlier version numbers than the current shadow block name map .

java implementation of a basic persistent database index using the linear hash algorithm edit

package linearhashmap;

import java.io.Externalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.TreeMap;
import java.util.TreeSet;

/**
 * 
 * @author S.Tan
 * 
 * @param <K>
 *            key type , must implement equals() and hashCode()
 * @param <V>
 *            value type
 * 
 * 
 */
public class LHMap2<K, V> implements Map<K, V>, Serializable {

	/**
	 * 
	 */
	private static final long serialVersionUID = 3095071852466632996L;

	/**
	 * 
	 */

	public static interface BlockListener<K, V> {
		public void blockRequested(int block, LHMap2<K, V> map);
	}

	List<BlockListener<K, V>> listeners = new ArrayList<BlockListener<K, V>>();

	// int savedBlocks;
	int N;
	int level = 0;
	int split = 0;
	int blockSize;
	long totalWrites = 0;

	List<Block<K, V>> blockList = new ArrayList<Block<K, V>>();

	public void addBlockListener(BlockListener<K, V> listener) {
		listeners.add(listener);
	}

	void notifyBlockRequested(int block) {
		for (BlockListener<K, V> l : listeners) {
			l.blockRequested(block, this);
		}
	}

	public LHMap2(int blockSize, int nStartingBlocks) {
		this.blockSize = blockSize;
		this.N = nStartingBlocks;
		for (int i = 0; i < nStartingBlocks; ++i) {
			addBlock();
		}

		Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {

			@Override
			public void run() {
				showStructure();

			}
		}));
	}

	public static class Block<K, V> implements Externalizable {
		/**
		 * 
		 */

		int j = 0;

		Block<K, V> overflow = null;
		LinkedList<K> keyList = new LinkedList<K>();
		LinkedList<V> valueList = new LinkedList<V>();
		transient private LHMap2<K, V> owner;
		transient private Map<K, V> shadow = new TreeMap<K, V>();

		private boolean changed = false;

		private int size = 0;

		public LHMap2<K, V> getOwner() {
			return owner;
		}

		public void setOwner(LHMap2<K, V> owner) {
			this.owner = owner;
			Block<K, V> ov = overflow;
			while (ov != null) {
				overflow.setOwner(owner);
				ov = ov.overflow;
			}
		}

		public Block() {
		}

		public Block(LHMap2<K, V> map) {
			setOwner(map);
		}

		public V put(K k, V v) {
			setChanged(true);

			V v2 = replace(k, v);
			if (v2 == null) {
				++size;
				if (keyList.size() == getOwner().blockSize) {

					if (overflow == null) {
						getOwner().blockOverflowed(this, k, v);
					} else {
						overflow.put(k, v);
					}

				} else {
					keyList.addFirst(k);
					valueList.addFirst(v);
				}

			}

			return v2;
		}

		void setChanged(boolean b) {
			changed = b;
		}

		public Map<K, V> drainToMap(Map<K, V> map) {

			while (!keyList.isEmpty()) {
				K k = keyList.removeLast();
				V v = valueList.removeLast();
				map.put(k, v);

			}

			if (overflow != null)

				map = overflow.drainToMap(map);

			garbageCollectionOverflow();

			return map;
		}

		public void updateMap(Map<K, V> map) {
			Iterator<K> ik = keyList.descendingIterator();
			Iterator<V> iv = valueList.descendingIterator();
			while (ik.hasNext() && iv.hasNext()) {
				map.put(ik.next(), iv.next());
			}

			if (overflow != null)
				overflow.updateMap(map);

		}

		private void garbageCollectionOverflow() {
			if (overflow != null) {
				overflow.garbageCollectionOverflow();
				overflow = null;

			}
		}

		public void addOverflowBucket() {

			// assert overflow is needed
			if (keyList.size() < getOwner().blockSize)
				return;

			if (overflow == null) {
				overflow = new Block<K, V>(getOwner());
			} else {
				overflow.addOverflowBucket();
			}
		}

		public V replace(Object key, V v2) {

			if (overflow != null) {
				V v = overflow.replace(key, v2);
				if (v != null)
					return v;
			}

			int j = 0;

			for (K k : keyList) {

				if (key.equals(k)) {

					V v = valueList.get(j);

					if (v2 != null) {

						valueList.set(j, v2);

					}

					return v;
				}
				++j;
			}

			return null;
		}

		public boolean isChanged() {
			return changed;
		}

		@Override
		public void readExternal(ObjectInput arg0) throws IOException,
				ClassNotFoundException {
			int sz = arg0.readInt();
			for (int i = 0; i < sz; ++i) {
				K k = (K) arg0.readObject();
				V v = (V) arg0.readObject();
				shadow.put(k, v);
			}
		}

		public void loadFromShadow(LHMap2<K, V> owner) {
			setOwner(owner);
			Block<K, V> b = this;
			for (Map.Entry<K, V> entry : shadow.entrySet()) {
				if (b.keyList.size() == owner.blockSize) {
					Block<K, V> overflow = new Block<K, V>(owner);
					b.overflow = overflow;
					b = overflow;
				}
				b.keyList.add(entry.getKey());
				b.valueList.add(entry.getValue());

			}
			shadow.clear();
		}

		@Override
		public void writeExternal(ObjectOutput arg0) throws IOException {
			if (!changed)
				return;
			Map<K, V> map = new TreeMap<K, V>();

			// this is destructive of the block
			updateMap(map);
			int sz = map.size();
			arg0.writeInt(sz);
			for (Map.Entry<K, V> entry : map.entrySet()) {
				arg0.writeObject(entry.getKey());
				arg0.writeObject(entry.getValue());
			}
			setChanged(false);

		}

	}

	void init() {

		for (int i = 0; i < N; ++i) {
			addBlock();
		}
	}

	/**
	 * @param hashValue
	 * @return a bucket number.
	 */
	int getDynamicHash(int hashValue) {

		long unsignedHash = (long) hashValue & 0xffffffffL;
		int h = (int) (unsignedHash % (N << level));
		// System.err.println("h = " + h);
		if (h < split) {

			h = (int) (unsignedHash % (N << (level + 1)));
			// System.err.println("h < split, new h = " + h);
		}
		return h;

	}

	public V put(K k, V v) {
		++totalWrites;
		int h = getDynamicHash(k.hashCode());
		Block<K, V> b = getBlock(h);

		b.put(k, v);

		return v;

	}

	public long getTotalWrites() {
		return totalWrites;
	}

	private Block<K, V> getBlock(int h) {
		notifyBlockRequested(h);
		return blockList.get(h);
	}

	void blockOverflowed(Block<K, V> b, K k, V v) {

		splitNextBucket();

		b.addOverflowBucket();
		b.put(k, v);
	}

	private void splitNextBucket() {
		Block<K, V> b = getBlock(split);
		TreeMap<K, V> map = new TreeMap<K, V>();
		b.drainToMap(map);
		addBlock();
		System.err.printf("split N LEVEL  %d %d %d \n", split, N, level);
		if (++split >= (N << level)) {
			++level;

			split = 0;
		}

		for (Map.Entry<K, V> entry : map.entrySet()) {
			K k = entry.getKey();
			V v = entry.getValue();
			int h = getDynamicHash(k.hashCode());
			System.err.println(h + " ");
			Block<K, V> b2 = getBlock(h);
			b2.put(k, v);
		}
	}

	private Block<K, V> addBlock() {
		Block<K, V> b = new Block<K, V>(this);
		blockList.add(b);

		return b;
	}

	@Override
	public void clear() {
		blockList = new ArrayList<Block<K, V>>();
		split = 0;
		level = 0;
		totalWrites = 0;// savedBlocks = 0;

	}

	@Override
	public boolean containsKey(Object key) {
		return get(key) != null;
	}

	@Override
	public boolean containsValue(Object value) {
		return values().contains(value);
	}

	@Override
	public Set<java.util.Map.Entry<K, V>> entrySet() {
		Set<Map.Entry<K, V>> set = new TreeSet<Map.Entry<K, V>>();
		for (final K k : keySet()) {
			set.add(new Entry<K, V>() {

				@Override
				public K getKey() {
					return k;
				}

				@Override
				public V getValue() {
					return get(k);
				}

				@Override
				public V setValue(V value) {
					return put(k, value);
				}
			});
		}
		return Collections.unmodifiableSet(set);
	}

	@Override
	public V get(Object key) {
		int h = getDynamicHash(key.hashCode());
		Block<K, V> b = getBlock(h);
		return b.replace(key, null);
	}

	@Override
	public boolean isEmpty() {
		return size() == 0;
	}

	@Override
	public Set<K> keySet() {
		Set<K> kk = new TreeSet<K>();
		for (int i = 0; i < blockList.size(); ++i) {
			Block<K, V> b = getBlock(i);
			kk.addAll(b.keyList);
			Block<K, V> ov = b.overflow;
			while (ov != null) {
				kk.addAll(ov.keyList);
				ov = ov.overflow;
			}
		}
		return Collections.unmodifiableSet(kk);
	}

	@Override
	public void putAll(Map<? extends K, ? extends V> m) {
		for (Map.Entry<? extends K, ? extends V> entry : m.entrySet()) {
			put(entry.getKey(), entry.getValue());
		}
	}

	@Override
	public V remove(Object key) {
		return null;
	}

	@Override
	public int size() {
		return (int) Math.min(longSize(), (long) Integer.MAX_VALUE);
	}

	public long longSize() {
		long sz = 0;
		for (Block<K, V> b : blockList) {
			Block<K, V> b1 = b;
			while (b1 != null) {
				sz += b1.size;
				b1 = b.overflow;
			}
		}
		return sz;
	}

	@Override
	public Collection<V> values() {
		List<V> list = new ArrayList<V>();
		for (K k : keySet()) {
			list.add(get(k));
		}
		return Collections.unmodifiableList(list);
	}

	private void showStructure() {
		for (int i = 0; i < blockList.size(); ++i) {

			Block<K, V> b = getBlock(i);
			Block<K, V> ov = b.overflow;
			int k = 0;
			while (ov != null) {
				ov = ov.overflow;
				++k;
			}

			System.out.println("Block " + i + " size " + b.keyList.size()
					+ " overflow blocks = " + k);

		}
	}

}



package linearhashmap;

import java.io.File;
import java.io.Serializable;
import java.util.List;
import java.util.Random;

public class LHMap2BlockFileManagerData implements  Serializable{
	/**
	 * 
	 */
	private static final long serialVersionUID = 1L;
	public byte[] buf;
	public Random r;
	public File baseDir;
	public File home;
	public int maxBlocks;
	public int retired;
	public double unloadRatio;
	public List<Integer> retirees;
	public int avgBlockSize;
	public long avgCount;

	public LHMap2BlockFileManagerData(byte[] buf, Random r, int retired,
			List<Integer> retirees, long avgCount) {
		this.buf = buf;
		this.r = r;
		this.retired = retired;
		this.retirees = retirees;
		this.avgCount = avgCount;
	}

	
}

package linearhashmap;



import java.io.ByteArrayInputStream;

import java.io.ByteArrayOutputStream;

import java.io.File;

import java.io.FileInputStream;

import java.io.FileNotFoundException;

import java.io.FileOutputStream;

import java.io.IOException;

import java.io.ObjectInputStream;

import java.io.ObjectOutputStream;

import java.io.Serializable;

import java.util.ArrayList;

import java.util.Random;



/**

 * This class manages the LHMap2 class block disk swapping, and saves and load

 * an instance of the LHMap2 class. It has been used to externally index a

 * legacy file based database of 100,000 record master table, and 1,000,000

 * record sized child tables, and accounts for heap space available in the java

 * virtual machine, so that OutOfMemory errors are avoided when the heap space

 * is low by putting blocks back on files, and garbage collecting them. The main

 * performance bottleneck appeared when loading a million record table for an

 * index , on initial creation of the index.

 * 

 * @author doctor

 * 

 * @param <K>

 * @param <V>

 */

public class LHMap2BlockFileManager<K, V> implements

		LHMap2.BlockListener<K, V>, Serializable {



	/**

	 * 

	 */

	private static final long serialVersionUID = 2615265603397988894L;

	LHMap2BlockFileManagerData data = new LHMap2BlockFileManagerData(

			new byte[10000], new Random(), 0, new ArrayList<Integer>(), 0);



	public LHMap2BlockFileManager(File baseDir, String name, int maxBlocks,

			double unloadRatio) {

		data.home = new File(baseDir, name);

		if (!data.home.exists())

			data.home.mkdir();

		this.data.maxBlocks = maxBlocks;

		this.data.unloadRatio = unloadRatio;

	}



	@Override

	public void blockRequested(int block, LHMap2<K, V> map) {

		LHMap2.Block<K, V> b = map.blockList.get(block);



		if (b == null) {

			int tries = 3; // for some reason, the File constructor can be

							// asynchronous occasionally, so need to try again

							// after delay

			File f = new File(data.home, Integer.toString(block));

			do {



				if (f.exists())

					break;



				if (!f.exists()) {

					if (--tries >= 0)

						fatal(block);

					try {



						Thread.sleep(100);

					} catch (InterruptedException e) {

						e.printStackTrace();

					}



				}



			} while (true);

			try {

				ByteArrayInputStream bis = new ByteArrayInputStream(data.buf);

				FileInputStream fis = new FileInputStream(f);

				int sz = fis.read(data.buf);

				fis.close();

				addByteStats(sz);

				ObjectInputStream ois = new ObjectInputStream(bis);

				b = new LHMap2.Block<K, V>();



				b.readExternal(ois);

				ois.close();

				b.loadFromShadow(map);



				map.blockList.set(block, b);

				--data.retired;



			} catch (FileNotFoundException e) {

				e.printStackTrace();

				fatal(block);

			} catch (IOException e) {

				e.printStackTrace();

				fatal(block);

			} catch (ClassNotFoundException e) {

				e.printStackTrace();

				fatal(block);

			}



		}

		int size = map.blockList.size();



		try {

			long freeMemory = Runtime.getRuntime().freeMemory();



			long limitMemory = (long) (data.avgBlockSize * data.unloadRatio * size);



			if (block % 30 == 0)

				System.err.println("free memory =" + freeMemory + " limit "

						+ limitMemory);



			if (map.split % 20 == 19) {

				// this is just to add statistics before really needing to

				// retire

				retireActiveBlock(map, block);

				++data.retired;



			} else if (freeMemory < limitMemory) {

				for (int i = 0; i < map.blockList.size() / 4; ++i) {

					retireActiveBlock(map, block);

					++data.retired;

				}

				System.runFinalization();

				System.gc();

			}



		} catch (FileNotFoundException e) {

			e.printStackTrace();

		} catch (IOException e) {

			e.printStackTrace();

		}



	}



	private void addByteStats(int sz) {

		++data.avgCount;

		data.avgBlockSize = (int) ((data.avgBlockSize * (data.avgCount - 1) + sz) / data.avgCount);

	}



	public void retireActiveBlock(LHMap2<K, V> map, int notThisOne)

			throws FileNotFoundException, IOException {

		int pick = 0;

		int size = map.blockList.size();



		for (pick = 0; pick < size

				&& (pick == notThisOne || map.blockList.get(pick) == null); ++pick)

			;

		if (pick < size)

			retireBlock(map, pick);



	}



	private void retireBlock(LHMap2<K, V> map, int pick) throws IOException,

			FileNotFoundException {

		LHMap2.Block<K, V> b = map.blockList.get(pick);

		if (b == null)

			return;



		if (b.isChanged()) {

			File f = new File(data.home, Integer.toString(pick));

			ByteArrayOutputStream bos = new ByteArrayOutputStream();



			ObjectOutputStream oos = new ObjectOutputStream(bos);

			b.writeExternal(oos);

			oos.flush();

			oos.close();

			FileOutputStream fos = new FileOutputStream(f);

			byte[] bb = bos.toByteArray();



			fos.write(bb);

			fos.flush();

			fos.close();

			if (bb.length > data.buf.length) {

				data.buf = bb;

			}

		}

		map.blockList.set(pick, null);



	}



	private void fatal(int block) {

		Exception e = new Exception();

		try {

			throw e;

		} catch (Exception e2) {

			e2.printStackTrace();

		}

		System.err.println("block " + block

				+ " requested and it is not in blocklist and not a file");

		for (int i : data.retirees) {

			System.err.print(i + " ");

		}

		System.err.println(" were retired");

		System.exit(-2);

	}



	public static boolean metaExists(File indexDir, String name) {

		File home = new File(indexDir, name);

		return new File(home, "LinearMap2").exists();

	}



	public static <K, V> LHMap2<K, V> load(File baseDir, String name)

			throws FileNotFoundException, IOException, ClassNotFoundException {

		File home = new File(baseDir, name);



		File f2 = new File(home, "LinearMap2");

		ObjectInputStream ois = new ObjectInputStream(new FileInputStream(f2));

		LHMap2<K, V> map = (LHMap2<K, V>) ois.readObject();

		ois.close();

		loadBlocks(map);



		return map;

	}



	private static <K, V> void loadBlocks(LHMap2<K, V> map) {

		LHMap2BlockFileManager<K, V> mgr = getBlockManagerListener(map);

		int size = map.blockList.size();

		for (int i = 0; i < size; ++i) {

			mgr.blockRequested(i, map);

		}

	}



	public static <K, V> LHMap2BlockFileManager<K, V> getBlockManagerListener(

			LHMap2<K, V> map) {

		LHMap2BlockFileManager<K, V> mgr = (LHMap2BlockFileManager<K, V>) map.listeners

				.get(0);

		return mgr;

	}



	public static void save(File indexDir, String name,

			LHMap2<?, ?> offsetMap) throws FileNotFoundException, IOException {

		retireAllBlocks(offsetMap);



		File home = new File(indexDir, name);

		File f2 = new File(home, "LinearMap2");

		ObjectOutputStream oos = new ObjectOutputStream(

				new FileOutputStream(f2));

		oos.writeObject(offsetMap);

		oos.close();

	}



	private static <K, V> void retireAllBlocks(LHMap2<K, V> offsetMap)

			throws FileNotFoundException, IOException {

		LHMap2BlockFileManager<K, V> mgr = getBlockManagerListener(offsetMap);

		int sz = offsetMap.blockList.size();

		for (int i = 0; i < sz; ++i) {

			LHMap2.Block<K, V> b = offsetMap.blockList.get(i);



			if (b != null) {

				mgr.retireBlock(offsetMap, i);

			}



		}

	}

}

The "other" useful algorithm - B-trees edit

Linear hashing is a hash algorithm which is very good for key based joins , where equality between keys is required. However, a range searching index is required to answer certain queries, such as names beginning with SM ( ending in SMZ) to find all the SMITHs , finding all diabetics greater than 50 years old, finding all the patients older than 50 years old seen by Dr Z in the last 12 months who haven't had a glucose done, etc.

The algorithm is usually B-trees, and that holds for popular databases of the 1980s such as the DBase family (foxpro, paradox , etc ), to newer trendy databases such as Postgresql, Firebird, DerbyDb etc ... B Trees were introduced as an indexing algorithm in the late 1960s and no better algorithm (that is as practised in implementation) seems to be around when range searchable indexes are required, and it has been said that if a choice between hashing and B Trees was required , than the latter provides superset functionality, at the cost of loss of O(0-2) disc access time.

Linear hashing is more easier to understand, and more straight forward to map to disc storage, but B Trees are conceptually as easy to understand, just a little harder to debug .

What are the important characteristic of B-Trees ? It is basically like a binary tree, and in fact a binary tree is really a B tree where a node has only 1 entry, whereas a B Tree node may have n entries, where n is around 10-100, and often quoted as being a size where n x (size of entry) = the size of a physical disc block. The idea is to load one block into memory and search it, and if the search key isn't found, but it is in between two keys or to the left or to the right of all the keys, and there exists a pointer to another block on the left of the block, or on the right of every key in the block, then follow the pointer, until the search succeeds or no block can be followed.

The reason B Trees are used is because when a block becomes full during a key insert, it can be split into 3 parts, by dividing it into a block with all entries to the left a median key, an entry with only the median key, and a block with all entries to the right of a median key, and then passing up the entry with the median key to the parent block on return from the insert, the parent then inserts the entry into its entries, and also splits if needed. If the root splits, then the median entry becomes a single entry block which becomes the root. Over a series of inserts, this is actually a self-balancing stategy, and the tree grows in width rather than depth, so that future searches are done mainly after a block is loaded and width traversing fast in-memory search is done, instead of a less frequent, slower vertical search loading from disc a non-cached child block.

Below is a java B Tree implementation, in-memory, which can be adapted for disc based block management, much like the linear hashing above, except with a little more tracking work e.g. using explicit reference IDs for blocks instead of references to blocks, and storing blocks in an map , with null values representing non-cached blocks.

[[[1]]] Why the author gave up on binary search for now.

package btreemap;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.SortedMap;
import java.util.TreeMap;

/** can't work without setting a comparator */
public class BTreeMap<K, V> implements SortedMap<K, V> {

	private static final int NODE_SIZE = 100;
	BTBlock<K, V> root;
	Comparator<? super K> comparator;

	@Override
	public Comparator<? super K> comparator() {
		// TODO Auto-generated method stub

		return comparator;
	}

	/**
	 * 
	 * @param comparator
	 *            - this is mandatory for the tree to work
	 */
	public void setComparator(Comparator<? super K> comparator) {
		this.comparator = comparator;
		root = new BTBlock<K, V>(NODE_SIZE, comparator);
	}

	/**
	 * 
	 * @author syan
	 * 
	 * @param <K>
	 * @param <V>
	 *            the entry is associated with a right child block.
	 * 
	 */
	static class BlockEntry<K, V> {
		K k;
		V v;
		BTBlock right;

		BlockEntry() {

		}

		BlockEntry(K k, V v) {
			right = null;
			this.k = k;
			this.v = v;
		}
	}

	/**
	 * 
	 * @author syan - this represents the result of splitting a full block into
	 *         a left block, and a right block, and a median key, the right
	 *         block and the median key held in a BlockEntry structure as above.
	 * @param <K>
	 * @param <V>
	 * @param <V>g
	 */
	static class SplitRootEntry<K, V> {
		BTBlock<K, V> left;
		BlockEntry<K, V> entry;

		SplitRootEntry(BTBlock<K, V> left, BlockEntry<K, V> entry) {
			this.left = left;
			this.entry = entry;
		}

		SplitRootEntry() {
			super();
		}
	}

	/**
	 * this is used to return a result of a possible split , during recursive
	 * calling.
	 * 
	 * @author syan
	 * 
	 * @param <K>
	 * @param <V>
	 */
	static class resultAndSplit<K, V> {

		/**
		 * null , if there is no split.
		 */
		SplitRootEntry<K, V> splitNode;
		V v;

		resultAndSplit(V v) {
			this.v = v;
		}

		resultAndSplit(SplitRootEntry<K, V> entry, V v) {
			this.v = v;
			this.splitNode = entry;
		}
	}

	/**
	 * used to represent the insertion point after searching a block if compare
	 * is zero, then a match was found, and pos is the insertion entry if
	 * compare < 0 and pos == 0 , then the search ended up less than the
	 * leftmost entry else compare > 0 , and the search will be to the immediate
	 * right of pos.
	 * 
	 * @author syan
	 * 
	 */
	static class PosAndCompare {
		int pos = 0;
		int compare = 0;
	}

	static class BTBlock<K, V> {
		List<BlockEntry<K, V>> entries;
		BTBlock<K, V> leftBlock = null;
		private int maxSz = 0;

		Comparator<? super K> comparator;

		Comparator<? super K> comparator() {
			return comparator;
		}

		public BTBlock(int size, Comparator c) {
			entries = new ArrayList<BlockEntry<K, V>>();
			maxSz = size;
			this.comparator = c;
		}

		/**
		 * PosAndCompare usage: if compare is zero, then a match was found, and
		 * pos is the insertion entry if compare < 0 and pos == 0 , then the
		 * search ended up less than the leftmost entry else compare > 0 , and
		 * the search will be to the immediate right of pos.
		 * 
		 * @author syan
		 * 
		 */
		private void blockSearch(K k, PosAndCompare pc) {
			for (int i = 0; i < entries.size(); ++i) {
				pc.compare = comparator.compare(k, entries.get(i).k);
				if (pc.compare == 0) {
					pc.pos = i;
					return;
				}
				if (pc.compare < 0 && i == 0) {
					pc.pos = 0;
					return;
				}

				if (pc.compare < 0) {
					pc.pos = i - 1;
					pc.compare = 1;
					return;
				}

			}
			pc.pos = entries.size() - 1;
			pc.compare = 1;

			// binary search, it's hard to get it right !
			// int left = 0;
			// int right = entries.size();
			//
			// while (left <= right && left < entries.size()) {
			// pc.pos = (right - left) / 2 + left;
			// pc.compare = comparator().compare(k, entries.get(pc.pos).k);
			// if (pc.compare < 0) {
			// right = pc.pos-1;
			// } else if (pc.compare > 0) {
			// left = pc.pos+1;
			// } else {
			// break;
			// }
			// }

		}

		resultAndSplit<K, V> put(K k, V v) {
			V v2;
			if (entries.size() == 0) {
				entries.add(new BlockEntry<K, V>(k, v));
				return new resultAndSplit<K, V>(v);
			} else {

				PosAndCompare pc = new PosAndCompare();

				blockSearch(k, pc);

				// a match
				if (pc.compare == 0) {
					v2 = entries.get(pc.pos).v;
					entries.get(pc.pos).v = v;
					return new resultAndSplit<K, V>(v2);
				}

				BlockEntry<K, V> entry = new BlockEntry<K, V>(k, v);

				// follow leftBlock if search is to left of first entry
				if (pc.compare < 0 && pc.pos == 0 && leftBlock != null) {

					resultAndSplit<K, V> result = leftBlock.put(k, v);
					if (result.splitNode != null) {
						leftBlock = result.splitNode.left;
						entries.add(0, result.splitNode.entry);
					}

				} else if (pc.compare > 0 && entries.get(pc.pos).right != null) {
					// follow right block if it exists
					resultAndSplit<K, V> result = entries.get(pc.pos).right
							.put(k, v);
					if (result.splitNode != null) {
						entries.get(pc.pos).right = result.splitNode.left;
						entries.add(pc.pos + 1, result.splitNode.entry);

					}

				} else if (pc.compare < 0) {
					// add to the left
					entries.add(pc.pos, entry);

				} else if (pc.compare > 0) {
					// add to the right
					entries.add(pc.pos + 1, entry);

				}
				// check if overflowed block , split if it has.
				if (entries.size() > maxSz) {
					int mid = entries.size() / 2;

					// copy right half to new entries list.
					List<BlockEntry<K, V>> rightEntries = new ArrayList<BlockEntry<K, V>>();
					for (int i = mid + 1; i < entries.size(); ++i) {
						rightEntries.add(entries.get(i));
					}

					BlockEntry<K, V> centreEntry = entries.get(mid);

					BTBlock<K, V> rightBlock = new BTBlock<K, V>(maxSz,
							comparator);

					rightBlock.entries = rightEntries;

					// the median entry's right block is the new right block's
					// leftmost block
					rightBlock.leftBlock = centreEntry.right;
					// the new right block becomes the right block
					centreEntry.right = rightBlock;

					// reduce the old block's entries into its left half of
					// entries.
					for (int i = entries.size() - 1; i >= mid; --i)
						entries.remove(i);

					// create a return object, with the reduced old block as the
					// left block
					// and the median entry with the new right block attached

					SplitRootEntry<K, V> split = new SplitRootEntry<K, V>(this,
							centreEntry);

					// the keyed value didn't exist before , so null
					return new resultAndSplit<K, V>(split, null);

				}
				return new resultAndSplit<K, V>(v);
			}

		}

		V get(K k) {
			if (entries.size() == 0)
				return null;
			PosAndCompare pc = new PosAndCompare();
			blockSearch(k, pc);
			if (pc.compare == 0) {
				return entries.get(pc.pos).v;
			}

			if (pc.compare < 0 && pc.pos == 0 && leftBlock != null) {
				return leftBlock.get(k);
			} else if (pc.compare < 0 && pc.pos == 0 && leftBlock == null) {
				return null;

			} else if (pc.compare > 0 && entries.get(pc.pos).right != null) {
				return (V) entries.get(pc.pos).right.get(k);
			} else
				return null;
		}

		void getRange(SortedMap map, K k1, K k2) {
			PosAndCompare pc = new PosAndCompare();
			blockSearch(k1, pc);
			PosAndCompare pc2 = new PosAndCompare();
			blockSearch(k2, pc2);
			for (int i = pc.pos; i <= pc2.pos; ++i) {

				if (i == 0 && pc.pos == 0 && pc.compare < 0
						&& leftBlock != null) {
					leftBlock.getRange(map, k1, k2);
					// if (pc2.pos == pc.pos)
					// break;
				}

				/*
				 * add the entry if it exactly matches, or it is in range, but
				 * not if it is non-inclusive limit i.e. i == pc.pos and compare
				 * > 0
				 */

				if ((pc.compare == 0 && pc.pos == i) || i > pc.pos)
					map.put(entries.get(i).k, entries.get(i).v);

				if (entries.get(i).right != null) {
					entries.get(i).right.getRange(map, k1, k2);
				}

			}

		}

		K headKey() {
			if (leftBlock != null) {
				return leftBlock.headKey();
			}
			return entries.get(0).k;
		}

		K tailKey() {
			int i = entries.size() - 1;
			if (entries.get(i).right != null) {
				return (K) entries.get(i).right.tailKey();
			}
			return entries.get(i).k;
		}

		void show(int n) {
			showTabs(n);
			for (int i = 0; i < entries.size(); ++i) {
				BlockEntry<K, V> e = entries.get(i);
				System.err.print("#" + i + ":(" + e.k + ":" + e.v + ")  ");
			}
			System.err.println();
			showTabs(n);

			if (leftBlock != null) {
				System.err.print("Left Block\n");
				leftBlock.show(n + 1);
			} else {
				System.err.println("No Left Block");
			}

			for (int i = 0; i < entries.size(); ++i) {
				BlockEntry<K, V> e = entries.get(i);
				showTabs(n);
				if (e.right != null) {

					System.err.println("block right of #" + i);
					e.right.show(n + 1);
				} else {
					System.err.println("No block right of #" + i);
				}
			}
			showTabs(n);
			System.err.println("End of Block Info\n\n");
		}

		private void showTabs(int n) {
			// TODO Auto-generated method stub
			for (int i = 0; i < n; ++i) {
				System.err.print("  ");
			}
		}

	}

	@Override
	public SortedMap<K, V> subMap(K fromKey, K toKey) {
		TreeMap<K, V> map = new TreeMap<K, V>();
		root.getRange(map, fromKey, toKey);
		return map;
	}

	@Override
	public SortedMap<K, V> headMap(K toKey) {
		// TODO Auto-generated method stub
		return subMap(root.headKey(), toKey);
	};

	@Override
	public SortedMap<K, V> tailMap(K fromKey) {
		// TODO Auto-generated method stub
		return subMap(fromKey, root.tailKey());
	}

	@Override
	public K firstKey() {
		// TODO Auto-generated method stub
		return root.headKey();
	}

	@Override
	public K lastKey() {
		// TODO Auto-generated method stub
		return root.tailKey();
	}

	@Override
	public int size() {
		// TODO Auto-generated method stub
		return 0;
	}

	@Override
	public boolean isEmpty() {
		// TODO Auto-generated method stub
		return false;
	}

	@Override
	public boolean containsKey(Object key) {
		// TODO Auto-generated method stub
		return get(key) != null;
	}

	@Override
	public boolean containsValue(Object value) {
		// TODO Auto-generated method stub
		return false;
	}

	@Override
	public V get(Object key) {
		// TODO Auto-generated method stub
		return root.get((K) key);
	}

	@Override
	public V put(K key, V value) {
		resultAndSplit<K, V> b = root.put(key, value);
		if (b.splitNode != null) {
			root = new BTBlock<K, V>(root.maxSz, root.comparator);
			root.leftBlock = b.splitNode.left;
			root.entries.add(b.splitNode.entry);
		}
		return b.v;
	}

	@Override
	public V remove(Object key) {
		// TODO Auto-generated method stub
		return null;
	}

	@Override
	public void putAll(Map<? extends K, ? extends V> m) {
		// TODO Auto-generated method stub

	}

	@Override
	public void clear() {
		// TODO Auto-generated method stub

	}

	@Override
	public Set<K> keySet() {
		// TODO Auto-generated method stub
		return null;
	}

	@Override
	public Collection<V> values() {
		// TODO Auto-generated method stub
		return null;
	}

	@Override
	public Set<java.util.Map.Entry<K, V>> entrySet() {
		// TODO Auto-generated method stub
		return null;
	}

}

Example application of health data querying : finding patients overdue for HbA1c > 6 months, = edit

Using the above (linear hashing with block persistence) data structure for permanent storage assisted indexes ^*1, this example program was applied for the above clinical use. Interestingly, the results of the query identified patients who had diabetes listed as a condition, along with co-morbid diagnoses, and atomized pathology HbA1c data when present, but often the patient listed under each practitioner was either a non-regular patient , had questionable diabetes diagnoses , and the query didn't examine whether patients were being currently managed elsewhere, other clinic or endocrinologist.

An exercise might be to reformulate the query as "group patients under practitioners, the patients who have a diabetes related diagnosis, and existing path atoms for HbA1c, but whose last HbA1c was more than 6 months". This would at least select patients with diagnoses diabetes who probably at some stage were being monitored by the consulting practitioners.

^*1 BTreeMap was used as an alternative to ordinary red-black TreeMap class , as proof of concept, and didn't actually overflow memory in the dataset used. Linear Hashing was used to implement the ur_no indexing of tables via the MapperImpl1 class. This is required for tables with large number of rows, such as PathAtom, and certainly Progress, as the indexes are too large to fit in the virtual Java environment running on an ordinary 2G Windows desktop office machine.

package dbfmap.linux.test;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Collection;
import java.util.Comparator;
import java.util.Date;
import java.util.GregorianCalendar;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.ListIterator;
import java.util.Map;
import java.util.SortedMap;
import java.util.SortedSet;
import java.util.TreeMap;
import java.util.TreeSet;
import java.util.Map.Entry;

import btreemap.BTreeMap;
import dbfmap.structs.DBFMemoPair;
import dbfmap.structs.DBFMemoPair.Iterator;
import dbfmap.use.Mapper;
import dbfmap.use.MapperImpl1;

public class FindDiabeticOverdue {
	private static final int MAXINTERVALMONTHS_NOHBA1C = -6;
	final static String RESULT = "PATHOL";
	final static String RESULT_NAME_FIELD = "TESTNAME";
	final static String RESULT_DATE = "REPORTDATE";
	static String[] crossRefConditions = new String[] { "DIAB", "HEART", "MYOCARD", 
		"HYPERT", "HYPERC", "HYERL", "ANGIN", "CORON", "ISCHEM", "VASCUL" , "GOU", "METABO", "OBES" };
	
	private BTreeMap<Date, List<String>> btree;

	private Map<String, DBFMemoPair> dbfmap;

	final DateFormat dateFormat = SimpleDateFormat
			.getDateInstance(SimpleDateFormat.SHORT);

	public Map<String, DBFMemoPair> getDbfmap() {
		return dbfmap;
	}

	public FindDiabeticOverdue() throws IOException, ClassNotFoundException {
		init();
	}

	void init() throws IOException, ClassNotFoundException {
		Mapper m = MapperImpl1.getInstance();

		dbfmap = m.getAccessObjectMap();
		DBFMemoPair pairPat = dbfmap.get("PATIENTS");
		DBFMemoPair accessResult= dbfmap.get(RESULT);
		Iterator iResult = accessResult.iterator();
		Map<String, Object> vv;

		btree = new BTreeMap<Date, List<String>>();
		btree.setComparator(new Comparator<Date>() {

			@Override
			public int compare(Date arg0, Date arg1) {
				// TODO Auto-generated method stub
				return arg0.compareTo(arg1);
			}

		});

		Calendar calendarMaxIntervalNoHbA1c = GregorianCalendar.getInstance();
		calendarMaxIntervalNoHbA1c.add(Calendar.MONTH, MAXINTERVALMONTHS_NOHBA1C);

		// SortedMap before = btree.tailMap(c.getTime());

		DBFMemoPair pairHist = dbfmap.get("HISTORY");

		Iterator iHistory = pairHist.iterator();
		Map<String, Object> vh;

		BTreeMap<String, List<String>> bTreeConditionUrNos = new BTreeMap<String, List<String>>();

		while ((vh = iHistory.readNextValues()) != null) {
			String condition = (String) vh.get("CONDITION");
			List<String> l = bTreeConditionUrNos.get(condition);
			if (l == null) {
				l = new LinkedList<String>();
			}
			l.add((String) vh.get("UR_NO"));
			bTreeConditionUrNos.put(condition, l);
		}
		SortedMap<String, List<String>> diabetesUrNos = bTreeConditionUrNos.subMap("DIAB", "DIABZ");
		SortedSet<String> patientUrNosWithDiabetes = new TreeSet<String>();
		for (String s : diabetesUrNos.keySet()) {
			System.out.println(s + diabetesUrNos.get(s).size() + " patients");
			patientUrNosWithDiabetes.addAll(diabetesUrNos.get(s));

		}

		Map<String, Map<String, Object>> urNoToLastHbA1c = new HashMap<String, Map<String, Object>>();
		for (String urno : patientUrNosWithDiabetes) {
			List<Map<String, Object>> resultsOnePatient = accessResult.getRecords("UR_NO", urno);
			for (Map<String, Object> result : resultsOnePatient) {
				String testName = (String) result.get(RESULT_NAME_FIELD);
				if (testName.startsWith("GLYCATED")) {
					Map<String, Object> lastResult = urNoToLastHbA1c.get(urno);
					if (lastResult == null
							|| ((Date) result.get(RESULT_DATE)).after((Date) lastResult
									.get(RESULT_DATE))) {
						urNoToLastHbA1c.put(urno, result);

					}
				}

			}
		}

		Map<String, Map<String, Object>> overdue = new TreeMap<String, Map<String, Object>>();

		for (String urno : urNoToLastHbA1c.keySet()) {
			Map<String, Object> lastResult = urNoToLastHbA1c.get(urno);
			Date d = (Date) lastResult.get(RESULT_DATE);
			if (d.before(calendarMaxIntervalNoHbA1c.getTime())) {
				overdue.put(urno, lastResult);

			}
		}

		List<String> noHbA1c = new ArrayList<String>(patientUrNosWithDiabetes);
		noHbA1c.removeAll(urNoToLastHbA1c.keySet());
		DateFormat f = SimpleDateFormat.getDateInstance(DateFormat.SHORT);
		System.out.println("Those with No HbA1c =" + noHbA1c.size());
		
		TreeMap<String, List<String> > responsibleTable = new  TreeMap<String, List<String> > ();
		for (String urno : noHbA1c) {
			Seen seen = getResponsible(urno);
			List<String> drsPatients = responsibleTable.get(seen.whoMost);
			if ( drsPatients == null ) { 
				
				drsPatients = new ArrayList<String>() ; 
			
				responsibleTable.put(seen.whoMost,drsPatients); 
			}
			drsPatients.add(urno);
		}
		
		for (String urno : overdue.keySet()) {
			
			Seen seen = getResponsible(urno);
			
			List<String> drsPatients = responsibleTable.get(seen.whoMost);
			
			if ( drsPatients == null ) { 
				drsPatients = new ArrayList<String>() ; 
				responsibleTable.put(seen.whoMost,drsPatients); 
			}
			drsPatients.add(urno);
		}
		
		for ( String dr : responsibleTable.keySet()) {
			System.out.println("***********************  List for " + dr);
			for ( String urNo : responsibleTable.get(dr)) {
				List<Map<String, Object>> shouldBeOnePatientMap = pairPat.getRecords("UR_NO", urNo);
				if (shouldBeOnePatientMap == null || shouldBeOnePatientMap.size() != 1) {
					System.err
							.println("ERROR with UR_NO for finding patient list : "
									+ urNo
									+ (shouldBeOnePatientMap == null ? " list " : "sz="
											+ (String.valueOf(shouldBeOnePatientMap.size()))));
					if (shouldBeOnePatientMap == null || shouldBeOnePatientMap.size() == 0)
						continue;
				}
				Map<String, Object> patient = shouldBeOnePatientMap.get(0);
				System.out.println(((String) patient.get("SURNAME")).trim() + ", "
						+ ((String) patient.get("FIRSTNAME")).trim() + ", DOB:"
						+ f.format(patient.get("DOB")));
				DBFMemoPair atoms = dbfmap.get("PATHATOM");
				List<Map<String, Object>> pathAtomsForOnePatient = atoms.getRecords("UR_NO", urNo);
				boolean printedTest = false;
				for (Map<String, Object> at2: pathAtomsForOnePatient ) {
					String n = (String)at2.get("TESTNAME");
					if ( n.toUpperCase().indexOf("A1C") >= 0 ) {
						printedTest = true;
						System.out.print( ", " + f.format(at2.get("RESULTDATE")) + " " + at2.get("RESULT").toString().trim() );
					}
				}
				if ( printedTest)
					System.out.println();
				
				List<Map<String, Object>> conditions = pairHist.getRecords("UR_NO", urNo);
				boolean printedCond = false;
				for ( Map<String, Object> condition : conditions ) {
					
					String conditionName = ((String)condition.get("CONDITION")).trim();
					
					for ( String keyPart : crossRefConditions)  
						if ( conditionName.indexOf(keyPart)>= 0) {
							if (!printedCond) System.out.print("Conditions: ");
							System.out.print( "  " + conditionName +":"+condition.get("YEAR"));
							printedCond = true;
							break;
						}
					 
					
				}
				if (printedCond)
				System.out.println( );
			}
			
			System.out.println("**************** end of list for " + dr);
			
		}
		
//		for (String s : noHbA1c) {
//			List<Map<String, Object>> l = pairPat.getRecords("UR_NO", s);
//			if (l == null || l.size() != 1) {
//				System.err
//						.println("ERROR with UR_NO for finding patient list : "
//								+ s
//								+ (l == null ? " list " : "sz="
//										+ (String.valueOf(l.size()))));
//				if (l == null || l.size() == 0)
//					continue;
//			}
//			Map<String, Object> patient = l.get(0);
//			System.out.println(((String) patient.get("SURNAME")).trim() + ", "
//					+ ((String) patient.get("FIRSTNAME")).trim() + ", DOB:"
//					+ f.format(patient.get("DOB")));
//			List<Map<String, Object>> patConditions = pairHist.getRecords(
//					"UR_NO", s);
//			for (Map<String, Object> c2 : patConditions) {
//				if (((String) c2.get("CONDITION")).indexOf("DIABE") >= 0) {
//					System.out.println("\t" + (String) c2.get("CONDITION")
//							+ c2.get("YEAR"));
//				}
//			}
//			Seen seen = showResponsible(s);
//
//		}
//
//		System.out.println("\n\nOVERDUE HbA1c ? = " + overdue.size());
//		
//		for (String s : overdue.keySet()) {
//			Map<String, Object> res2 = overdue.get(s);
//			List<Map<String, Object>> l = pairPat.getRecords("UR_NO", s);
//			Map<String, Object> x = l.get(0);
//			System.out.println(((String) x.get("SURNAME")).trim() + ", "
//					+ ((String) x.get("FIRSTNAME")).trim() + ", DOB:"
//					+ f.format(x.get("DOB")));
//			
//			System.out.println("\t" + f.format((Date)res2.get("REPORTDATE")) + " "+
//					(String)res2.get(RESULT_NAME_FIELD) ); 
//			DBFMemoPair atoms = dbfmap.get("PATHATOM");
//			l = atoms.getRecords("UR_NO", s);
//			for (Map<String, Object> at2: l ) {
//				String n = (String)at2.get("TESTNAME");
//				if ( n.toUpperCase().indexOf("A1C") >= 0 ) {
//					System.out.print( "," + f.format(at2.get("RESULTDATE")) + " " + at2.get("RESULT") );
//				}
//			}
//			System.out.println();
//			showResponsible(s);
//		}

	}

	private Seen showResponsible(String s) throws IOException,
			ClassNotFoundException {
		Seen seen = getResponsible(s);
		System.out.println("Dr " + seen.whoMost + " saw " + seen.most
				+ " times, out of total " + seen.total + "\n");
		return seen;
	}

	static class Seen implements Comparable<Seen>{
		int total;
		int most;
		String whoMost;

		public Seen(int t, int m, String who) {
			total = t;
			most = m;
			whoMost = who;
		}

		@Override
		public int compareTo(Seen o) {
			// TODO Auto-generated method stub
			return whoMost.compareTo(o.whoMost);
		}
	}

	private Seen getResponsible(String s) throws IOException,
			ClassNotFoundException {
		// TODO Auto-generated method stub
		DBFMemoPair pairPat = dbfmap.get("PROGRESS");
		List<Map<String, Object>> l = pairPat.getRecords("UR_NO", s);
		TreeMap<String, Integer> countDr = new TreeMap<String, Integer>();
		int total = 0;
		for (Map<String, Object> m : l) {
			String dr = (String) m.get("DR");
			if (dr.indexOf("RECEPT") >= 0 || dr.indexOf("PRACTIT") >= 0)
				continue;
			++total;
			Integer c = countDr.get(dr);
			if (c == null)
				countDr.put(dr, 1);
			else
				countDr.put(dr, c + 1);
		}
		TreeMap<Integer, String> ranking = new TreeMap<Integer, String>();
		for (String d : countDr.keySet()) {
			ranking.put(countDr.get(d), d);
		}
		Entry<Integer, String> e = ranking.lastEntry();
		return new Seen(total, e.getKey(), e.getValue());
	}

	/**
	 * @param args
	 * @throws IOException
	 * @throws ClassNotFoundException
	 */
	public static void main(String[] args) throws IOException,
			ClassNotFoundException {
		// TODO Auto-generated method stub
		FindDiabeticOverdue fdo = new FindDiabeticOverdue();
	}

}

Conclusion edit

This case study illustrates that the Computer Science aphorism of "tradeoff" also applies to increasing the sophistication of technology used represent computer based medical records. One trades the marketed benefits of high level technology features of enterprise level SQL DBMSs for vendor lock-in, lack of software extendability, and the risk of costs of expensive data extraction needed if vendors decide to close shop. Arguably, this would not occur if medical record system vendors didn't decide to remove all the features of the underlying database management system, and make opaque, the access of data tables and data table schemas. It shows that scalable database management is just a few pages of code , a good algorithm, and an open computer language system away , and in people terms, one could just employ a local computer science graduate to indefinetely provide open source services , and another graduate as quality assurance to ensure source code readability , if the current programmer decides not to offer services anymore.

It might be argued that maintaining the small scale and cottage industry status of general practice computer medical records undermines the needs of standardization of medical informatics , and portability of electronic medical records between primary care and hospital medical care. But by maintaining the open , readable and extensible architectures based on well known , easily understood but scalable algorithms, the counter argument is that justified simplification assists the adaptation to future needs of medical codification , because the software is not opaque to all but a few dominant vendors.