How to Write a Program/Collected Real-life Experiences

Why this book edit

Good practices and life experiences are never captured in books. Most computer science books are API walk-through. This book is DIFFERENT!

Build scripts edit

Try to set up "one-button testing". That makes it much more convenient to type a little, then hit that button which

saves the file you just edited,
compiles the application with all the appropriate options (such as "-fno-emit-frame", if necessary), and
runs a few quick tests, to make sure your "improvements" don't accidentally break something else.

Spending an hour to code up a few tests and set up one-button testing may *seem* like it is more hassle than it is worth. Manually compiling the file, and manually run through the various parts of the application to make sure they work, may take far less than an hour. But trust me, most programs you are going to edit-compile-test many, many times. And a year from now, when you make just one tiny little change, wouldn't you much rather push-the-button and be done with it, rather than

manually compile the file
manually run through the application and see that it suddenly it doesn't work any more
pull out your hair until
hours later, you remember you needed to include "-fno-emit-frame"
manually re-compile the file, this time with "-fno-emit-frame"
start testing all over from the beginning.

Wikipedia has related information at Continuous integration

There are lots of ways to set up an automated build system.

One-button testing is just one part of the continuous integration recommended by some programmers.

Even lawyers can see the advantages of automated build scripts. ^[1]

Scripting in general edit

This shouldn't just apply to build scripts. If you ever find yourself repeating a few lines, throw it into a script or batch file.

Learning to use Cygwin^[2] bash scripts if you are a windows programmer is often worth while. They are much more flexible than windows batch files. Also look into AutoHotKey^[3].

For that matter, don't be afraid to build yourself tools of any sort! You are a programmer, if you are working with a lot of metadata, make a metadata editor. If your build file has too many options to remember and they change a lot, write a GUI for it. Every step you can skip is one step you can't screw up--and a single screw-up can cost as much time as it took you to build your script or tool.

Another reason for scripts and tools is knowledge retention/exchange. You can tell a co-worker how to do the build 4 times until he remembers each step, or you can hand him the script that you use. Ftorkou besaha

Database Access edit

Concurrency issues edit

Don't use table locking! Its flawed! Use Transactions, in JDBC set the isolationlevel of the connection.

Errors edit

Error handling edit

Always remember to use the finally block for closing resources! Otherwise exceptions being thrown may result in many open connections

Be careful with empty error handlers... Error handles should NEVER be empty, but this is hard to enforce in Java because of checked exceptions. For instance, Java's sleep method throws a checked exception so you MUST catch it. Very few people actually use this and don't care, so they wrap it with an empty handler. Because of this, checked exceptions are considered an anti-pattern or just broken.

It is extremely difficult to track down bugs when the stack trace was hidden because someone thought they were "Unimportant" or messy, or someone else caught too broad an exception and hid the trace that would have made your bugfix a piece of cake.

Error tracking edit

Whenever you output anything as an error message, prefix it with e.g. new Date() or similar so you are able to track back later on. Adding the module name to the error message is also useful. Try to build your error handling as part of your testability. Don't rely only on the debugger to find your code errors.

Your time is well spent setting up a logging framework such as Log4J. Do note that certain configurations of the logging software are not suitable for production code due to the efficiency loss during the logging functionality. Do some iterative testing of various setting combinations to get an idea of the speed loss, then decide on the optimal combination to use.

Time Yourself edit

Many programmers are awful at estimating how long adding some feature to a program will take. Practice, practice, practice. Even if no one asks you how long it will take, keep a personal log of when you started adding that feature, when you (estimate) you will be finished, and when you (actually) finished. Even if you have totally no idea, at least write down when you started and when you stopped. Every time, you will learn something new about how long it takes to do that sort of task.

Comments! edit

It is amazing how many programmers will just NOT write comments, no matter how many different sets of coding standards recommend (or even mandate) that code be properly commented.

Comments are extremely important to the maintenance phases of a software project. Anyone who has been required to analyze, modify, or maintain someone else's code will attest to the fact that without comments the job requires much more effort, time, and money than to do the same task on properly commented code. Don't forget that you may, yourself, end up being responsible for this task at some (much) later date.

It is just as important not to over-comment as it is not to under-comment. Adding comments to the end of every line of code is total overkill. Instead, assume the reader has some familiarity with the language, and restrict the comments to descriptions of the functionality of a block of code.

Whenever you are looking at code (even your own) if you have the slightest trouble understanding it, it needs commenting. If you have a lot of trouble understanding it, it needs refactoring.

You may find it more comfortable to comment code once you are done coding a method — go back to the top and walk through each step. This also forces a small review of the code. Commenting before you code is also good — create a basic flow in comments first then insert code between the comments. Just don't forget to sanity-check your comments when you are done, often I'll only have a few of my original comments left when I'm done.

Another important concept is "self-documenting code". This means you use the names of functions, variables, classes, and methods to communicate the ideas of your code so that you don't have to write many comments. If you name a method with the action it performs, it will be evident to the reader what it is going to do; for example, use a method named determineIfStackIsEmpty() rather than just doing a comparison such as if( size == 0 ).

Don't neglect file headers in the code modules. These are important to provide a synopsis of the operation of the code, as well as authoring and licensing information. Most editor programs open code modules with line one of the file at the top of the screen; making the top section of the file contain a block comment which specifies what that module does is very helpful during maintenance.

Commenting is not a difficult discipline to master; in fact, it is quite easy to create a set of templates with standard comment forms for several languages, then copy the template and add in the code and appropriate comment details when starting a new module. Several Integrated Development Environment tools, such as Eclipse or X-Code, can be configured to do this automagically when a new source file is created.

Finally, it may be helpful on a medium- to large-scale project to use an automated documentation tool, such as Doxygen, JavaDocs, or Autodoc, to create hypertext-linked documentation for the modules in the project. This provides all the members of the project with the same standard documentation so that everyone can know the availability and operational details of all code modules in the project.

To get a good idea of some usable code-based documents, check out the Java source code in the libraries. Each Class has a comment block showing general class usage, each external method has a comment block describing usage and gotchas, and each non-trivial method has inline (non-javadoc) comments describing what is going on for each block of code (often for each line). Over all there are probably 5-10 lines of comments per line of code, but the method comments are more verbose than needed unless you are building a library for public use.

(The other thing you'll notice when reading through Sun's source code is that most methods have 1-3 lines. Good OO design looks like "Magic"--you think there must be a huge routine laying around somewhere doing all the "Real Work", but you never see it, it's just these little 1-3 line routines all the way to the center.)

If you are slowed down by typing comments (even 3 lines of comments per line of code) you are ether severely in need of a typing class or you are typing faster than you are thinking! Please take time with your code--don't just slam something out, check it in and go to the next. Review and rewrite comments as you refactor your first pass of code. You may think you are doing what your boss wants by slamming out code changes and bugfixes as fast as you can type, but what you are actually doing is (in my experience) getting your entire team fired due to extreme bug count and maintainability problems.

Configuration Management edit

Configuration Management, also known as "CM" (or sometimes "SCM" for "Software CM"), is a very misunderstood topic. Although there is some change in attitude happening lately, the prevalent thinking seems to be that CM is a "necessary evil", to be tolerated but not actively participated in, at least not more than absolutely required.

Change is unavoidable when computer software is constructed, and change increases the level of confusion among the developers who are working on the project. Confusion arises when the modifications are not analyzed before they are performed, written down before they are made, controlled in a way that improves quality and reduces mistakes, or properly reported to the people that should be aware of them.

There are several reasons for changes being required. First, the customer may levy one or more new requirements, or may make requests to change existing requirements. This can be driven by design reviews, re-engineering of applications, or even schedule and budgetary constraints. Second, changes in business conditions or the market for the application may direct that changes are required. Third, the business environment may grow or change, which may change the project priorities or the structure of the customer or supplier engineering team. The thing to remember is that as time goes on, all stakeholders come to know the system more intimately; this increase in knowledge is what drives the great majority of the requested/required modifications to the software. Thus it is a fact, which many software engineers and project managers have a hard time accepting, that most changes to software are justified!

Configuration management deals with the multi-version features of a software engineering project. CM is a set of activities that have been developed to assist in management of the changes that occur throughout the life cycle of the software project. In a typical project, the deliverable system comprises many different files and directories, and many complex relationships may develop between them. The problem is exacerbated by the need for frequent modifications during the development. The end results of poor CM include:

Finding/fixing the same error multiple times (inconsistency between versions)
Inconsistency between documentation and code
Loss of documentation
Loss of code

In a software system, a configuration includes all the "work products" associated with the system. This means not only the system that is actually delivered to the customer, but also the documentation and support code maintained by the developer. A configuration may also include different versions of executable files for different platforms, and may have associated defects and tasks that govern the assignment of modules to the members of the programming team.

To understand the problems connected with CM, it is important to grasp the enormous number of objects that are part of a characteristic software project. Even a small-sized system may contain more than a hundred files. For realistically-sized systems, this number may grow to thousands or even tens of thousands. Problems associated with CM can be similar in scope to the ones faced by a modern library. Without careful inventory controls, many of the books in the library would be misplaced, mis-shelved, stolen, or even lost.

Another thing to remember is that an effective CM plan is part of any software development effort that claims to adhere to the principles of Capability Maturity Model® Integration (CMMI). This system, created by the Software Engineering Institute (SEI) at Carnegie Mellon University, provides a process improvement approach to software development. It has 5 levels, the top three of which all include some sort of documented CM process.

CVS/Subversion edit

Subversion (SVN) seems to have surpassed CVS in functionality (see TortoiseSVN). However, CVS has a quite nice interface WinCVS.

Optimizations edit

You may have heard that "Premature optimizations are the root of all evil". It is true. I suggest you follow these guidelines.

Always write unoptimized code first.
If your application is too slow, run an analyzer to figure out just where.
Re-write the indicated code to be more optimized, but keep the old code in a comment.
RETEST and if your application is not noticeably faster, rip out the optimization.

Note, this is not an excuse for ignorance! As a programmer you MUST understand basic concepts like if you are doing an insertion sort, choose a linked-list over an array! This is not an optimization, it's programming!

See Optimizing Code for Speed for many ways to make programs run faster.

Avoid code duplication edit

My number one rule of all time is never ever repeat code. Even one or two similar lines could cause problems. Never copy and paste. A set of code with absolutely no redundancy is called "Fully Factored". Fully Factored code is MUCH easier to deal with than code created with too much Copy and Paste.

This is also referred to as the "DRY" principle: Do not Repeat Yourself.

I'd go so far as to say that you could differentiate languages by how difficult it was to avoid repeated code.

Identify Repeated Code edit

Copy and Pasters will generally copy a block and paste it multiple times, Then go through and change a few constants to suit the needs of the new block of code. These are pretty easy to see, simply scan your code for repeated patterns of line lengths.

Extract unique data from repeated blocks of code edit

Often the solution is to extract the constants into an array, remove all but one copy of the offending code, then iterate over the remaining copy of the code.

This often leads to other opportunities for code reduction. For instance, when the changing "constant" in the code is a function/method call (a different method name at the same spot in each set of duplicated code), you may find a lot of redundancy in the called methods. This will also often lead to the creation of new objects and reusable structures of data that you didn't have before. A good optimization will often kick off a whole series of refactoring and you may find yourself removing so much code that it's embarrassing.

Certain language structures seem to just draw programmers to code duplications. Setting up GUI components, for instance, can be extremely repetitive. This is a good place to look for code reduction opportunities. Java requires that each MenuItem object has a class defined. Often you will see this implemented as a screen of code doing nothing but creating MenuItem objects. It can be just as easily done (MUCH easier actually) by defining a set of data and writing a small routine to create all the MenuItem classes. At this point you find that suddenly an action like adding buttons for each menu item that may have been terrifying before the refactor is now trivial.

Fully factoring code often requires setting up some data and iterating over it. In Java, setting up an array of data in code is quite concise. Later, array data could be very easily externalized, much easier than the original C&P code. To set up an array in Java, get used to this simple syntax (other languages should have similar constructs):

String[] data = new String[] {"Opt1", "Opt2", "Opt3"};

Strings are common, but you could also use int or long arrays--the syntax is identical. I often use Object arrays to pair an integer with a string: new Object[] {"One", 1, "Two", 2}... (This will work in Java 5, before Java 5 you might be better off with two arrays. The trick of these arrays is to keep the syntax simple and brief.

Often control signals are needed in your data as well. Don't be afraid to include required signals in your code. For instance, if you were automatically generating a set of menus, you need to identify which menus are top level and which are sub-items. A list like this should be all the data you need:

"^", "File", "Load", "Save", "^", "Edit", "Copy", "Paste"

This data can be fed into a simple loop (a few lines at most) to create your entire menu structure.

The Next Step, create a new class! edit

These data structures created from refactoring will generally end up with pairs of data, or triplets--or worse. Any time you have pairs or sets of data that want to be grouped together you should probably be defining a custom object to hold them. I know that sounds extreme, but try it. You will suddenly find that you needed that object all along. You should quickly find yourself moving code that used to be static utility methods (bad code smell) into your new class and it will have that perfect-fit good code smell.

You can create a string or object array and iterate over it to create your custom objects, but a better way is to actually create your objects in the array:

Here's a somewhat tricky one that I have used in the past:

  MyClass[] primaryData=new MyClass[] {
    new MyClass("File", top),
    new MyClass("Save", "Save the file", "saveFunc"),
    new MyClass("Load", "Load the file", "loadFunc"),
    new MyClass("Edit", top),
    new MyClass("Copy", "Copy the selection", "copyFunc", "isTextSelectedFunc"),
  };

This uses a few tricks. First of all, there are multiple constructors. The top is (string, int) so that differentiates it and the system knows to create a new top-level menu item from this. All the following (without the int "top") added after it will become members of that menu. The rest of the constructors take strings--one string constructor takes 3 strings, the other takes 4.

The first string is obviously a menu name. The second is a tooltip, the third is the name of the function (in the originating object) to call (This is done through reflection). The fourth, if it exists is assumed to be a boolean method name that returns a T/F and can disable that menu item depending on the return value.

This code isn't exact, it was just an example. To really implement it, MyClass needs access to a few other things (like the calling object), but it gives you an idea.

Using this structure to implement a typical menu with 20 or 30 entries could replace many screens of C&P code. Also, you don't have to deal with the ugly "Inner class" syntax. You do have to deal with the ugly reflection syntax, but that's buried in the MyClass object and only needs to be dealt with once (ever).

You might also note that all your strings are in one place now. This means it would be extremely easy to replace those few lines of code with a data-driven routine. That might be a next step, it might not. I don't recommend trying to jump straight to data-driven.

Note on code sample edit

I coded it that way to make a point about storing classes in an array. If you were to really do it, I'd suggest creating one instance of MyClassHolder and passing in the calling class so MyClassHolder can make the reflective calls to it--then instead of "new" for each subsequent MyClass line, call a method in MyClassHolder that news MyClass for you and stores the instances inside itself. Leave a message in discussion tab if you'd like to see this really implemented. Also I could have distinguished "top" level items by the fact that they have a single parameter but I was trying to show a method that you can use if you can't differentiate each method by their parameters alone.

Warning: Don't key off visible strings edit

Notice that there is some redundancy in the data in the above example.

new MyClass("Save", "Save the file", "saveFunc"),

saveFunc could be calculated from Save. for instance you could just say the save function has to be named "doXFunc. so creating a "Save" option would automatically call doSaveFunc.

This usually works at first and seems like a nice trick, but in my experience--I've always regretted tying my internal name to externally displayed text, or using externally displayed text as any sort of a "Key". Eventually someone will want to implement it in a different language or something. I'm all for brevity but in this case it's just not worth it.

Warning: Be Very Careful with your data edit

Whenever you switch to data driven code most of your life will become much easier, however certain types of debugging become a lot harder--remember it is your job now to check the data.

Clearly define the data so it can be changed.
Be extremely picky about the data, allowing only the structure and values you have defined
Check references within your data (if they exist).
Fail LOUDLY with a clear explanation of the problem.
When users ignore your explanation and say it's your code's fault without reading it, ask them what you could have done to make them read the message and fix their problem. Put everything you learn back into the code.
If the data is complex enough, consider implementing an editor.
Data can easily be stored in .properties files (annoying), XML or a Database--or any number of other ways, pick the one easiest for you--.properties files make relating associated data difficult, and Databases almost require an editor. XML is a good balance.

Be SURE to validate the demarcation between your data and the rest of the program. For instance, in the example above check each reflective string AT LOAD TIME. In fact, you should instantiate the Method object as you load the MyClass object and store the method object instead of reflecting each time anyway.

Keep in mind that a spelling error in your data can be trivial to detect if you write error checking code like this--virtually impossible otherwise.

Anonymous Inner Classes edit

Anonymous inner classes is another case that leads to C&P. For instance, whenever the value of a control is changed you may want to validate the form. Worst case: the entire validation method is C&P into each anon listener. Better case: Each listener calls a "Validation method".

Fix edit

All the anon listeners should be similar, if not identical.

Create an inner class (not anonymous) that extends your basic listener type (ActionListener, ...) Put the code from one of the anonymousinner classes in here. If all the inner classes were identical, you can make the class stateless (no internal variables).

If it's stateless, create a single instance and pass it into each place where the listener was created. You're done. You probably cut your file size by 1/3 to 2/3 depending on what else was going on. You probably also eliminated 2 or 3 typo-related bugs in the process.

If your anonymous inner classes were slightly different, you should be able to pass in a variable when you instantiate the inner class, and use that variable to control the differences. If you do this, you will have to create more than one instance of the class--one for each different type. I still recommend you make them immutable, otherwise you need to create an instance for each listener.

If they have some differences, you could create a base class listener and inherit to a few sub-classes--just as you would with any OO code. There is no reason to treat listeners differently just because you almost always see them as anons.

Finally, if they are very different, just create 2 or 3 different classes. You can leave any anonymous inner classes that are unique alone, but where the code in the listeners are identical or very similar, please try to combine them.

Function/method size edit

Try to keep functions/methods down to one screen. You'll fail a lot, but it's a good goal. You should start to feel really uncomfortable if you see a function creep up to 2 or 3 screens in length.

It's much easier to deal with a bunch of small functions than one big one.

Please don't do this by daisy-chaining though. To simply break a function in the middle just because it's getting too long is even worse.

As I said in the Comments section, good OO code looks like it's not doing anything--anywhere. I don't think I've ever seen a method in the Java classes longer than a screen (after removing comments), and a huge majority are 1-3 lines of code. I just scanned Hashtable. Not a single method exceeds a screen (say, 25 lines or so) and most are smaller--all are very small and focused on solving 1 problem.

Each method or function should do only one thing. This functional breakdown makes it easier to manage and can provider a level of abstraction that was probably unrealized during the planning process. Generally, if the code runs longer than 40 lines, you may be doing to much in one step. If you feel that the functionality should not be exposed to other developers, set the method to private.

Public Variables/Variable Scope edit

I have come to the conclusion that every variable should be private. Always. Sometimes I'll implement a trivial class--rectangle or even "Pair" and make the variables public so I don't have to write setters/getters. I've always regretted it eventually.

Even protected variables are annoying. Write a simple protected getter if you must (Getters and setters can be a bad idea too--google "Getters and Setters are evil" for a great article.)

Passing a mutable object around can be as bad as a global. Be very careful with mutable objects (most Java objects like String are immutable and therefore much safer)