back to page 3

Program Evaluation Criteria


As noted on page 2, we want an objective method to choose among design alternatives, and to evaluate this program or its parts against existing programs. For a complex program such as this we take multiple measurable features or parameters and convert them to a common scale. The features chosen are the ones of value to the ultimate customer, which here is civilization as a whole, and the conversion formulas are according to their relative importance and desirability. Since we cannot ask everyone in the world what they want, we have to act as a proxy for them and make our best estimates of what they would want if they were well informed on this topic. We can draw on outside information to help with this process. Each design alternative will have different actual feature values, and result in different scores when converted to the common scale. Evaluation of the alternatives then amounts to adding all the scores and seeing which one has the highest total. For this step of the Conceptual Design we can only establish what our criteria should be. Devising alternatives and selecting among them will come later.

Identifying Candidate Criteria


Every possible measurable feature or specification could be used as a point of comparison, but this is unfeasible for two reasons: (1) the time and complexity to evaluate all of them for every alternative, and (2) many features are simply undefined at a high level of a program. So we restrict our candidates to those which are the most important, and those which can at least be estimated at a high level. Any feature which is absolutely required for the program is not a point of comparison, since all the valid alternatives must include it. What that leaves us with are parameters which are variable in some way, so that an alternative can do better or worse by that measure. An example of a good measure is cost. Every alternative has an associated cost, which is almost infinitely variable, and so useful for comparisons. Most people will agree that lower cost is better, although they may differ by how much better. The general agreement that cost is important and which direction is better allows setting up a formula to convert particular cost ranges to score values.

We already (page 3) developed our program requirements based on program goals. By designating them as requirements we have indicated they are important, so those are the first places we should look for measurable criteria. Beyond that, we will also look at the same headings we used for the requirements analysis process, and consider civilization needs and desires generally, as we can find them from outside sources.

Program Requirements


Referring to the Program Requirements on page 3, we identify our first set of candidate criteria as follows:

1.0 Objectives:

  • 1.1 Program Goal - This sets the fundamental goal of the program, which must be met. With no room for variation it provides no candidate criteria.
  • 1.2 Program Scale - This sets a requirement of number of people on Earth and in space to be supported. The intent is to demonstrate permanent occupation and use of a location, but this can be done with a different number of people than specified. This makes a good candidate.
  • 1.3 Choice - At this level it merely requires choice by participants and residents, but does not specify how much. Since we could define degrees of choice, this can be a candidate at this level or lower levels. Criteria applied only at lower levels are either an absolute requirement at the program level, or summed somehow to a higher level result.

2.0 Performance:

  • 2.1 Number of Locations - The requirement is to "maximize", but does not state a specific number. This is an excellent candidate because it has an inherent range. An unbounded number of locations would tend to result in an unbounded cost, so this criterion would need to be balanced against cost criteria.
  • 2.2 Growth - The requirement is to increase capacity progressively, but without a number specified. This is another good evaluation criterion candidate. We can define measures for absolute capacity, increment size, and growth rates.
  • 2.3 Improved Technology - Like growth, technology levels are required to increase progressively, but without specified values. We can define ranges for this parameter and weigh it against other criteria.
  • 2.4 Improved Quality of Life - This sets a lower threshold of top 10% of Earth civilization, but no upper bound. The specific physical and social measures will make good criteria.
  • 2.5 Data - The goal is to collect and disseminate an unknown amount of data. This is inherently a variable parameter, so we will include it as a candidate.
  • 2.6 Resources - We have a requirement of 100% surplus of resource needs. Since a design might fall short or exceed this level, it is a good candidate.

As a note, performance levels of a complex program are typically a rich area for finding evaluation criteria, since they tend to be variable with design choices.

3.0 Schedule:

  • 3.1 Completion Time - The goal here is to complete a location before technology renders it obsolete and a redesign or replacement is indicated. We can analyze this and set a specific time range for completion as a criterion.

4.0 Cost:

  • 4.1 Total Development Cost - Like performance, cost is another rich area for evaluation criteria, since cost is a barrier in the physics and social sense to a program getting approval. The ratio of performance to cost is a particularly popular measure of effectiveness. This is a good candidate.
  • 4.2 New Location Cost - This sets the net project cost less than half the long term net output. This is an explicit effectiveness measure, and makes a particularly good candidate criterion.
  • 4.3 Earth Launch Cost - This is another progressive improvement goal with a variable value and an aggressive target, so it makes a good candidate criterion. At lower levels it can be divided into component measures for launch cost/kg and percentage mass from Earth.

5.0 Technical Risk

  • 5.1 Risk Allowances - The requirement is to include allowances for technical risk in the design. Since the margins for unknown design factors is already included in performance and cost estimates, it would be double-counting to also evaluate those margins directly as a criterion. A smaller weighted criterion would be reasonable for the size of the margin, as less uncertainty in the design would be considered better than more uncertainty.

6.0 Safety

  • 6.1 New Location Risk - This is internal risk to the program contents and people. As a variable parameter, it is a good selection criterion.
  • 6.2 Population Risk - This is external risks outside the program including natural hazards already in place. Again, it is a variable parameter, and thus a good candidate.

7.0 Sustainability

  • 7.1 Biosphere Security - Security in and of itself is not a measurable parameter, but diversity of locations and species in alternate locations is a candidate.
  • 7.2 Survivability - This is a requirement for long term survival. A measure would be the rate of compensation vs expected time for a problem to become critical.

8.0 Openness

  • 8.1 Open Design - The program will either be open or not, so this is not a good candidate. We will assume that all design alternatives will be open.
  • 8.2 Access - This is another fixed requirement for reasonable access, which will be included in any design option, therefore not an evaluation measure.

This gives us a good starting list of candidate criteria. In early conceptual design many of the values will be undetermined for a given option. When not enough information is available to make a clear choice, the proper course is to keep multiple options until you have enough information to decide.

Additional Candidate Sources

  • Program Goals and Benefits - Looking back at these on page 1, we find they are well represented by the requirements and don't immediately present new candidate criteria.
  • Systems Engineering Experience - Reviewing the Systems Engineering section (1.5 of this book) does not present new program level criteria. Many of the requirements categories will apply at lower levels.
  • External Constraints - Limits imposed by the natural environment or human rules are not variable parameters, they must be met. Therefore they are not candidates to compare options.
  • Internal Requirements Classes - We review the paragraphs on page 2, and note that "Improved Quality of Life" needs more specific measures. For "Understanding the Earth" we note that data requirements should trace to useful results which can be acted upon, though how to measure that is problematic. Under "Biosphere Security", ability to counteract changes can be a variable measure. Under "Expanding Resources" we can measure the increase in economically viable available resources, including physical space.
  • Design Approach - From the list on page 2, we can identify percentage of: closed cycle, local resources, self production, and reduced human and remote control inputs as variable components of the improved technology criterion (2.3). For number of locations (2.1) we can list temperature, rainfall, pressure, gravity, and radiation levels as environment parameters to expand upon. Time parameters include communications, travel, and stay times. Energy parameters include potential of location, and available flux from natural sources. Another possible measure is how many parameters and by what amount is their range increased per location, or maximum total increase. For all these criteria, existing state of civilization is the baseline level.
  • Program Concept - This does not appear to have new criteria.

General Needs and Desires

NOTE: Improvement Needed This section is preliminary, and an opportunity for improvement.

Here are some ideas about general human/civilization needs and desires. The idea is that since civilization as a whole is the "customer" for this program, it is their needs and desires for which we should be designing, and therefore include these types of items in the evaluation criteria. The following items are drawn from online search, and are not yet backed rigorously or empirically. Until better defined, we are not using them as evaluation criteria.

  • Maslow's Hierarchy of Needs - This is the psychological theory originated in 1954 that people address more basic needs before higher level ones. The general levels are (1) Physiological, (2) Safety, (3) Love/Belonging, (4) Esteem, and (5) Self-Actualization. This theory has been criticized. A recent paper by Tay and Diener may shed some empirical light on this topic.
  • States of Being - The urge to be part of something larger than oneself, engagement with an item or activity, improved personal productivity or life, positive being or enjoyment, personal well-being/meaning/fulfillment.
  • Categories of needs and desires - Mental/intellectual, emotional, and physical: see Applied Empathy
  • List of emotional needs - The need for, or need to be: accepted, accepting,accomplished, acknowledged, admired, alive, amused, appreciated, appreciative, approved of, attention, capable, challenged, clear (not confused), competent, confident, developed, educated, empowered, focused, forgiven, forgiving, free, fulfilled, grown or growing, happy, heard, helped, helpful, important, in control, included, independent, interested, knowledgeable, listened to, loved, needed, noticed, open, optimistic, privacy, productive, protected, proud, reassured, recognized, relaxed, respected, safe, satisfied, secure, significant, successful, supported, treated fairly, understanding, understood, useful, valued, and worthy. Everyone feels these needs in different amounts.
  • Look at empirical behavior to find out what people really want, vs what they say they want when interviewed. For example, working hours as income goes up between countries, over historical time, or within economic groups could be extrapolated to find out what people would do if work was not required.

Selecting and Weighting Candidates


Now that we have established a set of candidates, preliminary as it is, the next step is to choose the most important ones, and establish relative scoring weights and conversion formulas. The weight is how much a given criterion contributes to the total score of a given design. The more important that feature or parameter is, the more weight we give it. The conversion formula takes the feature or parameter and converts it to a relative score, usually with a nominal 0 to 100% range, or 0.00 to 1.00 value. The range is arbitrary, as long as it is consistent within a project. We will use the 0 to 100% range. The total score is obtained from the weight x score for each component criterion, then summing all the resulting products.

We want to narrow the list to the most important ones because it takes too much time and effort to evaluate many criteria for every design option. One way to do this is by making some of them fixed requirements, which all options must meet. Another is to simply drop the candidate as not important enough at the top program level. Whatever list of criteria is developed through this process, they should be reviewed by the rest of the program participants and customer as best is possible. Setting criteria and importance is inherently subjective, so it is very important to get agreement and consensus from the right group of subjective humans that these are the right set to use in evaluating the design.

Candidate Discussion


The following discussion gives our reasoning for the conversion from program parameter or feature to a score. We use the same numbering as for the requirements to make them easier to compare. Since not every requirement has corresponding evaluation criteria, there are gaps in the numbering

  • 1.2 Program Scale - Studies from animal populations indicate below 100 members, long term viability is threatened. Human social research gives Dunbar's Number for cohesive groups, ranging from 30 up to about 2500 for various types of groups. We therefore set 100 people per location as a minimum goal, gaining a score of 0%, with each factor of e above that gaining 25% in score. Thus 5460 people per location yields a 100% score. Similarly, we set the total for all locations on Earth at 5000 for a score of 0%, and 273,000 to 100%. We have a preference for a diversity of locations rather than one big one, since the overall goal is to expand to a series of multiple more difficult ones. Therefore we will weight the total population goal 1.5 times as heavily as the per location goal.
  • 2.1 Number of locations - Humans already exist in a range of environments. We will define the range where 90% of the population already lives as the baseline, and increments of at least 10% beyond that in at least one parameter as "new". Although 5% of the population already lives at each extreme, when combined with other requirements for improved technology, it leads to new designs. Parameters which can fall to zero, like pressure or temperature have linear increments down to zero. With unbounded values the 10% increments are compounded ratios. Note that environment parameters are for the external environment, not the internal living and working conditions. The following parameters have been identified so far:
Environment temperature - range of daily high and low across seasons in Kelvin (K).
Water supply - annual rainfall + running water/ice/air moisture flow in meters.
Atmosphere pressure - average value at the location, in kiloPascals (kPa).
Ground pressure - foundation design load in megaPascals (MPa) or exterior water or rock pressure for below-surface construction.
Energy supply - Flux from natural sources in W/m2.
Gravity level - only applies to space, in meters/sec2.
Radiation dose - measured by human biological effect in Sievert/year.
Time and Distance: These are measured from 5th %ile nearest population on Earth.
Ping Time - minimum round trip communication delay, in seconds.
Travel Time - one way normal travel time for humans.
Stay time - average per person stay time per location in years. Increments count linearly from zero.
Transport energy - total potential, kinetic, and frictional energy to reach the location for most efficient cargo method, in megaJoules/kg.

The two criteria we derive are the actual number of distinct new locations the program establishes, and the total number of range steps expanded to by all the parameters combined. The first is measured directly at 1% per location, and the second at 0.5% per range step.

  • 2.2 Growth - The fastest growing national GDPs are about 10% per year, so we will give that a score of 75% here. Average GDP growth is about 4%, so we will adopt 5% as a score of 25%. We will use the average of industrial output, resident population, and transport capacity to measure "output" of a location, and growth rates relative to the final size of the location. So the minimum and maximum rates of 2.5 and 12.5% imply 40 and 8 years respectively to grow to final size, and sets minimum initial size as 1/40 of final size.
  • 2.3 Improved Technology - These are measured at the program level across all locations. The percentage of local resource use, self production, and cyclic mass flow all scale directly from 0 to 100%. Automation is measured in reduction of human labor hours per output relative to current technology. Autonomy is the percentage of internal human labor and control at the locations relative to the total required for the locations to function. These are both scored directly in percent.
  • 2.4 Quality of Life - We will set a nominal GDP per person of $60,000 in 2012 as the 25% score level. This is about the average for the top 10% of world population. There are many other potential quality of life criteria, but for simplicity we will use just this one for now. The highest single country GDP is Monaco at about $180,000, so we will set that to 100%, and scale linearly.
  • 2.5 Data - At this point of the conceptual design we do not know enough about data to set it as a criterion, therefore we put it aside for now.
  • 2.6 Resources - Our nominal requirement is 100% surplus, and more is better. No surplus we assign a score of 0%, since it is not meeting the desire to improve life outside itself. We give each doubling of output a 25% increase in score, thus 200% output (100% surplus) scores 25% and 15 times surplus (16 times output) scores 100%. We are choosing a local measure of resources relative to the program rather than global availability. If civilization as a whole wants to increase its resources, it can copy the programs examples.
  • 3.1 Completion Time - This seems to duplicate 2.2 Growth as far as setting an overall time to reach a final size. For the present we will not list it as a separate criterion.
  • 4.1 Total Development Cost - For terrestrial locations we can set one-time (non-recurring) development cost in the range of 10 to 100 times the unit cost, on the principle that multiple copies of locations will eventually be built, and the one-time cost will be distributed. For space locations we expect fewer copies, and that some key technologies will have been previously developed for Earth. Thus a range of 1 to 10 times unit cost is more reasonable. Since lower cost is better, we invert the development/unit cost ratio and multiply by 1000 and 100% respectively to get a score scaled to 100%. For difficulty of the location beyond temperate, we allow 10% new development cost for any environment parameter step above those previously used. For resident capacity we scale by ln(actual/nominal) size of 75 people, to account for larger or smaller elements.
  • 4.2 New Location Cost - This is the explicit unit cost per new location relative to total output. As written, the requirement overlaps with 2.6 Resources, so we instead use an absolute cost per person, with the US total capital per person set to 50% score. Each factor of 2 up or down adjusts the score by 25%. For space locations we allow twice as much capital cost per person. We allow for the difficulty of the location by adding 10% linearly for each environment parameter step beyond the temperate range.
  • 4.3 Earth Launch Cost - This cost is the transport to LEO component for space locations in $/kg of total system mass. This includes mass obtained locally in space. Because both actual transport to orbit and use of local mass (currently 0%) can be greatly improved, we use a steep scoring function. A current baseline of $1600/kg gets a score of 0%, and each factor of 10 reduction will score 20%. Thus the requirement goal of $0.08/kg will score 86%.
  • 5.1 Risk Allowance - Less variation is better, so we score this on an inverted scale. A technical risk margin due to design uncertainty of 50% would score zero, and a margin of 0% (which is only reached with completed and tested designs) would score 100%. More advanced technology may give better potential performance, but with more uncertainty. This can be reduced by development and testing, but that is in the future. Present uncertainty is used to evaluate this criterion.
  • 6.1 New Location Risk - The goal is significantly lower internal risk, although higher risks are acceptable as a temporary measure while setting up. We scale this as equal to current general population risk gets a 50% score, and each doubling of risk lowers the score by 25%, each halving of risk increases the score by 25%.
  • 6.2 Population Risk - These are risks to the external general population from program or natural causes. Because the whole world is affected, it is difficult for one program to have much effect, so we give this a narrow scoring function. The program acts more as a demuonstration that the risk reduction is possible, and civilization can exert itself if desired to do more. Each 5% reduction to existing population risk is worth 25% score. We base no change to population risk at 0% score since any increase in total risk is generally considered unacceptable for a new program.
  • 7.1 Biosphere Security - Maintaining biospheres outside their natural environmental range increases security by having backups and the ability to survive transient disruptions. Zoo breeding populations of endangered species and seed banks are examples of existing programs of this type. It is difficult to say how much of this activity is enough, so we will somewhat arbitrarily score total number of species x locations. For each factor of 10 increase starting with 10 we will add 20% to the score. Thus 100,000 species in 10 locations is 1 million total species-locations, and would score 100%.
  • 7.2 Survivability - Like population risk, a single program cannot guarantee long term survival by itself, so we set a narrow scoring function. For each 5% compensation for long term change and resource depletion the program reaches during its life it gets a 25% score. For a change like the Earth overheating due to the Sun, which might take millions of years, only the change which occurs during the program duration (perhaps 50 years) is being compensated for. For critical resources, only those without which civilization cannot function are considered. Ones with reasonable alternatives are not critical. Compensation can be by active measures, like shading the Earth from overheating, or by alternatives like moving to other planets.

Weighting Discussion


Next we discuss our reasoning for the relative weights of the criteria. We will use a total weight of 100 points for all the criteria together. Our weighting is subjective, based on human opinion as to the importance of design features and parameters. Design alternatives themselves are objective. So other people reviewing the program choices can simply change the scoring and weighting to fit their own opinions on what matters. The design alternatives can remain the same, but a different set of choices would result from the changed evaluation scores.

Most people, all other factors being equal, prefer to get more results relative to the cost or effort expended. Since the ratio of performance to cost can be equally affected by increasing performance or lowering cost, the relative weight of these criteria groups is often set to be equal, and a large part of the total weight. In our list above, the performance type criteria are from 1.2 through 2.6, and the cost type criteria are 4.1 through 4.3. The remaining criteria fall into the technical risk, safety, and sustainability categories. Historically, large complex programs tended to focus on cost and performance, and put relatively small weight on other factors. With feelings of a finite and connected Earth, longer lives, and greater wealth and standards of living, people feel there is more to lose and so place more importance on the possible negatives of a project. We expect this trend to continue in the future, and this will be a long term program, so we will assign 30 points to this group, and divide the other 70 points equally between performance and cost.

Performance Group (35 points)

Scale (1.2) and number of locations (2.1) are the main motivations for a program of human expansion, so we will assign each 7.5 points. There does not seem to be strong reason to give growth, improved technology, quality of life, or resources (2.2-2.4, 2.6) more or less weight among themselves, so we assign 5 points to each.

Cost Group (35 points)

Development (4.1) and new location (4.2) cost seem equally important, but Earth launch cost only applies to space locations, so we give it half the weight. Therefore the weights are 14, 14, and 7 points each.

Technical Risk, Safety, and Sustainability Group (30 points)

We subjectively rate location (6.1) and population (6.2) risk more highly than the remaining factors, and thus give them 7.5 points each. Technical risk (5.1), biosphere security (7.1), and survivability (7.2) then get 5 points each.

Resulting Evaluation Criteria


From the above discussion, we can now make a table of the resulting evaluation criteria to apply to our design options. Note that in some cases scores can go outside 0 to 100% range if the parameter is outside the expected range. Like all parts of the conceptual design, this may get revised by later work.

Criterion Weight (points) Scoring Formula (percent) Notes
1.2 Program Scale (per location) 3.0 ln(average population per location/100) x 25% Population is final design size for location after growth
1.2 Program Scale (total all locations) 4.5 ln(total population all locations/5000)x25% Population is total design size after growth
2.1 Number of locations (count) 3.75 actual count of locations > minimum size @ 1% each Minimum size = final size/years to grow to final size
2.1 Number of locations (range) 3.75 steps in environment, time, and distance range @ 0.5% each 10 parameters and definition of steps from discussion 2.1 above
2.2 Growth (rate/yr) 5.0 (equivalent % annual GDP growth of all locations -2.5%) x 10 internal production valued as if sold at market rates
2.3 Improved Technology (local resources) 1.0 % of local resources from program locations by kg (mass) or Joules (energy)
2.3 Improved Technology (self production) 1.0 % of finished products from program locations by economic value
2.3 Improved Technology (cyclic flow) 1.0 % of location mass flows reused includes propellants, but not production for growth or sale
2.3 Improved Technology (automation) 1.0 % reduction human labor hours relative to current technology
2.3 Improved Technology (autonomy) 1.0 % required labor and control from within locations based on necessary location functions
2.4 Quality of Life (GDP) 5.0 (equivalent GDP - $20,000)/1600 includes value of internal production and labor
2.6 Resources (surplus) 5.0 ln(material & energy output/internal use)/ln(2) x 25% over program life cycle. Clip at -100%
4.1 Total development cost (Earth) 14.0 - S (avg unit cost/total development cost) x 1000% S = 14 x (space/total) development cost
4.1 Total development cost (Space) S (avg unit cost/total development cost) x 100% see above for S
4.2 New Location Cost (Earth) 14.0-S2 [(ln(0.25xUS capital per person/location cost))/ln(2) x 25%]+100% $200K includes land value for US capital. S2 see below
4.2 New Location Cost (Space) S2 [(ln(0.5 US capital per person/location cost))/ln(2) x 25%]+100% S2 = 14 x (people in space/total in program)
4.3 Earth Launch Cost ($/kg) 7.0 log($1600/(LEO transport per total system mass)) x 20% total mass includes local space resources
5.1 Technical Risk Allowance (%) 5.0 (50% - technical uncertainty allowance) x 2 includes performance and design uncertainty
6.1 New Location Risk (relative) 7.5 [ln(0.25x general casualty risk/location risk)/ln(2) x 25%] +100% casualty risk includes life and property
6.2 Population Risk (relative) 7.5 (% reduction to general population risk) x 5 from natural and program causes. Increased risk not allowed.
7.1 Biosphere Security (species-locations) 5.0 [(log(species maintained outside natural range x locations)) - 1] x 20% in vivo or stored, humans are a species
7.2 Survivability (relative) 5.0 (% compensation for critical risks) x 5 includes all civilization level risks
Total 100 Sum partial scores x weight from each line above

continue to page 5