HANDBOOK OF GRANULAR COMPUTING Edited by Witold Pedrycz University of Alberta, Canada and Polish Academy of Sciences, Warsaw, Poland
Andrzej Skowron Warsaw University, Poland
Vladik Kreinovich University of Texas, USA
A
Publication
HANDBOOK OF GRANULAR COMPUTING
HANDBOOK OF GRANULAR COMPUTING Edited by Witold Pedrycz University of Alberta, Canada and Polish Academy of Sciences, Warsaw, Poland
Andrzej Skowron Warsaw University, Poland
Vladik Kreinovich University of Texas, USA
A
Publication
C 2008 Copyright
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England Telephone (+44) 1243 779777
Email (for orders and customer service enquiries):
[email protected] Visit our Home Page on www.wileyeurope.com or www.wiley.com All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to
[email protected], or faxed to (+44) 1243 770620. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Other Wiley Editorial Offices John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA JosseyBass, 989 Market Street, San Francisco, CA 941031741, USA WileyVCH Verlag GmbH, Boschstr. 12, D69469 Weinheim, Germany John Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, Australia John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #0201, Jin Xing Distripark, Singapore 129809 John Wiley & Sons Canada Ltd, 6045 Freemont Blvd, Mississauga, ONT, L5R 4J3 Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Library of Congress CataloginginPublication Data Pedrycz, Witold, 1953– Handbook of granular computing / Witold Pedrycz, Andrzej Skowron, Vladik Kreinovich. p. cm. Includes index. ISBN 9780470035542 (cloth) 1. Granular computing–Handbooks, manuals, etc. I. Skowron, Andrzej. II. Kreinovich, Vladik. III. Title. QA76.9.S63P445 2008 006.3–dc22 2008002695 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 9780470035542 Typeset in 9/11pt Times by Aptara Inc., New Delhi, India Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire
Contents Preface
ix
Foreword
xiii
Biographies
Part One 1
Fundamentals and Methodology of Granular Computing Based on Interval Analysis, Fuzzy Sets and Rough Sets
Interval Computation as an Important Part of Granular Computing: An Introduction Vladik Kreinovich
xv
1 3
2
Stochastic Arithmetic as a Model of Granular Computing Ren´e Alt and Jean Vignes
33
3
Fundamentals of Interval Analysis and Linkages to Fuzzy Set Theory Weldon A. Lodwick
55
4
Interval Methods for NonLinear Equation Solving Applications Courtney Ryan Gwaltney, Youdong Lin, Luke David Simoni, and Mark Allen Stadtherr
81
5
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing Witold Pedrycz
97
6
Measurement and Elicitation of Membership Functions ¨ ¸en Taner Bilgic¸ and I˙. Burhan Turks
7
Fuzzy Clustering as a DataDriven Development Environment for Information Granules Paulo Fazendeiro and Jos´e Valente de Oliveira
141
153
8
Encoding and Decoding of Fuzzy Granules Shounak Roychowdhury
171
9
Systems of Information Granules Frank H¨oeppner and Frank Klawonn
187
10
Logical Connectives for Granular Computing Erich Peter Klement, Radko Mesiar, Andrea Mesiarov´aZem´ankov´a, and Susanne SamingerPlatz
205
vi
Contents
11
Calculi of Information Granules. Fuzzy Relational Equations Siegfried Gottwald
225
12
Fuzzy Numbers and Fuzzy Arithmetic Luciano Stefanini, Laerte Sorini, and Maria Letizia Guerra
249
13
RoughGranular Computing Andrzej Skowron and James F. Peters
285
14
Wisdom Granular Computing Andrzej Jankowski and Andrzej Skowron
329
15
Granular Computing for Reasoning about Ordered Data: The DominanceBased Rough Set Approach Salvatore Greco, Benedetto Matarazzo, and Roman Slowi´nski
347
A Unified Approach to Granulation of Knowledge and Granular Computing Based on Rough Mereology: A Survey Lech Polkowski
375
16
17
A Unified Framework of Granular Computing Yiyu Yao
401
18
Quotient Spaces and Granular Computing Ling Zhang and Bo Zhang
411
19
Rough Sets and Granular Computing: Toward RoughGranular Computing Andrzej Skowron and Jaroslaw Stepaniuk
425
20
Construction of Rough Information Granules Anna Gomoli´nska
449
21
Spatiotemporal Reasoning in Rough Sets and Granular Computing Piotr Synak
471
Part Two
Hybrid Methods and Models of Granular Computing
22
A Survey of IntervalValued Fuzzy Sets Humberto Bustince, Javier Montero, Miguel Pagola, Edurne Barrenechea, and Daniel G´omez
23
Measurement Theory and Uncertainty in Measurements: Application of Interval Analysis and Fuzzy Sets Methods Leon Reznik
489 491
517
24
Fuzzy Rough Sets: From Theory into Practice Chris Cornelis, Martine De Cock, and Anna Maria Radzikowska
533
25
On Type 2 Fuzzy Sets as Granular Models for Words Jerry M. Mendel
553
26
Design of Intelligent Systems with Interval Type2 Fuzzy Logic Oscar Castillo and Patricia Melin
575
vii
Contents
27
Theoretical Aspects of Shadowed Sets Gianpiero Cattaneo and Davide Ciucci
603
28
Fuzzy Representations of Spatial Relations for Spatial Reasoning Isabelle Bloch
629
29
Rough–Neural Methodologies in Granular Computing Sushmita Mitra and Mohua Banerjee
657
30
Approximation and Perception in EthologyBased Reinforcement Learning James F. Peters
671
31
Fuzzy Linear Programming Jaroslav Ram´ık
689
32
A Fuzzy Regression Approach to Acquisition of Linguistic Rules Junzo Watada and Witold Pedrycz
719
33
Fuzzy Associative Memories and Their Relationship to Mathematical Morphology Peter Sussner and Marcos Eduardo Valle
34
Fuzzy Cognitive Maps E.I. Papageorgiou and C.D. Stylios
Part Three 35
Applications and Case Studies
Rough Sets and Granular Computing in Behavioral Pattern Identification and Planning Jan G. Bazan
733
755
775
777
36
Rough Sets and Granular Computing in Hierarchical Learning Sinh Hoa Nguyen and Hung Son Nguyen
801
37
Outlier and Exception Analysis in Rough Sets and Granular Computing Tuan Trung Nyuyen
823
38
Information Access and Retrieval Gloria Bordogna, Donald H. Kraft, and Gabriella Pasi
835
39
Granular Computing in Medical Informatics Giovanni Bortolan
847
40
Eigen Fuzzy Sets and Image Information Retrieval Ferdinando Di Martino, Salvatore Sessa, and Hajime Nobuhara
863
41
Rough Sets and Granular Computing in Dealing with Missing Attribute Values Jerzy W. GrzymalaBusse
873
42
Granular Computing in Machine Learning and Data Mining Eyke H¨ullermeier
889
viii
43
Contents
On Group Decision Making, Consensus Reaching, Voting, and Voting Paradoxes under Fuzzy Preferences and a Fuzzy Majority: A Survey and a Granulation Perspective Janusz Kacprzyk, Sl awomir Zadro˙zny, Mario Fedrizzi, and Hannu Nurmi
907
44
FuzzJADE: A Framework for AgentBased FLCs Vincenzo Loia and Mario Veniero
931
45
Granular Models for TimeSeries Forecasting Marina Hirota Magalh˜aes, Rosangela Ballini, and Fernando Antonio Campos Gomide
949
46
Rough Clustering Pawan Lingras, S. Asharaf, and Cory Butz
969
47
Rough Document Clustering and The Internet Hung Son Nguyen and Tu Bao Ho
987
48
Rough and Granular CaseBased Reasoning Simon C.K. Shiu, Sankar K. Pal, and Yan Li
1005
49
Granulation in AnalogyBased Classification Arkadiusz Wojna
1037
50
Approximation Spaces in Conflict Analysis: A Rough Set Framework Sheela Ramanna
1055
51
Intervals in Finance and Economics: Bridge between Words and Numbers, Language of Strategy Manuel Tarrazo
52
Granular Computing Methods in Bioinformatics Julio J. Vald´es
Index
1069
1093
1113
Preface In Dissertio de Arte Combinatoria by Gottfried Wilhelm Leibniz (1666), one can find the following sentences: ‘If controversies were to arise, there would be no more need of disputation between two philosophers than between two accountants. For it would suffice to take their pencils in their hands, and say to each other: “Let us calculate” ’ and in New Essays on Human Understanding (1705) [1], ‘Languages are the best mirror of the human mind, and that a precise analysis of the signification of words would tell us more than anything else about the operations of the understanding.’ Much later, methods based on fuzzy sets, rough sets, and other soft computing paradigms allowed us to understand that for calculi of thoughts discussed by Leibniz, it is necessary to develop tools for approximate reasoning about vague, noncrisp concepts. For example, human is expressing higher level perceptions using vague, nonBoolean concepts. Hence, for developing truly intelligent methods for approximate reasoning about such concepts in twovalued accessible for intelligent systems languages should be developed. One can gain in searching for solutions of tasks related to perceptions by using granular computing (GC). This searching in GC becomes feasible because GCbased methods use the fact that the solutions satisfy nonBoolean specifications to a satisfactory degree only. Solutions in GC can often be constructed more efficiently than in the case of methods searching for detailed, purely numeric solutions. Relevant granulation leads to efficient solutions that are represented by granules matching specifications to satisfactory degrees. In an inductive approach to knowledge discovery, information granules provide a means of encapsulating perceptions about objects of interest [2–7]. No matter what problem is taken into consideration, we usually cast it into frameworks that facilitate observations about clusters of objects with common features and lead to problem formulation and problem solving with considerable acuity. Such frameworks lend themselves to problems of feature selection and feature extraction, pattern recognition, and knowledge discovery. Identification of relevant features of objects contained in information granules makes it possible to formulate hypotheses about the significance of the objects, construct new granules containing sample objects during interactions with the environment, use GC to measure the nearness of complex granules, and identify infomorphisms between systems of information granules. Consider, for instance, image processing. In spite of the continuous progress in the area, a human being assumes a dominant and very much uncontested position when it comes to understanding and interpreting images. Surely, we do not focus our attention on individual pixels but rather transform them using techniques such as nonlinear diffusion and group them together in pixel windows (complex objects) relative to selected features. The parts of an image are then drawn together in information granules containing objects (clusters of pixels) with vectors of values of functions representing object features that constitute information granule descriptions. This signals a remarkable trait of humans that have the ability to construct information granules, compare them, recognize patterns, transform and learn from them, arrive at explanations about perceived patterns, formulate assertions, and construct approximations of granules of objects of interest. As another example, consider a collection of time series. From our perspective we can describe them in a semiqualitative manner by pointing at specific regions of such signals. Specialists can effortlessly interpret ECG signals. They distinguish some segments of such signals and interpret their combinations.
x
Preface
Experts can seamlessly interpret temporal readings of sensors and assess the status of the monitored system. Again, in all these situations, the individual samples of the signals are not the focal point of the analysis and the ensuing signal interpretation. We always granulate all phenomena (no matter if they are originally discrete or analog in their nature). Time is another important variable that is subjected to granulation. We use milliseconds, seconds, minutes, days, months, and years. Depending on specific problem we have in mind and who the user is, the size of the information granules (time intervals) can vary quite dramatically. To the highlevel management, time intervals of quarters of year or a few years can be meaningful temporal information granules on basis of which one develops any predictive model. For those in charge of everyday operation of a dispatching plant, minutes and hours could form a viable scale of time granulation. For the designer of highspeed integrated circuits and digital systems, the temporal information granules concern nanoseconds, microseconds, and, perhaps, milliseconds. Even such commonly encountered and simple examples are convincing enough to lead us to ascertain that (a) information granules are the key components of knowledge representation and processing, (b) the level of granularity of information granules (their size, to be more descriptive) becomes crucial to problem description and an overall strategy of problem solving, (c) there is no universal level of granularity of information; the size of granules is problem oriented and user dependent. What has been said so far touched a qualitative aspect of the problem. The challenge is to develop a computing framework within which all these representation and processing endeavors can be formally realized. The common platform emerging within this context comes under the name of granular computing. In essence, it is an emerging paradigm of information processing that has its roots in Leibnitz’s ideas [1] in Cantor’s set theory, Zadeh’s fuzzy information granulation [8], and Pawlak’s disovery of elementary sets [9] (see also [10–14]). While we have already noticed a number of important conceptual and computational constructs built in the domain of system modeling, machine learning, image processing, pattern recognition, and data compression in which various abstractions (and ensuing information granules) came into existence, GC becomes innovative and intellectually proactive in several fundamental ways:
r The information granulation paradigm leads to formal frameworks that epitomize and synthesize what has been done informally in science and engineering for centuries.
r With the emergence of unified frameworks for granular processing, we get a better grasp as to the role of interaction between various, possibly distributed, GC machines and visualize infomorphisms between them that facilitate classification and approximate reasoning. r GC brings together the existing formalisms of set theory (interval analysis), fuzzy sets, and rough sets under the same roof by clearly visualizing some fundamental commonalities and synergies. r Interestingly, the inception of information granules is highly motivated. We do not form information granules without reason. Information granules are an evident realization of the fundamental paradigm of scientific discovery. This volume is one of the first, if not the first, comprehensive compendium on GC. There are several fundamental goals of this project. First, by capitalizing on several fundamental and wellestablished frameworks of fuzzy sets, interval analysis, and rough sets, we build unified foundations of computing with information granules. Second, we offer the reader a systematic and coherent exposure of the concepts, design methodologies, and detailed algorithms. In general, we decided to adhere to the topdown strategy of the exposure of the material by starting with the ideas along with some motivating notes and afterward proceeding with the detailed design that materializes in specific algorithms, applications, and case studies. We have made the handbook selfcontained to a significant extent. While an overall knowledge of GC and its subdisciplines would be helpful, the reader is provided with all necessary prerequisites. If suitable, we have augmented some parts of the material with a stepbystep explanation of more advanced concepts supported by a significant amount of illustrative numeric material. We are strong proponents of the downtoearth presentation of the material. While we maintain a certain required level of formalism and mathematical rigor, the ultimate goal is to present the material so
xi
Preface
that it also emphasizes its applied side (meaning that the reader becomes fully aware of direct implications of the presented algorithms, modeling, and the like). This handbook is aimed at a broad audience of researchers and practitioners. Owing to the nature of the material being covered and the way it is organized, we hope that it will appeal to the wellestablished communities including those active in computational intelligence (CI), pattern recognition, machine learning, fuzzy sets, neural networks, system modeling, and operations research. The research topic can be treated in two different ways. First, as one the emerging and attractive areas of CI and GC, thus attracting researchers engaged in some more specialized domains. Second, viewed as an enabling technology whose contribution goes far beyond the communities and research areas listed above, we envision a genuine interest from a vast array of research disciplines (engineering, economy, bioinformatics, etc). We also hope that the handbook will also serve as a highly useful reference material for graduate students and senior undergraduate students in a variety of courses on CI, artificial intelligence, pattern recognition, data analysis, system modeling, signal processing, operations research, numerical methods, and knowledgebased systems. In the organization of the material we followed a topdown approach by splitting the content into four main parts. The first one, fundamentals and methodology, covers the essential background of the leading contributing technologies of GC, such as interval analysis, fuzzy sets, and rough sets. We also offer a comprehensive coverage of the underlying concepts along with their interpretation. We also elaborate on the representative techniques of GC. A special attention is paid to the development of granular constructs, say, fuzzy sets, that serve as generic abstract constructs reflecting our perception of the world and a way of an effective problem solving. A number of highly representative algorithms (say, cognitive maps) are presented. Next, in Part II, we move on the hybrid constructs of GC where a variety of symbiotic developments of information granules, such as intervalvalued fuzzy sets, type2 fuzzy sets and shadowed sets, are considered. In the last part, we concentrate on a diversity of applications and case studies. W. Pedrycz gratefully acknowledges the support from Natural Sciences and Engineering Research Council of Canada and Canada Research Chair program. Andrzej Skowron has been supported by the grant from the Ministry of Scientific Research and Information Technology of the Republic of Poland. Our thanks go to the authors who enthusiastically embraced the idea and energetically agreed to share their expertise and research results in numerous domains of GC. The reviewers offered their constructive thoughts on the submissions, which were of immense help and contributed to the quality of the content of the handbook. We are grateful for the truly professional support we have received from the staff of John Wiley, especially Kate Griffiths and Debbie Cox, who always provided us with words of encouragement and advice that helped us keep the project on schedule. EditorsinChief Edmonton – Warsaw – El Paso May 2007
References [1] G.W. Leibniz. New Essays on Human Understanding (1705). Cambridge University Press, Cambridge, UK, 1982. [2] L.A. Zadeh. Fuzzy sets and information granularity. In: M.M. Gupta, R.K. Ragade, and R.R. Yager (eds), Advances in Fuzzy Set Theory and Applications. NorthHolland, Amsterdam, 1979, 3–18. [3] L.A. Zadeh. Toward a generalized theory of uncertainty (GTU) – an outline. Inf. Sci., 172 (2005) 1–40. [4] Z, Pawlak. Information systemstheoretical foundations. Inf. Syst. 6(3) (1981) 205–218. [5] J.F. Peters and A. Skowron. Zdzisl aw Pawlak: Life and work, transaction on rough sets V. Springer Lect. Not. Comput. Sci. 4100 (2006) 1–24.. [6] Z. Pawlak and A. Skowron. Rudiments of rough sets. Inf. Sci. 177(1) (2007) 3–27.
xii
Preface
[7] A. Bargiela and W. Pedrycz. Granular Computing: An Introduction. Kluwer Academic Publishers, Dordercht, 2003. [8] L.A. Zadeh. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 90 (1997) 111–127. [9] Z. Pawlak. Rough sets. In: Theoretical Aspects of Reasoning About DataTheory and Decision Library, Series D: System Theory, Knowledge Engineering and Problem Solving, Vol. 9. Kluwer Academiic Publishers, Dordrecht, (1991). [10] J. Hobbs. Granulation. In: Proceedings of the 9th IJCAI 85, Los Angeles, California, August 18–23, 1985, pp. 432–435. [11] Z. Pawlak. Rough sets. Int. J. Comput. Inf. Sci. 11 (1982) 341–356. [12] Z. Pawlak. Rough Sets. Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Dordercht, 1991. [13] W. Pedrycz (ed). Granular Computing: An Emerging Paradigm. PhysicaVerlag, Heidelberg, 2001. [14] S.K. Pal, L. Polkowski, and A. Skowron (eds). RoughNeural Computing: Techniques for Computing with Words. Cognitive Technologies, SpringerVerlag, Heidelberg, 2004.
Foreword Granular Computing – coauthored by professors A. Bargiela and W. Pedrycz, and published in 2003 – was the first book on granular computing [1]. It was a superlative work in all respects. Handbook of Granular Computing is a worthy successor. Significantly, the coeditors of the handbook, Professors Pedrycz, Skowron, and Kreinovich are, respectively, the leading contributors to the closely interrelated fields of granular computing, rough set theory, and interval analysis – an interrelationship which is accorded considerable attention in the handbook. The articles in the handbook are divided into three groups: foundations of granular computing, interval analysis, fuzzy set theory, and rough set theory; hybrid methods and models of granular computing; and applications and case studies. One cannot but be greatly impressed by the vast panorama of applications extending from medical informatics and data mining to timeseries forecasting and the internet. Throughout the handbook, the exposition is aimed at reader friendliness and deserves high marks in all respects. What is granular computing? The preface and the chapters of this handbook provide a comprehensive answer to this question. In the following, I take the liberty of sketching my perception of granular computing – a perception in which the concept of a generalized constraint plays a pivotal role. An earlier view may be found in my 1998 paper ‘Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/intelligent systems’ [2]. Basically, granular computing differs from conventional modes of computation in that the objects of computation are not values of variables but information about values of variables. Furthermore, information is allowed to be imperfect; i.e., it may be imprecise, uncertain, incomplete, conflicting, or partially true. It is this facet of granular computing that endows granular computing with a capability to deal with realworld problems which are beyond the reach of bivalentlogicbased methods which are intolerant of imprecision and partial truth. In particular, through the use of generalizedconstraintbased semantics, granular computing has the capability to compute with information described in natural language. Granular computing is based on fuzzy logic. There are many misconceptions about fuzzy logic. To begin with, fuzzy logic is not fuzzy. Basically, fuzzy logic is a precise logic of imprecision. Fuzzy logic is inspired by two remarkable human capabilities. First, the capability to reason and make decisions in an environment of imprecision, uncertainty, incompleteness of information, and partiality of truth. And second, the capability to perform a wide variety of physical and mental tasks based on perceptions, without any measurements and any computations. The basic concepts of graduation and granulation form the core of fuzzy logic, and are the principal distinguishing features of fuzzy logic. More specifically, in fuzzy logic everything is or is allowed to be graduated, i.e., be a matter of degree or, equivalently, fuzzy. Furthermore, in fuzzy logic everything is or is allowed to be granulated, with a granule being a clump of attribute values drawn together by indistinguishability, similarity, proximity, or functionality. The concept of a generalized constraint serves to treat a granule as an object of computation. Graduated granulation, or equivalently fuzzy granulation, is a unique feature of fuzzy logic. Graduated granulation is inspired by the way in which humans deal with complexity and imprecision. The concepts of graduation, granulation, and graduated granulation play key roles in granular computing. Graduated granulation underlies the concept of a linguistic variable, i.e., a variable whose values are words rather than numbers. In retrospect, this concept, in combination with the associated concept of a fuzzy if–then rule, may be viewed as a first step toward granular computing.
xiv
Foreword
Today, the concept of a linguistic variable is used in almost all applications of fuzzy logic. When I introduced this concept in my 1973 paper ‘Outline of a new approach to the analysis of complex systems and decision processes’ [3], I was greeted with scorn and derision rather than with accolades. The derisive comments reflected a deepseated tradition in science – the tradition of according much more respect to numbers than to words. Thus, in science, progress is equated to progression from words to numbers. In fuzzy logic, in moving from numerical to linguistic variables, we are moving in a countertraditional direction. What the critics did not understand is that in moving in the countertraditional direction, we are sacrificing precision to achieve important advantages down the line. This is what is called ‘the fuzzy logic gambit.’ The fuzzy logic gambit is one of the principal rationales for the use of granular computing. In sum, to say that the Handbook of Granular Computing is an important contribution to the literature is an understatement. It is a work whose importance cannot be exaggerated. The coeditors, the authors, and the publisher, John Wiley, deserve our thanks, congratulations, and loud applause. Lotfi A. Zadeh Berkeley, California
References [1] A. Bargiela and W. Pedrycz. Granular Computing: An Introduction. Kluwer Academic Publishers, Dordercht, 2003. [2] L.A. Zadeh. Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/intelligent systems. Soft Comput. 2 (1998) 23–25. [3] L.A. Zadeh. Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Syst. Man Cybern. SMC3 (1973) 28–44.
Biographies Witold Pedrycz (M’88SM’90F’99) received the MSc, PhD, and DSci from the Silesian University of Technology, Gliwice, Poland. He is a professor and Canada Research Chair in computational intelligence in the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada. He is also with the Polish Academy of Sciences, Systems Research Institute, Warsaw, Poland. His research interests encompass computational intelligence, fuzzy modeling, knowledge discovery and data mining, fuzzy control including fuzzy controllers, pattern recognition, knowledgebased neural networks, granular and relational computing, and software engineering. He has published numerous papers in these areas. He is also an author of 11 research monographs. Witold Pedrycz has been a member of numerous program committees of IEEE conferences in the area of fuzzy sets and neurocomputing. He serves as an editorinchief of IEEE Transactions on Systems Man and Cybernetics – Part A and associate editor of IEEE Transactions on Fuzzy Systems. He is also an editorinchief of information sciences. Dr. Pedrycz is a recipient of the prestigious Norbert Wiener Award from the IEEE Society of Systems, Man, and Cybernetics as well as K.S. Fu Award from the North American Fuzzy Information Society. Andrzej Skowron received the PhD and DSci from the University of Warsaw in Poland. In 1991 he received the Scientific Title of Professor. He is a Full Professor in the Faculty of Mathematics, Computer Science and Mechanics at Warsaw University. Andrzej Skowron is the author of numerous scientific publications and editor of many books and special issues of scientific journals. His areas of expertise include reasoning with incomplete information, approximate reasoning, soft computing methods and applications, rough sets, rough mereology, granular computing, synthesis and analysis of complex objects, intelligent agents, knowledge discovery systems, and advanced data mining techniques, decision support systems, adaptive and autonomous systems. He was the supervisor of more than 20 PhD theses. He was also involved in several national and international research and commercial projects relating to data mining (fraud detection and web mining), control of unmanned vehicles, medical decision support systems, and approximate reasoning in distributed environments among many others. Since 1995 he is the editorinchief of Fundamenta Informaticae journal and a member of editorial boards of several others journals including Knowledge Discovery and Data. He is the coeditorinchief of the journal LNCS Transactions on Rough Sets published by Springer. Andrzej Skowron was the president of the International Rough Set Society from 1996 to 2000. He served or is currently serving on the program committees of almost 100 international conferences and workshops as program committee member, program chair, or cochair. He has delivered numerous invited talks at international conferences, including a plenary talk at the 16th IFIP World Computer Congress (Beijing, 2000). Throughout his career, Andrzej Skowron has won many awards for his achievements, including awards from the Ministry of Science, the Rector of Warsaw University, the Ministry of Education, Mazur’s Award of the Polish Mathematical Society, and Janiszewski’s Award of the Polish Mathematical Society. In 2003 he received the title of Honorary Professor from Chongqing University of Post and Telecommunication (China). In 2005 he received the ACM Recognition of Service Award for contributions to ACM and the award from International Rough Sets Society for the outstanding research results. Dr. Vladik Kreinovich received his MSc in mathematics and computer science from St Petersburg University, Russia, in 1974 and PhD from the Institute of Mathematics, Soviet Academy of Sciences,
xvi
Biographies
Novosibirsk, in 1979. In 1975–1980, he worked with the Soviet Academy of Sciences, in particular, in 1978–1980, with the Special Astrophysical Observatory (representation and processing of uncertainty in radioastronomy). In 1982–1989, he worked on error estimation and intelligent information processing for the National Institute for Electrical Measuring Instruments, Russia. In 1989, he was a visiting scholar at Stanford University. Since 1990, he is with the Department of Computer Science, University of Texas at El Paso. Also, he served as an invited professor in Paris (University of Paris VI), Hong Kong, St Petersburg, Russia, and Brazil. His main interests include representation and processing of uncertainty, especially interval computations and intelligent control. He has published 3 books, 6 edited books, and more than 700 papers. He is member of the editorial board of the international journal Reliable Computing (formerly, Interval Computations) and several other journals. He is also the comaintainer of the international website on interval computations, http://www.cs.utep.edu/intervalcomp. He is foreign member of the Russian Academy of Metrological Sciences, recipient of the 2003 El Paso Energy Foundation Faculty Achievement Award for Research awarded by the University of Texas at El Paso, and a corecipient of the 2005 Star Award from the University of Texas System. Ren´e Alt is a professor of computer sciences at the Pierre et Marie Curie University in Paris (UPMC). He received his master diploma in mathematics from UPMC in 1968, the Doctorate in Computer Sciences (PhD) of UPMC in 1971, and was Docteur es Sciences from UPMC in 1981. He has been professor of computer sciences at the University of Caen (France) from 1985 to 1991. He was head of the faculty of computer sciences of UPMC from 1997 to 2001 and vice president of the administrative council of UPMC from 2002 to 2006. Ren´e Alt’s fields of interest are the numerical solution of differential equations, computer arithmetic, roundoff error propagation, validation of numerical software, parallel computing, and image processing. S. Asharaf received the BTech from the Cochin University of Science and Technology, Kerala, and the Master of Engineering from the Indian Institute of Science, where he is working toward a PhD. His research interests include data clustering, soft computing, and support vector machines. He is one of the recipients of IBM best PhD student award in 2006. Rosangela Ballini received her BSc degree in applied mathematics from the Federal University of S˜ao Carlos (UFSCar), SP, Brazil, in 1996. In 1998, she received the MSc degree in mathematics and computer science from the University of S˜ao Paulo (USP), SP, Brazil, and the PhD degree in electrical engineering from the State University of Campinas (Unicamp), SP, Brazil, in 2000. Currently, she is professor of the Department of Economic Theory, Institute of Economics (IE), Unicamp. Her research interests include time series forecasting, neural networks, fuzzy systems, and nonlinear optimization. Mohua Banerjee received her BSc (Hons) degree in mathematics, and the MSc, MPhil, and PhD degrees in pure mathematics from the University of Calcutta in 1985, 1987, 1989, and 1995, respectively. During 1995–1997, she was a research associate at the Machine Intelligence Unit, Indian Statistical Institute, Calcutta. In 1997, she joined the Department of Mathematics and Statistics, Indian Institute of Technology, Kanpur, as lecturer, and is currently Assistant Professor in the same department. She was an associate of The Institute of Mathematical Sciences, Chennai, India, during 2003–2005. Her main research interests lie in modal logics and rough sets. She has made several research visits to institutes in India and abroad. She is a member of the Working Group for the Center for Research in Discrete Mathematics and its Applications (CARDMATH), Department of Science and Technology (DST), Government of India. She serves in the reviewer panel of many international journals. Dr. Banerjee was awarded the Indian National Science Academy Medal for Young Scientists in 1995. Edurne Barrenechea is an assistant lecturer at the Department of Automatics and Computation, Public University of Navarra, Spain. Having received an MSc in computer science at the Pais Vasco University in 1990. She worked as analyst programmer in Bombas Itur from 1990 to 2001 and then she joined the Public University of Navarra as associate lecturer. She obtained the PhD in computer science in 2005.
Biographies
xvii
Her research interests are fuzzy techniques for image processing, fuzzy sets theory, interval type2 fuzzy sets theory, neural networks, and industrial applications of soft computing techniques. She is member of the European Society for Fuzzy Logic and Technology (EUSFLAT). Jan G. Bazan is an Assistant Professor in the Institute of Mathematics at the University of Rzeszow in Poland. He received his PhD degree in 1999 from the University of Warsaw in Poland. His recent research interests focus on rough set theory, granular computing, knowledge discovery, data mining techniques, reasoning with incomplete information, approximate reasoning, decision support systems, and adaptive systems. He is the author or coauthor of more than 40 scientific publications and he was involved in several national and international research projects relating to fraud detection, web mining, risk pattern detection, and automated planning of the treatment among other topics. Taner Bilgi¸c received his BSc and MSc in industrial engineering from the Middle East Technical University, Ankara, Turkey, in 1987 and 1990, respectively. He received a PhD in industrial engineering from the University of Toronto in 1995. The title of his dissertation is ‘MeasurementTheoretic Frameworks for Fuzzy Set Theory with Applications to Preference Modelling.’ He spent 2 years at the Enterprise Integration Laboratory in Toronto as a research associate. Since 1997, he has been a faculty member at the Department of Industrial Engineering at Bogazici University in Istanbul, Turkey. Isabelle Bloch is a professor at ENST (Signal and Image Processing Department), CNRS UMR 5141 LTCI. Her research interests include threedimensional (3D) image and object processing, 3D and fuzzy mathematical morphology, decision theory, information fusion, fuzzy set theory, belief function theory, structural pattern recognition, spatial reasoning, and medical imaging. Gloria Bordogna received her Laurea degree in Physics at the Universit`a degli Studi di Milano, Italy, in 1984. In 1986 she joined the Italian National Research Council, where she presently holds the position of a senior researcher at the Institute for the Dynamics of Environmental Processes. She is also a contract professor at the faculty of Engineering of Bergamo University, where she teaches information retrieval and geographic information systems. Her research activity concerns soft computing techniques for managing imprecision and uncertainty affecting both textual and spatial information. She is coeditor of a special issue of JASIS and three volumes published by SpringerVerlag on uncertainty and impression management in databases. She has published over 100 papers in international journals, in the proceedings of international conferences, and in books. She participated at the program committee of international conferences such as FUZZIEEE, ECIR, ACM SIGIR, FQAS, EUROFUSE, IJCAI2007, ICDE 2007, and ACM SAC ‘Information Access and Retrieval’ track and served as a reviewer of journals such as JASIST, IEEE Transactions on Fuzzy Systems, Fuzzy Sets and Systems, and Information Processing and Management. Giovanni Bortolan received the doctoral degree from the University of Padova, Padova, Italy in 1978. He is senior researcher at the Institute of Biomedical Engineering, Italian National Research Council (ISIBCNR), Padova, Italy. He has published numerous papers in the areas of medical informatics and applied fuzzy sets. He is actively pursuing research in medical informatics in computerized electrocardiography, neural networks, fuzzy sets, data mining, and pattern recognition. Humberto Bustince is an Associate Professor at the Department of Automatics and Computation, Public University of Navarra, Spain. He holds a PhD degree in mathematics from Public University of Navarra from 1994. His research interests are fuzzy logic theory, extensions of fuzzy sets (type2 fuzzy sets and Atanassov’s intuitionistic fuzzy sets), fuzzy measures, aggregation operators, and fuzzy techniques for image processing. He is the author of more than 30 peerreviewed research papers and is member of IEEE and European Society for Fuzzy Logic and Technology (EUSFLAT). Cory J. Butz received the BSc, MSc, and PhD degrees in computer science from the University of Regina, Saskatchewan, Canada, in 1994, 1996, and 2000, respectively. His research interests include uncertainty reasoning, database systems, information retrieval, and data mining.
xviii
Biographies
Oscar Castillo was awarded Doctor of Science (DSc) from the Polish Academy of Sciences. He is a professor of computer science in the Graduate Division, Tijuana Institute of Technology, Tijuana, Mexico. In addition, he is serving as research director of computer science and head of the research group on fuzzy logic and genetic algorithms. Currently, he is president of Hispanic American Fuzzy Systems Association (HAFSA) and vice president of International Fuzzy Systems Association (IFSA) in charge of publicity. Professor Castillo is also vice chair of the Mexican Chapter of the Computational Intelligence Society (IEEE). Professor Castillo is also general chair of the IFSA 2007 World Congress to be held in Cancun, Mexico. He also belongs to the Technical Committee on Fuzzy Systems of IEEE and to the Task Force on ‘Extensions to Type1 Fuzzy Systems.’ His research interests are in type2 fuzzy logic, intuitionistic fuzzy logic, fuzzy control, neuro–fuzzy, and genetic–fuzzy hybrid approaches. He has published over 60 journal papers, 5 authored books, 10 edited books, and 150 papers in conference proceedings. Gianpiero Cattaneo is a Full Professor in ‘dynamical system theory’ at the Universit`a di Milano, Bicocca. Previously, he was an Associate Professor in ‘mathematical methods of physics’ (from 1974 to 1984) and researcher of ‘theoretical physics’ (from 1968 to 1974). From 1994 to 1997, he was a regular visiting professor at the London School of Economics (Department of Logic and Scientific Methods), where, since 1998, he had a position of research associate at ‘The Centre for the Philosophy of Natural and Social Science.’ From 1997 to 1999, he was Maitre de Conferences at the NancyMetz Academy and Maitre de Conferences at ‘la Ecole Normale Superieure’ in Lyon: Laboratoire de l’Informatique du Parall`elisme. He is member of the editorial board of the Transactions on Rough Sets, LNCS (SpringerVerlag), the Scientific Committee of the ‘International Quantum Structures Association (IQSA)’; the International Advisory Board of the ‘European School of Advanced Studies in Methods for Management of Complex Systems’ (Pavia); International Federation of Information Processing (IFIP): Working group on cellular automata. Moreover, he is scientific coordinator of a biannual 2006–2007 ‘Program of International Collaboration’ between France and Italy, involving the universities of Nice, Marseille, Ecole Normale Superieure de Lyon, MarnelaValle, MilanoBicocca, and Bologna. He was a member of numerous program committees of international conferences. His research activities, with results published on international journals in more than 140 papers, are centered on topological chaos, cellular automata and related languages, algebraic approach to fuzzy logic and rough sets, axiomatic foundations of quantum mechanics, and realization of reversible gates by quantum computing techniques. Davide Ciucci received a PhD in 2004 in computer science from the University of Milan. Since 2005, he has held a permanent position as a researcher at the University of MilanoBicocca, where he delivered a course on fuzzy logic and rough sets. His research interests are about a theoretical algebraic approach to imprecision, with particular attention to manyvalued logics, rough sets, and their relationship. Recently, he got involved in the semantic web area, with a special interest in fuzzy ontology and fuzzy description logics. He has been a member committee of several conferences about rough and fuzzy sets, coorganizer of a special session at the Joint Rough Set Symposium JRS07. His webpages, with a list of publications, can be found at www.fislab.disco.unimib.it. Chris Cornelis is a postdoctoral researcher at the Department of Applied Mathematics and Computer Science at Ghent University (Belgium) funded by the Research Foundation – Flanders. His research interests include various models of imperfection (fuzzy rough sets, bilattices and intervalvalued fuzzy sets); he is currently focusing on their application to personalized information access and web intelligence. Martine De Cock is a professor at the Department of Applied Mathematics and Computer Science at Ghent University (Belgium). Her current research efforts are directed toward the development and the use of computational intelligent methods for nextgeneration web applications. E.I. Papageorgiou was born in Larisa in 1975, Greece. She obtained the physics degree in 1997, MSc in medical physics in 2000, and PhD in computer science in July 2004 from the University of Patras. From 2004 to 2006, she was a postdoctoral researcher at the Department of Electrical and Computer
Biographies
xix
Engineering, University of Patras (Greece), on developing new models and methodologies based on soft computing for medical decision support systems. From 2000 to 2006, she was involved in several research projects related to the development of new algorithms and methods for complex diagnostic and medical decision support systems. Her main activities were the development of innovative learning algorithms for fuzzy cognitive maps and intelligent expert systems for medical diagnosis and decisionmaking tasks. From 2004 to 2005, she was appointed as lecturer at the Department of Electrical and Computer Engineering at the University of Patras. Currently, she is Assistant Professor at the Department of Informatics and Computer Technology, Technological Educational Institute of Lamia, and adjunct Assistant Professor at the University of Central Greece. She has coauthored more than 40 journals and conference papers, book chapters, and technical reports, and has more than 50 citations to her works. Her interests include expert systems, intelligent algorithms and computational intelligence techniques, intelligent decision support systems, and artificial intelligence techniques for medical applications. Dr. E.I. Papageorgiou was a recipient of a scholarship of Greek State Scholarship Foundation ‘I.K.Y.’ during her PhD studies (2000–2004), and from 2006 to May 2007, she was also a recipient of the postdoctoral research fellowship from the Greek State Scholarship Foundation ‘I.K.Y.’ Paulo Fazendeiro received the BS degree in mathematics and informatics in 1995 (with honors) and the equivalent of MS degree in computer science in 2001, all from the University of Beira Interior, Portugal. He is preparing his dissertation on the relationships between accuracy and interpretability of fuzzy systems as a partial fulfillment of the requirements for the informatics engineering PhD degree. He joined the University of Beira Interior in 1995, where he is currently a lecturer in the Informatics Department. His research interests include application of fuzzy set theory and fuzzy systems, data mining, evolutionary algorithms, multiobjective optimization, and clustering techniques with applications to image processing. Dr. Fazendeiro is a member of the Portuguese Telecommunications Institute and the Informatics Laboratory of the University of Algarve. Mario Fedrizzi received the MSc degree in mathematics in 1973 from the University of Padua, Italy. Since 1976, he has been an Assistant Professor; since 1981, an Associate Professor; and since 1986, a Full Professor with Trento University, Italy. He served as a chairman of the Institute of Informatics from 1985 to 1991 and as a dean of the Faculty of Economics and Business Administration from 1989 to 1995. His research focused on utility and risk theory, stochastic dominance, group decision making, fuzzy decision analysis, fuzzy regression analysis, and consensus modeling in uncertain environments, decision support systems. He has authored or coauthored books and more than 150 papers, which appeared in international proceedings and journals, e.g., European Journal of Operational Research, Fuzzy Sets and Systems, IEEE Transactions on Systems, Man and Cybernetics, Mathematical Social Sciences, Quality and Quantity, and International Journal of Intelligent Systems. He was also involved in consulting activities in the areas of information systems and DSS design and implementation, office automation, quality control, project management, expert systems, and neural nets in financial planning. From 1995 to 2006, he was appointed as chairman of a bank and of a realestate company, and as a member of the board of directors of Cedacri, the largest Italian banking information systems outsourcing company, and of Unicredit Banca. Fernando Antonio Campos Gomide received the BSc degree in electrical engineering from the Polytechnic Institute of the Pontifical Catholic University of Minas Gerais (IPUC/PUCMG) Belo Horizonte, Brazil; the MSc degree in electrical engineering from the State University of Campinas (Unicamp), Campinas, Brazil; and the PhD degree in systems engineering from Case Western Reserve University (CWRU), Cleveland, Ohio, USA. He is professor of the Department of Computer Engineering and Automation (DCA), Faculty of Electrical and Computer Engineering (FEEC) of Unicamp, since 1983. His interest areas include fuzzy systems, neural and evolutionary computation, modeling, control and optimization, logistics, decision making, and applications. Currently, he serves on editorial boards of Fuzzy Sets and Systems, Intelligent Automation and Soft Computing, IEEE Transactions on SMCB, Fuzzy Optimization and Decision Making, and Mathware and Soft Computing. He is a regional editor of the International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, and Journal of Advanced Computational Intelligence.
xx
Biographies
Anna Gomolinska ´ received a PhD in mathematics from Warsaw University in 1993. Her doctoral thesis, written under the supervision of Cecilia M. Rauszer, was entitled ‘Logical Methods of Knowledge Representation Under Incomplete Information.’ She works as a teacher in the Department of Mathematics of Bialystok University. She was a visiting scholar at Uppsala University in 1994 as well as a research fellow at the Swedish Collegium for Advanced Studies (SCAS) in Uppsala in 1995 and at the CNR Institute of Computer Science (IASICNR) in Rome, 2002. Anna Gomoli´nska has been the author or coauthor of around 40 research articles. Her scientific interests are rough sets, multiagent systems, game theory, and logical aspects of computer science and artificial intelligence. Since 2001 she has been a member of the research group led by Professor Andrzej Skowron from Warsaw University. Siegfried Gottwald, born in 1943, teaches mathematics and logic at Leipzig University since 1972. He got his PhD in mathematics there in 1969 and his habilitation degree in 1977. He became University Docent in Logic there in 1979 and Associate Professor in 1987. Since 1992, he is Full Professor for ‘nonclassical and mathematical logic’ at Leipzig University and head of the ‘Institute for Logic and Philosophy of Science’ there. He has been active in research on fuzzy logic, fuzzy sets, and fuzzy methodologies for over three decades now. His main topics include the fundamentals of fuzzy set theory, manyvalued logic and their relationship to fuzzy sets and vague notions, fuzzy relation equations and their relationship to fuzzy control, as well as fuzzy logic and approximate reasoning. He is also interested in the history and philosophy of logic and mathematics. He has published several books on manyvalued logic, fuzzy sets, and their applications, was coauthor of a textbook on calculus and of a reader in the history of logic, and coedited and coauthored a biographical dictionary of mathematicians. He was a visiting scholar at the Department of Computer Science TH Darmstadt, at the Departments of Philosophy at the University of California in Irvine, and at the Indiana University in Bloomington, IN. Actually, he is area editor for ‘nonclassical logics and fuzzy set theory’ of the international journal Fuzzy Sets and Systems, member of the editorial boards of MultipleValued Logic and Soft Computing and of Information Sciences, as well as of the consulting board of former editors of Studia Logica. In 1992 he was honored with the research award ‘Technische Kommunikation’ of the (German) AlcatelSELFoundation. Salvatore Greco has been a Full Professor at the Faculty of Economics of Catania University since 2001. His main research interests are in the field of multicriteria decision aid (MCDA), in the application of the rough set approach to decision analysis, in the axiomatic foundation of multicriteria methodology, and in the fuzzy integral approach to MCDA. In these fields he cooperates with many researchers of different countries. He received the Best Theoretical Paper Award, by the Decision Sciences Institute (Athens, 1999). Together with Benedetto Matarazzo, he organized the Seventh International Summer School on MCDA (Catania, 2000). He is the author of many articles published in important international journals and specialized books. He has been a Visiting Professor at Poznan Technical University and at the University of Paris Dauphine. He has been an invited speaker at important international conferences. He is a referee of the most relevant journals in the field of decision analysis. Maria Letizia Guerra is an Associate Professor at the University of Bologna (Italy), where she currently teaches mathematics for economics and finance. She received a PhD in computational methods for finance from Bergamo University in 1997; her current research activity examines stochastic and fuzzy models for derivatives pricing and risk management. Daniel G´omez is a Full Professor in the Department of Statistics and Operational Research III at the Faculty of Statistics, Complutense University of Madrid, Spain. He has held a PhD in mathematics from Complutense University since 2003. He is the author of more than 20 research papers in refereed journals and more than 10 papers as book chapters. His research interests are in multicriteria decision making, preference representation, aggregation, classification problems, fuzzy sets, and graph theory. Jerzy W. GrzymalaBusse is a professor of electrical engineering and computer science at the University of Kansas since August of 1993. His research interests include data mining, machine learning, knowledge discovery, expert systems, reasoning under uncertainty, and rough set theory. He has
Biographies
xxi
published three books and over 200 articles. He is a member of editorial boards of the Foundations of Computing and Decision Science, International Journal of KnowledgeBased Intelligent Engineering Systems, Fundamenta Informaticae, International Journal of Hybrid Intelligent System, and Transactions on Rough Sets. He is a vice president of the International Rough Set Society and member of the Association for Computing Machinery, American Association for Artificial Intelligence, and Upsilon Pi Epsilon. Courtney Ryan Gwaltney has a BSc degree from the University of Kansas and a PhD degree from the University of Notre Dame, both in chemical engineering. He received the 2006 Eli J. and Helen Shaheen Graduate School Award for excellence in research and teaching at Notre Dame. He is currently employed by BP. Tu Bao Ho is a professor at the School of Knowledge Science, Japan Advanced Institute of Science and Technology, Japan. He received his MSc and PhD from Marie and Pierre Curie University in 1984 and 1987, respectively, and habilitation from Paris Dauphine University in 1998. His research interests include knowledgebased systems, machine learning, data mining, medical informatics, and bioinformatics. Tu Bao Ho is a member of editorial board of the following international journals: Studia Informatica, Knowledge and Systems Sciences, Knowledge and Learning, and Business Intelligence and Data Mining. He is also an associate editor of Journal of Intelligent Information and Database Systems, a review board member of International Journal of Applied Intelligence, and a member of the Steering Committee of PAKDD (PacificAsia Conferences on Knowledge Discovery and Data Mining). Frank H¨oeppner received his MSc and PhD in computer science from the University of Braunschweig in 1996 and 2003, respectively. He is now professor for information systems at the University of Applied Sciences Braunschweig/Wolfenbuttel in Wolfsburg (Germany). His main research interest is knowledge discovery in databases, especially clustering and the analysis of sequential data. Eyke Hullermeier, ¨ born in 1969, holds MS degrees in mathematics and business computing, both from the University of Paderborn (Germany). From the Computer Science Department of the same university he obtained his PhD in 1997 and a habilitation degree in 2002. He spent 2 years from 1998 to 2000 as a visiting scientist at the Institut de Recherche en Informatique de Toulouse (France) and held appointments at the Universities of Dortmund, Marburg, and Magdeburg afterwards. Recently, he joined the Department of Mathematics and Computer Science at Marburg University (Germany), where he holds an appointment as a Full Professor and heads the Knowledge Engineering and Bioinformatics Lab. Professor H¨ullermeier’s research interests include methodical foundations of machine learning and data mining, fuzzy set theory, and applications in bioinformatics. He has published numerous research papers on these topics in respected journals and major international conferences. Professor H¨ullermeier is a member of the IEEE, the IEEE Computational Intelligence Society, and a board member of the European Society for Fuzzy Logic and Technology (EUSFLAT). Moreover, he is on the editorial board of the journals Fuzzy Sets and Systems, Soft Computing, and Advances in Fuzzy Systems. Andrzej Jankowski received his PhD from Warsaw University, where he worked for more than 15 years, involved in pioneering research on the algebraic approach to knowledge representation and reasoning structures based on topos theory and evolution of hierarchies of metalogics. For 3 years, he worked as a visiting professor in the Department of Computer Science at the University of North Carolina, Charlotte, USA. He has unique experience in managing complex IT projects in Central Europe, for example; he was inventor and the project manager of such complex IT projects for government like POLTAX (one of the biggest tax modernization IT project in Central Europe) and ePOLTAX (eforms for tax system in Poland). He accumulated the extensive experience in the government, corporate, industry, and finance sectors. He also supervised several AIbased commercial projects such as intelligent fraud detection and an intelligent search engine. Andrzej Jankowski is one of the founders of the Polish–Japanese Institute of Information Technology and for 5 years he served as its deputy rector for research and teaching.
xxii
Biographies
Janusz Kacprzyk MSc in computer science and automatic control, PhD in systems analysis, DSc in computer science, professor since 1997, and member of the Polish Academy of Sciences since 2002. Since 1970 with the Systems Research Institute, Polish Academy of Sciences, currently as professor and deputy director for research. Visiting professor at the University of North Carolina, University of Tennessee, Iona College, University of Trento, and Nottingham Trent University. Research interests include soft computing, fuzzy logic and computing with words, in decisions and optimization, control, database querying, and information retrieval. 1991–1995: IFSA vice president, 1995–1999: in IFSA Council, 2001–2005: IFSA treasurer, 2005: IFSA presidentelect, IFSA fellow, IEEE Fellow. Recipient of numerous awards, notably 2005 IEEE CIS Pioneer Award for seminal works on multistage fuzzy control, notably fuzzy dynamic programming, and the sixth Kaufmann Prize and Gold Medal for seminal works on the application of fuzzy logic and economy and managements. Editor of three Springer’s book series: Studies in Fuzziness and Soft Computing, Advances in Soft Computing, and Studies in Computational Intelligence. On editorial boards of 20 journals. Author of 5 books, (co)editor of 30 volumes, and (co)author of 300 papers. Member of IPC at 150 conferences. Frank Klawonn received his MSc and PhD in mathematics and computer science from the University of Braunschweig in 1988 and 1992, respectively. He has been a visiting professor at Johannes Kepler University in Linz (Austria) in 1996 and at Rhodes University in Grahamstown (South Africa) in 1997. He is now the head of the Lab for Data Analysis and Pattern Recognition at the University of Applied Sciences in Wolfenbuettel (Germany). His main research interests focus on techniques for intelligent data analysis especially clustering and classification. He is an area editor of the International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems and a member of the editorial board of the International Journal of Information Technology and Intelligent Computing, Fuzzy Sets and Systems, as well as Mathware & Soft Computing. Erich Peter Klement received his PhD in Mathematics in 1971 from the University of Innsbruck, Austria. He is a professor of mathematics and chairman of the Department of KnowledgeBased Mathematical Systems at the Johannes Kepler University, Linz, Austria. He held longterm visiting research positions at the University of California, Berkeley (USA), the Universite AixMarseille II (France), and the Tokyo Institute of Technology (Japan), and he worked as a visiting professor at the Universities of Klagenfurt (Austria), Cincinnati (Ohio, USA), and Trento (Italy). His major research interest is in the foundations of fuzzy logic and fuzzy control as well as in the application in probability and statistics, game theory, and image and signal processing. He is author/coauthor of three monographs, coeditor of six edited volumes, and author/coauthor of 90 papers in international journals and edited volumes. He served on the editorial board of nine international scientific journals, and he is a member of IEEE, the European Association for Fuzzy Logic and Technology, and the American and the Austrian Mathematical Society. Donald H. Kraft Professor, Department of Computer Science, Louisiana State University, Baton Rouge, LA. He is an editor of Journal of the American Society for Information Science and Technology (JASIST), and editorial board member of Information Retrieval, International Journal of Computational Intelligence Research (IJCIR), and Journal of Digital Information Management (JDIM). In other professional activities he served as a summer faculty of U.S. Air Force Office of Scientific Research (AFOSR), a research associate of WrightPatterson Air Force Base, Ohio, USA. He worked on a project, contracted through Research and Development Laboratories (RDL), to do an exploratory study of weighted fuzzy keyword retrieval and automatic generation of hypertext links for CASHE:PVS, a hypermedia system of human engineering documents and standards for use in design. Yan Li received the BSc and MSc degrees in mathematics in 1998 and 2001, respectively, from the College of Computer and Mathematics, Hebei University, PR China. She received her PhD degree in computer science from the Department of Computing, the Hong Kong Polytechnic University. She is currently an Assistant Professor of the School of Computer and Mathematics, Hebei University, PR China. Her interests include fuzzy mathematics, casebased reasoning, rough set theory, and information retrieval. She is a member of the IEEE.
Biographies
xxiii
Youdong Lin has BS and MS degrees in chemical engineering from Tsinghua University. He received the PhD degree in chemical engineering from the University of Notre Dame, where he is currently a research associate. He received the 2004 SGI Award for Computational Sciences and Visualization for his outstanding research at Notre Dame. Pawan Lingras’ undergraduate education from Indian Institute of Technology, Bombay, India, was followed by graduate studies at the University of Regina, Canada. His areas of interests include artificial intelligence, information retrieval, data mining, web intelligence, and intelligent transportation systems. Weldon A. Lodwick was born and raised in S˜ao Paulo, Brazil, to U.S. parents, where he lived through high school. He came to live in the USA and went to Muskingum College in New Concord, Ohio, USA, where he graduated from this college in 1967, with major in mathematics (honors), a minor in physics, and an emphasis in philosophy. He obtained his masters degree from the University of Cincinnati in 1969 and a PhD in mathematics from Oregon State University. He left Oregon State University in 1977 to begin work at Michigan State University as a systems analyst for an international project for food production potential working in the Dominican Republic, Costa Rica, Nicaragua, Honduras, and Jamaica. In addition, he developed software for food production analysis for Syria. His job consisted of developing software, geographical information systems, statistical models, linear programming models, analysis, and training for transfer to the various countries in which the project was working. While in Costa Rica, Dr. Lodwick worked directly with the Organization of American States (IICA), with some of their projects in Nicaragua and Honduras that had similar emphasis as that of Michigan State University. In 1982, he was hired by the Department of Mathematics of the University of Colorado at Denver where currently he is a Full Professor of mathematics. Vincenzo Loia received the PhD in computer science from the University of Paris VI, France, in 1989 and the bachelor degree in computer science from the University of Salerno in 1984. From 1989 he is faculty member at the University of Salerno where he teaches operatingsystembased systems and multiagentbased systems. His current position is as professor and head of the Department of Mathematics and Computer Science. He was principal investigator in a number of industrial R&D projects and in academic research projects. He is author of over 100 original research papers in international journals, book chapters, and international conference proceedings. He edited three research books around agent technology, Internet, and soft computing methodologies. He is cofounder of the Soft Computing Laboratory and founder of the Multiagent Systems Laboratory, both at the Department of Mathematics and Computer Science. He is coeditorinchief of Soft Computing, an international SpringerVerlag journal. His current research interests focus on merging soft computing and agent technology to design technologically complex environments, with particular interest in web intelligence applications. Dr. Loia is chair of the IEEE Emergent Technologies Technical Committee in the IEEE Computational Intelligence. He is also member of the International Technical Committee on Media Computing, IEEE Systems, Man and Cybernetics Society. Marina Hirota Magalh˜aes received her BSc degree in applied mathematics and computation from the State University of Campinas (Unicamp), SP, Brazil, in 2001. In 2004, she received the MSc degree in electrical engineering from the State University of Campinas (Unicamp), SP, Brazil. Currently, she is a PhD candidate in the National Institute of Research Space (INPE), SP, Brazil. Her research interests include time series forecasting, fuzzy systems, and neural networks. Ferdinando Di Martino is professor of computer science at the Faculty of Architecture of Naples University Federico II. Since 1990 he has participated to national and international research projects in artificial intelligence and soft computing. He has published numerous papers on wellknown international journals and his main interests concern applications of fuzzy logic to image processing, approximate reasoning, geographic information systems, and fuzzy systems.
xxiv
Biographies
Benedetto Matarazzo is a Full Professor at the Faculty of Economics of Catania University. He has been member of the committee of scientific societies of operational researches. He is organizer and member of the program committee and he has been invited speaker in many scientific conferences. He is member of the editorial boards of the European Journal of Operational Research, Journal of MultiCriteria Decision Analysis, and Foundations of Computing and Decision Sciences. He has been chairman of the Program Committee of EURO XVI (Brussels, 1998). His research is in the fields of MCDA and rough sets. He has been an invited professor at, and cooperates with, several European universities. He received the Best Theoretical Paper Award, by the Decision Sciences Institute (Athens, 1999). He is member of the Organizing Committee of the International Summer School on MCDA, of which he organized the first (Catania, 1983) and the seventh (Catania, 2000) editions. Patricia Melin was awarded Doctor of Science (DSc) from the Polish Academy of Sciences. She is a professor of computer science in the Graduate Division, Tijuana Institute of Technology, Tijuana, Mexico. In addition, she is serving as director of Graduate Studies in Computer Science and head of the research group on fuzzy logic and neural networks. Currently, she is vice president of Hispanic American Fuzzy Systems Association (HAFSA) and is also chair of the Mexican Chapter of the Computational Intelligence Society (IEEE). She is also program chair of the IFSA 2007 World Congress to be held in Cancun, Mexico. She also belongs to the Committee of Women in Computational Intelligence of the IEEE and to the New York Academy of Sciences. Her research interests are in type2 fuzzy logic, modular neural networks, pattern recognition, fuzzy control, neuro–fuzzy and genetic–fuzzy hybrid approaches. She has published over 50 journal papers, 5 authored books, 8 edited books, and 150 papers in conference proceedings. Jerry M. Mendel received the PhD degree in electrical engineering from the Polytechnic Institute of Brooklyn, Brooklyn, NY. Currently, he is professor of electrical engineering at the University of Southern California in Los Angeles, where he has been since 1974. He has published over 450 technical papers and is author and/or editor of eight books, including Uncertain Rulebased Fuzzy Logic Systems: Introduction and New Directions (PrenticeHall, 2001). His present research interests include type2 fuzzy logic systems and their applications to a wide range of problems, including smart oil field technology and computing with words. He is a life fellow of the IEEE and a distinguished member of the IEEE Control Systems Society. He was president of the IEEE Control Systems Society in 1986, and is presently chairman of the Fuzzy Systems Technical Committee and an elected member of the Administrative Committee of the IEEE Computational Intelligence Society. Among his awards are the 1983 Best Transactions Paper Award of the IEEE Geoscience and Remote Sensing Society, the 1992 Signal Processing Society Paper Award, the 2002 Transactions on Fuzzy Systems Outstanding Paper Award, a 1984 IEEE Centennial Medal, an IEEE Third Millenium Medal, and a Pioneer Award from the IEEE Granular Computing Conference, May 2006, for outstanding contributions in type2 fuzzy systems. Radko Mesiar received his PhD degree from the Comenius University Bratislava and the DSc degree from the Czech Academy of Sciences, Prague, in 1979 and 1996, respectively. He is a professor of mathematics at the Slovak University of Technology, Bratislava, Slovakia. His major research interests are in the area of uncertainty modeling, fuzzy logic, and several types of aggregation techniques, nonadditive measures, and integral theory. He is coauthor of a monograph on triangular norms, coeditor of three edited volumes, and author/coauthor of more than 100 journal papers and chapters in edited volumes. He is an associate editor of four international journals. Dr. Mesiar is a member of the European Association for Fuzzy Logic and Technology and of the Slovak Mathematical Society. He is a fellow researcher at UTIA AV CR Prague (since 1995) and at IRAFM Ostrava (since 2005). Andrea Mesiarov´aZem´ankov´a graduated from the Faculty of Mathematics, Physics and Informatics of the Comenius University, Bratislava, in 2002. She defended her PhD thesis in July 2005 at the Mathematical Institute of the Slovak Academy of Sciences, Bratislava. At the moment, she is a researcher at the Mathematical Institute of the Slovak Academy of Sciences. Her major scientific interests are triangular norms and aggregation operators.
Biographies
xxv
Sushmita Mitra is a professor at the Machine Intelligence Unit, Indian Statistical Institute, Kolkata. From 1992 to 1994 she was in the RWTH, Aachen, Germany, as a DAAD fellow. She was a visiting professor in the Computer Science Departments of the University of Alberta, Edmonton, Canada, in 2004 and 2007; Meiji University, Japan, in 1999, 2004, 2005, and 2007; and Aalborg University Esbjerg, Denmark, in 2002 and 2003. Dr. Mitra received the National Talent Search Scholarship (1978–1983) from NCERT, India, the IEEE TNN Outstanding Paper Award in 1994 for her pioneering work in neurofuzzy computing, and the CIMPAINRIAUNESCO Fellowship in 1996. She is the author of the books NeuroFuzzy Pattern Recognition: Methods in Soft Computing and Data Mining: Multimedia, Soft Computing, and Bioinformatics published by John Wiley. Dr. Mitra has guest edited special issues of journals, and is an associate editor of Neurocomputing. She has more than 100 research publications in referred international journals. According to the Science Citation Index (SCI), two of her papers have been ranked third and fifteenth in the list of topcited papers in engineering science from India during 1992–2001. Dr. Mitra is a senior member of IEEE and a fellow of the Indian National Academy of Engineering. She served in the capacity of program chair, tutorial chair, and as member of program committees of many international conferences. Her current research interests include data mining, pattern recognition, soft computing, image processing, and bioinformatics. Javier Montero is an Associate Professor at the Department of Statistics and Operational Research, Faculty of Mathematics, Complutense University of Madrid, Spain. He holds a PhD in mathematics from Complutense University since 1982. He is the author of more than 50 research papers in refereed journals such as Behavioral Science, European Journal of Operational Research, Fuzzy Sets and Systems, Approximate Reasoning, Intelligent Systems, General Systems, Kybernetes, IEEE Transactions on Systems, Man and Cybernetics, Information Sciences, International Journal of Remote Sensing, Journal of Algorithms, Journal of the Operational Research Society, Lecture Notes in Computer Science, Mathware and Soft Computing, New Mathematics and Natural Computation, OmegaInternational Journal of Management Sciences, Soft Computing and Uncertainty, and Fuzziness and KnowledgeBased Systems, plus more than 40 papers as book chapters. His research interests are in preference representation, multicriteria decision making, group decision making, system reliability theory, and classification problems, mainly viewed as application of fuzzy sets theory. Hung Son Nguyen is an Assistant Professor at Warsaw University and a member of International Rough Set society. He received his MS and PhD from Warsaw University in 1994 and 1997, respectively. His main research interests are fundamentals and applications of rough set theory, data mining, text mining, granular computing, bioinformatics, intelligent multiagent systems, soft computing, and pattern recognition. On these topics he has published more than 80 research papers in edited books, international journals, and conferences. He is the coauthor of ‘IEEE/WIC/ACM International Conference on Web Intelligence (WI 2005) Best Paper Award.’ Dr. Hung Son Nguyen is a member of the editorial board of the international journals Transaction on Rough Sets, Data Mining and Knowledge Discovery, and ERCIM News, and the assistant to the editorinchief of Fundamenta Informaticea. He has served as a program cochair of RSCTC’06, as a PC member, and a reviewer of various other conferences and journals. Sinh Hoa Nguyen is an Assistant Professor at the Polish Japanese Institute of Information Technology in Warsaw Poland. She received her MSc and PhD from Warsaw University in 1994 and 2000, respectively. Her research interests include rough set theory, data mining, granular computing, intelligent multiagent systems, soft computing, and pattern recognition; on these topics she has published more than 50 research papers in edited books, international journals, and conferences. Recently, she has concentrated on developing efficient methods for learning multilayered classifiers from data, using concept ontology as domain knowledge. Dr. Sinh Hoa Nguyen has also served as a reviewer of many journals and a PC member of various conferences. Trung T. Nguyen has received MSc in computer science from the Department of Mathematics of the Warsaw University in 1993. He is currently completing a PhD thesis at the Department of Mathematics
xxvi
Biographies
of the Warsaw University, while working at the Polish–Japanese Institute of Information Technology in Warsaw, Poland. His principal research interests include rough sets, handwritten recognition, approximate reasoning, and machine learning. Hajime Nobuhara is Assistant Professor in the Department of Intelligent Interaction Technologies of Tsukuba University. He was also Assistant Professor in Tokyo Institute of Technology and postdoctoral fellow c/o University of Alberta (Canada) and a member of the Institute of Electrical and Electronics Engineers (IEEE). His interests mainly concern fuzzy logic and its applications to image processing, publishing numerous, and various papers in famous international journals. Hannu Nurmi worked as a research assistant of Academy of Finland during 1971–1973. He spent the academic year 1972–1973 as a senior ASLAFulbright fellow at Johns Hopkins University, Baltimore, MD. In 1973–1974, he was an assistant at the Department of Political Science, University of Turku. From 1974 till 1995, Nurmi was the Associate Professor of methodology of the social sciences at the University of Turku. In 1978 he was a British Academy Wolfson fellow at the University of Essex, UK. From 1991 till 1996, he was the dean of Faculty of Social Sciences, University of Turku. From 1995 onward, he has been the professor of political science, University of Turku. The fall quarter of 1998 Nurmi spent as the David and Nancy Speer/Government of Finland Professor of Finnish Studies at University of Minnesota, USA. Currently, i.e., from 2003 till 2008, he is on leave from his political science chair on being nominated an academy professor in the Academy of Finland. Nurmi is the author or coauthor of 10 scientific monographs and well over 150 scholarly articles. He has supervised or examined some 20 PhD theses in Finland, Norway, Germany, Czech Republic, and the Netherlands. He is an editorial board member in four international journals and in one domestic scientific one. Miguel Pagola is an associate lecturer at the Department of Automatics and Computation, Public University of Navarra, Spain. He received his MSc in industrial engineering at the Public University of Navarra in 2000. He enjoyed a scholarship within a research project developing intelligent control strategies from 2000 to 2002 and then he joined the Public University of Navarra as associate lecturer. His research interests are fuzzy techniques for image processing, fuzzy set theory, interval type2 fuzzy set theory, fuzzy control systems, genetic algorithms, and neural networks. He was a research visitor at the DeMonfort University. He is a member of the European Society for Fuzzy Logic and Technology (EUSFLAT). Sankar K. Pal is the director of the Indian Statistical Institute, Calcutta. He is also a professor, distinguished scientist, and the founding head of Machine Intelligence Unit. He received the MTech and PhD degrees in radio physics and electronics in 1974 and 1979, respectively, from the University of Calcutta. In 1982 he received another PhD in electrical engineering along with DIC from Imperial College, University of London. Professor Pal is a fellow of the IEEE, USA, Third World Academy of Sciences, Italy, International Association for Pattern Recognition, USA, and all the four National Academies for Science/Engineering in India. His research interests include pattern recognition and machine learning, image processing, data mining, soft computing, neural nets, genetic algorithms, fuzzy sets, rough sets, web intelligence, and bioinformatics. He is a coauthor of ten books and about three hundred research publications. Professor Pal has served as an editor, associate editor, and a guest editor of a number of journals including IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Neural Networks, IEEE Computer, Pattern Recognition Letters, Neurocomputing, Information Sciences, and Fuzzy Sets and Systems. Gabriella Pasi completed her Laurea degree in computer science at the Universit`a degli Studi di Milano, Italy, and the PhD degree in computer science at the Universit´e de Rennes, France. She worked as a researcher at the National Council of Research in Italy from April 1985 until February 2005. She is now an Associate Professor at the Universit`a Degli Studi di Milano Bicocca, Milano, Italy. Her research activity mainly concerns the modeling and design of flexible systems (i.e., systems able to manage imprecision and uncertainty) for the management and access to information, such as information retrieval systems, information filtering systems, and database management systems. She also works at the definition of
Biographies
xxvii
techniques of multicriteria decision making and group decision making. She is a member of organizing and program committees of several international conferences. She has coedited seven books and several special issues of international journals. She has published more than 150 papers in international journals, books, and proceeding of international conferences. She is coeditor of seven books and several special issues of international journals. Since 2001 she is a member of the editorial board of the journals Mathware and Soft Computing and ACM Applied Computing Review, and since 2006 she has been a member of the editorial board of Fuzzy Sets and Systems. She has been the coordinator of the European Project PENG (Personalized News Content Programming). This is a STREP (Specific Targeted Research or Innovation Project), within the VI Framework Programme, Priority II, Information Society Technology. She organized several international events among which the European Summer School in Information Retrieval (ESSIR 2000) and FQAS 2006, and she coorganizes every year the track ‘Information Access and Retrieval’ within the ACM Symposium on Applied Computing. James F. Peters, PhD (1991), is a Full Professor in the Department of Electrical and Computer Engineering (ECE) at the University of Manitoba. Currently, he is coeditorinchief of the Transactions on Rough Sets journal published by Springer, cofounder and researcher in the Computational Intelligence Laboratory in the ECE Department (1996–), and current member of the Steering Committee, International Rough Sets Society. Since 1997, he has published numerous articles about approximation spaces, systems that learn adaptively, and classification in refereed journals, edited volumes, international conferences, symposia, and workshops. His current research interests are in approximation spaces (near sets), pattern recognition (ethology and image processing), rough set theory, reinforcement learning, biologically inspired designs of intelligent systems (vision systems that learn), and the extension of ethology (study of behavior of biological organisms) in the investigation of intelligent systems behavior. Lech Polkowski was born in 1946 in Poland. He graduated from Warsaw University of Technology in 1969 and from Warsaw University in 1977. He obtained his PhD in theoretical mathematics from Warsaw University in 1982, Doctor of Science (habilitation) in 1994 in mathematical foundation of computer science, and has been professor titular since 2000. Professor Polkowski lectured in Warsaw University of Technology, Ohio University (Athens, Ohio, USA), and Polish–Japanese Institute of Information Technology. His papers are quoted in monographs of topology and dimension theory. Since 1992 he has been interested in rough sets, mostly foundations and relations to theoretical paradigms of reasoning. He has published extensively on the topology of rough set spaces, logics for reasoning with rough sets, mereological foundations of rough sets, rough mereology, granulation theory, granulated data systems, multiagent systems, and rough cognitive computing. Anna Maria Radzikowska is an Assistant Professor at the Faculty of Mathematics and Information Science, Warsaw University of Technology (Poland). Her research interests include logical and algebraic methods for representing, analyzing, and reasoning about knowledge. Currently, her research focuses on hybrid fuzzy rough approaches to analyzing data in information systems. Sheela Ramanna received a PhD in computer science from Kansas State University. She is a Full Professor and past chair of the Applied Computer Science Department at the University of Winnipeg, Canada. She serves on the editorial board of the TRS Journal and is one of the program cochairs for RSFDGrC’07 Conference. She is currently the secretary for the International Rough Set Society. She has served on program committees of many past international conferences, including RSEISP 2007, ECML/PKDD 2007, IAT/WI 2007, and IICAI2007. She has published numerous papers on rough set methods in software engineering and intelligent systems. Her research interests include rough set theory in requirements engineering and software quality and methodologies for intelligent systems. Jaroslav Ram´ık holds an MSc and PhD degree in mathematics from the Faculty of Mathematics and Physics, Charles University in Prague (Czech Republic). He is the author of numerous monographs, books, papers, and research works in optimization, including fuzzy mathematical programming, multicriteria decision making, fuzzy control, and scheduling. Since 1990 he has been a professor and head of the
xxviii
Biographies
Department of Mathematical Methods in Economics at the Silesian University Opava, School of Business Administration in Karvin´a. Leon Reznik is a professor of computer science at the Rochester Institute of Technology, New York. He received his BS/MS degree in computer control systems in 1978 and a PhD degree from St Petersburg Polytechnic Institute in 1983. He has worked in both industry and academia in the areas of control, system, software and information engineering, and computer science. Professor Reznik is an author of the textbook Fuzzy Controllers (ButterworthHeinemann, 1997) and an editor of Fuzzy System Design: Social and Engineering Applications (PhysicaVerlag, 1998), Soft Computing in Measurement and Information Acquisition (Springer, 2003), and Advancing Computing and Information Sciences (Cary Graphic Arts Press, 2005). Dr. Reznik’s research has been concentrated on study and development of fuzzy and soft computing models applications. He pioneered a new research direction where he is applying fuzzy and soft computing models to describe measurement results with applications in sensor networks. Shounak Roychowdhury received his BEng in computer science and engineering from Indian Institute of Science, Bangalore, India, 1990. In 1997, he received MS in computer science from the University of Tulsa, OK. In between he has worked as researcher in LG’s research laboratories in Seoul, Korea. Currently, he is a senior member of technical staff at Oracle Corporation. His current interests include fuzzy theory, data mining, and databases. At the same time he is also a parttime graduate student at the University of Texas at Austin. Susanne SamingerPlatz graduated from the Technical University Vienna, Austria, in 2000. She defended her PhD in mathematics at the Johannes Kepler University, Linz, Austria, in 2003. She is an Assistant Professor at the Department of KnowledgeBased Mathematical Systems, Johannes Kepler University, Linz, Austria, and currently on a sabbatical year at the Dipartimento di Matematica ‘Ennio De Giorgi’ of the Universit del Salento, Italy. Her major research interests focus on the preservation of properties during uni and bipolar aggregation processes and therefore relate to such diverse fields as fuzzy logic, preference and uncertainty modeling, decision making, and probabilistic metric spaces. She is author/coauthor of several journal papers and chapters in edited volumes. She is further a member of the European Association for Fuzzy Logic and Technology (EUSFLAT), of the EURO Working Group on Fuzzy Sets (EUROFUSE), and of the Austrian Mathematical Society. Salvatore Sessa is professor of computer science at the Faculty of Architecture of Naples University Federico II. His main research interests are devoted to applications of fuzzy logic to image processing, approximate reasoning, geographic information systems, and fuzzy systems. He has published and edited several monographies and numerous papers on wellknown international journals. He is coeditor of the section ‘Recent Literature’ of the journal Fuzzy Sets and Systems. Simon C.K. Shiu is an Assistant Professor at the Department of Computing, Hong Kong Polytechnic University, Hong Kong. He received an MSc degree in computing science from the University of Newcastle Upon Tyne, UK, in 1985, an MSc degree in business systems analysis and design from City University, London, in 1986, and a PhD degree in computing from Hong Kong Polytechnic University in 1997. He worked as a system analyst and project manager between 1985 and 1990 in several business organizations in Hong Kong. His current research interests include casebased reasoning, machine learning, and soft computing. He has coguest edited a special issue on soft casebased reasoning of the journal Applied Intelligence. Dr. Shiu is a member of the British Computer Society and the IEEE. Luke David Simoni has a BS degree in chemical engineering from the Michigan Technological University. He is currently a PhD student at the University of Notre Dame, where he holds an Arthur J. Schmitt Presidential Fellowship. Roman Slowinski, ´ professor and founding head of the Laboratory of Intelligent Decision Support Systems within the Institute of Computing Science, Poznan University of Technology, Poland. He received
Biographies
xxix
the PhD in operations research and habilitation in computing science from the Poznan University of Technology, in 1977 and 1981, respectively. He has been professor on European Chair at the University of Paris Dauphine and invited professor at the Swiss Federal Institute of Technology in Lausanne and at the University of Catania. His research concerns operational research and artificial intelligence, including multiplecriteria decision analysis, preference modeling, project scheduling, knowledgebased decision support in medicine, technology, and economics, and rough set theory approach to knowledge and data engineering. He is laureate of the EURO Gold Medal (1991) and Doctor Honoris Causa of Polytechnic Faculty of Mons (2000) and University of Paris Dauphine (2001). Since 1999, he has been editorinchief of the European Journal of Operational Research. Since 2004, he has been elected member of the Polish Academy of Sciences. In 2005, he received the most prestigious Polish scientific award from the Foundation for Polish Science. Laerte Sorini is an Assistant Professor in Urbino University (Italy), where he teaches mathematics and informatics. He received his BA in mathematics from Bologna University. His current research deals with numerical aspects in simulation of stochastic differential equations and in implementation of fuzzy systems. When not teaching or writing, Laerte enjoys political debate and motorcycling. Mark Allen Stadtherr is professor of chemical and biomolecular engineering at the University of Notre Dame. He has a BChE degree from the University of Minnesota and a PhD degree from the University of Wisconsin. He was awarded the 1998 Computing in Chemical Engineering Award by the American Institute of Chemical Engineers. His research interests include the application of interval methods to global optimization, nonlinear algebraic equation solving, and systems of ordinary differential equations. Luciano Stefanini is a Full Professor at the University of Urbino (Italy) where he currently teaches mathematics and informatics. He received his BA in mathematics from the University of Bologna in 1974 and specialized in numerical analysis in 1975. From 1975 to 1982 he has been with the ENI Group for industrial research in computational mathematics and operations research. In 1982 he started with the University of Urbino. He has directed various research and applied projects in industry and in public sectors. His research activity has produced papers covering fields in numerical analysis and statistical computing, operations research, combinatorial optimization and graph theory, distribution management and transportation, geographic information systems, mathematical finance and game theory, fuzzy numbers and calculus. Jaroslaw Stepaniuk holds a PhD degree in mathematical foundations of computer science from the University of Warsaw in Poland and a Doctor of Science (habilitation) degree in computer science from the Institute of Computer Science Polish Academy of Sciences. Jaroslaw Stepaniuk is asssociate professor in the Faculty of Computer Science at Bialystok University of Technology and is the author of more than 130 scientific publications. His areas of expertise include reasoning with incomplete information, approximate reasoning, soft computing methods and applications, rough sets, granular computing, synthesis and analysis of complex objects, intelligent agents, knowledge discovery systems, and advanced data mining techniques. C.D. Stylios is an electrical engineer (Aristotle University of Thessaloniki, 1992); he received his PhD from the Department of Electrical and Computer Engineering, University of Patras, Greece (1999). He is Assistant Professor at the Department of Informatics and Telecommunications Technology, Technological Education Institute of Epirus, Greece, and director of Knowledge and Intelligent Computing Laboratory (March 2006–today). Since 1999, he is a senior researcher at Laboratory for Automation and Robotics, University of Patras, Greece, and since 2004 is an external consultant at Patras Science Park. He was adjunct assistant professor at Computer Science Department, University of Ioannina, Greece (2000– 2004). He has published over 60 journals and conference papers, book chapters, and technical reports. His research interests include soft computing methods, computational intelligent techniques, modeling of complex systems, intelligent systems, decision support systems, hierarchical systems, and artificial
xxx
Biographies
intelligence techniques for medical applications. He is a member of IEEE and the National Technical Chamber of Greece. Peter Sussner is an Assistant Professor at the Department of Applied Mathematics of the State University of Campinas. He also acts as a researcher for the Brazilian national science foundation CNPq and holds a membership of the IEEE Computational Intelligence Society. He has previously worked as a researcher at the Center of Computer Vision and Visualization at the University of Florida where he completed his PhD in mathematics – partially supported by a Fulbright Scholarship – in 1996. Peter Sussner has regularly published articles in refereed international journals, book chapters, and conference proceedings in the areas of artificial neural networks, fuzzy systems, computer vision, mathematical imaging, and global optimization. His current research interests include neural networks, fuzzy systems, mathematical morphology, and lattice algebra. Piotr Synak is one of the founders of Infobright, Inc., which has developed marketleading compression technologies implemented through a revolutionary, rough set theory based view of databases and data storage. He obtained his PhD in computer science in 2004 from the Polish Academy of Sciences. Since 1996 he has worked at the Polish–Japanese Institute of Information Technology in Poland and currently holds the position of Assistant Professor. He is the author of several papers related to rough sets and spatiotemporal reasoning. Manuel Tarrazo teaches corporate finance and investments courses at the School of Business of the University of San Francisco, where he is an Associate Professor of finance. His research interest includes the application of conventional (calculus, probabilistic methods, combinatorial optimization) and emerging methodologies (fuzzy sets, approximate equations, neural networks) to portfolio optimization, fixedincome analysis, asset allocation, and corporate financial planning. He has published research in the following journals: The European Journal of Operational Research, Applied Numerical Mathematics, Fuzzy Optimization and Decision Making, Financial Services Review, Advances in Financial Planning and Forecasting, Advances in Financial Education, Financial Technology, International Journal of Business, Journal of Applied Business and Economics, Midwest Review of Finance and Insurance, Research Papers in Management and Business, Revista Alta Direcci´on, and The International Journal of Business Research. In addition, he has made over 35 professional presentations, earning three ‘Best Study’ awards, and published the following monographs: ‘Practical Applications of Approximate Equations in Finance and Economics,’ Quorum Publishers, Greenwood Publishing Group, January 2001; ‘Advanced Spreadsheet Modeling for Portfolio Management,’ coauthored with Gregory Alves, Kendall/Hunt, 1996. Professor Tarrazo is a native from Spain, where he obtained a Licenciatura at the Universidad Complutense de Madrid. He worked as a financial manager before completing his doctoral education at the State University of New York at Albany, NY. ˙ Burhan Turk¸ ¨ sen joined the Faculty of Applied Science and Engineering at the University of Toronto I. and became professor emeritus in 2003. In December 2005, he was appointed as the head of department of Industrial Engineering at TOBB Economics and Technology University in Ankara Turkey. He was the president of International Fuzzy Systems Association (IFSA) during 1997–2001 and past president of IFSA during 2001–2003. Currently, he is the president, CEO, and CSO of Information Intelligence Corporation (IIC). He received the outstanding paper award from NAFIPS in 1986, ‘L.A. Zadeh Best Paper Award’ from Fuzzy Theory and Technology in 1995, ‘Science Award’ from Middle East Technical University, and an ‘Honorary Doctorate’ from Sakarya University. He is a foreign member in the Academy of Modern Sciences. Currently, he is a fellow of IFSA, IEEE, and WIF (World Innovation Foundation). He has published around 300 papers in scientific journals and conference proceedings. More than 600 authors have made references to his published works. His book entitled An Ontological and Epistemological Perspective of Fuzzy Theory was published by Elsevier in January 2006. Julio J. Vald´es is a senior research officer at the National Research Council Canada, Institute for Information technology. He has a PhD in mathematics and his areas of interest are artificial intelligence
Biographies
xxxi
(mathematical foundations of uncertainty processing and machine learning), computational intelligence (fuzzy logic, neural networks, evolutionary algorithms, rough sets, probabilistic reasoning), data mining, virtual reality, hybrid systems, image and signal processing, and pattern recognition. He is member of the IEEE Computational Intelligence Society and the International Neural Network Society. He has been coeditor of two special issues of the Neural Network Journal and has more than 150 publications in journals and international conferences. Marcos Eduardo Valle recently completed his PhD in applied mathematics at the State University of Campinas (UNICAMP), Brazil, under the supervision of Dr. Sussner. His doctoral research was financially supported by a scholarship from the Brazilian national science foundation CNPq. Currently, Dr. Valle is working as a visiting professor, funded by Fundac˜ao de Amparoa Pesquisa do Estado de S˜ao Paulo (FAPESP), at the Department of Applied Mathematics at the State University of Campinas. His research interests include fuzzy set theory, neural networks, and mathematical morphology. Jos´e Valente de Oliveira received the PhD (1996), MSc (1992), and the ‘Licenciado’ degrees in electrical and computer engineering, all from the IST, Technical University of Lisbon, Portugal. Currently, he is an Assistant Professor in the Faculty of Science and Technology of the University of Algarve, Portugal, where he served as deputy dean from 2000 to 2003. Dr. Valente de Oliveira was recently appointed director of the UALGiLAB, The University of Algarve Informatics Lab, a research laboratory whose pursuits in what concerns computational intelligence includes fuzzy sets, fuzzy and intelligent systems, data mining, machine learning, and optimization. During his first sabbatical year (2004/2005) he was with the University of Alberta, Canada, as a visiting professor. Dr. Valente de Oliveira is an associated editor of the Journal of Intelligent & Fuzzy Systems (IOS Press) and coeditor of the book Advances in Fuzzy Clustering and Its Applications (Wiley 2007). Mario Veniero, BSc, is senior software engineer at the LASA research group at the University of Salerno. He is an IEEE member and his main research interests are in the area of software agents, soft computing, semantic web, and distributed systems. Since 1998 he was investigating the area of software agents and involved in a number of industrial R&D and academic research projects based on hybrid approach of computational intelligence and agent technologies. He is author of several of original papers in book chapters and in international conference proceedings. Jean Vignes is emeritus professor at the Pierre et Marie Curie University in Paris (UPMC) since 1998. He has received the diploma of mathematiques superieures from the University of Toulouse in 1956 and the diploma of research engineer from the French Petroleum Institute (IFP) school in 1959. He was Docteur es sciences from UPMC in 1969. He has been professor of computer sciences both at IFP school from 1964 to 1998 and at UPMC from 1969 to 1998. Furthermore, he was scientific adviser at IFP from 1969 to 1998. His interest areas include computer arithmetic, roundoff error propagation, and validation of numerical software. He has created a stochastic method called CESTAC (Controle et Estimation Stochastique des Arrondis de Calcul) for estimating the effect of roundoff error propagation and uncertainties of data in every computed result which is at the origin of a software named CADNA (Control of Accuracy and Debugging for Numerical Applications), which automatically implements the CESTAC method in scientific codes. The CESTAC method is also the basis of stochastic arithmetic. He has obtained the award of computer sciences from the French Academy of Sciences for his work in the field of the estimation of the accuracy of computed results. He was also vice president of International Association for Mathematics and Computers in Simulation (IMACS). He is a member of the editorial boards of Mathematics and Computers in Simulation, Applied Numerical Mathematics, Numerical Algorithms, and the International Journal of Pure and Applied Mathematics. He is an honorary member of IMACS. Junzo Watada received his BSc and MS degrees in electrical engineering from Osaka City University, Japan, and PhD on ‘fuzzy analysis and its applications’ from Osaka Prefecture University, Japan. He is a professor of management engineering, knowledge engineering, and soft computing at Graduate School of Information, Production & Systems, Waseda University, since 2003, after having contributed for 13 years
xxxii
Biographies
as a professor of human informatics and knowledge engineering, to the School of Industrial Engineering at Osaka Institute of Technology, Japan. He was with Faculty of Business Administration, Ryukoku University, for 8 years. Before moving to academia, he was with Fujitsu Ltd. Co., where he worked on development of software systems as a senior system engineer for 7 years. Arkadiusz Wojna is an Assistant Professor at the Institute of Informatics, Warsaw University. His research interests include machine learning, analogybased reasoning, decision support systems, data mining, and knowledge discovery. He received the PhD degree in computer science from Warsaw University in 2005. He is the author and coauthor of conference and journal publications on rough sets, analogybased reasoning, and machine learning and coauthor of the rough set exploration system. He served on the program committees of the International Conference on Rough Sets and Current Trends in Computing (RSCTC2006), the International Conference on Rough Sets and Knowledge Technology (RSKT2006), the Joint Rough Set Symposium (JRS2007), and the Indian International Conference on Artificial Intelligence (IICAI2005 and IICAI2007). Yiyu Yao received his BEng (1983) in Computer Science from Xi’an Jiaotong University, and MSc (1988) and PhD (1991) in computer science from the University of Regina. Currently, he is a professor of computer science with the Department of Computer Science, University of Regina, Canada, and an adjunct professor of International WIC Institute, Beijing University of Technology, Xi’an Jiaotong University, and Chongqing University of Posts and Telecommunication. Dr. Yao’s research interests include web intelligence, information retrieval, uncertainty management (fuzzy sets, rough sets, interval computing, and granular computing), data mining, and intelligent information systems. He has published over 200 papers in international journals and conferences and has been invited to give talks at many international conferences and universities. Sawomir Zadrony is an Associate Professor (PhD 1994, DSc 2006) at the Systems Research Institute, Polish Academy of Sciences. His current scientific interests include applications of fuzzy logic in database management systems, information retrieval, decision support, and data analysis. He is the author and coauthor of about 100 journal and conference papers. He has been involved in the design and implementation of several prototype software packages. He is also a teacher at the Warsaw School of Information Technology in Warsaw, Poland, where his interests focus on information retrieval and database management systems. Bo Zhang, computer scientist, is a fellow of Chinese Academy of Sciences. He was born in March 1935. He is a professor of Computer Science and Technology Department of Tsinghua University, Beijing, China. In 1958 he graduated from Automatic Control Department of Tsinghua University. From 1980 to 1982, he visited University of Illinois at Urbana – Champaign, USA, as a scholar. Now he serves as the chairman of Academic Committee of Information Science and Technology College in Tsinghua University. Ling Zhang, computer scientist. He was born in May 1937. He is a professor of Computer Science Department of Anhui University, Hefei, China. In 1961 he graduated from Mathematics and Astronomy Department of Nanjing University, China. Now he serves as the director of Artificial Intelligence Institute, Anhui University.
Part One Fundamentals and Methodology of Granular Computing Based on Interval Analysis, Fuzzy Sets and Rough Sets
1 Interval Computation as an Important Part of Granular Computing: An Introduction Vladik Kreinovich
1.1 Brief Outline The main goal of this chapter is to introduce interval computations to people who are interested in using the corresponding techniques. In view of this goal, we will not only describe these techniques, but also do our best to outline the problems for which these techniques have been originally invented. We start with explaining why computations in general are needed in practice. Then, we describe the uncertainty related to all these practical applications and, in particular, interval uncertainty. This will bring us to the main problem of interval computations. In the following sections, we will briefly describe the history of interval computations, main interval techniques, and we list a few typical applications of these techniques.
1.2 Why Computations Are Needed in Practical Problems: A Brief Reminder In accordance with the above outline, before we explain the specific role of interval computations, we will recall where and why computations in general are needed.
Let us recall what practical problems we need to solve in the first place. To understand why computations are needed in practice, let us recall what practical problems we need to solve. Crudely speaking, most of the practical problems can be classified into three classes: r We want to learn what is happening in the world; in particular, we want to know the numerical values of different quantities (distances, masses, charges, coordinates, etc.).
r On the basis of these values, we would like to predict how the state of the world will change over time. r Finally, we would like to find out what changes we need to make in the world so that these changes will lead to the desired results. It should be emphasized that this classification is very crude: a reallife problem often involves solving subproblems of all three abovedescribed types. Handbook of Granular Computing C 2008 John Wiley & Sons, Ltd
Edited by Witold Pedrycz, Andrzej Skowron and Vladik Kreinovich
4
Handbook of Granular Computing
The above classification is related to the distinction between science and engineering. The above classification may sound unusual, but in reality, it is related to the wellknown classification of creative activity into engineering and science: r The tasks of learning the current state of the world and predicting the future state of the world are usually classified as science.
r The tasks of finding the appropriate change are usually classified as engineering. Example. r Measuring the river flow at different locations and predicting how this river flow will change over time are problems of science. r Finding the best way to change this flow (e.g., by building dams or levees) is a problem of engineering.
Computations are needed for all three classes of problems. In the following text, we will analyze the problems of these three types one by one. We will see that in all three cases, a large amount of computation is needed. How we learn the current state of the world: sometimes, it is (relatively) straightforward. Let us start with the first class of practical problems: the problem of learning the state of the world. As we have mentioned, this means, in particular, that we want to know the numerical values of different quantities y that characterize this state. Some quantities y we can simply directly measure. For example, when we want to know the current state of a patient in a hospital, we can measure the patient’s body temperature, blood pressure, weight, and many other important characteristics. In some situations, we do not even need to measure: we can simply ask an expert, and the expert will provide us with an approximate value y of the quantity y.
How we learn the current state of the world: sometimes, it is not easy. Some quantities we can simply directly measure. However, many other quantities of interest are difficult or even important to measure or estimate directly. Examples. Examples of such quantities include the amount of oil in a given well or the distance to a star. Let us explain this situation on the example of measuring distances: r We can estimate the distance between two nearby houses by simply placing a measuring tape between them.
r If we are interested in measuring the distance between two cities, in principle, it is possible to do it directly, by driving or walking from one to another. (It is worth mentioning that while such a direct measurement is possible in principle, it is not a reasonable practical way.) r If we are interested in measuring the distance to a star, then, at present, it is not possible to directly measure this distance.
How we can measure difficulttomeasure quantities. Since we cannot directly measure the values of these quantities, the only way to learn some information about them is to r measure (or ask an expert to estimate) some other easiertomeasure quantities x1 , . . . , xn , and then r estimate y based on the measured values xi of these auxiliary quantities xi . Examples. r To estimate the amount of oil in a given well, we perform seismic experiments: we set up small explosions at some locations and measure the resulting seismic waves at different distances from the location of the explosion.
5
Interval Computation: An Introduction
r To find the distance to a faraway star, we measure the direction to the star from different locations on Earth (and/or at different seasons) and the coordinates of (and the distances between) the locations of the corresponding telescopes.
To learn the current value of the desired quantity, we often need a lot of computations. To estimate the value of the desired quantity y, we must know the relation between y and the easiertomeasure (or easiertoestimate) quantities x1 , . . . , xn . Specifically, we want to use the estimates of xi to come up with an estimate for y. Thus, the relation between y and xi must be given in the form of an algorithm f (x1 , . . . , xn ) which transforms the values of xi into an estimate for y. Once we know this algorithm f and the measured values xi of the auxiliary quantities, we can estimate y as y = f ( x1 , . . . , xn ). x1 x2 ···
f
y = f ( x1 , . . . , xn )

xn 
In different practical situations, we have algorithms f of different complexity. For example, to find the distance to a star, we can usually have an explicit analytical formula coming from geometry. In this case, f is a simple formula. On the other hand, to find the amount of oil, we must numerically solve a complex partial differential equation. In this case, f is a complex iterative algorithm for solving this equation. There are many such practical cases when the algorithm f requires a lot of computations. Thus, the need to learn the current state of the world indeed often leads to the need to perform a large number of computations.
Comment: the notion of indirect measurement. We started with the situation in which we cannot estimate the value of the desired quantity y by simply directly measuring (or directly estimating) this value. In such situations, we can use the above twostage process, as a result of which we get an indirect estimate for y. In the case when the values xi are obtained by measurement, this twostage process does involve measurement. To distinguish it from direct measurements (i.e., measurements which directly measure the values of the desired quantity), the above twostage process is called an indirect measurement. Computations are needed to predict the future state of the world. Once we know the values of the quantities y1 , . . . , ym which characterize the current state of the world, we can start predicting the future state of the world, i.e., the future values of these quantities. To be able to predict the future value z of each of these quantities, we must know exactly how this value z depends on the current values y1 , . . . , ym . Specifically, we want to use the known estimates yi for yi to come up with an estimate for z. Thus, the relation between z and yi must be given in the form of an algorithm g(y1 , . . . , ym ) which transforms the values of yi into an estimate for z. Once we know this algorithm g and the estimates yi for the current values of the quantities yi , we can estimate z as z = g( y1 , . . . , yn ). Again, the corresponding algorithm g can be very complicated and time consuming. So, we often need a large number of computations to make the desired predictions. This is, e.g., how weather is predicted now: weather prediction requires so many computations that it can only be performed on fast supercomputers.
6
Handbook of Granular Computing
The general notion of data processing. So far, we have analyzed two different classes of practical problems:
r the problem of learning the current state of the world (i.e., the problem of indirect measurement) and r the problem of predicting the future state of the world. From the practical viewpoint, these two problems are drastically different. However, as we have seen, from the computational viewpoint, these two problems are very similar. In both problems,
r we start with the estimates x1 , . . . , xn for the quantities x1 , . . . , xn , and then r we apply the known algorithm f to these estimates, resulting in an estimate y = f ( x1 , . . . , xn ) for the desired quantity y. In both cases, this algorithm can be very time consuming. The corresponding (often time consuming) computational part of each of these two classes of problems – applying a known algorithm to the known values – is called data processing.
Comment. Since the computational parts of these two classes of problems are similar, it is important to describe the difference between these two classes of problems. As we can see from the above descriptions, the only difference between the two classes is where the original inputs xi come from: r In the problem of learning the current state of the world, the inputs xi come from direct measurements (or direct expert estimation).
r In contrast, in the problem of predicting the future state of the world, the inputs yi come from the learning stage – e.g., they may come from indirect measurements.
Decision making, design, control. Once we know the current state of the world and we know how to predict the consequences of different decisions (designs, etc.), it is desirable to find a decision (design, etc.) which guarantees the given results. Depending on what we want from this design, we can subdivide all the problems from this class into two subclasses. In both subclasses, the design must satisfy some constraints. Thus, we are interested in finding a design that satisfies all these constraints. r In some practical situations, satisfaction of all these constraints is all we want. In general, there may be several possible designs which satisfy given constraints. In the problems from the first subclass, we do not have any preferences for one of these designs – any one of them will suffice. Such problems are called the problems of constraint satisfaction. r In other practical situations, we do have a clear preference between different designs x. This preference is usually described in terms of an objective function F(x) – a function for which more preferable designs x correspond to larger values of F(x). In such situation, among all the designs which satisfy given constraints, we would like to find a design x for which the value F(x) of the given objective function is the largest. Such problems are called optimization problems. Both constraint satisfaction and optimization often require a large number of computations (see, e.g., [1]).
Comment. Our main objective is to describe interval computations. They were originally invented for the first two classes of problems, i.e., for data processing, but they turned out to be very useful for the third class (constraint satisfaction and optimization) as well.
1.3 In RealLife Computations, We Need to Take Uncertainty into Account Need for computations: reminder. In the previous section, we described the importance of computations. In particular, computations constituting data processing process the values which come from measurements (direct or indirect) and from expert estimations.
7
Interval Computation: An Introduction
Let us start with the problem of learning the values of the physical quantities. Let us start with the problems from the first class – the problems of learning the values of the physical quantities. In these problems, computations are needed to transform the results x1 , . . . , xn of direct measurements (or direct expert estimations) into the estimate y = f ( x1 , . . . , xn ) of the desired quantity y. In the case of both measurements and expert estimates, the estimates xi are only approximately equal to the (unknown) actual values xi of the corresponding quantities. Let us elaborate on this statement. Measurements are never exact. r From the philosophical viewpoint, measurements cannot be exact because – the actual value of the quantity is a general real number; so, in general, we need infinitely many bits to describe the exact value, while – after every measurement, we gain only a finite number of bits of information (e.g., a finite number of binary digits in the binary expansion of the number). r From the physical viewpoint, there is always some difficulttodelete noise which is mixed with the measurement results.
Expert estimates are never absolutely exact either. r First of all, as with the measurements, expert estimates cannot be absolutely exact, because an expert generates only a finite amount of information.
r Second, from the commonsense viewpoint, experts are usually even less accurate than (sometimes superprecise) are measuring instruments. def
In both cases, there is usually a nonzero approximation error. The difference Δxi = xi − xi
between the (approximate) estimate xi and the (unknown) actual value xi of the quantity xi is called the approximation error. In particular, if xi is obtained by measurement, this difference is called the measurement error.
Uncertainty in inputs leads to uncertainty in the result of data processing. We assume that the quantities x1 , . . . , xn that we directly measure or directly estimate are related to the desired quantity y by a known relation y = f (x1 , . . . , xn ). Because of this relation, we estimate the value y as y = f ( x1 , . . . , xn ). Since the values xi are, in general, different from the (unknown) actual values xi , the result y = f ( x1 , . . . , xn ) of applying the algorithm f to the estimates xi is, in general, different from the result y = f (x1 , . . . , xn ) of applying this algorithm to the actual values xi . Thus, the estimate y is, in general, different from the actual value y of def y − y = 0. the desired quantity: Δy = It is therefore desirable to find out the uncertainty Δy caused by the uncertainties Δxi in the inputs: Δx1

Δx2

... Δxn
f
Δy


Comment. In the above argument, we assumed that the relation f provides the exact relation between
the variables x1 , . . . , xn and the desired value y. In this case, in the ideal case when we plug in the actual (unknown) values of xi into the algorithm f , we get the exact value y = f (x1 , . . . , xn ) of y.
8
Handbook of Granular Computing
In many reallife situations, the relation f between xi and y is only approximately known. In this case, even if we know the exact values of xi , substituting these values into the approximate function f will not provide us with the exact value of y. In such situations, there is even more uncertainty in y:
r First, there is an uncertainty in y caused by the uncertainty in the inputs. r Second, there is a model uncertainty caused by the fact that the known algorithm f provides only an approximate description of the dependence between the inputs and the output. Interval computations enable us to estimate the uncertainty in y caused by the uncertainty of the inputs. If there is also a model uncertainty, it has to be estimated separately and added to the uncertainty produced by the interval computations techniques.
In many practical problems, it is important to estimate the inaccuracy of the results of data processing. In many practical applications, it is important to know not only the desired estimate for the quantity y, but also how accurate this estimate is. For example, in geophysical applications, it is not enough to know that the amount of oil in a given oil field is about 100 million tons. It is important to know how accurate this estimate is. If the amount is 100 ± 10, this means that the estimates are good enough, and we should start exploring this oil field. On the other hand, if it is 100 ± 200, this means that it is quite possible that the actual value of the desired quantity y is 0; i.e., there is no oil at all. In this case, it may be prudent to perform additional measurements before we invest a lot of money into drilling oil wells. The situation becomes even more critical in medical emergencies: it is not enough to have an estimate of the blood pressure or the body temperature to make a decision (e.g., whether to perform a surgery); it is important that even with the measurement uncertainty, we are sure about the diagnosis – and if we are not, maybe it is desirable to perform more accurate measurements.
Problems of the second class (prediction related): uncertainty in initial values leads to yi of uncertainty of predicted values. In the prediction problems, we start with the estimates the current values of the known quantities; we then apply the prediction algorithm g and produce the prediction z = g( y1 , . . . , ym ) for the desired future value z. We have already mentioned that, in general, the estimates yi of the current values of the quantities yi are different from the (unknown) actual values yi of these quantities. Therefore, even if the prediction algorithm is absolutely exact, i.e., if the future value of z is equal to g(y1 , . . . , ym ), the prediction result z will be different from the actual future value z.
Comment. In many practical situations, the prediction algorithm is only approximately known, so in general ( just as for the problems from the first class), there is also a model uncertainty – an additional component of uncertainty.
1.4 From Probabilistic to Interval Uncertainty: Case of Indirect Measurements Let us start with the uncertainty of learning the values of the desired quantities. In the previous section, we have shown that the uncertainties in the results of direct measurements and/or direct expert estimations lead to an uncertainty in our estimates of the current values of the physical quantities. These uncertainties, in turn, lead to an uncertainty in the predicted values. We are interested in the uncertainties occurring in problems of both classes: learning the current values and predicting the future values. Since the uncertainty in the future values comes from the uncertainty in the current values, it is reasonable to start with analyzing the uncertainty of the learned values.
9
Interval Computation: An Introduction
Let us start with indirect measurements. In the situation of learning the current values of the physical quantities, there are two possible situations: r when the (estimates for the) values of the auxiliary quantities xi come from direct measurements and r when these estimates come from the expert estimation. (Of course, it is also possible that some estimates come from measurement and some from expert estimation.) There is a lot of experience of handling measurement uncertainty, so we will start our analysis with measurement uncertainty. After that, we will explain how similar techniques can handle expert uncertainty.
Case of direct measurements: what can we know about Δxi . To estimate the uncertainty Δy caused by the measurement uncertainties Δxi , we need to have some information about these original uncertainties Δxi . The whole idea of uncertainty is that we do not know the exact value of xi . (Hence, we do not know the exact value of Δxi .) In other words, there are several possible values of Δxi . So, the first thing we would like to know is what is the set of possible values of Δxi . We may also know that some of these possible values are more frequent than the others. In other words, we may also have some information about the probabilities of different possible values Δxi . We need to go from theoretical possibility to practical situations. Up to now, we have analyzed the situation on a purely theoretical level: what kind of information can we have in principle. From the viewpoint of practical applications, it is desirable to analyze what information we actually have. First piece of information: upper bound on the measurement error. The manufacturers of a measuring device usually provide us with an upper bound Δi for the (absolute value of) possible measurement errors, i.e., with the bound Δi for which we are guaranteed that Δxi  ≤ Δi . The need for such a bound comes from the very nature of a measurement process. Indeed, if no such bound is provided, this means that the actual value xi can be as different from the ‘measurement result’ xi as possible. Such a value xi is not a measurement, but a wild guess. Enter intervals. Since the (absolute value of the) measurement error Δxi = x˜i − xi is bounded by the given bound Δi , we can therefore guarantee that the actual (unknown) value of the desired quantity belongs to the interval def
xi = [ xi − Δi , xi + Δi ].
Example. For example, if the measured value of a quantity is xi = 1.0 and the upper bound Δi on the measurement error is 0.1, this means that the (unknown) actual value of the measured quantity can be anywhere between 1 − 0.1 = 0.9 and 1 + 0.1 = 1.1; i.e., it can take any value from the interval [0.9, 1.1]. Often, we also know probabilities. In many practical situations, we not only know the interval [−Δi , Δi ] of possible values of the measurement error; we also know the probability of different values Δxi within this interval [2]. In most practical applications, it is assumed that the corresponding measurement errors are normally distributed with 0 means and known standard deviation. Numerous engineering techniques are known (and widely used) for processing this uncertainty (see, e.g., [2]). How we can determine these probabilities. In practice, we can determine the desired probabilities
of different values of Δxi by comparing
r the result xi of measuring a certain quantity with this instrument and r the result xi st of measuring the same quantity by a standard (much more accurate) measuring instrument.
10
Handbook of Granular Computing
Since the standard measuring instrument is much more accurate than the one we use, i.e.,  xi st − xi   xi − xi , we can assume that xi st = xi , and thus that the difference xi − xi st between these two measurement results is practically equal to the measurement error Δxi = xi − xi . Thus, the empirical distribution of the difference xi − xi st is close to the desired probability distribution of the measurement error.
In some important practical situations, we cannot determine these probabilities. In many practical cases, by using standard measuring instruments, we can determine the probabilities of different values of Δxi . There are two cases, however, when this determination is not done: r First is the case of cuttingedge measurements, e.g., measurements in fundamental science. When the Hubble telescope detects the light from a distant galaxy, there is no ‘standard’ (much more accurate) telescope floating nearby that we can use to calibrate the Hubble: the Hubble telescope is the best we have. r The second case is the case of real industrial applications (such as measurements on the shop floor). In this case, in principle, every sensor can be thoroughly calibrated, but sensor calibration is so costly – usually costing several orders of magnitude more than the sensor itself – that manufacturers rarely do it (only if it is absolutely necessary). In both cases, we have no information about the probabilities of Δxi ; the only information we have is the upper bound on the measurement error.
Case of interval uncertainty. In this case, after performing a measurement and getting a measurement
result xi , the only information that we have about the actual value xi of the measured quantity is that it belongs to the interval xi = [ xi − Δi , xi + Δi ]. In other words, we do not know the actual value xi of the ith quantity. Instead, we know the granule [ xi − Δi , xi + Δi ] that contains xi .
Resulting computational problem. In this situation, for each i, we know the interval xi of possible values of xi , and we need to find the range def
y = { f (x1 , . . . , xn ) : x1 ∈ x1 , . . . , xn ∈ xn } of the given function f (x1 , . . . , xn ) over all possible tuples x = (x1 , . . . , xn ) with xi ∈ xi .
The desired range is usually also an interval. Since the function f (x1 , . . . , xn ) is usually continuous, this range is also an interval; i.e., y = [y, y] for some y and y. So, to find this range, it is sufficient to find the endpoints y and y of this interval. From traditional (numerical) computations to interval computations. In traditional data processing, we know the estimates xi of the input values, and we use these estimates to compute the estimate y for the desired quantity y. The corresponding algorithm is a particular case of computations (which often require a large amount of computing power). When we take uncertainty in the account, we have a similar problem, in which r as inputs, instead of the numerical estimates xi for xi , we have intervals of possible values of xi , and r as an output, instead of a numerical estimate y for y, we want to compute the interval [y, y] of possible values of y. The corresponding computations are therefore called interval computations. Let us formulate the corresponding problem of interval computations in precise terms.
The main problem of interval computations: a precise description. We are given r an integer n, r n intervals x1 = [x , x 1 ], . . . , xn = [x , x n ], and 1 n r an algorithm f (x1 , . . . , xn ) which transforms n real numbers into a real number y = f (x1 , . . . , xn ).
11
Interval Computation: An Introduction
We need to compute the endpoints y and y of the interval y = [y, y] = f (x1 , . . . , xn ) : x1 ∈ [x 1 , x 1 ], . . . , [x n , x n ] .
x1 x2 ...
f
y

xn 
Interval computations are also important for the second class of problems: predicting future. In the prediction problem, we start with the known information about the current values y1 , . . . , ym of the physical quantities. On the basis of this information, we would like to derive the information about possible future value z = g(y1 , . . . , ym ) of each quantity of interest z. We have already mentioned that in many practically important situations, we can only determine the intervals [y i , y i ] of possible values of yi . In this case, the only information that we can deduce about z is that z belongs to the range z = {g(y1 , . . . , ym ) : y1 ∈ [y 1 , y 1 ], . . . , ym ∈ [y m , y m ]}. The problem of computing this range is also the problem of interval computations:
r We know intervals of possible values of the input. r We know the algorithm that transforms the input into the output. r We want to find the interval of possible values of the output. Thus, interval computations are also important for the prediction problem.
1.5 Case of Expert Uncertainty How can we describe expert uncertainty. So far, we have analyzed measurement uncertainty. As we have mentioned earlier, expert estimates also come with uncertainty. How can we estimate and process this uncertainty? Probabilistic approach: its possibility and limitations. For a measuring instrument, we know how to estimate the probability distribution of the measurement error: r Ideally, we should compare the measurement results with the actual values of the measured quantity. The resulting differences form a sample from the actual distribution of measurement error. On the basis of this sample, we can determine the probability distribution of the measurement error. r In practice, since we cannot determine the exact actual value of the quantity, we use an approximate value obtained by using a more accurate measuring instrument. On the basis of the sample of the corresponding differences, we can still determine the probability distribution of the measurement error. In principle, we can do the same for expert estimates. Namely, to estimate the quality of expert estimates, we can consider the cases when the quantity estimates by an expert were consequently measured. Usually, measurements are much more accurate than expert estimates; i.e.,  xmeas − x  x − x, where x is the
12
Handbook of Granular Computing
(unknown) value of the estimated quantity, x is the expert estimate for this quantity, and xmeas is the result of the consequent measurement of this same quantity. In comparison with expert estimates, we can therefore consider measurement results as approximately equal to the actual values of the quantity: xmeas − x ≈ x − x. Thus, by considering the differences xmeas − x as a sample from the unknown probability distribution, we can determine the probability distribution of the expert estimation error. If we have such a probability distribution, then we can use traditional welldeveloped statistical methods to process expert estimates – the same way we can process measurement results for which we know the distribution of measurement errors. To determine a probability distribution from the empirical data, we need a large sample: the larger the sample, the more accurate the results.
r A measuring instrument takes a small portion of a second to perform a measurement. Thus, with a measuring instrument, we can easily perform dozens, hundreds, and even thousands of measurements. So, we can have samples which are large enough to determine the corresponding probability distribution with reasonable accuracy. r On the other hand, for an expert, a single estimate may require a lot of analysis. As a result, for each expert, there are usually few estimates, and it is often not possible to determine the distribution from these estimates.
Experts can produce interval bounds. A measuring instrument usually simply produces a number; it cannot be easily modified to also produce an information about the measurement uncertainty, such as the upper bound on the measurement error. In contrast, an expert is usually able not only to supply us with an estimate, but also to provide us with an accuracy of this estimate. For example, an expert can estimate the age of a person as x = 30 and indicate that this is 30 plus minus Δ = 5. In such a situation, what the expert is actually saying is that the actual (unknown) value of the estimated quantity should be in the interval [ x − Δ, x + Δ]. Interval computations are needed to handle interval uncertainty in expert estimates. Let us now consider a typical situation of data processing. We are interested in some quantity y which is difficult to estimate directly. To estimate y, we ask experts to estimate the values of the auxiliary quantities x1 , . . . , xn which are related to y by a known dependence y = f (x1 , . . . , xn ). On the basis of the expert estimates xi and the expert estimates Δi of their inaccuracy, we conclude def xi − Δi , xi + Δi ]. that the actual (unknown) value of the each quantity xi belongs to the interval xi = [ Thus, we can conclude that the actual value of y = f (x1 , . . . , xn ) belongs to the interval range def
[y, y] = { f (x1 , . . . , xn ) : x1 ∈ x1 , . . . , xn ∈ xn }. The problem of computing this range is exactly the problem of interval computations.
From interval to fuzzy uncertainty. Usually, experts can provide guaranteed bounds Δi on the inaccuracy of their estimates. Often, however, in addition to these (rather wide) bounds, experts can also produce narrower bounds – which are, however, true only with a certain degree of certainty. For example, after estimating the age as 30, r in addition to saying that an estimation inaccuracy is always ≤5 (with 100% certainty), r an expert can also say that with 90% certainty, this inaccuracy is ≤4, and r with 70% certainty, this inaccuracy is ≤2. Thus, instead of a single interval [30 − 5, 30 + 5] = [25, 35] that is guaranteed to contain the (unknown) age with certainty 100%, the expert also produces a narrower interval [30 − 4, 30 + 4] = [26, 34] which contains this age with 90% certainty, and an even narrower interval [30 − 2, 30 + 2] = [28, 32] which contains the age with 70% certainty. So, we have three intervals which are nested in the sense that every interval corresponding to a smaller degree of certainty is contained in the interval corresponding to the larger degree of certainty: [28, 32] ⊆ [26, 34] ⊆ [25, 35].
13
Interval Computation: An Introduction
In general, instead of a single interval, we have a nested family of intervals corresponding to different degrees of certainty. Such a nested family of intervals can be viewed as a fuzzy number [3, 4]: for every value x, we can define the degree μ(x) to which x is possible as 1 minus the largest degree of certainty α for which x belongs to the αinterval.
Interval computations are needed to process fuzzy data. For expert estimates, for each input i, we may have different intervals xi (α) corresponding to different degrees of certainty α. Our objective is then to produce the corresponding intervals for y = f (x1 , . . . , xn ). For α = 1, i.e., for intervals in which the experts are 100% confident, it is natural to take y(1) = f (x1 (1), . . . , xn (1)). Similarly, for each α, if we want to consider beliefs at this level α, then we can combine the corresponding intervals xi (α) into the desired interval y(α) for y: y(α) = f (x1 (α), . . . , xn (α)). It turns out that the resulting fuzzy number is exactly what we would get if we simply apply Zadeh’s extension principle to the fuzzy numbers corresponding to xi [3–5]. So, in processing fuzzy expert opinions, we also need interval computations.
1.6 Interval Computations Are Sometimes Easy but In General, They Are Computationally Difficult (NPHard) Interval computations are needed in practice: a reminder. In the previous sections, we have explained why interval computations are needed in many practical problems. In other words, in many practical situations, we know n intervals x1 , . . . , xn , know an algorithm f (x1 , . . . , xn ), and need to find the range of the function f on these intervals: [y, y] = { f (x1 , . . . , xn ) : x1 ∈ x1 , . . . , xn ∈ xn }.
Let us first analyze the computational complexity of this problem. Before we start explaining how to solve this problem, let us make a useful detour. Until the 1930s, researchers believed that every mathematical problem can be solved. Under this belief, once we have a mathematical problem of practical importance, we should try to solve it in its entire generality. Starting with the famous G¨odel’s result, it is well known that some mathematical problems cannot be solved in the most general case. For such problems, attempts to solve them in their most general form would be a futile waste of time. At best, we can solve some important class of such problems or get an approximate solution. To avoid this waste of efforts, before we start solving a difficult problem, it is desirable to first analyze whether this problem can be solved in its utmost generality. This strategy was further clarified in the 1970s, when it turned out, crudely speaking, that some problems cannot be efficiently solved; such difficult problems are called NPhard (see [1, 6, 7] for detailed description). If a problem is NPhard, then it is hopeless to search for a general efficient solution; we must look for efficient solutions to subclasses of this problem and/or approximate solutions. Comment. Strictly speaking, NPhardness does not necessarily mean that the problem is computationally difficult: this is true only under a hypothesis NP = P, which is widely believed but not proved yet. (It is probably the most well known open problem in theoretical computer science.) Interval computations are sometimes easy: case of monotonicity. In some cases, it is easy to estimate the desired range. For example, the arithmetic average E=
x1 + · · · + xn n
14
Handbook of Granular Computing
is a monotonically increasing function of each of its n variables x1 , . . . , xn . So,
r the smallest possible value E of the average E is attained when each value xi is the smallest possible (xi = x i ), and
r the largest possible value E of the average E is attained when xi = x i for all i. In other words, the range E of E is equal to [E(x 1 , . . . , xn ), E(x 1 , . . . , x n )], where E=
1 · (x 1 + · · · + x n ) n
E=
1 · (x 1 + · · · + x n ). n
and
In general, if f (x1 , . . . , xn ) is a monotonically increasing function of each of its n variables, then
r The smallest possible value y of the function f over given intervals [x , x i ] is attained when all its i inputs xi take the smallest possible values xi = x i . In this case, y = f (x 1 , . . . , x n ).
r The largest possible value y of the function f over given intervals [x , x i ] is attained when all its inputs i xi take the largest possible values xi = x i . In this case, y = f (x 1 , . . . , x n ).
Thus, we have an explicit formula for the desired range: [y, y] = [ f (x 1 , . . . , x n ), f (x 1 , . . . , x n )]. A similar formula can be written down if the function f (x1 , . . . , xn ) is increasing with respect to some of its inputs and decreasing with respect to some others. In this case, to compute y, we must take
r xi = x i for all the variables xi relative to which f is increasing, and r x j = x for all the variables x j relative to which f is decreasing. j Similarly, to compute y, we must take
r xi = x for all the variables xi relative to which f is increasing, and i r x j = x j for all the variables x j relative to which f is decreasing. Case of linear functions f (x1 , . . . , xn ). In the previous section, we showed how to compute the range of a function which is monotonic in each of its variables – and it can be increasing relative to some of them and decreasing relative to some others. n An example of such a function is a general linear function f (x1 , . . . , xn ) = c0 + i=1 ci · xi . Substituting xi = xi − Δxi into this expression, we conclude that y = f (x1 , . . . , xn ) = c0 +
n
ci · ( xi − Δxi ) = c0 +
i=1
xn ) = c0 + By definition, y = f ( x1 , . . . ,
n i=1
n
i=1 ci
ci · xi −
n
ci · Δxi .
i=1
· xi , so we have
y−y= Δy =
n
ci · Δxi .
i=1
The dependence of Δy on Δxi is linear: it is increasing relative to xi if ci ≥ 0 and decreasing if ci < 0. So, to find the largest possible value Δ of Δy, we must take
r the largest possible value Δxi = Δi when ci ≥ 0, and r the smallest possible value Δxi = −Δi when ci < 0.
15
Interval Computation: An Introduction In both cases, the corresponding term in the sum has the form ci  · Δi , so we can conclude that Δ=
n
ci  · Δi .
i=1
Similarly, the smallest possible value of Δy is equal to −Δ. Thus, the range of possible values of y is equal to [y, y] = [ y − Δ, y + Δ].
Interval computations are, in general, computationally difficult. We have shown that for linear functions, we can easily compute the interval range. Linear functions often occur in practice, because an arbitrary function can be usually expanded in Taylor series and then we can keep only a few first terms to get a good description of the actual dependence. If we keep only linear terms, then we get a linear approximation to the original dependence. If the accuracy of this linear approximation is not sufficient, then it is natural to also consider quadratic terms. A natural question is ‘is the corresponding interval computations problem still feasible?’ Alas, it turns out that for quadratic functions, interval computations problem is, in general, NPhard; this was first proved in [8]. Moreover, it turns out that it is NPhard not just for some rarely used exotic quadratic functions: it is known that the problem of computing the exact range V = [V , V ] for the variance 2 n n n 1 1 1 2 2 V = · · (xi − E) = · x − xi n i=1 n i=1 i n i=1 over interval data xi ∈ [ xi − Δi , xi + Δi ] is, in general, NPhard (see, e.g., [9, 10]). To be more precise, there is a polynomialtime algorithm for computing V , but computing V is, in general, NPhard.
Historical comment. NPhardness of interval computations was first proved in [11, 12]. A general overview of computational complexity of different problems of data processing and interval computations is given in [1].
1.7 Maximum Entropy and Linearization: Useful Techniques for Solving Many Practical Cases of Interval Computations Problem, Their Advantages and Limitations In many practical situations, an approximate estimate is sufficient. The NPhardness result states that computing the exact range [y, y], i.e., in other words, computing the exact values of the endpoints y and y, is NPhard. In most practical problems, however, it is not necessary to produce the exact values of the range; good approximate values will be quite sufficient. Computing the range with guaranteed accuracy is still NPhard. Thus, we arrive at the following natural question. Suppose that we fix an accuracy ε, and we consider the problem of computing y and y with this accuracy, i.e., the problem of computing the values Y and Y for which Y − y ≤ ε and Y − y ≤ ε. In this case, we can guarantee that Y − ε ≤ y ≤ Y + ε and Y − ε ≤ y ≤ Y + ε. So, if we succeed in computing the estimate Y and Y , then we do not have the exact range, but we have an εapproximation for the (unknown) desired range y: namely, we know that [Y + ε, Y − ε] ⊆ y ⊆ [Y − ε, Y + ε]. Is the problem of computing such values Y and Y computationally simpler? Alas, it turns out that this new problem is still NPhard (see, e.g., [1]).
16
Handbook of Granular Computing
In some practical problems, it is OK to have estimates which are not guaranteed. The difficulty of solving the general problem of interval computations comes from the fact that we are looking for guaranteed bounds for y and y. In some practical problems, we are not 100% sure that our algorithm f (x1 , . . . , xn ) is absolutely correct. This happens, e.g., in prediction problems, where the dynamic equations used for prediction are only approximately known anyway. In such situations, it is OK to have estimates sometimes deviating from the desired range. Possible approaches to this problem. In order to describe possible approaches to this problem, let us first recall what properties of our problem make it computationally complex. By relaxing these properties, we will be able to come up with computationally efficient algorithms. We have mentioned that in some practical situations, we know the probability distributions of the estimation errors Δxi . In such situations, the problem of estimating the effect of these approximation errors Δxi on the result of data processing is computationally easy. Namely, we can use MonteCarlo simulations (see, e.g., [13]), when for several iterations k = 1, . . . , N , we do the following: r Simulate the inputs Δx (k) according to the known probability distributions. i r Substitute the resulting simulated values x (k) = xi − Δxi(k) into the algorithm f , producing y (k) = i f (x1(k) , . . . , xn(k) ).
r And then use the sample of the differences Δy (k) = y − y (k) to get the probability distribution of Δy. Thus, the first difficulty of interval computations comes from the fact that we do not know the probability distribution. However, the mere fact that we do not know this distribution does not necessarily make the problem computationally complex. For example, even when we restrict ourselves to interval uncertainty, for linear functions f , we still have a feasible algorithm for computing the range. Thus, the complexity of the general interval computations problem is caused by the following two properties of this general problem:
r first, that we do not know the probability distribution for the inputs Δxi , and r second, that the function f (x1 , . . . , xn ) is nonlinear. To be able to perform efficient computations, we must relax one of these properties. Thus, we arrive at two possible ways to solve this problem:
r First, we can select one of the possible probability distributions. r Second, we can approximate the original function f by a linear one. Let us describe these two ideas in more detail.
First idea: selecting a probability distribution. As we have mentioned, in many cases, we know the probability distribution for approximation errors Δxi . Interval uncertainty corresponds to the case when we have only a partial information about this probability distribution: namely, the only thing we know about this distribution is that it is located (with probability 1) somewhere on the interval [−Δi , Δi ]. This distribution could be uniform on this interval, could be a truncated Gaussian distribution, and could be a 1point degenerate distribution, in which the value Δxi is equal to one fixed value from this interval with probability 1. Situations in which we have partial information about the probability distributions are common in statistics. In such situations, we have several different probability distributions which are all consistent with the given knowledge. One way to handle these situations is to select one of these distributions, the one which is, in some sense, the most reasonable to select. Simplest case: Laplace’s principle of indifference. The approach started with the early nineteenthcentury work of the famous mathematician Pierre Simon Laplace, who analyzed the simplest
Interval Computation: An Introduction
17
of such situations, when we have finitely many (n) alternatives and have no information about their probabilities. In this simple situation, the original situation is invariant with respect to arbitrary permutations of the original alternatives. So, it is reasonable to select the probabilities n which reflect this symmetry – i.e., equal probabilities p1 = · · · = pn . Since the total probability i=1 pi must be equal to 1, we thus conclude that p1 = · · · = pn = 1/n. This idea is called Laplace’s principle of indifference.
General case: maximum entropy approach. Laplace’s simple idea can be naturally applied to the more general case, when we have partial information about the probabilities, i.e., when there are several possible distributions which are consistent with our knowledge. In this case, it is reasonable to view these distributions as possible alternatives. So, we discretize the variables (to make sure that the overall number of alternatives is finite) and then consider all possible distributions as equally probable. As the discretization constant tends to 0, we should get a distribution of the class of all (nondiscretized) distributions. It turns out that in the limit, only one such distribution has probability 1: namely, the distribution def which has the largest possible value of the entropy S = − ρ(x) · ln(ρ(x)) dx. (Here ρ(x) denotes the probability density.) For details on this maximum entropy approach and its relation to interval uncertainty and Laplace’s principle of indifference, see, e.g., [14–16]. Maximum entropy method for the case of interval uncertainty. One can easily check that for a single variable x1 , among all distributions located on a given interval, the entropy is the largest when this distribution is uniform on this interval. In the case of several variables, we can similarly conclude that the distribution with the largest value of the entropy is the one which is uniformly distributed in the corresponding box x1 × · · · × xn , i.e., a distribution in which r each variable Δxi is uniformly distributed on the corresponding interval [−Δi , Δi ], and r variables corresponding to different inputs are statistically independent. This is indeed one of the main ways how interval uncertainty is treated in engineering practice: if we only know that the value of some variable xi is in the interval [x i , x i ] and we have no information about the probabilities, then we assume that the variable xi is uniformly distributed on this interval.
Limitations of the maximum entropy approach. To explain the limitations of this engineering approach, let us consider the simplest possible algorithm y = f (x1 , . . . , xn ) = x1 + · · · + xn . For simx1 = · · · = plicity, let us assume that the measured values of all n quantities are 0s, xn = 0, and that all n measurements have the same error bound Δx ; i.e., Δ1 = · · · = Δn = Δx . In this case, Δy = Δx1 + · · · + Δxn . Each of n component measurement errors can take any value from −Δx to Δx , so the largest possible value of Δy is attained when all of the component errors attain the largest possible value Δxi = Δx . In this case, the largest possible value Δ of Δy is equal to Δ = n · Δx . Let us see what the maximum entropy approach will predict in this case. According to this approach, we assume that Δxi are independent random variables, each of which is uniformly distributed on the interval [−Δ, Δ]. According to the central limit theorem [17, 18], when n → ∞, the distribution of the sum of n independent identically distributed bounded random variables tends to Gaussian. This means that for large values n, the distribution of Δy is approximately normal. Normal distribution is uniquely determined by its mean and variance. When we add several independent variables, their means and variances add up. For each uniform distribution Δxi on the interval [−Δx , Δx ] of width 2Δx , the probability density is equal to ρ(x) = (1/2Δx ), so the mean is 0 and the variance is
Δx
Δx x 1 1 1 1 x 2 · ρ(x) dx = · x 2 dx = · · x 3 Δ · Δ. V = −Δx = 2Δx −Δx 2Δx 3 3 −Δx Thus, for the sum Δy of n such variables, the mean √ √ is√0 and the variance is equal to (n/3) · Δx . Thus, the standard deviation is equal to σ = V = Δx · n/ 3. It is known that in a normal distribution, with probability close to 1, all the values are located within the k · σ vicinity of the mean: for k = 3, it is true with probability 99.9%; for k = 6, it is true with
18
Handbook of Granular Computing
probability 10−6 %; √ and so on. So, practically with certainty, Δy is located within an interval k · σ which grows with n as n. √ √ For large n, we have k · Δx · n/ 3 Δx · n, so we get a serious underestimation of the resulting measurement error. This example shows that estimates obtained by selecting a single distribution can be very misleading.
Linearization: main idea. As we have mentioned earlier, another way to handle the complexity of the general interval computations problem is to approximate the original expression y = f (x1 , . . . , xn ) = f ( x1 − Δx1 , . . . , xn − Δxn ) by linear terms in its Taylor expansion: y ≈ f ( x1 , . . . , xn ) −
n ∂f · Δxi , ∂ xi i=1
where the partial derivatives are computed at the midpoint x = ( x1 , . . . , xn ). Since f ( x1 , . . . , xn ) = y, n y − y = i=1 we conclude that Δy = ci · Δxi , where ci = ∂ f /∂ xi . We already know how to compute the interval range for a linear function, and the resulting formula is n Δ = i=1 ci  · Δi . Thus, to compute Δ, it is sufficient to know the partial derivatives ci .
Linearization: how to compute. A natural way to compute partial derivatives comes directly from their definition. By definition, a partial derivative is defined as a limit ∂f f ( x1 , . . . , xi−1 , xi + h, xi+1 , . . . , xn ) − f ( x1 , . . . , xn ) . = lim h→0 ∂ xi h In turn, a limit, by its definition, means that when the value of h is small, the corresponding ratio is very close to the partial derivative. Thus, we can estimate the partial derivative as the ratio ci =
∂f f ( x1 , . . . , xi−1 , xi + h, xi+1 , . . . , xn ) − f ( x1 , . . . , xn ) ≈ ∂ xi h
for some small value h. After n we have computed n such ratios, we can then compute the desired bound Δ on Δy as Δ = i=1 ci  · Δi .
Linearization: how to compute faster. The above algorithm requires that we call the data processing algorithm n + 1 times: first to compute the value y = f ( x1 , . . . , xn ) and then n more times to compute the values f ( x1 , . . . , xi−1 , xi + h, xi+1 , . . . , xn ), and thus the corresponding partial derivatives. In many practical situations, the data processing algorithms are time consuming, and we process large amounts of data, with the number n of data points in thousands. In this case, the use of the above linearization algorithm would require thousands time longer than data processing itself – which itself is already time consuming. Is it possible to estimate Δ faster? The answer is ‘yes,’ it is possible to have an algorithm which estimates Δ by using only a constant number of calls to the data processing algorithm f (for details, see, e.g., [19, 20]).
In some situations, we need a guaranteed enclosure. In many application areas, it is sufficient to have an approximate estimate of y. However, in some applications, it is important to guarantee that the (unknown) actual value y of a certain quantity does not exceed a certain threshold y0 . The only way to guarantee this is to have an interval Y = [Y , Y ] which is guaranteed to contain y (i.e., for which y ⊆ Y) and for which Y ≤ y0 . For example, in nuclear engineering, we must make sure that the temperatures and the neutron flows do not exceed the critical values; when planning a spaceflight, we want to guarantee that the spaceship lands on the planet and does not fly pass it.
Interval Computation: An Introduction
19
The interval Y which is guaranteed to contain the actual range y is usually called an enclosure for this range. So, in such situations, we need to compute either the original range or at least an enclosure for this range. Computing such an enclosure is also one of the main tasks of interval computations.
1.8 Interval Computations: A Brief Historic Overview Before we start describing the main interval computations techniques, let us briefly overview the history of interval computations.
Prehistory of interval computations: interval computations as a part of numerical mathematics. The notion of interval computations is reasonably recent: it dates from the 1950s. But the
main problem is known since Archimedes who used guaranteed twosided bounds to compute π (see, e.g., [21]). Since then, many useful guaranteed bounds have been developed for different numerical methods. There have also been several general descriptions of such bounds, often formulated in terms similar to what we described above. For example, in the early twentiethcentury, the concept of a function having values which are bounded within limits was discussed by W.H. Young in [22]. The concept of operations with a set of multivalued numbers was introduced by R.C. Young, who developed a formal algebra of multivalued numbers [23]. The special case of closed intervals was further developed by P.S. Dwyer in [24].
Limitations of the traditional numerical mathematics approach. The main limitation of the traditional numerical mathematics approach to error estimation was that often no clear distinction was made between approximate (nonguaranteed) and guaranteed (=interval) error bounds. For example, for iterative methods, many papers on numerical mathematics consider the rate of convergence as an appropriate measure of approximation error. Clearly, if we know that the error decreases as O(1/n) or as O(a −n ), we gain some information about the corresponding algorithms – and we also gain a knowledge that for large n, the second method is more accurate. However, in real life, we make a fixed number n of iterations. If the only information we have about the approximation error is the above asymptotics, then we still have no idea how close the result of nth iteration is to the actual (desired) value. It is therefore important to emphasize the need for guaranteed methods and to develop techniques for producing guaranteed estimates. Such guaranteed estimates are what interval computations are about. Origins of interval computations. Interval computations were independently invented by three researchers in three different parts of the world: by M. Warmus in Poland [25, 26], by T. Sunaga in Japan [27], and by R. Moore in the USA [28–35]. The active interest in interval computations started with Moore’s 1966 monograph [34]. This interest was enhanced by the fact that in addition to estimates for general numerical algorithms, Moore’s monograph also described practical applications which have already been developed in his earlier papers and technical reports: in particular, interval computations were used to make sure that even when we take all the uncertainties into account, the trajectory of a spaceflight is guaranteed to reach the moon. Since then, interval computations have been actively used in many areas of science and engineering [36, 37]. Comment. An early history of interval computations is described in detail in [38] and in [39]; early papers on interval computations can be found on the interval computations Web site [36].
20
Handbook of Granular Computing
1.9 Interval Computations: Main Techniques General comment about algorithms and parsing. Our goal is to find the range of a given function
f (x1 , . . . , xn ) on the given intervals x1 = [x 1 , x 1 ], . . . , xn = [x n , x n ]. This function f (x1 , . . . , xn ) is given as an algorithm. In particular, we may have an explicit analytical expression for f , in which case this algorithm simply consists of computing this expression. When we talk about algorithms, we usually mean an algorithm (program) written in a highlevel programming language like Java or C. Such programming languages allow us to use arithmetic expressions and many other complex constructions. Most of these constructions, however, are not directly implemented inside a computer. Usually, only simple arithmetic operations are implemented: addition, subtraction, multiplication, and 1/x (plus branching). Even division a/b is usually not directly supported; it is performed as a sequence of two elementary arithmetic operations:
r First, we compute 1/b. r And then, we multiply a by 1/b. When we input a general program into a computer, the computer parses it; i.e., represents it as sequence of elementary arithmetic operations. Since a computer performs this parsing anyway, we can safely assume that the original algorithm f (x1 , . . . , xn ) is already represented as a sequence of such elementary arithmetic operations.
Interval arithmetic. Let us start our analysis of the interval computations techniques with the simplest possible case when the algorithm f (x1 , . . . , xn ) simply consists of a single arithmetic operation: addition, subtraction, multiplication, or computing 1/x. Let us start by estimating the range of the addition function f (x1 , x2 ) = x1 + x2 on the intervals [x 1 , x 1 ] and [x 2 , x 2 ]. This function is increasing with respect to both its variables. We already know how to compute the range [y, y] of a monotonic function. So, the range of addition is equal to [x 1 + x 2 , x 1 + x 2 ]. The desired range is usually denoted as f (x1 , . . . , xn ); in particular, for addition, this notation takes the form x1 + x2 . Thus, we can define ‘addition’ of two intervals as follows: [x 1 , x 1 ] + [x 2 , x 2 ] = [x 1 + x 2 , x 2 + x 2 ]. This formula makes perfect intuitive sense: if one town has between 700 and 800 thousand people and it merges with a nearby town whose population is between 100 and 200 thousand, then
r the smallest possible value of the total population of the new big town is when both populations are the smallest possible, i.e., 700 + 100 = 800, and
r the largest possible value is when both populations are the largest possible, i.e., 800 + 200 = 1000. The subtraction function f (x1 , x2 ) = x1 − x2 is increasing with respect to x1 and decreasing with respect to x2 , so we have [x 1 , x 1 ] − [x 2 , x 2 ] = [x 1 − x 2 , x 1 − x 2 ]. These operations are also in full agreement with common sense. For example, if a warehouse originally had between 6.0 and 8.0 tons and we moved between 1.0 and 2.0 tons to another location, then the smallest amount left is when we start with the smallest possible value 6.0 and move the largest possible value 2.0, resulting in 6.0 − 2.0 = 4.0. The largest amount left is when we start with the largest possible value 8.0 and move the smallest possible value 1.0, resulting in 8.0 − 1.0 = 7.0. For multiplication f (x1 , x2 ) = x1 · x2 , the direction of monotonicity depends on the actual values of x1 and x2 : e.g., when x2 > 0, the product increases with x1 ; otherwise it decreases with x1 . So, unless we know the signs of the product beforehand, we cannot tell whether the maximum is attained at x1 = x 1 or at x1 = x 1 . However, we know that it is always attained at one of these endpoints. So, to find the range
21
Interval Computation: An Introduction of the product, it is sufficient to try all 2 · 2 = 4 combinations of these endpoints: [x 1 , x 1 ] · [x 2 , x 2 ] = [min(x 1 · x 2 , x 1 · x 2 , x 1 · x 2 , x 1 · x 2 ), max(x 1 · x 2 , x 1 · x 2 , x 1 · x 2 , x 1 · x 2 )].
Finally, the function f (x1 ) = 1/x1 is decreasing wherever it is defined (when x1 = 0), so if 0 ∈ [x 1 , x 1 ], then
1 1 1 = , . [x 1 , x 1 ] x1 x1 The formulas for addition, subtraction, multiplication, and reciprocal of intervals are called formulas of interval arithmetic.
Computational complexity of interval arithmetic. Interval addition requires two additions of numbers; interval subtraction requires two subtraction of numbers, and dividing 1 by an interval requires two divisions of 1 by a real number. In all these operations, we need twice longer time to perform the corresponding interval operation than to perform an operation with real numbers. The only exception is interval multiplication, which requires four multiplications of numbers. Thus, if we use the above formulas, we get, in the worst case, a four times increase in computation time. Computational comment: interval multiplication can be performed faster. It is known that we can compute the interval product faster, by using only three multiplications [40, 41]. Namely, r r r r r r r r r
if x 1 if x 1 if x 1 if x 1 if x 1 if x 1 if x 1 if x 1 if x 1
≥ 0 and x 2 ≥ 0, then x1 · x2 = [x 1 · x 2 , x 1 · x 2 ]; ≥ 0 and x 2 ≤ 0 ≤ x 2 , then x1 · x2 = [x 1 · x 2 , x 1 · x 2 ]; ≥ 0 and x 2 ≤ 0, then x1 · x2 = [x 1 · x 2 , x 1 · x 2 ]; ≤ 0 ≤ x 1 and x 2 ≥ 0, then x1 · x2 = [x 1 · x 2 , x 1 · x 2 ]; ≤ 0 ≤ x 1 and x 2 ≤ 0 ≤ x 2 , then x1 · x2 = [min(x 1 · x 2 , x 1 · x 2 ), max(x 1 · x 2 , x 1 · x 2 )]; ≤ 0 ≤ x 2 and x 2 ≤ 0, then x1 · x2 = [x 1 · x 2 , x 1 · x 2 ]; ≤ 0 and x 2 ≥ 0, then x1 · x2 = [x 1 · x 2 , x 1 · x 2 ]; ≤ 0 and x 2 ≤ 0 ≤ x 2 , then x1 · x2 = [x 1 · x 2 , x 1 · x 2 ]; ≤ 0 and x 2 ≤ 0, then x1 · x2 = [x 1 · x 2 , x 1 · x 2 ].
We see that in eight out of nine cases, we need only two multiplications, and the only case when we still need four multiplications is when 0 ∈ x1 and 0 ∈ x2 . In this case, it can also be shown that three multiplications are sufficient:
r r r r
If 0 ≤ x 1  ≤ x 1 and 0 ≤ x 2  ≤ x 2 , then x1 · x2 If 0 ≤ x 1 ≤ x 1  and 0 ≤ x 2 ≤ x 2 , then x1 · x2 If 0 ≤ x 1  ≤ x 1 and 0 ≤ x 2 ≤ x 2 , then x1 · x2 If 0 ≤ x 1 ≤ x 1  and 0 ≤ x 2  ≤ x 2 , then x1 · x2
= [min(x 1 · x 2 , x 1 · x 2 ), x 1 · x 2 ]. = [min(x 1 · x 2 , x 1 · x 2 ), x 1 · x 2 ]. = [x 1 · x 2 , max(x 1 · x 2 , x 1 , x 2 )]. = [x 1 · x 2 , max(x 1 · x 2 , x 1 , x 2 )].
Straightforward (‘naive’) interval computations: idea. We know how to compute the range for each arithmetic operation. Therefore, to compute the range f (x1 , . . . , xn ), it is reasonable to do the following: r first, to parse the algorithm f (this is done automatically by a compiler), r and then to repeat the computations forming the program f step by step, replacing each operation with real numbers by the corresponding operation of interval arithmetic. It is known that, as a result, we get an enclosure Y for the desired range y [34, 37].
22
Handbook of Granular Computing
Example where straightforward interval computations work perfectly. Let us start with an example of computing the average of two values f (x1 , x2 ) = 0.5 · (x1 + x2 ). This function is increasing in both variables, so its range on the intervals [x 1 , x 1 ] and [x 2 , x 2 ] is equal to [0.5 · (x 1 + x 2 ), 0.5 · (x 1 + x 2 )]. A compiler will parse the function f into the following sequence of computational steps:
r we start with x1 and x2 ; r then, we compute an intermediate value x3 = x1 + x2 ; r finally, we compute y = 0.5 · x3 . According to straightforward interval computations,
r we start with x1 = [x , x 1 ] and x2 = [x , x 2 ]; 1 2 r then, we compute x3 = x1 + x2 = [x + x , x 1 + x 2 ]; 1 2 r finally, we compute y = 0.5 · x3 , and we get the desired range. One can easily check that we also get the exact range for the general case of the arithmetic average and, even more generally, for an arbitrary linear function f (x1 , . . . , xn ).
Can straightforward interval computations be always perfect? In straightforward interval computations, we replace each elementary arithmetic operation with the corresponding operation of interval arithmetic. We have already mentioned that this replacement increases the computation time at most by a factor of 4. So, if we started with the polynomial time, we still get polynomial time. On the other hand, we know that the main problem of interval computations is NPhard. This means, crudely speaking, that we cannot always compute the exact range by using a polynomialtime algorithm. Since straightforward interval computation is a polynomialtime algorithm, this means that in some cases, its estimates for the range are not exact. Let us describe a simple example when this happens. Example where straightforward interval computations do not work perfectly. Let us illustrate straightforward interval computations on the example of a simple function f (x1 ) = x1 − x12 ; we want to estimate its range when x1 ∈ [0, 1]. To be able to check how good is the resulting estimate, let us first find the actual range of f . According to calculus, the minimum and the maximum of a smooth (differentiable) function on an interval are attained either at one of the endpoints or at one of the extreme points, where the derivative of this function is equal to 0. So, to find the minimum and the maximum, it is sufficient to compute the value of this function at the endpoints and at all the extreme points: r The largest of these values is the maximum. r The smallest of these values is the minimum. For the endpoints x1 = 0 and x1 = 1, we have f (0) = f (1) = 0. By differentiating this function and equating the derivative 1 − 2x1 to 0, we conclude that this function has only one extreme point x1 = 0.5. At this point, f (0.5) = 0.25, so y = min(0, 0, 0.25) = 0 and y = max(0, 0, 0.25) = 0.25. In other words, the actual range is y = [0, 0.25]. Let us now apply straightforward interval computations. A compiler will parse the function into the following sequence of computational steps:
r we start with x1 ; r then, we compute x2 = x1 · x1 ; r finally, we compute y = x1 − x2 .
23
Interval Computation: An Introduction
According to straightforward interval computations,
r we start with x1 = [0, 1]; r then, we compute x2 = x1 · x1 ; r finally, we compute Y = x1 − x2 . Here, x2 = [0, 1] · [0, 1] = [min(0 · 0, 0 · 1, 1 · 0, 1 · 1), max(0 · 0, 0 · 1, 1 · 0, 1 · 1) = [0, 1], and so Y = [0, 1] − [0, 1] = [0 − 1, 1 − 0] = [−1, 1]. The resulting interval is the enclosure for the actual range [0, 0.25] but it is much wider than this range. In interval computations, we say that this enclosure has excess width.
Reason for excess width. In the above example, it is easy to see why we have excess width. The
range [0, 1] for x2 is actually exact. However, when we compute the range for y as the difference x1 − x2 , we use the general interval computations formulas which assume that x1 and x2 can independently take any values from the corresponding intervals x1 and x2 – i.e., all pairs (x1 , x2 ) ∈ x1 × x2 are possible. In reality, x2 = x12 , so only the pairs with x2 = x12 are possible.
Interval computations go beyond straightforward technique. People who are vaguely familiar with interval computations sometimes erroneously assume that the above straightforward (‘naive’) technique is all there is in interval computations. In conference presentations (and even in published papers), one often encounters a statement: ‘I tried interval computations, and it did not work.’ What this statement usually means is that they tried the above straightforward approach and – not surprisingly – it did not work well. In reality, interval computation is not a single algorithm, but a problem for which many different techniques exist. Let us now describe some of such techniques. Centered form. One of such techniques is the centered form technique. This technique is based on the same Taylor series expansion ideas as linearization. We start by representing each interval xi = [x i , x i ] in the form [ xi − Δi , xi + Δi ], where xi = (x i + x i )/2 is the midpoint of the interval xi and Δi = (x i − x i )/2 is the halfwidth of this interval. After that, we use the Taylor expansion. In linearization, we simply ignore quadratic and higher order terms. Here, instead, we use the Taylor form with a remainder term. Specifically, the centered form is based on the formula f (x1 , . . . , xn ) = f ( x1 , . . . , xn ) +
n ∂f (η1 , . . . , ηn ) · (xi − xi ), ∂ xi i=1
where each ηi is some value from the interval xi . Since ηi ∈ xi , the value of the ith derivative belongs to the interval range of this derivative on these intervals. We also know that xi − xi ∈ [−Δi , Δi ]. Thus, we can conclude that x1 , . . . , xn ) + f (x1 , . . . , xn ) = f (
n ∂f (x1 , . . . , xn ) · [−Δi , Δi ]. ∂ xi i=1
To compute the ranges of the partial derivatives, we can use straightforward interval computations.
Example. Let us illustrate this method on the above example of estimating the range of the function
f (x1 ) = x1 − x12 over the interval [0, 1]. For this interval, the midpoint is x1 = 0.5; at this midpoint, f ( x1 ) = 0.25. The halfwidth is Δ1 = 0.5. The only partial derivative here is ∂ f /∂ x1 = 1 − 2x1 , its range on [0, 1] is equal to 1 − 2 · [0, 1] = [−1, 1]. Thus, we get the following enclosure for the desired range y: y ⊆ Y = 0.25 + [−1, 1] · [−0.5, 0.5] = 0.25 + [−0.5, 0.5] = [−0.25, 0.75]. This enclosure is narrower than the ‘naive’ estimate [−1, 1], but it still contains excess width.
24
Handbook of Granular Computing
How can we get better estimates? In the centered form, we, in effect, ignored quadratic and higher order terms, i.e., terms of the type ∂2 f · Δxi · Δx j . ∂ xi ∂ x j When the estimate is not accurate enough, it means that this ignored term is too large. There are two ways to reduce the size of the ignored term:
r We can try to decrease this quadratic term. r We can try to explicitly include higher order terms in the Taylor expansion formula, so that the remainder term will be proportional to, say, Δxi3 and thus be much smaller. Let us describe these two ideas in detail.
First idea: bisection. Let us first describe the situation in which we try to minimize the secondorder remainder term. In the above expression for this term, we cannot change the second derivative. The only thing we can decrease is the difference Δxi = xi − xi between the actual value and the midpoint. This value is bounded by the halfwidth Δi of the box. So, to decrease this value, we can subdivide the original box into several narrower subboxes. Usually, we divide it into two subboxes, so this subdivision is called bisection. The range over the whole box is equal to the union of the ranges over all the subboxes. The width of each subbox is smaller, so we get smaller Δxi and hopefully, more accurate estimates for ranges over each of these subboxes. Then, we take the union of the ranges over subboxes. Example. Let us illustrate this idea on the above x1 − x12 example. In this example, we divide the original interval [0, 1] into two subintervals [0, 0.5] and [0.5, 1]. For both intervals, Δx1 = 0.25. x1 ) = 0.25 − 0.0625 = 0.1875. The range In the first subinterval, the midpoint is x1 = 0.25, so f ( of the derivative is equal to 1 − 2 · [0, 0.5] = 1 − [0, 1] = [0, 1]; hence, we get an enclosure 0.1875 + [0, 1] · [−0.25, 0.25] = [−0.0625, 0.4375]. For the second interval, x1 = 0.75, so f (0.75) = 0.1875. The range of the derivative is 1 − 2 · [0.5, 1] = [−1, 0]; hence, we get an enclosure 0.1875 + [−1, 0] · [−0.25, 0.25] = [−0.0625, 0.4375]. The union of these two enclosures is the same interval [−0.0625, 0.4375]. This enclosure is much more accurate than before.
Bisection: general comment. The more subboxes we consider, the smaller Δxi and thus the more accurate the corresponding enclosures. However, once we have more boxes, we need to spend more time processing these boxes. Thus, we have a tradeoff between computation time and accuracy: the more computation time we allow, the more accurate estimates we will be able to compute. Additional idea: monotonicity checking. If the function f (x1 , . . . , xn ) is monotonic over the original box x1 × · · · × xn , then we can easily compute its exact range. Since we used the centered form for the original box, this probably means that on that box, the function is not monotonic: for example, with respect to x1 , it may be increasing at some points in this box and decreasing at other points. However, as we divide the original box into smaller subboxes, it is quite possible that at least some of these subboxes will be outside the areas where the derivatives are 0, and thus the function f (x1 , . . . , xn ) will be monotonic. So, after we subdivide the box into subboxes, we should first check monotonicity on each of these subboxes – and if the function is monotonic, we can easily compute its range.
25
Interval Computation: An Introduction
In calculus terms, a function is increasing with respect to xi if its partial derivative ∂ f /∂ xi is nonnegative everywhere on this subbox. Thus, to check monotonicity, we should find the range [y i , y i ] of this derivative: (We need to do it anyway to compute the centered form expression.)
r If y ≥ 0, this means that the derivative is everywhere nonnegative and thus the function f is increasing i in xi .
r If y ≤ 0, this means that the derivative is everywhere nonpositive and thus the function f is decreasing i in xi . If y i < 0 < y i , then we have to use the centered form. If the function is monotonic (e.g., increasing) only with respect to some of the variables xi , then
r to compute y, it is sufficient to consider only the value xi = x i , and r to compute y, it is sufficient to consider only the value xi = x . i For such subboxes, we reduce the original problem to two problems with fewer variables, problems which are thus easier to solve.
Example. For the example f (x1 ) = x1 − x12 , the partial derivative is equal to 1 − 2 · x1 .
On the first subbox [0, 0.5], the range of this derivative is 1 − 2 · [0, 0.5] = [0, 1]. Thus, the derivative is always nonnegative, the function is increasing on this subbox, and its range on this subbox is equal to [ f (0), f (0.5)] = [0, 0.25]. On the second subbox [0.5, 1], the range of the derivative is 1 − 2 · [0.5, 1] = [−1, 0]. Thus, the derivative is always nonpositive, the function is decreasing on this subbox, and its range on this subbox is equal to [ f (1), f (0.5)] = [0, 0.25]. The union of these two ranges is [0, 0.25] – the exact range.
Comment. We got the exact range because of the simplicity of our example, in which the extreme point 0.5 of the function f (x1 ) = x1 − x12 is exactly in the middle of the interval [0, 1]. Thus, when we divide the box in two, both subboxes have the monotonicity property. In the general case, the extremal point will be inside one of the subboxes, so we will have excess width. General Taylor techniques. As we have mentioned, another way to get more accurate estimates is to use socalled Taylor techniques, i.e., to explicitly consider secondorder and higher order terms in the Taylor expansion (see, e.g., [42–44] and references therein). Let us illustrate the main ideas of Taylor analysis on the case when we allow secondorder terms. In this case, the formula with a remainder takes the form f (x1 , . . . , xn ) = f ( x1 , . . . , xn ) +
n ∂f ( x1 , . . . , xn ) · (xi − xi ) ∂ xi i=1
n m 1 ∂2 f + · (η1 , . . . , ηn ) · (xi − xi ) · (x j − x j ). 2 i=1 j=1 ∂ xi ∂ x j
Thus, we get the enclosure x1 , . . . , xn ) + f (x1 , . . . , xn ) ⊆ f (
n ∂f ( x1 , . . . , xn ) · [−Δi , Δi ] ∂ xi i=1
n m 1 ∂2 f + · (x1 , . . . , xn ) · [−Δi , Δi ] · [−Δ j , Δ j ]. 2 i=1 j=1 ∂ xi ∂ x j
Example. Let us illustrate this idea on the above example of f (x1 ) = x1 − x12 . Here, x1 = 0.5, so
f ( x1 ) = 0.25 and ∂ f /∂ x1 ( x1 ) = 1 − 2 · 0.5 = 0. The second derivative is equal to −2, so the Taylor estimate takes the form Y = 0.25 − [−0.5, 0.5]2 .
26
Handbook of Granular Computing
Strictly speaking, if we interpret Δx12 as Δx1 · Δx1 and use the formulas of interval multiplication, we get the interval [−0.5, 0.5] · [−0.5, 0.5] = [−0.25, 0.25], and thus the range Y = 0.25 − [−0.25, 0.25] = [0, 0.5] with excess width. However, we can view x 2 as a special function, for which the range over [−0.5, 0.5] is known to be [0, 0.25]. In this case, the above enclosure 0.25 − [0, 0.25] = [0, 0.25] is actually the exact range.
Taylor methods: general comment. The more terms we consider in the Taylor expansion, the smaller the remainder term and thus the more accurate the corresponding enclosures. However, once we have more terms, we need to spend more time computing these terms. Thus, for Taylor methods, we also have a tradeoff between computation time and accuracy: the more computation time we allow, the more accurate estimates we will be able to compute. An alternative version of affine and Taylor arithmetic. The main idea of Taylor methods is to approximate the given function f (x1 , . . . , xn ) by a polynomial of a small order plus an interval remainder term. In these terms, straightforward interval computations can be viewed as 0th order Taylor methods in which all we have is the corresponding interval (or, equivalently, the constant term plus the remainder interval). To compute this interval, we repeated the computation of f step by step, replacing operations with numbers by operations with intervals. We can do the same for higher order Taylor expansions as well. Let us illustrate how this can be done for the firstorder Taylor terms. xi − Δxi . Then, at each step, we keep nWe start with the expressions xi = a term of the type a = a + i=1 ai · Δxi + a. (To be more precise, keep the coefficients a and ai and the interval a.) Addition and subtraction of such terms are straightforward: ( a+ ( a+
n i=1 n
ai · Δxi + a) + ( b+ b+ ai · Δxi + a) − (
i=1
n i=1 n
bi · Δxi + b) = ( a + b) + b) + bi · Δxi + b) = ( a −
i=1
n
(ai + bi ) · Δxi + (a + b);
i=1 n
(ai − bi ) · Δxi + (a − b).
i=1
For multiplication, we add terms proportional to Δxi · Δx j to the interval part: ( a+
n
ai · Δxi + a) · ( b+
i=1
+ ( a · b + b·a+
n i=1
n
bi · Δxi + b) = ( a · b) +
i=1
ai · bi ·
[0, Δi2 ]
+
n
n ( a · bi + b · ai ) · Δxi i=1
ai · b j · [−Δi , Δi ] · [Δ j · Δ j ]).
i=1 j=i
n At the end, we get an expression of the above type for the desired quantity y: y = y + i=1 yi · Δxi + y. We already know how to compute the range of a linear function, so we get the following enclosure for n y + [−Δ, Δ] + y, where Δ = i=1 the final range: Y = yi  · Δi .
Example. For f (x1 ) = x1 − x12 , we first compute x2 = x12 and then y = x1 − x2 . We start with the
interval x1 = x1 − Δx1 = 0.5 + (−1) · Δ1 + [0, 0]. On the next step, we compute the square of this expression. This square is equal to 0.25 − Δx1 + Δx12 . Since Δx1 ∈ [−0.5, 0.5], we conclude that Δx12 ∈ [0, 0.25] and thus that x2 = 0.25 + (−1) · Δx1 + [0, 0.25]. For y = x1 − x2 , we now have y = (0.5 − 0.25) + ((−1) − (−1)) · Δx1 + ([0, 0] − [0, 0.25]) = 0.25 + [−0.25, 0] = [0, 0.25]. This is actually the exact range for the desired function f (x1 ).
Interval Computation: An Introduction
27
1.10 Applications of Interval Computations General overview. Interval computations have been used in almost all areas of science and engineering in which we need guaranteed results, ranging from space exploration to chemical engineering to robotics to supercollider design. Many applications are listed in [37, 45]; some other are described in numerous books and articles (many of which are cited in the interval computations Web site [36]). Many important applications are described in the intervalrelated chapters of this handbook. Most of these applications use special software tools and packages specifically designed for interval computations (see, e.g., [46]); a reasonably current list of such tools is available on the interval Web site [36]. Applications to control. One of the areas where guaranteed bounds are important is the area of control. Robust control methods, i.e., methods which stabilize a system (known with interval uncertainty) for all possible values of the parameters from the corresponding intervals, are presented, e.g., in [47, 48]. Applications to optimization: practical need. As we have mentioned earlier, one of the main objectives of engineering is to find the alternative which is the best (in some reasonable sense). In many reallife situations, we have a precise description of what is the best; i.e., we have an objective function which assigns to each alternative x = (x1 , . . . , xn ) a value F(x1 , . . . , xn ), characterizing the overall quality of this alternative, and our goal is to find the alternative for which this quality metric attains the largest possible value. In mathematical terms, we want to find the maximum M of a function F(x1 , . . . , xn ) on a given set S, and we are also interested in finding out where exactly this maximum is attained. Applications to optimization: idea. The main idea of using interval computations in optimization is as follows. If we compute the value of F at several points from S and then take the maximum m of the computed values, then we can be sure that the maximum M over all points from S is not smaller than m: m ≤ M. Thus, if we divide the original set into subboxes and on one of these subboxes the range [y, y] for f is < m, then we can guarantee that the desired maximum does not occur on this subbox. Thus, this subbox can be excluded from the future search. This idea is implemented as the following branchandbound algorithm.
Applications to optimization: simple algorithm. For simplicity, let us describe this algorithm for the case when the original set S is a box. On each step of this algorithm, we have:
r a collection of subboxes, r interval enclosures for the range of F on each subbox, and r a current lower bound m for the desired maximum M. We start with the original box; as the initial estimate m, we take, e.g., the value of F at the midpoint of the original box. On each step, we subdivide one or several of the existing subboxes into several new ones. For each new subbox, we compute the value of F at its midpoint; then, as a new bound m, we take the maximum of the old bound and of these new results. For each new subbox, we use interval computations to compute the enclosure [Y , Y ] for the range. If Y < m, then the corresponding subbox is dismissed. This procedure is repeated until all the subboxes concentrate in a small vicinity of a single point (or of a few points); this point is the desired maximum.
Example. Let us show how this algorithm will find the maximum of a function F(x1 ) = x1 − x12 on
the interval [0, 1]. We start with the midpoint value m = 0.5 − 0.52 = 0.25, so we know that M ≥ 0.25. For simplicity, let us use the centered form to compute the range of F. On the entire interval, as we have shown earlier, we get the enclosure [−0.25, 0.75].
28
Handbook of Granular Computing
Let us now subdivide this box. In the computer, all the numbers are binary, so the easiest division is by 2, and the easiest subdivision of a box is bisection (division of one of the intervals into two equal subintervals). Since we use the decimal system, it is easier for us to divide by 5, so let us divide the original box into five subboxes [0, 0.2], [0, 2, 0.4], . . . , [0.8, 1]. All the values at midpoints are ≤ m, so the new value of m is still 0.25. The enclosure over [0, 0.2] is (0.1 − 0.12 ) + (1 − 2 · [0, 0.2]) · [−0.1, 0.1] = 0.09 − [−0.1, 0.1] = [−0.01, 0.19]. Since 0.19 < 0.25, this subbox is dismissed. Similarly, the subbox [0.8, 1] will be dismissed. For the box [0.2, 0.4], the enclosure is (0.3 − 0.32 ) + (1 − 2 · [0.2, 0.4]) · [−0.1, 0.1] = 0.21 − [−0.06, 0.06] = [0.15, 0.27]. Since m = 0.25 < 0.27, this subbox is not dismissed. Similarly, we keep boxes [0.4, 0.6] and [0.6, 0.8] – the total of three. On the next step, we subdivide each of these three boxes, dismiss some more boxes, etc. After a while, the remaining subboxes will concentrate around the actual maximum point x = 0.5.
Applications to optimization: more sophisticated algorithms. Interval techniques are actually used in the best optimization packages which produce guaranteed results. Of course, these interval methods go beyond the above simple branchandbound techniques: e.g., they check for monotonicity to weed out subboxes where local maxima are possible only at the endpoints, they look for solutions to the equation ∂ f /∂ xi = 0, etc. (see, e.g., [49, 50]). Optimization: granularity helps. In the above text, we assumed that we know the exact value of the objective function F(x) for each alternative x. In reality, we often have only approximate predictions of this value F(x), with some accuracy ε. In such situations, it does not make sense to waste time and optimize the function beyond this accuracy. For example, in the simplest intervalbased optimization algorithm, at each stage, we not only get the lower bound m for the desired maximum. We can also compute the upper bound M – which can be found as the largest of the endpoints Y of all subbox enclosures. Thus, m ≤ M = max F(x) ≤ M. Once we get M − m ≤ ε, we can guarantee that every value from the interval [m, M] is εclose to M. Thus, we can produce any alternative from any of the remaining subboxes as a good enough solution. This simple idea can often drastically decrease computation time.
Applications to mathematics. In addition to practical applications, there have been several examples when interval computations help in solving longstanding mathematical open problems. The first such problem was the doublebubble problem. It is well known that of all sets with a given volume, a ball has the smallest surface area. What if we consider two sets of equal volumes, and count the area of both the outside boundaries and the boundary between the two sets? It has been conjectured that the smallest overall area is attained for the ‘double bubble’: we take two spheres, use a plane to cut off the top of one of them, do a similar cut with the second sphere, and bring them together at the cut (so that the boundary between them is a disk). The actual proof required to prove that for this configuration, the area is indeed larger than that for all other possible configurations. This proof was done by Haas et al. in [51], who computed an interval enclosure [Y , Y ] for all other configurations and showed that Y is smaller than the area Y0 corresponding to the double bubble. Another wellknown example is the Kepler’s conjecture. Kepler conjectured that the standard way of stacking cannonballs (or oranges), when we place some balls on a planar grid, place the next layer in the holes between them, etc., has the largest possible density. This hypothesis was proved in 1998 by T.C. Hales, who, in particular, used interval computations to prove that many other placements lead to a smaller density [52]. Beyond interval computations, towards general granular computing. In the previous text, we consider situations when we have either probabilistic, or interval, or fuzzy uncertainty.
Interval Computation: An Introduction
29
In practice, we often have all kinds of uncertainty. For example, we may have partial information about def probabilities: e.g., instead of the cumulative distribution function (cdf ) F(x) = Prob(ξ ≤ x), we only know bounds F(x) and F(x) on this cdf. In this case, all we know about the probability distribution is that the actual (unknown) cdf F(x) belongs to the corresponding interval [F(x), F(x)]. This probabilityrelated interval is called a probability box, or a pbox, for short. In data processing, once we know the pboxes corresponding to the auxiliary quantities xi , we need to find the pbox corresponding to the desired quantity y = f (x1 , . . . , xn ); such methods are described, e.g., in [53] (see also [54, 55]). Similarly, in fuzzy logic, we considered the case when for every property A and for every value x, we know the exact value of the degree μA (x) to which x satisfies the property. In reality, as we have mentioned, experts can only produce interval of possible values of their degrees. As a result, intervalvalued fuzzy sets more adequately describe expert opinions and thus, often, lead to better applications (see, e.g., [56] as well as the corresponding chapters from this handbook). Overall, we need a combination of all these types of tools, a combination which is able to handle all kinds of granules, a combination termed granular computing (see, e.g., [57]).
Our Hopes One of the main objectives of this handbook is that interested readers learn the techniques corresponding to different parts of granular computing – and when necessary, combine them. We hope that this handbook will further enhance the field of granular computing.
Acknowledgments This work was supported in part by NSF grant EAR0225670, by Texas Department of Transportation grant No. 05453, and by the Japan Advanced Institute of Science and Technology (JAIST) International Joint Research Grant 2006–2008. This work was partly done during the author’s visit to the Max Planck Institut f¨ur Mathematik.
References [1] V. Kreinovich, A. Lakeyev, J. Rohn, and P. Kahl. Computational Complexity and Feasibility of Data Processing and Interval Computations. Kluwer, Dordrecht, 1997. [2] S.G. Rabinovich. Measurement Errors and Uncertainty. Theory and Practice. SpringerVerlag, Berlin, 2005. [3] G. Klir and B. Yuan. Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, Upper Saddle River, NJ, 1995. [4] H.T. Nguyen and E.A. Walker. A First Course in Fuzzy Logic. CRC Press, Boca Raton, FL, 2005. [5] H.T. Nguyen. A note on the extension principle for fuzzy sets. J. Math. Anal. Appl. 64 (1978) 359–380. [6] M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. W.F. Freeman, San Francisco, CA, 1979. [7] C.H. Papadimitriou. Computational Complexity. AddisonWesley, Reading, MA, 1994. [8] S. Vavasis. Nonlinear Optimization: Complexity Issues. Oxford University Press, New York, 1991. [9] S. Ferson, L. Ginzburg, V. Kreinovich, L. Longpr´e, and M. Aviles. Computing variance for interval data is NPhard. ACM SIGACT News 33(2) (2002) 108–118. [10] S. Ferson, L. Ginzburg, V. Kreinovich, L. Longpr´e, and M. Aviles. Exact bounds on finite populations of interval data. Reliab. Comput. 11(3) (2005) 207–233. [11] A.A. Gaganov. Computational Complexity of the Range of the Polynomial in Several Variables. M.S. Thesis. Mathematics Department, Leningrad University, Leningrad, USSR, 1981. [12] A.A. Gaganov, Computational complexity of the range of the polynomial in several variables. Cybernetics, Vol. 21, (1985) 418–421. [13] C.P. Robert and G. Casella. Monte Carlo Statistical Methods. SpringerVerlag, New York, 2004. [14] B. Chokr and V. Kreinovich. How far are we from the complete knowledge: complexity of knowledge acquisition in DempsterShafer approach. In: R.R. Yager, J. Kacprzyk, and M. Pedrizzi (eds). Advances in the DempsterShafer Theory of Evidence. Wiley, New York, 1994, pp. 555–576.
30
Handbook of Granular Computing
[15] E.T. Jaynes and G.L. Bretthorst. Probability Theory: The Logic of Science. Cambridge University Press, Cambridge, UK, 2003. [16] G.J. Klir. Uncertainty and Information: Foundations of Generalized Information Theory. Wiley, Hoboken, NJ, 2005. [17] D.J. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall/CRC, Boca Raton, FL, 2004. [18] H.M. Wadswort (ed.). Handbook of Statistical Methods for Engineers and Scientists. McGrawHill, New York, 1990. [19] V. Kreinovich, J. Beck, C. Ferregut, A. Sanchez, G.R. Keller, M. Averill, and S.A. Starks. MonteCarlotype techniques for processing interval uncertainty, and their potential engineering applications. Reliab. Comput. 13(1) (2007) 25–69. [20] V. Kreinovich and S. Ferson. A new Cauchybased blackbox technique for uncertainty in risk analysis. Reliab. Eng. Syst. Saf. 85(1–3) (2004) 267–279. [21] Archimedes. On the measurement of the circle. In: T.L. Heath (ed.), The Works of Archimedes. Cambridge University Press, Cambridge, 1897; Dover edition, 1953, pp. 91–98. [22] W.H. Young. Sull due funzioni a piu valori constituite dai limiti d’una funzione di variable reale a destra ed a sinistra di ciascun punto. Rend. Acad. Lincei Cl. Sci. Fes. 17(5) (1908) 582–587. [23] R.C. Young. The algebra of multivalued quantities. Math. Ann. 104 (1931) 260–290. [24] P.S. Dwyer. Linear Computations. Wiley, New York, 1951. [25] M. Warmus. Calculus of approximations. Bull. Acad. Pol. sci. 4(5) (1956) 253–257. [26] M. Warmus. Approximations and inequalities in the calculus of approximations. Classification of approximate numbers. Bull. Acad. Polon. Sci., Ser. Sci. Math. Astron. Phys. 9 (1961) 241–245. [27] T. Sunaga. Theory of interval algebra and its application to numerical analysis. In: RAAG Memoirs, Ggujutsu Bunken Fukuykai. Research Association of Applied Geometry (RAAG), Tokyo, Japan, 1958, Vol. 2, 1958, pp. 29–46 (547–564). [28] R.E. Moore. Automatic Error Analysis in Digital Computation. Technical Report, Space Div. Report LMSD84821. Lockheed Missiles and Space Co., Sunnyvale, California, 1959. [29] R.E. Moore. Interval Arithmetic and Automatic Error Analysis in Digital Computing. Ph.D. Dissertation. Department of Mathematics, Stanford University, Stanford, CA, 1962. Published as Applied Mathematics and Statistics Laboratories Technical Report No. 25. [30] R.E. Moore. The automatic analysis and control of error in digital computing based on the use of interval numbers. In: L.B. Rall (ed.), Error in Digital Computation. Wiley, New York, 1965, Vol. I, pp. 61–130. [31] R.E. Moore. Automatic local coordinate transformations to reduce the growth of error bounds in interval computation of solutions of ordinary differential equations. In: L.B. Rall (ed.), Error in Digital Computation. Wiley, New York, 1965, Vol. II, pp. 103–140. [32] R.E. Moore, W. Strother, and C.T. Yang. Interval Integrals. Technical Report, Space Div. Report LMSD703073, Lockheed Missiles and Space Co., 1960. [33] R.E. Moore and C.T. Yang. Interval Analysis I. Technical Report, Space Div. Report LMSD285875. Lockheed Missiles and Space Co., 1959. [34] R.E. Moore. Interval Analysis. Prentice Hall, Englewood Cliffs, NJ, 1966. [35] R.E. Moore. Methods and Applications of Interval Analysis. SIAM, Philadelphia, 1979. [36] Interval computations Web site, Helveticahttp://www.cs.utep.edu/intervalcomp, 2008. [37] L. Jaulin, M. Kieffer, O. Didrit, and E. Walter. Applied Interval Analysis: With Examples in Parameter and State Estimation, Robust Control and Robotics. SpringerVerlag, London, 2001. [38] S. Markov and K. Okumura. The contribution of T. Sunaga to interval analysis and reliable computing. In: T. Csendes (ed.), Developments in Reliable Computing. Kluwer, Dordrecht, 1999, pp. 167–188. [39] R.E. Moore, The dawning. Reliab. Comput. 5 (1999) 423–424. [40] G. Heindl. An Improved Algorithm for Computing the Product of Two Machine Intervals. Interner Bericht IAGMPI 9304. Fachbereich Mathematik, Gesamthochschule Wuppertal, 1993. [41] C. Hamzo and V. Kreinovich. On average bit complexity of interval arithmetic. Bull. Eur. Assoc. Theor. Comput. Sci. 68 (1999) 153–156. [42] M. Berz and G. Hoffst¨atter. Computation and application of Taylor polynomials with interval remainder bounds. Reliab. Comput. 4(1) (1998) 83–97. [43] A. Neumaier. Taylor forms – use and limits. Reliab. Comput. 9 (2002) 43–79. [44] N. Revol, K. Makino, and M. Berz. Taylor models and floatingpoint arithmetic: proof that arithmetic operations are validated in COSY. J. Log. Algebr. Program. 64(1) (2005) 135–154. [45] R.B. Kearfott and V. Kreinovich (eds). Applications of Interval Computations. Kluwer, Dordrecht, 1996.
Interval Computation: An Introduction
31
[46] R. Hammer, M. Hocks, U. Kulisch, and D. Ratz. Numerical Toolbox for Verified Computing. I. Basic Numerical Problems. SpringerVerlag, Heidelberg, New York, 1993. [47] B.R. Barmish. New Tools for Robustness of Linear Systems. McMillan, New York, 1994. [48] S.P. Bhattacharyya, H. Chapellat, and L. Keel. Robust Control: The Parametric Approach. PrenticeHall, Englewood Cliffs, NJ, 1995. [49] E.R. Hansen and G.W. Walster. Global Optimization Using Internal Analysis. MIT Press, Cambridge, MA, 2004. [50] R.B. Kearfott. Rigorous Global Search: Continuous Problems. Kluwer, Dordrecht, 1996. [51] J. Haas, M. Hutchings, and R. Schlafy. The double bubble conjecture. Electron. Res. Announc. Am. Math. Soc. 1 (1995) 98–102. [52] T.C. Hales. A proof of the Kepler conjecture. Ann. Math. 162 (2005) 1065–1185. [53] S. Ferson. Risk Assessment with Uncertainty Numbers: Risk Calc. CRC Press, Boca Raton, FL, 2002. [54] V. Kreinovich, L. Longpr´e, S.A. Starks, G. Xiang, J. Beck, R. Kandathi, A. Nayak, S. Ferson, and J. Hajagos. Interval versions of statistical techniques, with applications to environmental analysis, bioinformatics, and privacy in statistical databases. J. Comput. Appl. Math. 199(2) (2007) 418–423. [55] V. Kreinovich, G. Xiang, S.A. Starks, L. Longpr´e, M. Ceberio, R. Araiza, J. Beck, R. Kandathi, A. Nayak, R. Torres, and J. Hajagos. Towards combining probabilistic and interval uncertainty in engineering calculations: algorithms for computing statistics under interval uncertainty, and their computational complexity. Reliab. Comput. 12(6) (2006) 471–501. [56] J.M. Mendel. Uncertain RuleBased Fuzzy Logic Systems: Introduction and New Directions. Prentice Hall, Englewood Cliffs, NJ, 2001. [57] W. Pedrycz (ed.). Granular Computing: An Emerging Paradigm. SpringerVerlag. New York, 2001.
2 Stochastic Arithmetic as a Model of Granular Computing Ren´e Alt and Jean Vignes
2.1 Introduction Numerical simulation is used more and more frequently in the analysis of physical phenomena. A simulation requires several phases. The first phase consists of constructing a physical model based on the results of experimenting with the phenomena. Next, the physical model is approximated by a mathematical model. Generally, the mathematical model contains algebraic expressions, ordinary or partial differential equations, or other mathematical features which are very complex and cannot be solved analytically. Thus, in the third phase the mathematical model must be transformed into a discrete model which can be solved with numerical methods on a computer. In the final phase the discrete model and the associated numerical methods must be translated into a scientific code by the use of a programming language. Unfortunately, when a code is run on a computer all the computations are performed using floatingpoint (FP) arithmetic which does not deal with real numbers but with ‘machine numbers’ consisting of a finite number of significant figures. Thus the arithmetic of the computer is merely an approximation of the exact arithmetic. It no longer respects the fundamental properties of the latter, so that every result provided by the computer always contains a roundoff error, which is sometimes such that the result is false. It is therefore essential to validate all computergenerated results. Furthermore, the data used by the scientific code may contain some uncertainties. It is thus also necessary to estimate the influence of data errors on the results provided by the computer. This chapter is made up in two parts. In the first part, after a brief recalling how roundoff error propagation results from FP arithmetic, the CESTAC method (Controle et Estimation STochastique des Arrondis de Calcul) is summarized. This method is a probabilistic approach to the analysis of roundoff error propagation and to the analysis of the influence that uncertainties in data have on computed results. It is presented from both a theoretical and a practical point of view. The CESTAC method gives rise to stochastic arithmetic which is presented as a model of granular computing in a similar fashion to interval arithmetic and interval analysis [1]. Theoretically, in stochastic arithmetic the granules are Gaussian random variables and the tools working on these granules are the operators working on Gaussian random variables. In practice, stochastic arithmetic is discretized and is termed discrete stochastic arithmetic (DSA). In this case granules of DSA are the samples provided by the CADNA (Control of Accuracy and
Handbook of Granular Computing C 2008 John Wiley & Sons, Ltd
Edited by Witold Pedrycz, Andrzej Skowron and Vladik Kreinovich
34
Handbook of Granular Computing
Debugging for Numerical Application) library which implements the CESTAC method. The construction of these granules and the tools working on them are detailed in this first part. In the second part, the use of DSA via the CADNA library in three categories of numerical methods is presented. For finite methods, the use of DSA allows the detection of numerical instabilities and provides the number of exact significant digits of the results. For iterative methods the use of DSA allows iterations to be stopped as soon as a satisfactory result is reached and thus provides an optimal (in some sense) termination criterion. Additionally, it also provides the number of exact significant digits in the results. In the case of approximate methods, DSA allows the computation of an optimal step size for the numerical solution of differential equations and the computation of integrals. As in the previous cases, DSA also provides the number of exact significant digits in the results. For each of the three categories, simple but illustrative examples are presented.
2.2 RoundOff Error Propagation Induced by FP Computation A numerical algorithm is an ordered sequence of ν operations. For the sake of simplicity it is supposed that the considered algorithm provides a unique result r ∈ R. When this algorithm is translated into computer and executed, FP arithmetic is used. The obtained result always contains an error resulting from roundoff error propagation and is different from the exact result r . However, it is possible to estimate this error from the roundoff error resulting from each FP operator.
2.2.1 Errors Due to FP Operators Let us consider any value x ∈ R in normalized FP form: in this section lowercase letters are used for real numbers and uppercase letters are used for ‘machine numbers.’ The FP operations on machine numbers are denoted, respectively, by ⊕, , ⊗, . A real number x is then represented using radix b as x = ε · m · be
with
1 ≤ m < 1, b
(1)
where ε is the sign of x, m is an unlimited mantissa, b is the radix, and e is the integer exponent. This real number x is represented on a computer working with b = 2 and a finite length of p bits for the mantissa as X ∈ F, F being the set of FP values which may be represented on a computer and expressed as X = ε M · 2E ,
(2)
where M is the limited mantissa encoded using p bits, including the hidden bit, and E is the exponent. Then, the absolute roundoff error resulting from each FP operator is X − x = ε M · 2 E − ε · m · be. In what follows it is supposed that the two exponents e and E are identical, which is the case most of the time, except, e.g., if x = 1.9999999 and X = 2.0000000. So the difference X − x, being caused by rounding, is X − x = ε 2 E (M − m), with the finite mantissa M and the infinite mantissa m being identical up to the pth bit. Consequently,
r For the assignment operator, this roundoff error can be expressed by equation (3): X = x − ε · 2 E− p · α. For the rounding to nearest mode: α ∈ [−0.5, 0.5[
(3)
35
Stochastic Arithmetic as a Model of Granular Computing For the rounding to zero mode: α ∈ [0, +1[ For the rounding to −∞ or to +∞ mode: α ∈ ] − 1, +1[ r For the addition operator ⊕ x1 ∈ R
x2 ∈ R X1 ∈ F X i = xi − εi 2 Ei − p αi
X2 ∈ F i = 1, 2
X 1 ⊕ X 2 = x1 + x2 − ε1 2 E1 − p α1 − ε2 2 E2 − p α2 − ε3 2 E3 − p α3 ,
(4) (5)
where E 3 , ε3 , and α3 are, respectively, the exponent, the sign, and the roundoff error resulting from the FP addition. r For the subtraction operator X 1 X 2 = x1 − x2 − ε1 2 E1 − p α1 + ε2 2 E2 − p α2 − ε3 2 E3 − p α3 .
(6)
r For the multiplication operator ⊗ X 1 ⊗ X 2 = x1 x2 − ε1 2 E1 − p α1 x2 − ε2 2 E2 − p α2 x1 + ε1 ε2 2 E1 +E2 −2 p α1 α2 − ε3 2 E3 − p α3 .
(7)
In equation (7) the fourth term is of second order in 2− p . When this term is neglected, the approximation to the first order of the roundoff error resulting from the FP multiplication is expressed as equation (8): X 1 ⊗ X 2 x1 x2 − ε1 2 E1 − p α1 x2 − ε2 2 E2 − p α2 x1 − ε3 2 E3 − p α3 .
(8)
r For the division operator In the same way as that for the multiplication, the approximation to the first order of the roundoff error is expressed as equation (9): X1 X2
x1 α1 x1 − ε1 2 E 1 − p + ε2 2 E2 − p α2 2 . x2 x2 x2
(9)
2.2.2 Error in a Computed Result Starting from the equations in Section 2.2.1, the absolute roundoff error on the computed result R of a code requiring ν FP operations including assignments can be modeled to first order in 2− p by equation. (10) R=r+
ν
g j (d)2− p α j ,
(10)
j=1
where g j (d) are quantities depending exclusively on the data and on the code but independent of the α j ’s and r and R are the exact result and the computed result respectively. This formula has been proved in [2, 3].
2.3 The CESTAC Method, a Stochastic Approach for Analyzing RoundOff Error Propagation In the stochastic approach the basic idea is that during a run of a code roundoff errors may be randomly positive or negative with various absolute values. Thus in equation (10), the coefficients α j ’s may be considered as independent random variables. The distribution law of α j ’s has been studied by several authors. First, Hamming [4] and Knuth [5] showed that the most realistic distribution of mantissas is
36
Handbook of Granular Computing
a logarithmic distribution. Then, on this basis, Feldstein and Goodman [6] proved that the roundoff errors denoted by the α j ’s can be considered as random variables uniformly distributed on the intervals previously defined in Section 2.2.1 as soon as the number of bits p of the mantissa is greater than 10. Note that in practice p ≥ 24. In this approach a computed result R can be considered as a random variable, and the accuracy of this result depends on the characteristics of this random variable, i.e., the mean value μ and the standard deviation σ . This means that the higher the value of μσ , the lower the accuracy of R. But for estimating μ and σ it is necessary to obtain several samples of the distribution of R. Unfortunately, during the computation information on roundoff errors is lost. In consequence, how is it possible to obtain several samples of the computed result R? The CESTAC method gives an easy answer to this question.
2.3.1 Basic Ideas of the Method The CESTAC method was first developed by M. La Porte and J. Vignes [7–11] and was later generalized by the later in [12–19]. The basic idea of the method is to execute the same code N times in a synchronous manner so that roundoff error propagation is different each time. In doing this N samples are obtained using a random rounding mode.
2.3.2 The Random Rounding Mode The idea of the random rounding mode is that each result R ∈ F of an FP operator (assignment, arithmetic operator), which is not an exact FP value, is always bounded by two FP values R − and R + obtained, respectively, by rounding down and rounding up, each of them being representative of the exact result. The random rounding consists, for each FP operation or assignment, in choosing the result randomly and with an equal probability, as R − or R + . When a code is performed N times in a synchronous parallel way using this random rounding mode, N samples Rk , k = 1, . . . , N , are obtained of each computed result. From these samples, the accuracy of the mean value R, considered to be the computed result, may be estimated as explained in the following section.
2.3.3 Modeling the CESTAC Method From the formalization of the roundoff error of the FP operators presented in Section 2.2, a probabilistic model of the roundoff error on a computed result obtained with the random rounding mode has been proposed. This model is based on two hypotheses: – Hyp. 1: The elementary roundoff errors α j of the FP operators are random, independent, uniformly distributed variables. – Hyp. 2: The approximation of the first order in 2− p is legitimate. Hypothesis 2 means that the terms in 2−2 p , which appear in the expression of the roundoff error of FP multiplications and FP divisions, have been neglected. Only the terms in 2− p are considered. It has been shown [2, 3] that if the two previous hypotheses hold, each sample Rk obtained by the CESTAC method may be modeled by a random variable Z defined by Z r+
ν
u i (d)2− p z i
z i ∈] − 1, +1[,
(11)
i=1
where u i (d) are constants, ν is the number of arithmetic operations, and z i are the relative roundoff errors on mantissas considered as independent, centered, and equidistributed variables.
Stochastic Arithmetic as a Model of Granular Computing
37
Then
r R = E(Z ) r . r The distribution of Z is a quasiGaussian distribution. Consequently to estimate the accuracy of R it is legitimate to use Student’s test which provides a confidence interval for R and then to deduce from this interval the number of significant decimal digits C R of R, which is estimated by equation (12). √ N R C R = log10 , (12) σ τη where R=
1 N
N
Rk
and σ 2 =
k=1
N 2 1 Rk − R , N − 1 k=1
τη is value of Student’s distribution for N − 1 degrees of freedom and a probability level 1 − η. Remark. The statistical property used here is the following. Let m be the unknown mean value of a Gaussian distribution. Then, if (xk ), k = 1, . . . , N , are N measured values satisfying distribution, x is this N 1 (xk − x)2 , the mean value of the xk , and σ is the empirical standard deviation defined as σ = N −1 k=1 √ x−m the variable T = σ N satisfies a Student’s distribution with N − 1 degrees of freedom. For example for N = 3 and 1 − η = 0.05, i.e., a percentile of 0.975, Student’s table for N − 1 = 2 degrees of freedom provides a value τη = 4.303. From a theoretical point of view, we may define a new number called a ‘stochastic number,’ which is a Gaussian variable defined by its mean value and its standard deviation. The corresponding operators for addition, subtraction, multiplication, and division define what is called stochastic arithmetic.
2.4 Stochastic Arithmetic Stochastic arithmetic operates on stochastic numbers and is directly derived from operations on independent Gaussian random variables. Let us present here the main properties of this arithmetic, which are detailed in [20, 21]. From the granular computing point of view a stochastic number is a granule and the stochastic arithmetic is a tool for computing granules.
2.4.1 Definition of the Stochastic Operators Definition 1. Stochastic numbers (granules). The set of stochastic numbers denoted S is the set of Gaussian random variables. Then an element X ∈ S is defined by X = (m, σ ), m being the mean value of X and σ being its standard deviation. If X ∈ S and X = (m, σ ), then λη exists (depending only on η) such that
P X ∈ Iλ,X = 1 − η
Iη,X = m − λη σ, m + λη σ . (13) Iη,X is the confidence interval of m with a probability (1 − η). For η = 0.05, λη = 1.96. Then the number of significant digits on m is obtained by
m . (14) Cη,X = log10 λη σ
38
Handbook of Granular Computing
Definition 2. Stochastic zero. X ∈ S is a stochastic zero, denoted 0, if and only if Cη,X ≤ 0
X = (0, 0).
or
Note that if mathematically f (x) = 0, then with stochastic arithmetic F(X ) = 0. Definition 3. Stochastic operators (tools working on granules). The four elementary operations of stochastic arithmetic between two stochastic numbers X 1 = (m 1 , σ1 ) and X 2 = (m 2 , σ2 ), denoted by s+, s−, s×, s/, are defined by
de f X 1 s+ X 2 = m 1 + m 2 , σ12 + σ22 , X1
s−
X1
s×
de f X 2 = m 1 − m 2 , σ1 2 + σ 2 2 ,
de f 2 2 2 2 2 2 X 2 = m 1 m 2 , m 2 σ1 + m 1 σ2 + σ1 σ2 ,
s/
X 2 = ⎝m 1 /m 2 ,
⎛ X1
de f
σ1 m2
2 +
m 1 σ2 m2
2
⎞ ⎠,
with m 2 = 0.
Remark. The first three formulas including stochastic multiplication are exact. However the last one is often simplified by considering only the firstorder terms in mσ , i.e., by suppressing the term σ1 2 σ2 2 [21], but this is acceptable only when this ratio is small, which is not always the case. The formula for the division is defined only to the firstorder terms in mσ . Definition 4. Stochastic equality. Equality between two stochastic numbers X 1 and X 2 is defined as follows: X 1 is stochastically equal to X 2 , denoted by X 1 s =X 2 , if and only if (15) X 1 s− X 2 = 0 ⇒ m 1 − m 2  ≤ λη σ 21 + σ 22 . Definition 5. Order relations between stochastic numbers.
r X 1 is stochastically greater than X 2 , denoted X 1 s > X 2 , if and only if m 1 − m 2 > λη
σ 21 + σ 22 .
(16)
r X 1 is stochastically greater than or equal to X 2 , denoted X 1 s ≥ X 2 , if and only if m 1 ≥ m 2 or m 1 − m 2  ≤ λη
σ 21 + σ 22 .
(17)
Based on these definitions the following properties of stochastic arithmetic have been proved [21, 22]: 1. 2. 3. 4. 5. 6.
m 1 = m 2 ⇒ X 1s =X 2 . s = is a reflexive and symmetric relation, but it is not a transitive relation. X 1 s >X 2 ⇒ m 1 > m 2 . m 1 ≥ m 2 ⇒ X 1 s ≥X 2 . s < is the opposite of s >. s > is a transitive relation.
39
Stochastic Arithmetic as a Model of Granular Computing 7. s ≥ is a reflexive and symmetric relation, but is not a transitive relation. 8. 0 is absorbent; i.e., ∀X ∈ S, 0 s× X s = 0.
2.4.2 Some Algebraic Structures of Stochastic Arithmetic As seen above stochastic numbers are Gaussian random variables with a known mean value and a known standard deviation. It can be seen that algebraic structures close to those existing in the set of real numbers can be developed for stochastic numbers. But this is beyond the scope of this handbook. As an example a numerical example of the algebraic solution of linear systems of equations with righthand sides involving stochastic numbers is presented. The aim of this example is to show how a theory can be developed for a better understanding of the properties of the CESTAC method. In particular, it must be noticed that the signs of errors are unknown but when computing with these errors, operations are done with their signs. Consequently, as errors are represented in the theory by standard deviations of Gaussian variables, a sign must be introduced for them. This is done in the same way that intervals are extended to generalized intervals in which the bounds are not ordered. A stochastic number (m, σ ) for which σ may be negative is called a generalized stochastic number. For a detailed presentation of the theory, see [23–26].
The solution of a linear system with stochastic righthand sides [24]. We shall use here the ˙ for the arithmetic addition over standard deviations and the special symbol ∗ for the special symbol + multiplication of standard deviations by scalars. These operations are different from the corresponding ones for numbers but the use of the same symbol ∗ for the multiplication of standard deviations or stochastic numbers by a scalar causes no confusion. The operations + and ∗ induce a special arithmetic on the set R+ . We consider a linear system Ax = b, such that A is a real n × n matrix and the righthand side b is a vector of stochastic numbers. Then the solution x also consists of stochastic numbers, and in consequence, all arithmetic operations (additions and multiplications by scalars) in the expression Ax involve stochastic numbers; therefore, we shall write A ∗ x instead of Ax. Problem.. Assume that A = (ai j ), i, j = 1, . . . , n, ai j ∈ R is a real n × n matrix, and B = (b, τ ) is a ntuple of (generalized) stochastic numbers, such that b, τ ∈ Rn , b = (b1 , . . . , bn ), and τ = (τ1 , . . . , τn ). We look for a (generalized) stochastic vector X = (x, σ ), x, σ ∈ Rn , i.e., an ntuple of stochastic numbers, such that A ∗ X = B. ˙ ···+ ˙ ain ∗ xn = bi . Obviously, Solution.. The ith equation of the system A ∗ X = B reads ai1 ∗ x1 + A ∗ X = B reduces to a linear system Ax = b for the vector x = (x1 , . . . , xn ) of mean values and a system A ∗ σ = τ for the standard deviations σ = (σ1 , . . . , σn ). If A = (ai j ) is nonsingular, then x = A−1 b. We shall next concentrate on the solution of the system A ∗ σ = τ for the standard deviations. The ith equation of the system A ∗ σ = τ reads ai1 ∗ σ1 + · · · + ain ∗ σn = τi . It has been proved [21] that this is equivalent to 2 2 ˙ ···+ ˙ ain ai1 sign(σ1 )σ12 + sign(σn )σn2 = sign(τi ) τi2 ,
i = 1, . . . , n,
with sign(σ j ) = 1 if σ j ≥ 0 and sign(σ j ) = −1 if σ j < 0. Setting yi = sign(σi )σi2 and ci = sign(τi )τi2 , we obtain a linear n × n system Dy = c for y = (yi ), where D = (ai2j ) and c = (ci ). If D is nonsingular, we can solve the system Dy = c for the vector y √ and then obtain the standard deviation vector σ by means of σi = sign(yi ) yi . Thus for the solution of the original problem it is necessary and sufficient that both matrices A = (ai j ) and D = (ai2j ) are nonsingular. Summarizing, to solve A ∗ X = B the following steps are performed: 1. Check the matrices A = (ai j ) and D = (ai2j ) for nonsingularity. 2. Find the solution for mean values; i.e., solve the linear system Ax = b.
40
Handbook of Granular Computing
−1 2 3. Find the solution √ y = D c of the linear system Dy = c, where c = (ci ) and ci = sign(τi )τi . Compute σ = sign(yi ) yi . 4. The solution of A ∗ X = B is X = (x, σ ).
Numerical Experiments Numerical experiments, using imprecise stochastic data, have been performed to compare the theoretical results with numerical results obtained using the CESTAC method, implemented in the CADNA library (see Section 2.8). As an example, the two solutions obtained for a linear system are reported below. Let A = {ai j } be a real matrix such that ai j = i, if i = j else ai j = 10−i− j , i, j = 1, . . . , n. Assume that B is a stochastic vector such that the component Bi is a stochastic number with a mean value bi = nj=1 ai j and a standard deviation for each component equal to 1 · e − 4. The centers xi of the components of the solution are thus close to 1 and present no difficulty for their computation. The theoretical standard deviations are obtained according to the method described in the previous √ section. First, matrix D is computed from the matrix A, then Dy = c is solved, and then σi = sign(yi ) yi  is computed. For a correct comparison of the solution provided by the CADNA library and the theoretical solution, accurate values for the standard deviations are obtained as follows. Twenty different vectors b(k) , k = 1, . . . , 20, for the righthand side have been randomly generated, and the corresponding twenty systems A ∗ X = B (k) have been solved. For each component of B (k) , the standard deviation of the N = 3 samples has been computed with the CADNA software and then the mean value of the standard deviations has been computed for each component and presented in Table 2.1. As we can see in Table 2.1, the theoretical standard deviations and the computed values are very close. To conclude this subsection we comment that the theoretical study of the properties of stochastic numbers allows us to obtain a rigorous abstract definition of stochastic numbers with respect to the operations of addition and multiplication by scalars. This theory also allows the solution of algebraic problems with stochastic numbers. Moreover, this provides a possibility of comparing algebraically obtained results with practical applications of stochastic numbers, such as those provided by the CESTAC method [27]. Remark. The authors are grateful to Professor S. Markov from the Bulgarian Academy of Sciences and to Professor J.L. Lamotte from University Pierre et Marie Curie (Paris, France) for their contribution to the above section.
Table 2.1 Theoretical and computed standard deviations Component
Theoretical standard deviations, x
Computed standard deviations
1 2 3 4 5 6 7 8 9 10
9.98e−05 4.97e−05 3.32e−05 2.49e−05 1.99e−05 1.66e−05 1.42e−05 1.24e−05 1.11e−05 0.999e−05
10.4e−05 4.06e−05 3.21e−05 2.02e−05 1.81e−05 1.50e−05 1.54e−05 1.02e−05 0.778e−05 0.806e−05
Stochastic Arithmetic as a Model of Granular Computing
41
2.5 Validation and Implementation of the CESTAC Method 2.5.1 Validation and Reliability of the CESTAC Method The theoretical validation of the CESTAC method is therefore established if and only if the two previous hypotheses hold. But its efficiency in scientific codes can be guaranteed only if its underlying hypotheses hold in practice:
r Concerning hypothesis 1, because of the use of the random rounding mode, roundoff errors αi are random variables; however, in practice they are not rigorously centered and in this case Student’s test leads to a biased estimation of the computed result. It may be thought that the presence of a bias seriously jeopardizes the reliability of the CESTAC method. In fact it has been proved in [13] that it is the ratio q of the bias divided by the standard deviation σ which is the key of the reliability of equation (12). It is shown in [13, 21] that a magnitude of q of several tens only induces an error less than one decimal significant digit on C R computed with equation (12). This great robustness of equation (12) is due first to the use of the logarithm and second to the natural robustness of Student’s test. Consequently in practice even if hypothesis 1 is not exactly satisfied, it is not a drawback for the reliability of equation (12). r Concerning hypothesis 2, the approximation to first order only concerns multiplications and divisions, because in formulas (5) and (6) for roundoff errors in additions or subtractions, the secondorder terms, i.e., those in 2−2 p , do not exist. For the firstorder approximation to be legitimate, it is shown in [2, 20] that if ε1 and ε2 are, respectively, the absolute roundoff errors on the operands X 1 ∈ F and X 2 ∈ F, the following condition must be verified: ε1 ε2 (18) Max , 1. X1 X2 Hence, the more accurate the computed results, the more legitimate the firstorder approximation. However, if a computed result becomes nonsignificant, i.e., if its roundoff error is of the same order of magnitude as itself, then the firstorder approximation may not be legitimate. In other words, with the use of the CESTAC method hypothesis 2 holds when 1. The operands of any multiplication are both significant. 2. The divisor of any division is significant. As a consequence, validation of the CESTAC method requires during the run of the code to control steps (1) and (2). Indeed if (1) or (2) is not satisfied, this means that hypothesis 2 has been violated and then the results obtained with equation (12) must be considered as unreliable. This control is achieved with the concept of the computational zero described in Section 2.6. This control is in practice performed by the CADNA library, which is sketched in Section 2.8.
2.5.2 Implementation of the CESTAC Method The two main features of the CESTAC method are as follows:
r The random rounding for each arithmetical operation, which consists in randomly choosing either the result rounded to up ρ + or the result rounded to down ρ − .
r Performing the N runs of code.
To set these features in context we must consider the period pre1988 when FP arithmetic was machine dependent and post1988 when it was standardized by IEEE.
42
Handbook of Granular Computing
Asynchronous Implementation Before 1988, FP arithmetic was highly computer dependent. Scientific computers as IBM, CDC, and CRAY worked with different rounding modes either with a chopping mode (rounding to zero) or with a rounding to the nearest mode. Sometimes, even on the same computer some arithmetic operations were performed with the chopping mode and others with the rounding to the nearest mode. At this time an implementation which violates the hypotheses of the method has been used in a software named Prosolver [28]. As a consequence this flawed software has been the origin of some criticisms as in [29, 30], which have been erroneously attributed to the method. This implementation has also been used later in the Monte Carlo arithmetic (see [31–33]). In this implementation, which is called ‘asynchronous implementation,’ the N runs of a code were performed independently. This means that the code was first run to completion and then it was run a second time and so on until the N th run. In addition, in the software Prosolver, the random rounding mode consisted in randomly adding ±1 or 0 to the last bit of every FP operation result. This random rounding mode is unsatisfactory because, even when the result of an FP operation is an exact FP value, it is increased or decreased by one unit on the last position (ulp). The main criticisms of this implementation were that the random rounding used as defined before violates theorems about exact rounding, and when a computation is virulently unstable but in a way that almost always diverges to the same wrong destination, such a randomized recomputation almost always yields the same wrong result. The correct implementation is now described in the following section.
Correct Synchronous Implementation It is only since 1990 that the standard IEEE 754 FP arithmetic has been available to users. Around the same time scientific languages began to provide users with the capability of overloading operators. With IEEE 754 arithmetic and the overloading statements it is easy to implement the CESTAC method correctly.
r A correct random rounding mode
It was proposed in Section 2.3.2 to choose ρ − or ρ + as result of FP operator. In practice we use the IEEE 754 rounding toward +∞ and toward −∞. Rounding occur only when an arithmetic operation has a result that is not exact. Therefore no artificial roundoff error is introduced in the computation. The choice of the rounding is at random with an equal probability for the (N − 1) first samples, with the last one chosen as the opposite of the (N − 1)th sample. With this random rounding the theorems on exact rounding are respected. r Synchronous runs We have seen previously that to control the reliability of the CESTAC method it is absolutely necessary to detect during the run of the code the emergence of computational zeroes. To achieve this it suffices to use the synchronous implementation which consists of performing each FP operator N times with the random rounding mode before performing the next operation. Thus everything proceeds as if N identical codes were running simultaneously on N synchronized computers each using the random rounding mode. Thus for each numerical result we have N samples, from which with equation (12) the number of significant decimal digits of the mean value, considered as the computed result, is estimated. With this implementation a DSA may be defined, allowing during the run of the code to control dynamically the reliability of the CESTAC method. Thus it is possible dynamically to – control the roundoff error propagation of each FP operation, – detect a loss of accuracy during the computation, – control the branching statements, and – detect a violation of hypothesis 2 which guarantees the reliability of the method.
2.6 Discrete Stochastic Arithmetic (DSA) The concept of the computational zero and the synchronous implementation of the CESTAC method leads to operations on Ntuples as referred to discrete stochastic numbers. Operation on these numbers is
43
Stochastic Arithmetic as a Model of Granular Computing
also termed DSA. The salient properties of this arithmetic, which is detailed in [16, 17, 34], are presented here. From the granular computing point of view, a discrete stochastic number is a granule and the DSA is a tool for computing granules.
2.6.1 Discrete Stochastic Arithmetic Operators Definition 6. Discrete stochastic numbers (granules). A discrete stochastic number is an N tuple formed by the N samples provided by the synchronous implementation of the CESTAC method. Definition 7. Discrete stochastic arithmetic (tools working on granules). DSA operates on discrete stochastic numbers. The result of the four discrete stochastic operators is by definition the result of the corresponding arithmetic operation provided by the CESTAC method. Let X, Y , and Z be discrete stochastic numbers, and let be an FP arithmetic operator ∈ [⊕, , ⊗, ], as defined in Section 2.2, X = (X 1 , . . . , X N ) ,
Y = (Y1 , . . . , Yn ) ,
Z = (Z 1 , . . . , Z N ) .
Then any of the four stochastic arithmetic operations s+, s−, s×, s/, denoted s, is defined as Z = X s Y ⇒ Z = (X 1 Y1 )± , . . . , (X N Y N )± ,
(19)
where ± means that the FP operation has been randomly performed with the rounding toward +∞ or toward −∞, as explained previously. Thus any discrete stochastic operator provides a result that is an N tuple obtained from the corresponding FP operator operating on the components of the two operands the result of which is rounded at random toward +∞ or −∞. Remark. To simplify the notations the ones for the discrete stochastic operators are chosen to be the same as those for the (continuous) stochastic operators. Then with DSA it is straightforward using equation (12) to estimate the number of significant decimal digit of any result produced by a DSA operator. Definition 8. Discrete stochastic zero (computational zero) [15]. Any discrete stochastic number X = (X 1 , X 2 , . . . , X N ) is a discrete stochastic zero, also called computational zero, denoted @.0, if one of the two following condition holds: 1. ∀i, X i = 0, i = 1, . . . , N . 2. C X¯ ≤ 0, where C X¯ is obtained from equation (12).
2.6.2 Discrete Stochastic Relations (Tools Working on Granules) From the concept of the discrete stochastic zero @.0, discrete stochastic relations can now be defined. Let X , Y be discrete stochastic numbers, it is possible to define equality and order relations for these numbers. They are called discrete stochastic equality and discrete order relations and are defined as follows.
44
Handbook of Granular Computing
Definition 9. Discrete stochastic equality denoted by s =. The discrete stochastic equality is defined by X s= Y
if
Z = X s−Y = @.0.
Definition 10. Discrete stochastic inequalities denoted by s > and s ≥. These are defined by X s >Y
if
X >Y
and
X s−Y = @.0;
X s ≥Y
if
X ≥Y
or
X s−Y = @.0.
With this DSA it is possible during the execution of a code to follow the roundoff error propagation, detect numerical instabilities, check branchings, and check hypotheses that guarantee the reliability of equation (12).
2.7 Taking into Account Data Errors In reallife problems, data often come from measurements and thus contain errors issuing from sensors. Most of the time data errors may be considered as centered Gaussian random variables. It is then absolutely necessary to estimate the effect of these errors on the numerical results provided by DSA. In a similar fashion to estimating equation (11), let us consider a finite sequence of ν arithmetic operations, providing a single result r and requiring nd uncertain data di , i = 1, . . . , nd. Let δi be the data error on each di . These δi ’s may be considered as Gaussian variables with a standard deviation of σi . It has been proved [3, 35] that when the previous finite sequence is performed with DSA, each data Di , i = 1, . . . , nd, is defined by Di = di (1 + 2θ σi ).
(20)
θ being a random number uniformly distributed on ] − 1, +1[, then each N tuple of the computer result R may be modeled by a Gaussian random centered variable
R r+
nd i=1
vi (d)2− p δi +
ν
gi (d)2− p z i ,
(21)
i=1
vi (d) being quantities depending exclusively on the data and on the code. This formula is an extension of equation (11). Indeed the first quantity represents the error coming from uncertainties of data and the second represents the roundoff error propagation. Then to estimate the number of significant decimal digits in the computed result R it suffices to use equation (21). In this estimation both errors (uncertainties of data and roundoff error) have been taken in account. In the framework of granular computing each data item Di is a granule elaborated from (20), which is an operand for the DSA operators.
2.8 The CADNA Library [36] The CADNA library has been written in Fortran, C++, and ADA. It is presented in detail in [20]. It is the Fortran version that is described here. The CADNA library automatically implements DSA in any Fortran code. For CADNA Fortran and CADNA ADA, N = 3 has been chosen. But for CADNA C++, the value of N must be chosen by the user. Furthermore, the probability is here chosen to the classical level of η = 0.95. As seen in the beginning, in equation (12) for N = 3 and 1 − η = 0.05
Stochastic Arithmetic as a Model of Granular Computing
45
the value of Student’s table for N − 1 = 2 degrees of freedom is τη = 4.303. Thus the CADNA library enables a user to run a scientific code with DSA on a conventional computer without having to rewrite or even substantially modify the Fortran source code. A new ‘stochastic number’ type has been created, which is a triplet (because N = 3), each component being a sample provided by the random rounding. All the Fortran arithmetic operators, +, −, ∗, /, have been overloaded, so that when such an operator is performed, the operands and the result are stochastic numbers. In the same way the relational operators such as ==, >, ≥, j wi, j = 0.0 for i < j ⎪ ⎪ w = 1.0 for i ∈ [1, n − 1] ⎪ ⎪ ⎩ i,n wn,n = α = 0.9
(23)
In this system the diagonal and the nth column and are (1, 1, . . . α), the elements of the upper triangular submatrix are null except the last column, and those of the lower triangular submatrix are −1. The n elements of righthand side B are equal to 1.
47
Stochastic Arithmetic as a Model of Granular Computing
It is easy to show that the exact solution of this system, which is not ill conditioned, is
xi∗ = −2i−1 xn∗ =
1−α Δ∗
i = [1, n − 1]
(24)
2n−1 Δ∗
Δ∗ = 2n−1 − 1 + α, Δ∗ being the determinant of the matrix. This system has been solved using Gaussian elimination method with partial pivoting, first with the IEEE 754 standard double precision and them with the CADNA library, for n ∈ [30, 35, 40, 45, 50]. The determinant computed with both the IEEE 754 standard and the CADNA library yields results correct to 15 significant decimal digits. Concerning the solution X i , i = 1, n − 1, we find the following results which are detailed in Table 2.2.
r With the IEEE 754 standard some of the last digits are false, but of course the user is not informed of the failure.
r With the CADNA library, only the N decimal digits estimated to be exact up to 1 by the software are provided. It can be seen in Table 2.2 that these are in perfect agreement with the number of exact digits, N ∗ , obtained by comparing the CADNA solution to the exact solution xi∗ , i = 1, n − 1. The following example concerns a problem with uncertain data solved by the CADNA library. To perturb the data, CADNA uses a special function constructed according to formula (20). Example 3. Study of the sensitivity of a determinant to the coefficients of the matrix. Let us consider the determinant proposed in [38]: −73 78 24 Δ = 92 66 25 . −80 37 10
(25)
The exact value of this determinant is Δ = 1. When this determinant is computed with IEEE 754 FP arithmetic in double precision using different rounding modes, the results obtained are as follows:
r r r r
with the rounding to nearest mode Δ = 0.9999999999468869, with the rounding to zero mode Δ = 0.9999999999468865, with the rounding to −∞ mode Δ = 0.9999999999894979, and with the rounding to +∞ mode Δ = 1.000000000747207.
The underlined digits are false but obviously the user is not aware of this fact. When the determinant is computed with the CADNA library, the result is Δ = 1.000000000. Note that the result is printed with only ten digits, which is the best accuracy which can be obtained. Table 2.2
Accuracy of the solution of system (23) for different size n
n
Number of false last decimal digits (IEEE 754 standard)
Number of decimal digits, N (CADNA library)
30 35 40 45 50
9 10 12 13 15
6 4 3 1 0
Number of exact decimal digits, N ∗ 6 5 3 2 0
48
Handbook of Granular Computing Table 2.3 Number of exact decimal digits of Δ as a function of ε ε N
10−15 10
10−13 8
10−11 6
10−9 4
10−7 2
10−5 0
Suppose now that coefficients a12 = 78 and a33 = 10 of the matrix are uncertain data. This means that they both contain a relative error ε, which is taken here to be the same. In other words, a12 ∈ [78 − 78ε, 78 + 78ε]
and a33 ∈ [10 − 10ε, 10 + 10ε].
The CADNA library, as explained above, is an effective tool for estimating the influence of data uncertainties on the computed determinant. Table 2.3 presents the number of exact decimal digits, N , provided by CADNA in the computed determinant (25) as a function of ε, which determines the uncertainty of a12 and a33 . From these results it clearly appears that if the magnitude of uncertainty in the coefficients is greater than or equal to 10−5 , then the determinant cannot be computed since the result obtained is not significant.
2.9.2 Iterative Methods From the mathematical standpoint, these methods, starting from an initial point x0 considered as an approximation of the solution to the problem to be solved, consist in computing a sequence x1 , x2 , . . . , xk that is supposed to converge to the solution. So, let us consider here an iterative sequence defined by xk+1 = ϕ(xk )
ϕ
Rm −→ Rm .
If the method is convergent, then ∃ x : x = limk→∞ xk . From the computational point of view, this limit cannot be reached, and consequently a termination criterion is used to stop the iterative process, such as if X k − X k−1 ≤ εX k then stop
X k ∈ Fm ,
where ε is an arbitrary positive value. It is clear that this termination criterion is not satisfactory for two reasons. If ε is too large then the sequence is broken off before a good approximation to the solution is reached. On the contrary if ε is too small, then many useless iterations are performed, without improving the accuracy of the solution because of roundoff error propagation. Moreover each X k has only a certain number of significant decimal digits. If the ε selected is less than the accuracy of X k , this termination criterion is no longer meaningful. Two problems then arise. 1. How can the iterative process be stopped correctly? 2. What is the accuracy of the computed solution provided by the computer? With the use of the CADNA library, thanks to the properties of DSA, it is possible to define new termination criteria, depending on the problem to be solved, which stop the iterative process as soon as a satisfactory computational solution is reached. Indeed two categories of problems exist: 1. those for which there exists some function which is null at the solution of the problem. The solution of a linear or nonlinear system or the search of an optimum for a constrained or nonconstrained problem belong to this category; 2. those for which such a function does not exist. Such is the computation of the sum of a series.
49
Stochastic Arithmetic as a Model of Granular Computing
For the first category the termination criterion is called ‘optimal termination criterion.’ It acts directly on functions which must be null at the solution of the problem. For example, from the mathematical standpoint if xs ∈ Rm is the solution of a linear system then A · xs − B = 0. The optimal termination criterion consists in stopping the iterative process on the kth iterations if and only if A ∗ X k s−B = @.0 (@.0 being the computational zero). For the second category the usual termination criterion is replaced by if X k s−X k−1 = @.0
then stop.
With this termination criterion the arbitrary value ε is eliminated. Example 4. Jacobi’s iterations. To illustrate the fact that the choice of ε in the classical termination criterion may be difficult and the efficiency of the optimal termination criterion, let us consider the following linear system AX = B of dimension 25 with ai j =
for i, j = 1, . . . , n and i = j i−1 n aii = 1. + j=1 ai j + for i = 1, . . . , n j=i+1 ai j n for i = 1, . . . , n, with z j = 3 j−1 × 2−10 , bi = j=1 ai j z j 1 i+ j−1
(26) j = 1, . . . , n.
As the diagonal of A is dominant Jacobi’s iterations are always convergent. The exact solution is x j = 3 j−1 × 2−10 . System (26) was first solved using standard IEEE 754 doubleprecision floatingpoint arithmetic (DPFP) with several values of ε and then with the CADNA doubleprecision DSA. The results are the following:
r With DPFP and the classical termination criterion with ε = 10−4 , the last unknown x25 has been r
r r r
computed with the maximum accuracy (15 decimal digits) and the accuracy decreases from component to component until it reaches 3 decimal digits on the first component x1 . With ε < 10−4 , the test is never satisfied and the iterations stop on a predefined arbitrary maximum number of iterations. Hence a great number of useless iterations are computed, without improving the accuracy of the solution. For example with ε = 10−5 , the process is stopped after 10,000 iterations and the accuracy is identical to the one obtained with ε = 10−4 (see Table 2.4). On the contrary with ε = 10−3 , the process is stopped too soon (419 iterations) and x1 and x25 are obtained with, respectively, 2 and 14 decimal digits. Moreover in all cases with IEEE 754 FP arithmetic the number of significant decimal digits of each unknown cannot be obtained. On the contrary with the use of the CADNA library and the optimal termination criterion defined above the process is stopped as soon as a satisfactory solution is obtained, 459 iterations in this case, and the number of exact decimal digits, up to one, of each component is provided.
The number N of decimal digits thus obtained for system (26) with the initialization xi0 = 15 are given in Table 2.4. Table 2.4
Number of exact decimal digits in the solution of system (26) with the CADNA library
i 1 N 3
3 4
2 4
4 5
5 6
6 6
7 6
8 7
9 8
10 11 12 13 14 15 16 17 18 19 20 21 22 23 23 25 9 9 10 10 11 11 12 12 13 13 14 14 15 15 15 15
50
Handbook of Granular Computing
In fact, as shown in [39] the optimal termination criterion which consists in testing the residual and the usual criterion which consists in testing the difference between two iterates are closely connected in the case of Jacobi’s method because X k+1 − X k = D −1 ˙(B − A X k ), matrix D being the diagonal of A. This is perfectly verified with the CADNA library. In fact when the termination criterion is the stochastic equality of two successive vector iterates, the process is stopped at the 460th iteration and the accuracy of the solution is also the same as the one reported in Table 2.4.
2.9.3 Approximate Methods From the mathematical standpoint, these methods provide only an approximation of the solution. This category contains, e.g., numerical computation of derivatives, numerical integration, and numerical solution of differential or partial differential equations. When these methods are run on a computer, they always provide a solution containing an error eg , which is a combination of the method error em inherent in the employed method and the error due to the propagation of roundoff errors called computation error ec . It is well known that the method error em is an increasing function of the discrete step size h. On the contrary the computation error ec is an increasing function of the inverse h1 of the step size. This means that em and ec act in the opposite way and consequently the global error eg is a function which has a minimum for some value of h. Thus the best approximation of the solution that can be obtained on a computer corresponds to an optimal discrete step size h ∗ , such that deg /dh = 0. Obviously, it is impossible to establish a general methodology to estimate h ∗ , because the method error em is specific to the method. Yet most of the time for a specific method, em can be estimated. Furthermore, ec can be estimated using the CADNA library. Then in many cases, it is possible to estimate h ∗ [17]. To illustrate this, let us consider the following example, which is a simple solution of a differential equation using Euler’s method. Example 5. The differential equation is y = e x y + x y − (x + 1)e−x − 1.
(27)
With the initial condition y(0) = 1, the exact solution is y(x) = e−x . The computation of the optimal step size for each interval [xk , xk+1 ] requires three phases: 1. the estimation of the roundoff error ec , 2. the evaluation of the truncation error (method error) em , and 3. the computation of the optimal step size. The estimation of the roundoff error ec is obtained using the CADNA library. Indeed a special function called nbsignificant digits(x) of this library returns the number of significant digits in a stochastic argument x. This number is an integer n obtained from equation (12) rounded down. The estimation of the roundoff error is then computed by equation (28): ec = 10−(n+1) .
(28)
The estimation of the truncation error em at each step for Euler’s method is well known and is given by em = 2y1 − y2 .
(29)
51
Stochastic Arithmetic as a Model of Granular Computing
Table 2.5 Solution of equation (27) with different step size x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Exact solution y ∗
h = 10−1
h = 10−3
h = 10−6
Optimal h
1.0 0.905 0.819 0.741 0.670 0.607 0.549 0.497 0.449 0.407 0.368
1.0 0.900 0.809 0.726 0.649 0.578 0.511 0.447 0.384 0.320 0.250
1.0 0.905 0.819 0.741 0.670 0.606 0.548 0.496 0.448 0.405 0.366
1.0 0.905 0.818 0.740 0.670 0.606 0.550 0.499 0.453 0.413 0.380
1.0 0.905 0.819 0.741 0.670 0.607 0.548 0.496 0.447 0.404 0.366
y1 is the value of y(xk + h k ) integrated over the interval [xk , xk + h k ] with step size h k , while y2 is the value of y(xk + h k ) integrated over the same interval [xk , xk + h k ] with step size 12 h k . Of course, em may not be less than ec because it is also computed with the same computer. The optimal step size h ∗ can then be obtained with a simple minimization method. The results obtained in single precision using CADNA are presented in Table 2.5. From the results of Table 2.5 it clearly appears that if the step size is too large (h = 0.1) or too small (h = 10−6 ), only one or two decimal digits are obtained in the solution, while with the optimal step size the solution is computed with two or three exact digits. Here, a very simple method to estimate the optimal step size has been presented, but more sophisticated methods have also been developed in [27].
2.10 Can the CADNA Library Fail? To answer to this question, imagine a computation such that only one or two rounding errors are the dominant contribution to the final error. This is the case in example 1 and example 2. Concerning example 1 which has been specially created to jeopardize the stochastic approach of FP computation, it can be shown experimentally that as the number of samples N increases the percentage of failure decreases. This percentage which is presented in Table 2.6 is in total agreement with the approximation of the mean value and standard deviation of an unknown Gaussian distribution by those of empirical values which is used in equation (12). Concerning example 2, during the Gaussian elimination there is no roundoff error propagation except for the last pivot an,n , because all the other results are integer values which are exact FP values. It is exactly the same thing for the computation of the n elements of the righthand side B which are also exact FP values. The values of the last pivot an,n and bn are an,n = α + 2n−1 − 1 and bn = 2n−1 . Table 2.6 Percentage of failure as a function of the number N of samples in example 1 N %
3 10
4 5
5 3
7 1
10 0
52
Handbook of Granular Computing
Table 2.7 Percentage of failures as a function of n and N in example 2 n
N =3
N =4
N =5
N =6
N =7
5 25 45
5 10 10
3 5 6
2 3 4
0 1 3
0 0 0
The larger the n, the closer xn is to 1. Furthermore, xn = bn /an,n and xn−1 = an−1,n+1 (1 − xn ) because an−1,n+1 = an−1,n . With the CADNA library if a particular combination of random roundings makes the N elements of xn equal, then the roundoff error on xn has vanished and the resulting xn−1 , xn−2 , . . . , x1 are false values and CADNA does not detect the failure. Table 2.7 presents the percentages of failures with respect to the dimension n and the number of samples N . Tables 2.6 and 2.7 show that for computations in which only one rounding error is the dominant contribution to the final error, N must be greater than 3 so that there is no failure. Then the choice of N = 3 has to be explained. Indeed this is because for normal computing, several rounding errors contribute to the final error. The CADNA library uses N = 3 and a probability of 95% for estimating the number of significant decimal digits. However it has been shown that if the user accepts an error of one unit in the number of significant decimal digits, then the probability of estimating it up to 1 is 99.94%.
2.11 Conclusion In this chapter the CESTAC method, which is a stochastic method for estimating the error propagation from both the FP arithmetic and the uncertainties of data issuing from sensors, has been presented. With this method the number (up to one) of significant digits on a computed numerical result can be evaluated. However this type of method was incorrectly implemented by S.G. Popovitch with his Prosolver software. In this software the N runs are not synchronized and thus the control of numerical anomalies cannot be performed at the level of each elementary operators. Thus many numerical instabilities will not be not detected. It is for this reason that examples to expose the weakness of this software were proposed in [29] and [30]. Later using the ideas developed for the CESTAC method, the Monte Carlo method and the software Wonglediff were also proposed in [31–33] with the same drawbacks as those of Prosolver. Indeed to be effective, the stochastic method requires that an eventual anomaly or instability is checked at the level of each elementary operation, i.e., an arithmetic operation, an order relation, or a branching. This requires that the N samples representing the result of an operation are obtained synchronously and not by running the same code N times in sequence. In other words, these methods are reliable if and only if they are implemented in the scope of granular computing and follow the model of DSA. The theory of stochastic arithmetic, which is proposed in this chapter, provides a model for computation on approximate data. In this sense it aims at the same target as interval arithmetic except that the operands and operators are different. In the scope of granular computing the granules of stochastic arithmetic are independent Gaussian variables and the tools are the classical operators on Gaussian functions. These operators induce many algebraic structures and some of them have been presented. The theory of DSA provides a model in which granules are composed of an N tuple of N samples of the same mathematical result of an arithmetical operator implemented in FP arithmetic. These samples differ from each other because the data are imprecise and because of different rounding. The operator working on these granules is an FP operator corresponding to the exact arithmetical operator which is performed N times in a synchronous way with random rounding. Thus the result is also a granule. This granule is called a discrete stochastic number. It has been shown that the DSA operating on discrete stochastic numbers has many properties (but not all) of real numbers; in particular, the notion of stochastic zero has been defined.
Stochastic Arithmetic as a Model of Granular Computing
53
The CADNA library implements DSA and is able during the run of a code to analyze the effect of uncertainties of the data and of roundoff error propagation on the result of each arithmetical operation. Thus any anomaly can be detected at this level. When such an anomaly is detected, a warning is written in a special file provided for the user. Hence, because of its correct implementation in the scope of granular computing the CADNA library does not fail when tested with the previously cited examples. This library has been successfully used for solving many problems belonging to the three categories of numerical methods. In the field of linear algebra it has been used for the solution of linear systems using Gaussian elimination, GMRES [40], Orthomin(k) [41], and CGS [39] algorithms. It has enabled the optimization of collocation algorithms [27] and quadrature algorithms [42, 43]. It has also been used for checking the reliability of numerical methods in most fields of research in applied numerical mathematics: geology [44, 45], acoustics [46], solid mechanics [47], engine combustion [48], and atomic physics [49]. In all cases the CADNA library has always been successful. Moreover, many future developments and applications of the CESTAC method, DSA, and CADNA are now possible particularly in the production of selfvalidated libraries requiring no programming effort in every domain of numerical analysis.
References [1] J.G. Rokne. Interval arithmetic and interval analysis: an introduction. In: Granular Computing: An Emerging Paradigm. PhysicaVerlag GmbH, Heidelberg, Germany, 2001, pp. 1–22. [2] J.M. Chesneaux. Study of the computing accuracy by using a probabilistic approach. In: C. Ullrich (ed.), Contributions to Computer Arithmetic and Selfvalidating Methods, IMACS, NJ, 1990, pp. 19–30. [3] J.M. Chesneaux. Etude th´eorique et impl´ementation en ADA de la m´ethode CESTAC. Thesis. Paris VI University, Paris, 1988. [4] R.W. Hamming. On the distribution of numbers. Bell Syst. Tech. J. 49 (1970) 1609–1625. [5] D. Knuth, The Art of Computer Programming 2. AddisonWesley, Reading, MA, 1969. [6] A. Feldstein and R. Goodman. Convergence estimates for the distribution of trailing digits. J. ACM. 23 (1976) 287–297. [7] M. La Porte and J. Vignes. Evaluation statistique des erreurs num´eriques sur ordinateur. Proc. Canadian Comp. Conf. (1972) 414201–414213. [8] M. La Porte and J. Vignes. Etude statistique des erreurs dans l’arithm´etique des ordinateurs, application au contrˆole des r´esultats d’algorithmes num´eriques. Numer. Math. 23 (1974) 63–72. [9] M. La Porte and J. Vignes. M´ethode num´erique de d´etection de la singularit´e d’une matrice. Numer. Math. 23 (1974) 73–82. [10] M. La Porte and J. Vignes. Evaluation de l’incertitude sur la solution d’un syst`eme lin´eaire. Numer. Math. 24 (1975) 39–47. [11] J. Vignes and M. La Porte. Error analysis in computing. In: Information Processing 74, NorthHolland, Amsterdam, 1974, pp. 610–614. [12] P. Bois, and J. Vignes. An algorithm for automatic roundoff error analysis in discrete linear transforms. Intern. J. Comput. Math. 12 (1982) 161–171. [13] J.M. Chesneaux and J. Vignes. Sur la robustesse de la m´ethode CESTAC. C.R. Acad. Sci. Paris 307 (1988) 855–860. [14] J. Vignes. New methods for evaluating the validity of the results of mathematical computations. Math. Comput. Simul. 20 (4) (1978) 227–248. [15] J. Vignes. Z´ero math´ematique et z´ero informatique. C.A. Acad. Sci., Paris 303 (1) (1986) 997–1000; La Vie des Sciences, 4, (1) (1987) 1–13. [16] J. Vignes. Discrete stochastic arithmetic for validating results of numerical software. Numer. Algoriths 37 (2004) 377–390. [17] J. Vignes. A stochastic arithmetic for reliable scientific computation. Math. Comput. Simul. 35 (1993) 233–261. [18] J. Vignes. Review on stochastic approach to roundoff error analysis and its applications. Math. Comput. Simul. 30 (6) (1988) 481–491. [19] J. Vignes and R. Alt. An efficient stochastic method for roundoff error analysis, In: Accurate Scientific Computations, L.N.C.S 235, SpringerVerlag, New York, 1985, pp. 183–205. [20] J.M. Chesneaux. L’Arithm´etique stochastique et le logiciel CADNA. Habilitation a` diriger les recherches. Universit´e Pierre et Marie Curie, Paris, 1995.
54
Handbook of Granular Computing
[21] J.M. Chesneaux and J. Vignes. Les fondements de l’arithm´etique stochastique. C.R. Acad. Sci., Paris 315 (1992) 1435–1440. [22] J.M. Chesneaux. The equality relation in scientific computing. Numer. Algorithms 7 (1994) 129–143. [23] R. Alt, and S. Markov. On the algebraic properties of stochastic arithmetic, comparison to interval arithmetic. In: W. Kraemer and J. Wolff von Gudenberg (eds), Scientific Computing, Validated Numerics, Interval Methods. Kluwer, Dordrecht, 2001, pp. 331–341. [24] R. Alt, J.L. Lamotte and S. Markov. On the numerical solution to linear problems using stochastic arithmetic. In: Proceedings of the 2006 ACM Symposium on Applied Computing, Dijon France, (2006), pp. 1655–1659. [25] S. Markov, R. Alt, and J.L. Lamotte. Stochastic arithmetic: Sspaces and some applications, Numer. Algorithms 37 (1–4) (2004) 275–284. [26] S. Markov and R. Alt. Stochastic arithmetic: addition and multiplication by scalars. Appl. Numer. Math. 50 (2004) 475–488. [27] R. Alt and J. Vignes. Validation of results of collocation methods for ODEs with the CADNA library. Appl. Numer. Math. 20 (1996) 1–21. [28] S.G. Popovitch. Prosolver, La Commande Electronique, France. AshtonTate, Torrance, CA, 1987. [29] W. Kahan. The improbability of probabilistic error analyses. In: UCB Statistics Colloquium. Evans Hall, University of California, Berkeley, 1996. http://www.ca.berkeley.edu/wkahan/improber.ps. [30] W. Kahan. How futile are mindless assessments of roundoff in floating point computation. Householder Symposium XVI, 2005, http://www.cs.berkeley.edu/wkahan/Mindless.pdf. [31] D.S. Parker. Monte Carlo Arithmetic: Exploiting Randomness in Floating Point Arithmetic. Report of computer science department, UCLA, Los Angeles, March, 30, 1997. [32] D.S. Parker, B. Pierce, and D.R. Eggert. Monte Carlo arithmetic: how to gamble with floating point and win. Comput. Sci. Eng. (2000) 58–68. [33] P.R. Eggert, and D.S. Parker. Perturbing and evaluating numerical programs without recompilation – the wonglediff way. Softw. Pract. Exp. 35 (2005) 313–322. [34] J. Vignes. A stochastic approach to the analysis of roundoff error propagation: a survey of the CESTAC method. In: Proceedings of the 2nd Real Numbers and Computer Conference, Marseille, France, 1996, pp. 233–251. [35] M. Pichat and J. Vignes. Ing´enierie du contrˆole de la pr´ecision des calculs sur ordinateur. Technip, Paris (1993). [36] Cadna user’s guide, http://www.lip6.fr/cadna. [37] M. Daumas, and J.M. Muller. Qualit´e des Calculs sur Ordinateur. Masson, Paris, 1997. [38] J.R. Westlake. Handbook of Numerical Matrix Inversion and Solution of Linear Equations. Wiley. New York, 1968. [39] J.M. Chesneaux, and A. Matos. Breakdown and near breakdown control in the CGS algorithm using stochastic arithmetic. Numer. Algorithms 11 (1996) 99–116. [40] F. Toutounian. The use of the CADNA library for validating the numerical results of the hybrid GMRES algorithm. Appl. Numer. Math. 23 (1997) 275–289. [41] F. Toutounian. The stable A T Aorthogonal sstep Orthomin(k) algorithm with the CADNA library. Numer. Algorithms 17 (1998) 105–119. [42] F. J´ez´equel and J.M. Chesneaux. Computation of an infinite integral using Romberg’s method. Numer. Algorithms 36 (2004) 265–283. [43] F. J´ez´equel. Dynamical control of converging sequences computation. Appl. Numer. Math. 50 (2004) 147–164. [44] F. Delay and J.L. Lamotte. Numerical simulations of geological reservoirs: improving their conditioning through the use of entropy. Math. Compute. Simul. 52 (2000) 311–320. [45] J.L. Lamotte and F. Delay. On the stability of the 2D interpolation algorithms with uncertain data. Math. Comput. Simul. 43 (1997) 183–190. [46] J.M. Chesneaux, and A. Wirgin. Reflection from a corrugated surface revisited. J. Acoust. Soc. 96(2 pt. 1) (1993) 1116–1129. [47] N.C. Albertsen, J.M. Chesneaux, S. Christiansen, and A. Wirgin. Evaluation of roundoff error by interval and stochastic arithmetic methods in a numerical application of the Rayleigh theory to the study of scattering from an uneven boundary. In: G. Cohen (ed), Proceedings of the Third International Conference on the Mathematical and Numerical Aspects of Wave Propagation, SIAM, Philadelphia, 1995, pp. 338–346. [48] S. Guilain, and J. Vignes. Validation of numerical software results. Application to the computation of apparent heat release in direct injection diesel engine. Math. Comput. Simul. 37 (1994) 73–92. [49] N.S. Scott, F. J´ez´equel, C. Denis, and J.M. Chesneaux. Numerical ‘health check’ for scientific codes: the CADNA approach. Comput. Phys. Commun. 176 (8) (2007) 507–521.
3 Fundamentals of Interval Analysis and Linkages to Fuzzy Set Theory Weldon A. Lodwick
3.1 Introduction The granular computing of interest to this chapter processes entities (granules) whose representation is other than real numbers. Processing with real numbers leads to determinism and mathematical analysis. The granules of interest to this chapter are intervals and fuzzy sets. The point of departure for this chapter is interval analysis. It is noted that parts of what is presened can be found in [1–4]. An interval [a, b] on the realnumber line, with the usual meaning of the order relation ≤, is the set of all real numbers {x : a ≤ x ≤ b}. The next section develops a natural definition of arithmetic for intervals, represented as pairs of real numbers (their endpoints), that follow from elementary properties of the relation ≤. However, intervals on the real line may be viewed as having a dual nature, both as a set (of real numbers) and as a new kind of number represented by pairs of real numbers. Interval arithmetic derived from both these two natures of the granules (a new number and a set) leads to a relatively new type of mathematical analysis called interval analysis [5]. The point of view of intervals as a set is also discussed in [6]. The logic of interval analysis that follows is one of certain containment. The sum of two intervals certainly contains the sums of all pairs of real numbers, one from each of the intervals. We can also compute intersections and unions of intervals. For a given interval [a, b] and a given real number x, the statement x ∈ [a, b] is either true or false. Moreover, for two intervals A1 and A2 , if we know that x ∈ A1 and x ∈ A2 , then we also know that x ∈ A1 ∩ A2 . In interval arithmetic, and the interval analysis developed from it, a measure of possibility or probability is not assigned to parts of an interval. A number x either is in an interval A or is not. The introduction of a distribution that represents the possible (probable) spread of uncertainty within an interval, and using level sets, integrals, or other measures, connects interval arithmetic to the other granule of interest – fuzzy sets. Computing with intervals, as parts of the total support, whether finite or infinite, of possibility or probability distributions, can produce intervals representing enclosures of mappings of input intervals. It is a separate problem to assign possibility or probability measures to the interval results, according to assumptions about measure on the input intervals, and the general theories of possibility or probability distributions. Handbook of Granular Computing C 2008 John Wiley & Sons, Ltd
Edited by Witold Pedrycz, Andrzej Skowron and Vladik Kreinovich
56
Handbook of Granular Computing
The certainty offered by interval methods refers only to certainty in the sense of knowledge about solutions to mathematical equations. For example, using interval computation we may find that the range of values of a realvalued function f of a single real variable x is contained in an interval [c, d] when x ∈ [a, b]. We can denote this by f ([a, b]) ⊆ [c, d]. Suppose f is continuous, and we are interested in finding an interval containing a zero of f , which is a solution to the equation f (x) = 0. If 0 ∈ / [c, d], which can tested using 0 < c or d < 0, then it is known that there is no solution in [a, b]. On the other hand, if 0 ∈ [c, d], then it is possible that f has a zero in [a, b]. It is not certain because it is in the nature of interval computation that it cannot generally find exact ranges of values. Thus we may have 0 ∈ [c, d] but 0∈ / f ([a, b]). By using continuous contraction mappings or other techniques of interval analysis, we may be able to prove that there is a solution in [a, b]. Interval analysis and fuzzy set theory, as active fields of research and application, are relatively new mathematical disciplines, receiving the impetus that defined them as separate fields of study in 1959 and 1965, respectively, with R.E. Moore’s technical reports on interval analysis and his Ph.D. thesis (see [7–10]) and L. Zadeh’s seminal papers on fuzzy set theory (see [11–13]). The connection between interval analysis and possibility theory is evident in the mathematics of uncertainty. The first to recognize the potential of interval analysis in dealing with fuzzy set and possibility theory seems to have been D. Dubois and H. Prade (see [14,15]). The theory of interval analysis models, among other things, the uncertainty arising from numerical computation which can be considered as a source of ambiguity. Fuzzy set theory and possibility theory model, among other things, the uncertainty of vagueness and ambiguity arising from the transitional nature of entities and a lack of information respectively. Interval analysis developed as part of the then emergent field of numerical analysis initially had three directions of interest to this chapter: 1. Computational error analysis (automatically computed error including rounding). 2. Verified computing which R.E. Moore called range arithmetic dealt with guaranteed enclosures (including rounding) of the minimum and maximum of a continuous function over interval domains. Later Aberth [16] developed an interval arithmetic he called range arithmetic, which is different than how R.E. Moore used the phrase (which he used only once). 3. The derivation of the underlying algebraic structure of floatingpoint numbers called computer algebra. Fuzzy sets have developed two directions of interest to this chapter: 1. Fuzzy interval analysis (see [17–21]), and 2. Possibility theory (see [22, 23]). Although the two fields can be thought of as having come from a common root, interval analysis and fuzzy set theory are independent fields whose cross fertilization has been a relatively recent phenomenon (see [2, 3, 18, 19]). All interval, fuzzy/possibilistic and intervalvalued probabilistic analyses that follow are over sets of real numbers, i.e., realvalued intervals and realvalued distributions (in the case of fuzzy membership functions or possibilistic distributions). Moreover, when the word ‘box’ is used in the context of intervals it is understood to be in R if the box refers to an interval [a, b] and in Rn if the box is a rectangular ndimensional hyperrectangle [a1 , b1 ] × · · · [an , bn ], ai , bi ∈ R i = 1, . . . , n.
3.2 The Central Issues The underlying mathematical theory from which interval and distribution analysis arise is set functions particularized to intervals and fuzzy sets and associated upper and lower approximations of the resultants. Therefore, the central issues common to both interval and fuzzy analyses must include the interval/fuzzy extension principles, interval/fuzzy arithmetic, and enclosure/verification. Extension principles of R.E. Moore [9] and L. Zadeh [11] are directly related to an earlier development and more general set function theory (see, e.g., [24, 25]). A treatment of setvalued functions is [26].
Interval Analysis and Fuzzy Sets
57
Of course, setvalued functions extend realvalued functions to functions on intervals and fuzzy sets. The extension principle used in interval analysis is called the united extension (see [9,10]). In fuzzy set theory it is called simply the extension principle (see [11]). Since arithmetic operations are continuous realvalued functions (excluding division by zero), the extension principles may be used (and have been) to define interval and fuzzy arithmetic. Enclosure, for this exposition, means approximations that produce upper and lower values (interval or functional envelope, depending on the context) to the theoretical solution which lies between the upper and lower values or functions. Efficient methods to compute upper and lower approximations are desired and necessary in practice. When enclosure is part of mathematical problem solving, it is called verification, formally defined in what follows. The point of view that the extension principle is an important thread which can be used to relate and understand the various principles of uncertainty that is of interest to this chapter (interval, fuzzy, and possibility) leads to a direct relationship between the associated arithmetic. Moreover, upper and lower approximation pairs for fuzzy sets allow for simpler computation using min/max arithmetic and lead to enclosures with careful implementation. Arithmetic is a gateway to mathematical analysis. The importance of this is that in the computational methods requisite in doing mathematics (arithmetic and analysis) with various types of uncertainty, the underlying approach to computing is the extension principle. This chapter, therefore, considers three associated themes: 1. Extension principles, 2. Arithmetic (derived from rules on interval endpoints and extension principles), and 3. Enclosure and verification.
3.2.1 Extension Principles The extension principle is key because it defines how realvalued expressions are represented in the context of intervals and fuzzy sets. One can view the extension principle as one of the main unifying concepts between interval analysis and fuzzy set theory. Moreover, the extension principle is used here to define how to do arithmetic on intervals, fuzzy sets, and, more generally, distributions which will not be discussed (see [4] for an extended discussion). Arithmetic can also be defined using rules on interval endpoints. Both the extension and the rules on interval endpoint approaches are discussed. All extension principles associated with intervals and fuzzy sets may be thought of coming from setvalued mappings or graphs. Generally, an extension principle defines how to obtain functions whose domains are sets. It is clear how to accomplish this for real numbers. It is more complex for sets since how to obtain resultant wellfined entities must be defined. Setvalued maps have a very long history in mathematics. Relatively recently, Strother’s 1952 Ph.D. thesis [24] and two papers [25, 27] define the united extension for setvalued functions for domains possessing specific topological structures. R.E. Moore applied Strother’s united extension to intervals. In doing so, he had to show that the topological structures on intervals were one of those that Strother developed. Having done this, Moore retains the name united extension as the extension principle particularized to intervals. In fact, Strother is a coauthor of the technical report that first uses the setvalued extension principle on intervals. That is, Moore’s united extension (the interval extension principle) is a setvalued function whose domain is the set of intervals and of course the range is an interval for those underlying functions that are continuous. Zadeh’s extension principle (see [11]) is also the way functions of fuzzy sets are derived from realvalued functions. It expresses, in essence, how to compute with fuzzy sets. That is, Zadeh’s extension principle can be thought of as a setvalued function where the domain elements are fuzzy sets and of course the range values are fuzzy sets for the appropriate maps, called membership functions, which are defined below. The extension principle was generalized and made more specific to what are now called fuzzy numbers or fuzzy intervals by various researchers beginning with H. Nguyen [28].
58
Handbook of Granular Computing
3.2.2 Arithmetic Interval arithmetic is central to fuzzy arithmetic and can be derived axiomatically or from Moore’s united extension. Of special interest is the latter approach especially in deriving a constrained interval arithmetic (see Section 3.3.2.2) which will have implications for fuzzy arithmetic. There are two direct precursors of Moore’s development of interval arithmetic in 1959, M. Warmus (1956 – [29]) and T. Sunaga (1958 – [30]). Moore’s initial work references and extends in significant ways Sunaga’s work in that he develops computational methods, incorporates computer rounding, develops for the first time automatic numerical error analysis (gets the computer to calculate roundoff, numerical truncation, and numerical method error estimations), and extends interval arithmetic to interval analysis. Analysis on intervals, since they are sets, require setvalued functions, limits, integration, and differentiation theory. This is done via the united extension (see [5]). The rules of interval arithmetic as articulated by Warmus [29], Sunaga [30], and Moore [7] are as follows. It is noted that Warmus’ notation is different but the operations are the same. 1. Addition: [a, b] + [c, d] = [a + c, b + d]
(1)
[a, b] − [c, d] = [a − d, b − c]
(2)
[a, b] × [c, d] = [min{ac, ad, bc, bd}, max{ac, ad, bc, bd}]
(3)
2. Subtraction:
3. Multiplication:
4. Division: [a, b] ÷ [c, d] = [a, b] × [1/d, 1/c]
where 0 ∈ / [c, d].
(4)
There is an extended interval arithmetic that incorporates the case where 0 ∈ [c, d] for division (see [31, 32]). Moreover, there are a variety of ways to approach interval arithmetic, e.g., see [33–36]. In fuzzy arithmetic, the axioms of interval arithmetic apply to each αcut of a fuzzy set membership function as long as the entity is a fuzzy number or fuzzy interval. An interval arithmetic base on axioms 1–4 above is called here interval arithmetic or traditional interval arithmetic when there is a need to distinguish it from what is developed in the sequel using a setvalued approach to interval arithmetic that is called constraint interval arithmetic. The implementation of interval arithmetic to the computer in which the concern is to account for all errors (numerical and truncation error) is called rounded interval arithmetic (see [7, 8]). U. Kulisch in [37] studied rounded interval arithmetic and uncovered the resultant algebraic structure, called a ringoid, with W.L. Miranker (see [38, 39]). While specialized extended languages (PascalXSC and CXSC) and chips were developed for interval and rounded interval data types incorporating the ideas set out by Moore, Kulisch, and Miranker (among other researchers), the most successful rounded interval tool is undoubtedly INTLAB, a software package that runs in conjunction with MATLAB with embedded interval arithmetic, rounded interval arithmetic, and some interval analytic methods, in particular computational linear algebraic methods (downloadable from www.ti3.tuharburg.de/˜rump/intlab).
3.2.3 Enclosure and Verification The approach of Archimedes ([40, 41]) to the computation of the circumference of a circle using outer circumscribed and inner inscribed regular polygons whose perimeter is a straightforward calculation is an enclosure and verification method, perhaps the first one. The essential part of enclosure and verification
Interval Analysis and Fuzzy Sets
59
is that a solution is mathematically provable to exist (perhaps is unique) and lies between the computed upper and lower bound values (real numbers for our purposes). That is, verification guarantees that the solution exists (and perhaps is unique) in a mathematical sense. Enclosure is the computed upper and lower bound containing the solution. Verification in the case of Archimedes computation of the circumference of a circle is the geometrical fact (theorem) that the perimeter of the circumscribed regular polygon is greater than the circumference of a circle and that the inscribed regular polygon has a perimeter less than that of the circumference of a circle. Moreover, the two outer and inner partitions converge to the ‘perimeter’ of the circle (one from above and the other from below), which Archimedes took as obvious by construction. Often, to verify the existence of solutions in mathematical analysis, fixedpoint theorems (contractive mapping theorems for example) are used. These theorems are employed from the point of view of verification of hypotheses so that interval analysis will mathematically calculate guaranteed bounds on the Lipschitz constant, e.g., accounting for all numerical and truncation errors which if less than 1 means that the mapping is contractive and hence a computed solution will exist. The methods to compute upper and lower bounds in a mathematically correct way on a computer must account for numerical and computer truncation error. This is one of the core research areas of interval mathematics. One of the primary applications of interval analysis is to enclosure methods for verification and as will be seen the equivalent for fuzzy sets and possibility theory is the computation of functional envelopes (intervalvalued probability [42]). Interval verification methods obtain interval enclosures containing the solution(s) within its bounds. In interval analysis, verification means that existence is mathematically verified and guaranteed bounds on solutions are given. When possible and/or relevant, uniqueness is mathematically determined. Thus, verification in the context of a computational process that uses intervals for a given problem means that a solution, say x, is verified (mathematically) to exist and the computed solution is returned with lower and upper bounds, a and b, such that the solution, shown to exist, is guaranteed to lie between the provided bounds, i.e., a ≤ x ≤ b. Uniqueness is determined when possible or desirable. Although, not often thought of in these terms, possibility/necessity pairs when carefully constructed enclose a resultant distribution of the solution to a problem and are functional enclosures. Verification in the context of distributions is understood to be the construction of lower and upper functions, g(x) and h(x), to a given function f (x), such that g(x) ≤ f (x) ≤ h(x). We wish to do this not only when x is a real number or vector, but also when x is a (vectors of) distribution such as random variables, intervals, fuzzy sets, and/or possibilities. When f (x) is a complex expression, this is an especially difficult problem.
3.3 Interval Analysis Intervals are sets, and they are a (new type of) number. This dual role is exploited in the arithmetic and analysis. Professor A. Neumaier [43, p. 1] states,‘Interval arithmetic is an elegant tool for practical work with inequalities, approximate numbers, error bounds, and more generally with certain convex and bounded sets.’ And he goes on to say that intervals arise naturally in 1. Physical measurements 2. Truncation error – the representation of an infinite process by a finite one (a) Representation of numbers by finite expansions (b) Finite representation of limits and iterations 3. Numerical approximations 4. Verification of monotonicity and convexity 5. Verification of the hypotheses of fixedpoint theorems – the contraction mapping theorem or Brouwer’s fixedpoint theorem for example 6. Sensitivity analysis, especially as it is applied to robotics 7. Tolerance problems Interval arithmetic and analytic methods have been used to solve an impressive array of problems given that these methods capture error (modeling, roundoff, and truncation) so that rigorous accounting of error together with the contraction mapping theorem or Brouwer’s fixedpoint theorem allow for computer
60
Handbook of Granular Computing
verification of existence, uniqueness, and enclosure. In particular, W. Tucker [44], using interval analysis, solved a long outstanding problem (Smale’s fourteenth conjecture [45]) by showing that the Lorenz equations do possess a strange attractor. Professor B. Davies [46, p. 1352] observes, Controlled numerical calculations are also playing an essential role as intrinsic parts of papers in various areas of pure mathematics. In some areas of nonlinear PDE, rigorous computerassisted proofs of the existence of solutions have been provided. . . . These use interval arithmetic to control the rounding errors in calculations that are conceptually completely conventional. Another longstanding problem, Kepler’s conjecture about the densest arrangement of spheres in space, was solved by T. Hales [47] using interval arithmetic. There were ten problems posed by Nick Trefethen in January/February, 2002 SIAM News, each of which had a realnumber solution and the objective was to obtain a tendigit solution to each of the problems. The book [48] documents not only the correct solutions but the analysis behind the problems. One of the authors, S. Wagon, in a personal communication indicated that [i]ntervals were extremely useful in several spots. In Problem 2 intervals could be used to solve it by using smaller and smaller starting interval until success is reached. In Problem 4 intervals were used in designing an optimization algorithm to solve it by subdividing. Moreover, for Problems 2, 4, 7 and 9, intervals yield proofs that the digits are correct. Chapter 4 of [48] contains an exposition of interval optimization. Robust stability analysis for robots performed with the aid of a computer that is mathematically verifiable uses interval analysis methods (see [49–51]). There are excellent introductions to interval analysis beginning with R.E. Moore’s book [5] (also see other texts listed in the references). A more recent introduction can be found in [52] and downloaded from http://www.eng.mu.edu/corlissg/PARA04/READ ME.html. Moreover, there are introductions that can be downloaded from the interval analysis Web site (http://www.cs.utep.edu/intervalcomp).
3.3.1 Interval Extension Principle Moore’s three technical reports [7–9] recognized that the extension principle is a key concept. Interval arithmetic, rounded interval arithmetic, and computing range of functions can be derived from interval extensions. At issue is how to compute ranges of setvalued functions. This requires continuity and compactness over interval functions, which in turn needs welldefined extension principles. Moore in [9] uses for the first time in an explicit way the extension principle for intervals called the united extension, which particularizes [24, 25] setvalued extensions to sets that are intervals. If f : X → Y is an arbitrary mapping from an arbitrary set X into an arbitrary set Y , the united extension of f to S(X ), denoted F, is defined as follows (see [9]): F : S(X ) → S(Y ), where F(A) = { f (a)  ∀ a ∈ A, A ∈ S(X )}, in particular F({x}) = { f (x)  x ∈ {x}, {x} ∈ S(X )}. Thus, F(A) = { f (a)}. a∈A
This definition, as we shall see, is quite similar to the fuzzy extension principle of Zadeh, where the union is replace by the supremum which is a fuzzy union. Theorem 1. Let X and Y be compact Hausdorff spaces and f : X → Y continuous. Then the united extension of f, F, is continuous. Moreover, F is closed (see [25]).
61
Interval Analysis and Fuzzy Sets
The results of interest associated with the united extension for intervals are the following (see [5]): 1. Isotone property: A mapping f from partially ordered set (X, r X ) into another (Y, rY ), where r X and rY are relations, is called isotone if x r X y implies f (x) rY f (y). In particular, the united extension is isotone with respect to intervals and the relation ⊆. That is, for A, B ∈ S([X ]), A ⊆ B, then F(A) ⊆ F(B). 2. The Knaster–Tarski theorem (1927): An isotone mapping of a complete lattice into itself has at least one fixed point. Recall that (S([R]), ⊆) is a complete lattice. Considering the united extension F : S([R]) → S([R]), the Knaster–Tarski theorem implies that F has at least one fixed ‘point’ (set) in S([R]), which may be the empty set. However, this result has an important numerical consequence. Consider the sequence {X n } in S(X ) defined by X n+1 = F(X n ). Since X 1 ⊆ F(X 0 ) ⊆ X 0 , by induction, X n+1 ⊆ X n . Considering Y =
∞
Xn,
n=0
the following is true (see [9]). If x = f (x) is any fixed point of f in X , then x ∈ X n for all n = 0, 1, 2, . . . and so x ∈ Y and x ∈ F(Y ) ⊆ Y. Thus X n , Y , and F(Y ) contain all the fixed points f in X. If Y and/or F(Y ) is empty, then there are no fixed points of f in X. Newton’s method is a fixedpoint method, so that the above theorem pertains to a large class of problems. Moreover, these enclosures lead to computational validated solutions when implemented on a computer with rounded arithmetic.
3.3.2 Interval Arithmetic Interval arithmetic was defined by R.C. Young [53] in 1931, P.S. Dwyer [54] in 1951, M. Warmus [29] in 1956, and then independently by T. Sunaga [30] in 1958. Moore [7, 8] extends interval arithmetic to rounded interval arithmetic thereby allowing interval arithmetic to be useful in computational mathematics. There are two approaches to interval arithmetic. The first is the traditional interval arithmetic obtained by application of rules (1)–(4) to interval endpoints. The second is the approach that considers an interval as a set and uses the united extension on the arithmetic operations as functions on sets. As will be seen, interval arithmetic derived from the direct application of the united extension has the complexity of global optimization. The traditional approach is simple in its application, but for expressions involving nontrivial computations requires the exponential complexity of partitioning the domain to obtain realistic bounds. That is, the direct extension principle interval arithmetic models the problem of interval arithmetic as a global optimization problem obtaining an intuitive algebra with not only additive/multiplicative identities but additive multiplicative inverses as well at the cost of the complexity of
62
Handbook of Granular Computing
global optimization. The traditional approach to interval arithmetic obtains a simple and direct approximation from the beginning and adds the exponential complexity of ndimensional partitioning to obtain reasonable bounds as a second step. There is an interval arithmetic and associated semantics that allows for ‘intervals’ [a, b] for which a > b [55, 56]. This arithmetic is related to directed interval arithmetic (see Section 3.2.2.3) and has some interesting applications to fuzzy control (see [57, 58]). The basic rules associated with interval arithmetic are (1), (2), (3), and (4). They are more fully developed in [5]. There are various properties associated with the traditional approach to interval arithmetic, which are different from those of real numbers and that of the constraint interval arithmetic. In particular, there is only the subdistributive property. Thus, from [59] we have for intervals X, Y, and Z 1. 2. 3. 4. 5. 6. 7.
X + (Y + Z ) = (X + Y ) + Z – the associative law for addition. X · (Y · Z ) = (X · Y ) · Z – the associative law for multiplication. X + Y = Y + X – the commutative law for addition. X · Y = Y · X – the commutative law for multiplication. [0, 0] + X = X + [0, 0] = X – additive identity. [1, 1] · X = X · [1, 1] = X – multiplicative identity. X · (Y + Z ) ⊆ X · Y + X · Z – the subdistributive property.
Example 2. [59, p. 13] points out that [1, 2](1 − 1) = [1, 2](0) = 0, whereas [1, 2](1) + [1, 2](−1) = [−1, 1]. Moore’s [7] implementation of [30] (neither Moore nor Sunaga was aware of Warmus’ earlier work [29]) has X ◦ Y = {zz = x ◦ y, x ∈ X, y ∈ Y, ◦ ∈ {+, −, ×, ÷}}. That is, Moore applies the united extension for distinct (independent) intervals X and Y. However, Moore abandons this united extension definition and develops associated rules, assuming independence of all intervals, since it generates rules (1), (2), (3), and (4). These rules lead to a simplification of the operations since one does not have to account for multiple occurrences, while at the same time it leads to overestimation in the presence of dependencies that is severe at times. From the beginning, Moore was aware of the problems of overestimation associated with multiple occurrences of the same variable in an expression. Thus, it is apparent that from the axiomatic approach, X − X is never 0 unless X is a real number (a zero width interval). Moreover, X ÷ X is never 1 unless X is a real number (a zero width interval).
3.3.2.1 Interval Arithmetic from Intervals Considered as Pairs of Numbers: Traditional Interval Arithmetic The traditional approach to interval arithmetic considers all instantiations of variables as independent. That is, Warmus, Sunaga, and Moore’s approach to interval arithmetic is one that considers the same variable that appears more than once in an expression as being independent variables. While axiomatic interval arithmetic is quite simple to implement, it leads to overestimations. Example 3. Consider f (x) = x(x − 1), x ∈ [0, 1].
63
Interval Analysis and Fuzzy Sets
Using the traditional interval analysis approach, [0, 1]([0, 1] − 1) = [0, 1][−1, 0] = [−1, 0].
(5)
However, the smallest interval containing f (x) = x(x − 1) is [−0.25, 0]. Traditional interval arithmetic leads to (5) because the two instantiations of the variable x are taken as independent when in reality they are dependent. The united extension F(x), which is F(x) = {y  y = f (x) = x(x − 1), x ∈ [0, 1]} = [−0.25, 0], was not used. If the calculation were x(y − 1) for x ∈ [0, 1], y ∈ [0, 1], then the smallest interval containing x(y − 1), its united extension, is [−1, 0]. Note also that the subdistributive property does not use the united extension in computing X · Y + X · Z but instead considers X · Y + W · Z where W = X . Partitioning the interval variables (which are repeated) will lead to closer approximation to the united extension. That is, take the example given above and partition the interval in which x lies. Example 4. Consider x(x − 1) again, but x ∈ [0, 0.5] ∪ [0.5, 1]. This yields [0, 0.5] ([0, 0.5] − 1) ∪ [0.5, 1]([0.5, 1] − 1) = [0, 0.5][−1, −0.5] ∪ [0.5, 1][−0.5, 0] = [−0.5, 0] ∪ [−0.5, 0] = [−0.5, 0],
(6) (7) (8)
which has an overestimation of 0.25 compared with an overestimation of 0.5 when the full interval [0, 1] was used. In fact, for operations that are continuous functions, a reduction in width leads to estimations that are closer to the united extension and in the limit, to the exact united extension value (see [5, 10, 59]). There are other approaches which find ways to reduce the overestimation arising from the traditional approach that have proved to be extremely useful such as centered, mean value, and slope forms (see [43, 59, 60–63]).
3.3.2.2 Interval Arithmetic from the United Extension: Constraint Interval Arithmetic The power of the traditional approach to interval arithmetic is that it is simple to apply. Its complexity is at most four times that of realvalued arithmetic (per partition). However, the traditional approach to interval arithmetic leads to overestimations in general because it takes every instantiation of the same variable independently. The united extension when applied to sets of real numbers is global optimization (as will be seen below). On the other hand, simple notions such as X − X = 0 and X ÷ X = 1,0 ∈ / X
(9) (10)
are desirable properties and can be maintained if the united extension is used to define interval arithmetic [64]. In the context of fuzzy arithmetic (which uses interval arithmetic), Klir [65] looked at fuzzy arithmetic which was constrained to account for (9) and (10), though from a casebased approach. What is given next was developed in [64] independently of [65] and is more general. It develops interval arithmetic from first principles, the united extension, rather than traditional interval arithmetic or case based. It is known that applying interval arithmetic to the union of intervals of decreasing width yield tighter bounds on the result that converge to the united extension interval result [10] in the limit. Of course,
64
Handbook of Granular Computing
for ndimensional problems, ‘intervals’ are rectangular parallelepipeds (boxes), and as the diameters of these boxes approach zero, the union of the result approaches the correct bound for the expression. Partitioning each of the sides of the ndimensional box in half has complexity of O(2n ). Theorems proving convergence to the exact bound of the expression and the rates associated with subdividing intervals can be found in [43, 59, 60–63]. What is proposed here is to redefine interval numbers in such a way that dependencies are explicitly kept. The ensuing arithmetic will be called constraint interval arithmetic. This new arithmetic is the derivation of arithmetic directly from the united extension of [24]. An interval number is redefined (also see [64)] into an equivalent form next as the graph of a function of one variable and two constants (inputs, coefficients, or parameters). ¯ (or interval for short) is the graph of the real singlevalued Definition 5. An interval number [x, x] function X I (λx ), where ¯ X I (λx ) = λx x + (1 − λx )x,
0 ≤ λx ≤ 1.
(11)
Strictly speaking, in (11), since the numbers x and x¯ are known (inputs), they are coefficients, whereas λx is varying, although constrained between 0 and 1, hence the name ‘constraint interval arithmetic.’ Note that (11) defines a set representation explicitly and the ensuing arithmetic is developed on sets of numbers. The algebraic operations are defined as follows: Z = = = = where z =
X ◦Y {z  z = x ◦ y, for all x ∈ X I (λx ), y ∈ Y I (λ y ), 0 ≤ λx , λ y ≤ 1} ¯ ◦ (λ y y + (1 − λ y ) y¯ )), 0 ≤ λx ≤ 1, 0 ≤ λ y ≤ 1} {z  z = (λx x + (1 − λx )x) [z, z¯ ], min {z}, z¯ = max {z}, and ◦ ∈ {+, −, ×, ÷}.
(12)
(13)
It is clear from (13) that constraint interval arithmetic is a global optimization problem. However, when the operations use the same interval variable, no exceptions need be made as in [65]. We only use (13) and obtain Z = = 0≤ = 0≤ =
X◦X ¯ ◦ (λx x + (1 − λx )x), ¯ {z  z = (λx x + (1 − λx )x) λx , λx ≤ 1} ¯ ◦ (λx x + (1 − λx )x), ¯ {z  z = (λx x + (1 − λx )x) λx ≤ 1} [z, z¯ ].
This results in the following properties: 1. Addition of the same interval variable: ¯ + (λx x + (1 − λx )x), ¯ 0 ≤ λx ≤ 1} X + X = {z  z = (λx x + (1 − λx )x) ¯ 0 ≤ λx ≤ 1} = [2x, 2x]. ¯ = {z  z = 2(λx x + (1 − λx )x), 2. Subtraction of the same interval variable: ¯ − (λx x + (1 − λx )x), ¯ 0 ≤ λx ≤ 1} = 0. X − X = {z  z = (λx x + (1 − λx )x) 3. Division of the same interval variable, 0 ∈ / X: ¯ ÷ (λx x + (1 − λx )x), ¯ 0 ≤ λx ≤ 1} = 1. X ÷ X = {z  z = (λx x + (1 − λx )x)
(14)
65
Interval Analysis and Fuzzy Sets
4. Multiplication of the same interval variable with x < x: ¯ ¯ × (λx x + (1 − λx )x), ¯ 0 ≤ λx ≤ 1} X × X = {z  z = (λx x + (1 − λx )x) = {z  z = (λ2x x 2 + 2(1 − λx )xλx x¯ + (1 − λx )2 x¯ 2 , 0 ≤ λx ≤ 1} = [min{x 2 , x¯ 2 , 0}, max{x 2 , x¯ 2 , 0}] = [z, z¯ ]. To verify that this is the interval solution, note that as a function of the single variable λx , the production X × X is f (λx ) = (x¯ − x)2 λ2x + 2x(x¯ − x)λx + x 2 , which has a critical point at λx = −
x . x¯ − x
Thus, x )} = min{x 2 , x¯ 2 , 0} = 0, x¯ − x x z¯ = max{ f (0), f (1), f (− )} = max{x 2 , x¯ 2 , 0} = max{x 2 , x¯ 2 }. x¯ − x
z = min{ f (0), f (1), f (−
¯ then X × X = x 2 . Of course, if x = x, 5. X(Y + Z) = XY + X Z. Constraint interval arithmetic is the complete implementation of the united extension, and it gives an algebra which possesses an additive inverse, a multiplicative inverse, and the distributive law.
3.3.2.3 Specialized Interval Arithmetic Various interval arithmetic approaches have been developed in addition to the axiomatic and united extension approaches. Different representations of intervals were created and include the development of range arithmetic (see [16, pp. 13–25]) and rational arithmetic (see [66]). These purport to simplify operations and/or result in more accuracy. Another issue addressed by researcher was how to extend interval arithmetic, called extended interval arithmetic, to handle unbounded intervals that result from a division by zero (see [32, 67–71]). The general space of improper intervals, which includes extended interval arithmetic, called directed interval arithmetic was developed subsequently. However, previously M. Warmus [29, 72] had considered this space and its arithmetic. G. Alefeld states (of improper intervals), These intervals are interpreted as intervals with negative width. The point intervals [a, a] are no longer minimal elements with respect to the ordering ⊆. All the structures of I (R) are carried over to I (R) ∪ I (R) and a completion through two improper elements p and − p is achieved. In ¯ with a ≤ 0 ≤ a, ¯ a = a, ¯ can also be defined this manner the division by an interval A = [a, a] ([73, p. 8]). This approach was studied by [55, 56]. E.D. Popova [74] states, Directed interval arithmetic is obtained as an extension of the set of normal intervals by improper intervals and a corresponding extension of the definitions of the interval arithmetic operations. The corresponding extended interval arithmetic structure possesses group properties with respect to addition and multiplication operations and a number of other advantages. Generalized interval arithmetic (and its more recent generalization, affine arithmetic (see [36])) dealt with the problem of reducing overestimation that characterizes the axiomatic approach to interval arithmetic. Triplex arithmetic [35] is a way to carry more information about the uncertainty beyond the bounds that are represented by the endpoints of the interval (the endpoints of the support if it is a distribution) by
66
Handbook of Granular Computing
keeping track of a main value within the interval in addition to its endpoints. According to [35], triplex arithmetic started out as a project initiated in 1966 at the University of Karlsruhe to develop a compiler and demonstrate its usefulness for solution to problems in numerical analysis. Threevalued set theory has also been studied by Klaua [75] and Jahn [76]. What is presented here is a synopsis of [35]. Its generalization, quantile arithmetic (see [33, 77]), is a way to carry more information about the uncertainty bound in a probabilistic and statistically faithful way than triplex arithmetic. While it is more complex as will be seen, it does have a welldefined probabilistic and statistical semantics. In fact, triplex arithmetic can be represented by quantile arithmetic. In particular, quantile arithmetic approximates distributions whose support is an interval (which can be infinite for extended interval arithmetic) whose value lies between the given lower and upper bounds and whose error at each arithmetic operation is independent. In [33, 77], a threepoint arithmetic is used to approximate a discrete distribution, although there is nothing to prevent, using a finer approximation except computational time considerations. In triplex arithmetic, a main value is carried. The uncertainty within a given interval and the manner in which this uncertainty propagates within the interval, when a function or expression is applied, is not a part of interval analysis. The problem of where the uncertainty lies in the resultant of a function or expression is especially problematic when the uncertainty has a large support, and the bulk of the uncertainty is amassed around a single value; that is, it has a narrow dispersion and a long tail. The ellipsoidal arithmetic of [34] is based on approximating enclosing affine transformations of ellipsoids that are again contained in an ellipsoid. The focus of the [34] is to enclose solutions to dynamical system models where the wrapping effect associated with interval (hyperboxes) enclosures may severely hamper their usefulness since boxes parallel to the axes are not the optimal geometric shape to minimized bounds (also see [78]). A second focus of [34] is to enclose confidence limits. In [79] how to compute the ‘tightest’ ellipsoid enclosure of the intersection of two ellipsoids is shown, which is the underlying basis of the approximations developed in [34]. It is clear that computing with ellipsoids is not simple. Therefore, an approximation which is simple is necessary if the method is to be useful. While the sum and product of ellipses are not found explicitly worked out in [34], it is implicit. Enclosing the sum is straightforward. The difference, product, and quotient need approximations. Variableprecision interval arithmetic ([80, 81], and more recently [82, 83]) was developed to enclose solutions to computational mathematical problems requiring more precision that afforded by usual floatingpoint arithmetic (single and double precision for example). A problem in this category is windshear (vortex) modeling (see [84]). There is a specialized interval arithmetic that has been developed both in software (see [81]) and in hardware (see [82]).
3.3.3 Comparison between Traditional and Constraint Interval Arithmetic The traditional approach to interval arithmetic considers an interval as a number (like a complex number, an interval has two components), whereas constraint interval arithmetic considers an interval as a set. In considering an interval as a number, interval arithmetic defines the operations using rules on interval endpoints. Interval arithmetic is simple and straightforward since it is defined via realnumber operations which, on a sequential machine, is no less than twice and at most four times more complex than the corresponding realnumber operations. What ensues is an arithmetic that does not have additive nor multiplicative inverses and is subdistributive resulting in overestimation of computations. Exponential complexity arises when trying to reduce overestimations. Considering an interval as a set leads to an arithmetic defined via global optimization of the united extension expressing the continuous function of the arithmetic operations. Thus, constraint interval arithmetic requires a procedure. The complexity is explicit at the onset and potentially NPhard. Nevertheless, the algebraic structure of constraint interval arithmetic possesses additive and multiplicative inverses as well as distributive.
3.3.4 Enclosure and Verification Enclosure and verification are approaches to computational mathematical problem solving in which solutions are returned with automatically computed bounds. This is what is mean by enclosure. If the
Interval Analysis and Fuzzy Sets
67
enclosure is nonempty, a check of existence (and uniqueness if possible) is mathematically carried out on the machine. This is what is mean by verification. There are three different approaches to enclosure and verification methods of interest: 1. Range of a function methods compute an upper bound to the maximum and a lower bound to the minimum of a continuous function by using rounded interval arithmetic (see [60, 85, 63], for example). 2. Epsilon inflation methods (see [86] for example) compute an approximate solution, inflate the approximate to form an interval, and compute the range according to (1) above. 3. Defect correction methods (see [87] for example) compute an approximate inverse to the problem. If the approximate inverse composed with the given function is contractive, then iterative methods are guaranteed to converge and mathematically correct error bounds on the solution can be computed on a digital machine. The naive (most often nonuseful) approach to compute the range of a rational function is to replace every algebraic operation by interval arithmetic operation. This works in theory for continuous function when one takes unions of smaller and smaller boxes whose diameters go to zero. However, this approach is computationally complex. Authors have found excellent and efficient ways to implement, using a variety of theorems along with intelligent methods (see [60, 62]). The meaning of enclosure and verification in the context of interval analysis is discussed next. Definition 6. By the enclosure of a set of real numbers (real vectors) Y is meant a set of real numbers (real vectors) X such that Y ⊆ X . In this case X encloses Y . The set X is called the enclosing set. Enclosure makes sense when Y is an unknown but for which bounds on its values are sought. For example, the set Y could be the set of solutions to a mathematical problem. In the case of interval analysis over Rn , the enclosing setX is a computed box. Typically, algorithms return a realnumber (vector) approximation x˜ as a computed value of the unknown solution y with no sense of the quality of the solution, i.e., the error bounds. The idea of enclosure is to provide mathematically valid computed error bounds, Y ⊆ X = [x, x], on the set of solutions Y . If the approximation to the solution is x˜ = x+x , 2 x−x the maximal error is guaranteed to be errormax = 2 . If we are dealing with functions, there are only two pertinent cases.
r The first is the enclosure of the range of a function in a box; that is, Y = { f (x)  x ∈ domain} ⊆ X , where X is a box.
r The second case of function is pointwise enclosure; that is, [g(x), h(x)] encloses the function f (x) pointwise if g(x) ≤ f (x) ≤ h(x) ∀x ∈ domain. That is, at each point in the domain, f (x) is enclosed by a box. This is the function envelope of f . Researchers do not give a definition to ‘enclosure methods,’ since the word itself, ‘enclosure,’ seems to denote its definition. In fact, Alefeld [85] states, In this paper we do not try to give a precise definition of what we mean by an enclosure method. Instead we first recall that the four basic interval operations allow to include the range of values of rational functions. Using more appropriate tools also the range of more general functions can be included. Since all enclosures methods for solution of equations which are based on interval arithmetic tools are finally enclosures methods for the range of some function we concentrate ourselves on methods for the inclusion of the range of function. There is an intimate relation between enclosure and inclusion of the range of functions. However, enclosure for this study is more general than that to which Alefeld limits himself in [85], since we deal with epsilon inflation and defect correction methods in addition to finding the range of a function. Nevertheless, when the inclusion of the range of a function is computed, it is an important class of enclosure methods (see [60, 62, 63, 85] for example).
68
Handbook of Granular Computing
The concept of verification for this study is restricted to the context of computed solutions to problems in continuous mathematics. Verification is defined next. Definition 7. Verification of solutions to a problem in continuous mathematics in Rn is the construction of a box X that encloses the solutions of the problem in a given domain where, for X = ∅, at least one solution exists, and for X = ∅, no solution exists in the given domain of the problem. Thus verification includes the existence of solutions and the computability of enclosures. In particular, when the construction of the verified solution is carried out on a computer, the enclosures are mathematically valid enclosures whose endpoints are floatingpoint numbers. That is, the construction must take into account roundoff errors and of course inherent and truncation errors. The literature often uses ‘validation’ to mean what we have defined as ‘verification.’ Methods that compute enclosures and verified uniqueness are called Emethods by [86], and these methods are applied to solutions to fixedpoint problems, f (x) = x. The authors of [86] also develop methods to solve linear equations by Emethods. May authors, in the context of verifying solutions to equations, use the word ‘proof’ (see, e.g., section 2 of [88)]. While the mathematical verification of existence (and perhaps uniqueness) is a type of proof, for this chapter, the mathematical verification of the hypotheses of an existing theorem (say Brouwer’s fixedpoint theorem) will simply be called verification. Along these lines, [88] states on p. 3, A powerful aspect of interval computations is tied to the Brouwer fixed point theorem. Theorem A (Brouwer fixed point theorem – see any elementary text on Real Analysis or [43 page 200)] Let D be a convex and compact subset of Rn with int(D) = ∅. Then every continuous mappingG : D → Dhas at least one fixed pointx ∗ ∈ D, that is, a point with x ∗ = G(x ∗ ). The Brouwer fixed point theory combined with interval arithmetic enables numerical computations to prove existence of solutions to linear and nonlinear systems. The simplest context in which this can be explained is the onedimensional interval Newton method. Suppose f : x = [x, x] → R has continuous first derivative on x, xˇ ∈ x, and f (x) is a set that contains the range of f over x (such as when f is evaluated at x with interval arithmetic). Then the operator
ˇ = xˇ − f (x)/ ˇ f (x) N( f ; x, x)
(15)
is termed the univariate interval Newton method. . . . Applying the Brouwer fixed point theorem in the context of the univariate interval Newton method leads to: ˇ ⊂ x, then there exists a unique solution to f (x) = 0 in x. Theorem B If N( f ; x, x) Existence in Theorem B follows from Miranda’s theorem, a corollary of the Brouwer fixed point theorem. We next turn our attention to three types of verifications that occur in practice. These are (1) enclosure of the range of a function or global optimization, (2) epsilon inflation, and (3) defect correction.
3.3.4.1 Enclosure of the Range of a Function The enclosure of the range of a function using interval arithmetic assumes a function is continuous so that as long as rounded interval arithmetic is used the enclosure (and the existence) is mathematically guaranteed to be correct. Uniqueness can also be verified mathematically on a computer using methods outlined in [43, 60]. More recent methods to compute verified ranges of functions can be found in [61, 62].
69
Interval Analysis and Fuzzy Sets
3.3.4.2 Epsilon Inflation Epsilon inflation methods are approaches for the verification of solutions to the problem f (x) = 0, using ˆ two steps: (1) apply a usual numerical method to solve f (x) = 0 to obtain an approximate solution x, and (2) inflate xˆ to obtain an approximate interval Xˆ = [xˆ − , xˆ + ] and apply interval methods using rounded interval arithmetic (e.g., interval Newton’s method) to obtain an enclosure. G¨unter Mayer [89] outlines how to solve problems via Emethods, using epsilonin f lation techniques to solve f (x) = 0 (see p. 98 of [89]), where the function is assumed to be continuous over its domain of definition. The idea is to solve the problem on a closed and bounded subset of its domain, using the following steps: 1. Transform the problem into an equivalent fixedpoint problem, f (x) = 0 ⇔ g(x) = x. ˜ ≈ x. ˜ 2. Solve the fixed point for an approximate solution x˜ using a known algorithm. That is, g(x) 3. Identify an interval function enclosure to the fixedpoint representation of the problem g(x) ∈ [G]([x])∀x ∈ [x]
where [x] is in the domain of both g and [G].
For example, [G]([x]) = [min G(y), max G(y)]. y∈[x]
y∈[x]
4. Verify [G]([x]) ⊆ interior[x] by doing the following: ˜ x] ˜ (a) [x]0 := [x, (b) k = −1 (c) repeat (i) k := k + 1 (ii) choose [x]k∈ such that [x]k ⊆ interior([x]k∈ ) – this is the epsilon inflation (iii) [x]k+1 := [G]([x]k∈ ) (d) until [x]k+1 ⊆ interior([x]k∈ ) or k > kmax . There are a variety of ways to pick the epsilon inflation. In particular [90] uses the following: [x]∈ = (1 + )[x] − [x] + [−η, η], where η is the smallest floatingpoint number (machine epsilon). Another approach is as follows: [y] = [y, y] := (1 + )[x] − [x] [x]∈ : = [pred(y), succ(y)], where pred(y) denotes the next floatingpoint number below y (round down) and succ(y) denotes the next floatingpoint number above y (round up). The value = 0.1 has been used as an initial guess.
3.3.4.3 Defect Correction Defect correction methods [87] solve the fixedpoint problem f (x) = x by computing an approximate inverse in such a way that the approximate inverse acting on the original operator is contractive. This approach is then used in conjunction with verification (see [86]), for example, when they are used in conjunction with epsilon inflation and/or range enclosure outlined above. The general defect method as stated by [87, p. 3] is Solve F z = y,
(16)
70
Handbook of Granular Computing
where F : D ⊂ E → Dˆ ⊂ Eˆ is a bijective, continuous, generally nonlinear operator; E and Eˆ are Banach spaces. The domain and range are defined appropriately so that for every y˜ ∈ Dˆ there exists exactly one solution of F z = y˜ . The (unique) solution to (16) is denoted z ∗ . Assume that (16) cannot be solved directly but the defect (also called the residual in other contexts) d(˜z ) := F z˜ − y
(17)
may be evaluated for ‘approximate solutions’ z˜ ∈ D. Further assume that the approximate problem F˜ z = y˜
(18)
ˆ That is, we can evaluate the solution operator G˜ of (18). Gˆ : can be readily solved for y˜ ∈ D. Dˆ → D is an approximate inverse of F such that (in some approximate sense) G˜ F z˜ = z˜
for˜z ∈ D
(19)
F G˜ y˜ = y˜
ˆ for y˜ ∈ D.
(20)
and
Assume that an approximation z˜ ∈ D to z ∗ is known and the defect d(˜z ) (17) has been computed. There are, in general, two ways to compute another (hopefully better) approximation z¯ to z˜ by solving (18). 1. Compute a change Δz in (18) with the righthand side being the defect, d(˜z ), and then use Δz as a correction for z˜ . That is, ˜ + d(˜z )) − G˜ y] z¯ : = z˜ − Δz = z˜ − [G(y ˜ ˜ z¯ : = z˜ − G F z˜ + G y.
(21)
˜ is linear; that is, G(y ˜ + d(˜z )) = G˜ y + G(F ˜ z˜ − y) = This assumes that the approximate inverse, G, G˜ F z˜ . 2. Use the known approximate solution z˜ in (18) to compute y˜ . Now change this value by the defect to obtain y¯ = y˜ − d(˜z ). Use the approximate inverse and solve using y¯ . That is, y¯ : = y˜ − d(˜z ) = y˜ − (F z˜ − y) = y˜ − F G˜ y˜ + y, since y˜ = F z˜ , so that from (19) G˜ y˜ = G˜ F z˜ = z˜ ; that is, z˜ = F G˜ y˜ . Now, the new approximation to z˜ becomes ˜ F˜ − F)˜z + y], z¯ = G˜ y¯ = G[(
(22)
where again, we must assume that the inverse operator G˜ is linear. The success of the defect correction steps (21) or (22) depends on the contractivity of the operators (I − G˜ F) : D → D or ˜ : Dˆ → Dˆ (I − F G)
71
Interval Analysis and Fuzzy Sets
respectively since (21) implies z¯ − z ∗ = (I − G˜ F)˜z − (I − G˜ F)z ∗ , while (22) implies ˜ y˜ − (I − F G)y ˜ ∗. y¯ − y ∗ = (I − F G) The associated iterative algorithm (see [91]) is Deffect correction 1 (21) z k+1 = z k − G˜ F z k + G˜ y
(23)
yk+1 = yk − F G˜ yk + y z k = G˜ yk
(24)
Deffect correction 2 (22)
˜ F˜ − F)z k + y. z k+1 = G[(
3.4 Fuzzy Set Theory Fuzzy set and possibility theory were defined and developed by L. Zadeh beginning with [11] and subsequently [12, 13]. The idea was to mathematize and develop analytical tools to solve problems whose uncertainty was more amble in scope than probability theory. Classical mathematical sets, e.g., a set A, have the property that an element either x ∈ A or x ∈ / A but not both. There are no other possibilities for classical sets which are also called crisp sets. An interval is a classical set. L. Zadeh’s idea was to relax this ‘all or nothing’ membership in a set to allow for grades of belonging to a set. ˜ L. Zadeh associated a When grades of belonging are used, a fuzzy set ensues. To each fuzzy set A, realvalued function μ A˜ (x), called a membership function, for all x in the domain of interest, the universe ˜ Ω, whose range is in the interval [0, 1] that describes, quantifies the degree to which x belongs to A. ˜ For example, if A is the fuzzy set ‘middleaged person’ then a 15yearold has a membership value of zero, while a 35yearold might have a membership value of one and a 40yearold might have a membership value of onehalf. That is, a fuzzy set is a set for which membership in the set is defined by its membership function μ A˜ (x) : Ω → [0, 1], where a value of zero means that an element does not belong to the set A˜ with certainty and a value of one means that the element belongs to the set A˜ with certainty. Intermediate values indicate the degree to which an element belongs to the set. Using this definition, a classical (socalled crisp) set A is a set whose membership function has a range that is binary; that is, μ A (x) : Ω → {0, 1}, where μ A (x) = 0 means that x ∈ / A and μ A (x) = 1 means x ∈ A. This membership function for a crisp set A is, of course, the characteristic function. So a fuzzy set can be thought of as being one which has a generalized characteristic function that admits values in [0, 1] and not just two values {0, 1} and is uniquely defined by its membership function. A fuzzy set is a (crisp) set in R2 . This follows from the mathematical definition of function as a set of ordered pairs (graph): A˜ = {(x, μ A˜ (x))} ⊆ {(−∞, ∞) × [0, 1]}.
(25)
Some of the earliest people to recognize the relationship between interval analysis and fuzzy set theory were H. Nguyen [28], implicitly, Dubois and Prade [14,15], and Kaufmann and Gupta [92], explicitly. In particular, [17–19, 21] deal specifically with interval analysis and its relationship with fuzzy set theory. In [19] it is shown that, setinclusive monotonicity, as given by R.E. Moore (see [5, 59]), holds for fuzzy quantities. That ˜ is, for fuzzy sets A˜ and B, ˜ ⊆ f ( B). ˜ A˜ ⊆ B˜ ⇒ f ( A)
72
Handbook of Granular Computing
This crucial result just reminds us that when the operands become more imprecise, the precision of the result cannot but diminish. Due to its close relationship to interval analysis, the calculus of fuzzy quantities is clearly pessimistic about precision, since f (A1 , A2 ) is the largest fuzzy set in the sense of fuzzy set inclusion, that is, A˜ ⊆ B˜ ⇔ μ A (x) ≤ μ B (x), ∀x. Much has been written about fuzzy sets that can be found in standard textbooks (see, e.g., [23]) and will not be repeated here. We present only the ideas that are pertinent to the areas in the interfaces between interval and fuzzy analysis of interest. Given that the primary interest is in the relationships between realvalued interval and fuzzy analysis, we restrict our fuzzy sets to a realvalued universe Ω ⊆ R whose membership functions are fuzzy numbers or fuzzy intervals, defined next. Definition 8. A modal value of a membership function is a domain value at which the membership function is 1. A fuzzy set with at least one modal value is called normal. The suppor t of a membership function is the closure of {x  μ A˜ (x)) > 0}. Definition 9. A f uzzy interval is a fuzzy set whose domain is a subset of the reals and whose membership function is upper semicontinuous, normal, and has bounded support. A f uzzy number is a fuzzy interval with a unique modal value. Remark 10. The αcuts of fuzzy intervals are closed intervals of real numbers for all α ∈ (0,1]. The difference between a fuzzy number and a fuzzy interval is that the modal value for a fuzzy number is just 1 point, whereas for a fuzzy interval, the modal value can be a nonzero width interval. The fact that we have bounded intervals at each αcut means that fuzzy arithmetic can be defined by interval arithmetic on each αcut. In fact, when dealing with fuzzy intervals, the operations and analysis are interval operations and analyses on αcuts. There is a more recent development of what are called gradual numbers (see [20, 21]). In the context of a fuzzy interval (all fuzzy numbers are fuzzy intervals) A˜ with membership function μ A (x), the idea is to define a gradual number by the inverse of two functions, one is the inverse of the membership function restricted to (−∞, m − ], i.e., the inverse of the function μ−A (x) = μ A (x), x ∈ (−∞, m − ], where [m − , m + ] is the set of modal values. (Since we are restricting ourselves to fuzzy intervals over the reals, this set is nonempty.) The second function is the inverse of the membership function restricted to [m + , ∞), i.e., the inverse function μ+A (x) = μ A (x), x ∈ [m + , ∞). These inverses are well defined for fuzzy intervals for which μ−A (x) (μ+A (x)) is continuous and strictly increasing (decreasing). These two (inverse) functions (μ−A )−1 (α) : (0, 1] → R and (μ+A )−1 (α) : (0, 1] → R
(26) (27)
define the gradual numbers in the context of real fuzzy intervals, which is our interest. Definition 11. A gradual r eal number r˜ is defined by an assignment Ar˜ from (0, 1] to R (see [21]). Functions (μ−A )−1 (α) (26), and (μ+A )−1 (α) (27) are special cases of this definition and the fuzzy sets that describe fuzzy intervals are the ones of interest to this chapter.
73
Interval Analysis and Fuzzy Sets
3.4.1 Fuzzy Extension Principle Fuzzy extension principles show how to transform realvalued functions into functions of fuzzy sets. The meaning of arithmetic depends directly on the extension principle in force since arithmetic operations are (continuous) functions over the reals assuming that division by zero is not allowed and over the extended reals [31] when division by zero is allowed. The fuzzy arithmetic coming from Zadeh’s extension principle [11] and its relationship to interval analysis has an extensive development (see, e.g., [56, 92]). Moreover, there is an intimate interrelationship between the extension principle being used and the analysis that ensues. For example, in optimization, the way one extends union and intersection via tnorms and tconorms will determine the constraint sets so that it captures the way tradeoffs among decisions are made. The extension principle within the context of fuzzy set theory was first proposed, developed, and defined in [11, 13]. Definition 12. (Extension principle – L. Zadeh) Given a realvalued function f : X → Y, the function over fuzzy sets f : S(X ) → S(Y ) is given by μ f ( A)˜ (y) = sup{μ A˜ (x)  y = f (x)} for all fuzzy subsets A˜ of S(X ) (the set of all fuzzy sets of X ). This definition leads to fuzzy arithmetic as we know it. Moreover, it is one of the main mechanisms requisite to perform fuzzy interval analysis. Various researchers have dealt with the issue of the extension principle and amplified its applicability. H. Nguyen [28] pointed out, in his 1978 paper, that a fuzzy set needs to be defined to be what Dubois and Prade later called a fuzzy interval in order that [ f (A, B)]α = f (Aα , Bα ), where the function f is assumed to be continuous. In particular Aα and Bα need to be compact (i.e., closed/bounded intervals) for each αcut. Thus, H. Nguyen defined a fuzzy number as one whose membership function is upper semicontinuous and whose support was compact. In this case, the αcuts generated are closed and bounded (compact) sets, i.e., realvalued intervals. This is a wellknown result in real analysis. R. Yager [93] pointed out that by looking at functions as graphs (in the Euclidean plane), the extension principle could be extended to include all graphs, thus allowing for analysis of what he calls ‘nondeterministic’ mappings, i.e., graphs that are not functions. Now, ‘nondeterminism’ as is used by Yager can be considered as pointtoset mappings. Thus, Yager implicitly restores the extension principle to a more general setting of pointtoset mappings. J. Ramik [94] points out that we can restore L. Zadeh’s extension principle to its more general setting of settoset mappings explicitly. In fact, a fuzzy mapping is indeed a settoset mapping. He defines the image of a fuzzy settoset mapping as being the set of α’s generated by the function on the αcuts of domain. Lastly, T.Y. Lin’s paper [95] is concerned with determining the function space in which the fuzzy set generated by the extension principle ‘lives.’ That is, the extension principle generates the resultant membership function in the range space. Suppose one is interested in stable controls, then one way to extend is generate resultant (range space) membership functions that are continuous. The (/δ) definition of continuous function essentially states that small perturbations in the input, i.e., domain, cause small perturbations in the output, i.e., range, which is one way to view the definition of stability. T.Y. Lin points out conditions that are necessary in order that range membership function have some desired characteristics (such as continuity or smoothness). What these extension principles express is how to define functions over fuzzy sets so that the resulting range has various properties of interest, defining what may be done in the space of where the extension sends the fuzzy set via the function as dictated by the extension principle itself.
74
Handbook of Granular Computing
3.4.2 Fuzzy Arithmetic Fuzzy arithmetic was, like interval arithmetic, derived from the extension principle of Zadeh [11]. S. Nahmias [96] defined fuzzy arithmetic via a convolution: 1. Addition: μ Z =X +Y (z) = sup min{μ X (x), μY (z − x)}, where z = x + y. x
2. Subtraction: μ Z =X +Y (z) = sup min{μ X (x), μY (x − z)}, where z = x − y. x
3. Multiplication: μ Z =X ×Y (z) = sup min{μ X (x), μY (z/x)}, where z = x × y. x
4. Division: μ Z =X ÷Y (z) = sup min{μ X (x), μY (x/z)}, where z = x ÷ y. x
The above definition was how arithmetic of fuzzy entities was originally conceived. When the extension principle of Zadeh [11] was applied to 1–4 above assuming the fuzzy entities involved were noninteractive (independent), what has come to be known as fuzzy arithmetic ensued. That is, much as what occurred to interval arithmetic occurred to fuzzy arithmetic – its roots in the extension principle were eliminated when given noninteraction (independence) and the axioms for arithmetic (using [28] and requiring membership functions to be upper/lower semicontinuous) ensued. Thus, fuzzy arithmetic became interval arithmetic on αcuts (see [14, 15]).
3.4.2.1 Traditional Fuzzy Arithmetic The fuzzy arithmetic that has been developed by [92] is taken as the standard approach where a more recent approach is found is needed is the fact that a fuzzy interval is uniquely determined in−[56]. What by its αcuts, A˜ = [μ A˜ (α), μ+A˜ (α)], where μ−A˜ (α) and μ+A˜ (α) are the left and right endpoints of the α∈(0,1]
˜ In particular, for fuzzy intervals we have αcuts of the fuzzy set A. A˜ + B˜ =
{[μ−A˜ (α), μ+A˜ (α)] + [μ−B˜ (α), μ+B˜ (α)]},
(28)
{[μ−A˜ (α), μ+A˜ (α)] − [μ−B˜ (α), μ+B˜ (α)]},
(29)
{[μ−A˜ (α), μ+A˜ (α)] × [μ−B˜ (α), μ+B˜ (α)]},
(30)
{[μ−A˜ (α), μ+A˜ (α)] ÷ [μ−B˜ (α), μ+B˜ (α)]}.
(31)
α∈(0,1]
A˜ − B˜ =
α∈(0,1]
A˜ × B˜ =
α∈(0,1]
A˜ ÷ B˜ =
α∈(0,1]
For fuzzy sets whose membership functions are semicontinuous, ˜ α ∗ ( B) ˜ α , ∗ ∈ {+, −, ×, ÷}. ˜ α = ( A) ( A˜ ∗ B) Computer implementation of (28)–(31) can be found in [97]. This program uses INTLAB (another downloadable system that has interval data types and runs in conjunction with MATLAB) to handle the fuzzy arithmetic on αcuts.
3.4.2.2 CaseBased Fuzzy Arithmetic Klir [65] notices, as Moore before him, that if (28)–(31) are used, overestimations will occur. Moreover, when this approach is used, A˜ − A˜ = 0 and A˜ ÷ A˜ = 1. Klir’s idea for fuzzy arithmetic with requisite constraints is to do fuzzy arithmetic with constraints dictated by the context of the problem. That is, Klir defines exceptions to obtain A˜ − A˜ = 0 and A˜ ÷ A˜ = 1.
75
Interval Analysis and Fuzzy Sets
3.4.2.3 Constraint Fuzzy Arithmetic Klir’s approach to fuzzy arithmetic [65] requires an a priori knowledge (via cases) of which variables are identically the same. Constraint fuzzy arithmetic [64] carries this information in the parameters; that is, it performs (28), (29), (30), and (31) using parameter λx that identifies the variable. The resulting fuzzy arithmetic derived from constraint interval arithmetic on αcuts is essentially fuzzy arithmetic with requisite constraints of Klir without cases.
3.4.2.4 Fuzzy Arithmetic Using Gradual Numbers The implementation of [21] as a way to perform fuzzy arithmetic uses (26) and (27) in the following way:
A˜ ∗ B˜ =
⎧ − −1 − −1 − −1 − −1 + −1 (μ A∗ ˜ B˜ ) (α) = min{(μ A˜ ) (α) ∗ (μ B˜ ) (α), (μ A˜ ) (α) ∗ (μ B˜ ) (α), ⎪ ⎪ ⎪ ⎨ (μ+ )−1 (α) + (μ )−1 (α)(μ+ )−1 (α) ∗ (μ+ )−1 (α)} A˜
B˜
A˜
B˜
⎪ (μ+˜ ˜ )−1 (α) = max{(μ−A˜ )−1 (α) ∗ (μ−B˜ )−1 (α), (μ−A˜ )−1 (α) ∗ (μ+B˜ )−1 (α), ⎪ ⎪ ⎩ A∗ B (μ+A˜ )−1 (α) + (μ B˜ )−1 (α)(μ+A˜ )−1 (α) ∗ (μ+B˜ )−1 (α)}
for ∗ ∈ {+, −, ×, ÷}.
3.5 Historical Context of Interval Analysis The context in which fuzzy set theory arose is quite well known, whereas for interval analysis this is perhaps not as clear since its precursors go back at least to Archimedes and more recently Burkhill [98] in 1924. L. Zadeh is recognized as the primary impetus in the creation and development of fuzzy set theory. R.E. Moore, T. Sunaga, and M. Warmus have played a similar role in interval analysis. While there are five known direct and clear precursors to Moore’s version of interval arithmetic and interval analysis beginning in 1924 (see [29, 30, 53, 54, 98]), Moore worked out rounded computer arithmetic and fully developed the mathematical analysis of intervals, called interval analysis. Interval analysis has an early history in Archimedes’ computation of circumference of a circle [40]. However as developed by R.E. Moore (see [7–9]), interval analysis arose from the attempt to compute error bounds of numerical solutions on a finitestate machine that accounted for all numerical and truncation error, including roundoff error, automatically (by the computer itself). This leads in a natural way to the investigation of computations with intervals as the entity, data type, that enabled automatic error analysis. R.E. Moore and his colleagues are responsible for developing the early theory, extensions, vision and wide applications of interval analysis, and the actual implementation of these ideas to computers. The major contributions that Moore made have to include at least the following: 1. He recognized how to use intervals in computational mathematics, now called numerical analysis. 2. He extended and implemented the arithmetic of intervals to computers. 3. His work was influential in creating IEEE standards for accessing computer’s rounding processes, which is a necessary step in obtaining computergenerated validated computations (see [39]). 4. He developed the analysis associated with intervals where, as will be seen, functions of intervals, called the united extension, play a key role in achieving this. Citing but one major achievement in this area, he showed that Taylor series methods for solving differential equations are not only more tractable but more accurate (see [59]). 5. He was the first to recognize the usefulness of interval analysis for use in computer verification methods especially for solutions to nonlinear equations using interval Newton’s method in which the method includes verification of existence and uniqueness of solution(s).
76
Handbook of Granular Computing
3.6 Conclusion This chapter presented the main themes pertinent to mathematical analysis associated with granules that are intervals and fuzzy interval sets. The intimate relationship between interval analysis and fuzzy interval analysis was shown. The central role that the extension principle plays in both the arithmetic and the resulting analysis were discussed. Lastly, the role of interval analysis in the areas of enclosure and verification was highlighted. The reader who is interested in the role of enclosure and verification methods with respect to fuzzy sets, possibility theory, and probability theory is directed to [4, 42, 99], where, among other things, enclosure and verification methods applied to risk analysis as well as optimization under uncertainty are developed.
References [1] R.E. Moore and W.A. Lodwick. Interval analysis and fuzzy set theory. Fuzzy Sets Syst. 135(1) (2003) 5–9. [2] W.A. Lodwick and K.D. Jamison (eds). Special issue on the interfaces between fuzzy set theory and interval analysis. Fuzzy Sets Syst. 135(1) (April 2003). [3] W.A. Lodwick (ed.). Special issue on linkages between interval analysis and fuzzy set theory. Reliab. Comput. 9(2) (April 2003). [4] W.A. Lodwick. Interval and fuzzy analysis: A unified approach, Adv. Imag. Electron. Phys. 148 (2007) 75–192. [5] R.E. Moore. Interval Analysis. Prentice Hall, Englewood Cliffs, NJ, 1966. [6] J.D. Pryce and G.F. Corliss. Interval arithmetic with containment sets. Computing, 78(3) (2006) 25–276. [7] R.E. Moore. Automatic Error Analysis in Digital Computation. Technical Report LMSD48421. Lockheed Missile and Space Division, Sunnyvale, CA, 1959. See http://interval.louisiana.edu/Moores early papers/ bibliography.html. [8] R.E. Moore and C.T. Yang. Interval Analysis I. Technical Report LMSD285875. Lockheed Missiles and Space Division, Sunnyvale, CA, 1959. [9] R.E. Moore, W. Strother, and C.T. Yang. Interval Integrals. Technical Report LMSD703073. Lockheed Missiles and Space Division, Sunnyvale, CA, 1960. [10] R.E. Moore. Interval Arithmetic and Automatic Error Analysis in Digital Computing. Ph.D. Thesis. Stanford University, Stanford, CA. Published as Applied Mathematics and Statistics Laboratories Technical Report No. 25, November 15, 1962. See http://interval.louisiana.edu/Moores early papers/bibliography.html. [11] L.A. Zadeh. Fuzzy sets. Inf. Control 8 (1965) 338–353. [12] L.A. Zadeh. Probability measures of fuzzy events. J. Math. Anal. Appl. 23 (1968) 421–427. [13] L.A. Zadeh. The concept of a linguistic variable and its application to approximate reasoning. Inf. Sci. Pt I 8 (1975) 199–249; Part II 8 (1975) 301–357; Part III 9 (1975) 43–80. [14] D. Dubois and H. Prade. Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York, 1980. [15] D. Dubois and H. Prade. Additions of interactive fuzzy numbers. IEEE Trans. Autom. Control 26(4) (1981) 926–936. [16] O. Aberth. Precise Numerical Analysis. William C. Brown, Dubuque, IO, 1988. [17] D. Dubois and H. Prade. Evidence theory and interval analysis. In: Second IFSA Congress, Tokyo, July 20–25, 1987, pp. 502–505. [18] D. Dubois and H. Prade. Random sets and fuzzy interval analysis. Fuzzy Sets Syst. 42 (1991) 87–101. [19] D. Dubois, E. Kerre, R. Mesiar, and H. Prade. Chapter 10: Fuzzy interval analysis. In: D. Dubois and H. Prade (eds). Fundamentals of Fuzzy Sets. Kluwer Academic Press, Dordrecht, 2000. [20] D. Dubois and H. Prade. Fuzzy elements in a fuzzy set. In: Proceedings of the 10th International Fuzzy System Association (IFSA) Congress, Beijing 2005, pp. 55–60. [21] J. Fortin, D. Dubois, and H. Fargier. Gradual numbers and their application to fuzzy interval analysis, IEEE Trans. Fuzzy Syst. (2008, in press). [22] D. Dubois and H. Prade. Possibility Theory an Approach to Computerized Processing of Uncertainty. Plenum Press, New York, 1988. [23] G. J. Klir and B. Yuan. Fuzzy Sets and Fuzzy Logic. Prentice Hall, Upper Saddle River, NJ, 1995. [24] W. Strother. Continuity for MultiValued Functions and Some Applications to Topology. Ph.D. Thesis. Tulane University, New Orleans, LA, 1952. [25] W. Strother. Fixed points, fixed sets, and mretracts. Duke Math. J. 22(4) (1955) 551–556. [26] J.P. Audin and H. Frankkowska. SetValued Analysis. Birkh¨auser, Boston, 1990. [27] W. Strother. Continuous multivalued functions. Boletim da Sociedade de Matematica de S˜ao Paulo 10 (1958) 87–120.
Interval Analysis and Fuzzy Sets
77
[28] H.T. Nguyen. A note on the extension principle for fuzzy sets. J. Math. Anal. Appl. 64 (1978) 369–380. [29] M. Warmus. Calculus of approximations. Bull. Acad. Pol. Sci. Cl. III (4), (1956) 253–259. See http://www.cs.utep.edu/intervalcomp/early.html. [30] T. Sunaga. Theory of an interval algebra and its application to numerical analysis. RAAG Mem. 2 (1958) 547–564. See http://www.cs.utep.edu/intervalcomp/early.html. [31] E.R. Hansen. A generalized interval arithmetic. In: K. Nickel (ed.), Interval Mathematics, Lecture Notes in Computer Science 29. SpringerVerlag, New York, 1975, pp. 7–18. [32] W.M. Kahan. A More Complete Interval Arithmetic. Lecture Notes for a Summer Course at University of Michigan, Ann Arbor, MI, 1968. [33] M.A.H. Dempster. An application of quantile arithmetic to the distribution problem in stochastic linear programming. Bull. Inst. Math. Appl. 10 (1974) 186–194. [34] A. Neumaier. The wrapping effect, ellipsoid arithmetic, stability and confidence regions. Comput. Suppl. 9 (1993) 175–190. [35] K. Nickel. TriplexAlgol and its applications. In: E.R. Hansen (ed.), Topics in Interval Analysis. Oxford Press, New York, 1969, pp. 10–24. [36] J. Stolfi, M.V.A. Andrade, J.L.D. Comba, and R. Van Iwaarden. Affine arithmetic: a correlationsensitive variant of interval arithmetic. See http://www.dcc.unicamp.br/˜ stolfi/EXPORT/projects/affinearith, accessed January 17, 2008. [37] U. Kulisch. An axiomatic approach to rounded computations. Numer. Math. 18 (1971) 1–17. [38] U. Kulisch and W.L. Miranker. Computer Arithmetic in Theory and Practice. Academic Press, New York, 1981. [39] U. Kulisch and W.L. Miranker. The arithmetic of the digital computer: a new approach. SIAM Rev. 28(1) (1986) 1–40. [40] Archimedes of Siracusa. On the measurement of the circle. In: T.L. Heath (ed.), The Works of Archimedes. Cambridge University Press, Cambridge, 1897; Dover edition, 1953, pp. 91–98. [41] G.M. Phillips. Archimedes the numerical analyst. Am. Math. Mon. (1981) 165–169. [42] W.A. Lodwick and K.D. Jamison. Intervalvalued probability in the analysis of problems that contain a mixture of fuzzy, possibilistic and interval uncertainty. In: K. Demirli and A. Akgunduz (eds), 2006 Conference of the North American Fuzzy Information Processing Society, June 3–6, 2006, Montr´eal, Canada, paper 327137. [43] A. Neumaier. Interval Methods for Systems of Equations. Cambridge Press, Cambridge, 1990. [44] W. Tucker. A rigorous ODE solver and Smale’s 14th problem. Found. Comput. Math. 2 (2002) 53–117. [45] S. Smale. Mathatical problems for the next century. Math. Intell. 20(2) (1998) 7–15. [46] B. Davies. Whither mathematics? Not. AMS 52(11) (December 2005) 1350–1356. [47] T.C. Hales. Cannonballs and honeycombs. Not. Am. Math. Soc. 47 (2000) 440–449. [48] F. Bornemann, D. Laurie, S. Wagon, and J. Waldvogel. The SIAM 100Digit Challenge: A Study in HighAccuracy Numerical Computing. SIAM, Philadelphia, 2004. [49] D. Daney, Y. Papegay, and A. Neumaier. Interval methods for certification of the kinematic calibration of parallel robots. In: IEEE International Conference on Robotics and Automation, New Orleans, LA, April 2004, pp. 191–198. [50] L. Jaulin. Path planning using intervals and graphs. Reliab. Comput. 7(1) (2001) 1–15. [51] L. Jaulin, M. Kieffer, O. Didrit, and E. Walter. Applied Interval Analysis. Springer, New York, 2001. [52] G.F. Corliss. Tutorial on validated scientific computing using interval analysis. In: PARA’04 Workshop on StateoftheArt Computing. Technical University of Denmark, Denmark, June 20–23, 2004. See http://www.eng.mu.edu/corlissg/PARA04/READ ME.html. [53] R.C. Young. The algebra of manyvalued quantities. Math. Ann. Band 104 (1931) 260–290. [54] P.S. Dwayer. Linear Computations, Wiley, New York, 1951. [55] E. Garde˜nes, H. Mielgo, and A. Trepat, Modal intervals: reasons and ground semantics. In: K. Nickel (ed.), Interval Mathematics 1985: Proceedings of the International Symposium, Freiburg i. Br., Federal Republic of Germany, September 23–26, 1985, pp. 27–35. [56] M. Hanss. Applied Fuzzy Arithmetic. SpringerVerlag, Berlin, 2005. [57] J. Bondia, A. Sala, and M. S´ainz. Modal fuzzy quantities and applications to control. In: K. Demirli and A. Akgunduz (eds), 2006 Conference of the North American Fuzzy Information Processing Society, June 36, 2006, Montr´eal, Canada, 2006, paper 327134. [58] M.A. S´ainz. Modal intervals. Reliab. Comput. 7(2) (2001) 77–111. [59] R.E. Moore. Methods and Applications of Interval Analysis. SIAM, Philadelphia, 1979. [60] E.R. Hansen. Global Optimization Using Interval Arithmetic. Marcel Dekker, New York, 1992. [61] R.B. Kearfott. Rigorous Global Search: Continuous Problem. Kluwer Academic Publishers, Boston, 1996. [62] A. Neumaier. Complete search in continuous global optimization and constraint satisfaction. In: A. Iserles (ed.), Acta Numerica 2004. Cambridge University Press, Cambridge, 2004, pp. 271–369.
78
Handbook of Granular Computing
[63] H. Ratschek and J. Rokne. New Computer Methods for Global Optimization. Horwood, Chichester, England, 1988. [64] W.A. Lodwick. Constrained Interval Arithmetic. CCM Report 138, February 1999. [65] G.J. Klir. Fuzzy arithmetic with requisite constraints. Fuzzy Sets Syst. 91(2) (1997) 165–175. [66] P. Korenerup and D.W. Matula. Finite precision rational arithmetic: an arithmetic unit. IEEE Trans. Comput. 32(4) (1983) 378–388. [67] E.R. Hansen. Interval forms of Newton’s method. Computing 20 (1978) 153–163. [68] G.W. Walster. The extended real interval system (personal copy from the author, 1998). [69] H.J. Ortolf. Eine Verallgemeinerung der Intervallarithmetik. Geselschaft fuer Mathematik und Datenverarbeitung. Bonn, Germany, 1969 Vol. 11, pp. 1–71. ¨ [70] E. Kaucher. Uber metrische und algebraische Eigenschaften eiginger beim numerischen Rechnen auftretender R¨aume. Ph.D. Thesis. University of Karlsruhe, Karlsruhe, Germany, 1973. [71] E. Kaucher. Interval analysis in the extended space I R. Comput. Suppl. 2 (1980) 33–49. [72] M. Warmus. Approximations and inequalities in the calculus of approximations. Classification of approximate numbers. Bull. Acad. Pol. Sci. Cl. IX (4) (1961) 241–245. See http://www.cs.utep.edu/intervalcomp/early.html. [73] G. Alefeld and J. Herzberger. Introduction to Interval Computations. Academic Press, New York, 1983. [74] E.D. Popova. http://www.math.bas.bg/˜epopova/directed.html, accessed January 17, 2008. [75] D. Klaua. Partielle Mengen und Zhlen. Mtber. Dt. Akad. Wiss. 11 (1969) 585–599. [76] K. Jahn. The importance of 3valued notions for interval mathematics. In: K.E. Nickel (ed.) Interval Mathematics 1980. Academic Press, New York, 1980, pp. 75–98. [77] M.A.H. Dempster. Distributions in interval and linear programming. In: E.R. Hansen (ed.), Topics in Interval Analysis. Oxford Press, New York, 1969, pp. 107–127. [78] K. G. Guderley and C.L. Keller. A basic theorem in the computation of ellipsoidal error bounds. Numer. Math. 19(3) (1972) 218–229. [79] W.M. Kahan. Circumscribing an ellipsoid about the intersection of two ellipsoids. Can. Math. Bull. 11(3) (1968) 437–441. [80] R.E. Moore. Computing to arbitrary accuracy. In: C. Bresinski and U. Kulisch (eds), Computational and Applied Mathematics I: Algorithms and Theory. NorthHolland, Amsterdam, 1992, pp. 327–336. [81] J.S. Ely. The VPI software package for variable precision interval arithmetic. Interval Comput. 2(2) (1993) 135–153. [82] M.J. Schulte and E.E. Swartzlander, Jr. A family of variableprecision interval processors. IEEE Trans. Comput. 49(5) (2000) 387–397. [83] N. Revol and F. Rouillier. Motivations for an arbitrary precision interval arithmetic and the MPFI library. Reliab. Comput. 11(4) (2005) 275–290. [84] J.S. Ely and G.R. Baker. Highprecision calculations of vortex sheet motion. J. Comput. Phys. 111 (1993) 275–281. [85] G. Alefeld. Enclosure methods. In: C. Ullrich (ed.), Computer Arithmetic and SelfValidating Numerical Methods. Academic Press, Boston, 1990, pp. 55–72. [86] E. Kaucher and S.M. Rump. Emethods for fixed point equations f (x) = x. Computing 28(1) (1982) 31–42. [87] K. B¨ohmer, P. Hemker, and H.J. Stetter. The defect correction approach. Comput. Suppl. 5 (1984) 1–32. [88] R.B. Kearfott. Interval computations: introduction, uses, and resources. Euromath. Bull. 2(1) (1996) 95–112. [89] G. Mayer. Success in epsiloninflation. In: G. Alefeld and B. Lang (eds), Scientific Computing and Validated Numerics: Proceedings of the International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics SCAN95, Wuppertal, Germany, September 26–29, 1995, Akademie Verlag, Berlin, May 1996, pp. 98–104. [90] G. Mayer. Epsiloninflation in verification algorithms. J. Comput. Appl. Math. 60 (1995) 147–169. [91] H.J. Stetter. The defect correction principle and discretization methods. Numer. Math. 29 (1978) 425–443. [92] A. Kaufmann and M.M. Gupta. Introduction to Fuzzy Arithmetic–Theory and Applications. Van Nostrand Reinhold, New York, 1985. [93] R.R. Yager. A characterization of the extension principle. Fuzzy Sets Syst. 18 (1986) 205–217. [94] J. Ramik. Extension principle in fuzzy optimization. Fuzzy Sets Syst. 19 (1986) 29–35. [95] T.Y. Lin. A function theoretical view of fuzzy sets: new extension principle. In: D. Filev and H. Ying (eds.), Proceedings of the 2005 North American Fuzzy Information Processing Society Annual Conference: Computing for Real World Applications, Ann Arbor, MI, 2005. [96] S. Nahmias. Fuzzy variable. Fuzzy Sets Syst. 1 (1978) 97–110. [97] M. Anile, S. Deodato, and G. Privitera. Implementing fuzzy arithmetic. Fuzzy Sets Syst. 72 (2) (1995) 239–250. [98] J.C. Burkill. Functions of intervals. Proc. Lond. Math. Soc. 22 (1924) 375–446.
Interval Analysis and Fuzzy Sets
79
[99] K.D. Jamison and W.A. Lodwick. Intervalvalued probability in the analysis of problems containing a mixture of fuzzy, possibilistic, probabilistic and interval uncertainty Fuzzy Sets Syst, 2008, in press. [100] J.J. Buckley. A generalized extension principle. Fuzzy Sets Syst. 33 (1989) 241–242. [101] A. Deif. Sensitivity Analysis in Linear Systems. SpringerVerlag, New York, 1986. [102] D. Dubois, S. Moral, and H. Prade. Semantics for possibility theory based on likelihoods. J. Math. Anal. Appl. 205 (1997) 359–380. [103] D. Dubois and H. Prade. Le Flou, M´ec´edonka? Technical Report, C.E.R.T.D.E.R.A., Toulouse, France, Avril 1977. [104] D. Dubois and H. Prade. Fuzzy Algebra, Analysis, Logics. Technical Report, N0 TREE 7813, Purdue University, March 1978. [105] D. Dubois and H. Prade. Operations on fuzzy numbers. Int. J. Syst. Sci. 9(6) (1978) 613–626. [106] D. Dubois and H. Prade. Fuzzy real algebra: some results. Fuzzy Sets Syst. 2(4) (1979) 327–348. [107] D. Dubois and H. Prade. Fuzzy Numbers: An Overview. Technical Report No. 219. L.S.I., University of Paul Sabatier, Toulouse, France. Also In: J.C. Bezdek (ed.), Chapter 1 of Analysis of Fuzzy Information, Volume 1, Mathematics and Logic. CRC Press, Boca Raton, FL, 1987. [108] D. Dubois and H. Prade. Special Issue on fuzzy numbers. Fuzzy Sets Syst. 24(3) (December 1987). [109] D. Dubois and H. Prade (eds). Fundamentals of Fuzzy Sets. Kluwer Academic Press, Dordrecht, 2000. [110] R. Full´er and T. Keresztfalvi. On generalization of Nguyen’s theorem. Fuzzy Sets Syst. 41 (1990) 371–374. [111] E.R. Hansen (ed.). Topics in Interval Analysis. Oxford Press, New York, 1969. [112] E.R. Hansen. Publications Related to Early Interval Work of R. E. Moore, August 13, 2001. See http://interval.louisiana.edu/Moores early papers/bibliography.html. [113] T. Hickey, Q. Ju, and M.H. van Emden. Interval arithmetic: from principles to implementation. J. ACM 48(5) (2001) 1038–1068. [114] S. Kaplan. On the method of discrete probability distributions in risk and reliability calculations–applications to seismic risk assessment. J. Risk 1(3) (1981) 189–196. [115] G.J. Klir. Chapter 1: The role of constrained fuzzy arithmetic in engineering. In: B. Ayyub and M.M. Gupta (eds), Uncertainty Analysis in Engineering and Sciences: Fuzzy Logic, Statistics, and Neural Network Approach. Kluwer Academic Publishers, Dordrecht, pp. 1–19. 1998. [116] W.A. Lodwick. Constraint propagation, relational arithmetic in AI systems and mathematical programs. Ann. Oper. Res. 21 (1989) 143–148. [117] W.A. Lodwick. Analysis of structure in fuzzy linear programs. Fuzzy Sets Syst. 38 (1990) 15–26. [118] M. Mizumoto and K. Tanaka. The four operations of arithmetic on fuzzy numbers. Syst. Comptut. Control 7(5) (1976) 73–81. [119] R.E. Moore. The automatic analysis and control of error in digital computing based on the use of interval numbers. In: L.B. Rall (ed.), Error in Digital Computation. John Wiley and Sons, New York, 1965, vol. I, Chapter 2, pp. 61–130. [120] R.E. Moore. The dawning. Reliab. Comput. 5 (1999) 423–424. [121] B. Russell. Vagueness. Aust. J. Phil. 1 (1924) 84–92. [122] J.R. Shewchuk. Delaunay refinement algorithms for triangular mesh generation. Comput. Geom. Theory Appl. 22(1–3) (2002) 21–74. [123] J.A. Tupper. Graphing Equations with Generalized Interval Arithmetic. Ph.D. Thesis. University of Toronto, Ontario, 1996. [124] R.R. Yager. A procedure for ordering fuzzy subsets of the unit interval. Inf. Sci. 24 (1981) 143–161.
4 Interval Methods for NonLinear Equation Solving Applications Courtney Ryan Gwaltney, Youdong Lin, Luke David Simoni, and Mark Allen Stadtherr
4.1 Overview A problem encountered frequently in virtually any field of science, engineering, or applied mathematics is the solution of systems of nonlinear algebraic equations. There are many applications in which such systems may have multiple solutions, a single solution, or no solution, with the number of solutions often unknown a priori. Can all solutions be found? If there are no solutions, can this be verified? These are questions that are difficult or impossible to answer using conventional local methods for equation solving. However, methods based on interval analysis are available that can answer these questions, and do so with mathematical and computational rigor. Such methods are based on the processing of granules in the form of intervals and can thus be regarded as one facet of granular computing [1]. The remainder of this chapter is organized as follows: in the next section, a brief summary of interval arithmetic is provided, and some of the key concepts used in interval methods for equation solving are reviewed. In subsequent sections, we focus on specific application areas, namely the modeling of phase equilibrium (Section 4.3), transitionstate analysis (Section 4.4), and ecological modeling (Section 4.5).
4.2 Background 4.2.1 Interval Arithmetic Interval arithmetic in its modern form was introduced by Moore [2] and is based on arithmetic conducted on closed sets of real numbers. A real interval X is defined as the set of real numbers between (and including) given upper and lower bounds. That is, X = [X , X ] = {x ∈ R  X ≤ x ≤ X }. Here an underline is used to indicate the lower bound of an interval, while an overline is used to indicate the upper bound. Unless indicated otherwise, uppercase quantities are intervals and lowercase quantities or uppercase quantities with an underline or overline are real numbers. An interval vector X = (X 1 , X 2 , . . . , X n )T has n interval components and can be interpreted geometrically as an ndimensional rectangular convex polytope or ‘box.’ Similarly, an n × m interval matrix A has interval elements Ai j , i = 1, 2, . . . , n and j = 1, 2, . . . , m. Handbook of Granular Computing C 2008 John Wiley & Sons, Ltd
Edited by Witold Pedrycz, Andrzej Skowron and Vladik Kreinovich
82
Handbook of Granular Computing
Interval arithmetic is an extension of real arithmetic. For a real arithmetic operation op ∈ {+, −, ×, ÷}, the corresponding interval operation on intervals X and Y is defined by X op Y = {x op y  x ∈ X, y ∈ Y }.
(1)
That is, the result of an interval arithmetic operation on X and Y is an interval enclosing the range of results obtainable by performing the operation with any number in X and any number in Y . Interval extensions of the elementary functions (sin, cos, exp, log, etc.) can be defined similarly and computed using interval arithmetic operations on the appropriate series expansions. For dealing with exceptions, such as division by an interval containing zero, extended models for interval arithmetic are available, often based on the extended real system R∗ = R ∪ {−∞, +∞}. The concept of containment sets (csets) provides a valuable framework for constructing models for interval arithmetic with consistent handling of exceptions [3, 4]. When machine computations using intervals are performed, rounding errors must be handled correctly in order to ensure that the result is a rigorous enclosure. Since computers can represent only a finite set of real numbers (machine numbers), the results of floatingpoint arithmetic operations that compute the endpoints of an interval must be determined using a directed (outward) rounding, instead of the standard roundtonearest, procedure. Through the use of interval arithmetic with directed outward rounding, as opposed to floatingpoint arithmetic, any potential rounding error problems are avoided. Several good introductions to interval analysis, including interval arithmetic and other aspects of computing with intervals, are available [3, 5–8]. Implementations of interval arithmetic and elementary functions are readily available for a variety of programming environments, including INTLIB [9, 10] for Fortran 77, INTERVAL ARITHMETIC [11] for Fortran 90, PROFIL/BIAS [12] and FILIB++ [13] for C++, and INTLAB [14] for Matlab. Recent compilers from Sun Microsystems provide direct support for interval arithmetic and an interval data type. For an arbitrary function f (x), the interval extension, denoted by F(X ), encloses all possible values of f (x) for x ∈ X . That is, F(X ) ⊇ { f (x)  x ∈ X } encloses the range of f (x) over X . It is often computed by substituting the given interval X into the function f (x) and then evaluating the function using interval arithmetic. This ‘natural’ interval extension may be wider than the actual range of function values, although it always includes the actual range. The potential overestimation of the function range is due to the ‘dependency’ problem of interval arithmetic, which may arise when a variable occurs more than once in a function expression. While a variable may take on any value within its interval, it must take on the same value each time it occurs in an expression. However, this type of dependency is not recognized when the natural interval extension is computed. In effect, when the natural interval extension is used, the range computed for the function is the range that would occur if each instance of a particular variable were allowed to take on a different value in its interval range. For the case in which f (x) is a singleuse expression, i.e., an expression in which each variable occurs only once, the use of interval arithmetic will always yield the true function range, not an overestimation. For cases in which obtaining a singleuse expression is not possible, there are several other approaches that can be used to tighten interval extensions [3, 5, 7, 8, 15], including the use of monotonicity [16, 17] and the use of Taylor models [18, 19].
4.2.2 EquationSolving Techniques There are a many ways intervals may be used in nonlinear equation solving. No attempt is made to systematically survey all such methods here. Instead, we highlight some of the key concepts used in many interval methods for nonlinear equation solving. Many of these concepts can be described in terms of contraction operators, or contractors [5]. Contractors may either reduce the size of or completely eliminate the region in which solutions to the equation system of interest are being sought. Consider the nonlinear equation solving problem f(x) = 0, for which real roots are sought in an initial interval X(0) . Intervalbased strategies exist for contracting, eliminating, or dividing X(0) . Reliable methods for locating all solutions to an equation system are formed by combining these strategies. For a comprehensive treatment of these techniques, several sources are available, including monographs by Neumaier [8], Kearfott [7], Jaulin et al. [5], and Hansen and Walster [3].
Interval Methods for NonLinear Equation Solving Applications
83
4.2.2.1 Function Range Testing Consider a search for solutions of f(x) = 0 in an interval X. If an interval extension of f(x) over X does not contain zero, i.e., 0 ∈ / F(X), then the range of f(x) over X does not contain zero, and it is not possible for X to contain a solution of f(x) = 0. Thus, X can be eliminated from the search space. The use of interval extensions for function range testing is one simple way an interval can be eliminated as not containing any roots. This is commonly used in nonlinear equation solving methods prior to use of the contraction methods discussed below. A method that makes more extensive use of function range testing was developed by Yamamura [20] on the basis of use of linear combinations of the component functions of f(x). An approach for forming the linear combinations based on the inverse of the midpoint of the interval extension of the Jacobian of f(x) was shown to be very effective.
4.2.2.2 Constraint Propagation Most constraint propagation strategies for nonlinear equation solving are based on the concepts of hull consistency, box consistency, or some combination or variation thereof. These are strategies for contracting (shrinking) intervals in which roots are sought, or possibly eliminating them entirely. The hull consistency form of constraint propagation is based on an interval extension of fixedpoint iteration [3, 21]. Consider a single equation and variable f (x) = 0, and let it be reformulated into the fixedpoint form x = g(x). If X is the search interval, then any roots of f (x) = 0 must be in the interval X˜ = G(X ). It may be possible to shrink the search interval by taking the intersection of X and X˜ , i.e., X ← X ∩ X˜ . If this results in a contraction of X , then the process may be repeated. Furthermore, if X ∩ X˜ = ∅, then the current search interval can be eliminated entirely as containing no solutions. If there are different ways to obtain the function g(x), then the process can be repeated using these alternative fixedpoint forms. For systems of equations, hull consistency can be applied to one equation at a time and one variable at a time (holding other variables constant at their interval values in the current search space). In this way, contractions in one component of the search space can be propagated readily to other components. Clearly, there are many possible strategies for organizing this process. Another type of constraint propagation strategy is known as box consistency [3, 5]. In this case, all but one of the variables in an equation system are set to their interval values in the current search space. Now there are one or more equations involving only the remaining variable, say x j . These constraints can be used to contract X j , the current search range for x j . There are various ways to do this, including univariate intervalNewton iteration [22] and methods [3] for direct calculation of new bounds for x j . This procedure can be repeated using a combination of any equation and any variable in the equation system. Again, this provides a way for contractions in one component of the search space to be propagated to other components. Box consistency and hull consistency tests can also be easily combined [3, 23]. A variety of software packages are available that apply constraint propagation techniques, often in combination with other intervalbased methods, to solve systems of equations. These include RealPaver [24], Numerica [25], and ICOS [26].
4.2.2.3 Krawczyk and IntervalNewton The Krawczyk and intervalNewton methods are contraction strategies that have been widely used in the solution of nonlinear equation systems. They also provide a test for the existence of a unique solution in a given interval. Both are generally applied in connection with some bisection or other tessellation scheme [7], thus resulting in a sequence of subintervals to be tested. Let X(k) indicate an interval in this sequence. Using Krawczyk or intervalNewton method, it is possible to contract X(k) , or even eliminate it, and also to determine if a unique solution to f(x) = 0 exists in X(k) . In the Krawczyk method, the interval K(k) is computed from K(k) = K(X(k) , x(k) ) = x(k) − Y (k) f(x(k) ) + (I − Y (k) F (X(k) ))(X(k) − x(k) ).
(2)
Here, F (X(k) ) indicates an interval extension of the Jacobian matrix of f(x), but could be any Lipschitz matrix. Also, x(k) is an arbitrary point in X(k) , and Y (k) is a real preconditioning matrix. The properties
84
Handbook of Granular Computing
of this method have been widely studied [3, 5, 7, 8, 27–29]. Any roots of f(x) = 0 in X(k) will also be in K(k) , thus giving the contraction scheme X(k+1) = X(k) ∩ K(k) . It follows that if X(k) ∩ K(k) = ∅, then X(k) contains no roots and can be eliminated. An additional property is that if K(k) is in the interior of X(k) , then there is a unique root in X(k) . If X(k) cannot be eliminated or sufficiently contracted, or cannot be shown to contain a unique root, then it is bisected, and the procedure is repeated on each resulting interval. Several improvements to the basic Krawczyk method have been suggested, including a bicentered method [30], a boundarybased method [30, 31], and a componentwise version of the algorithm [32]. In the intervalNewton method, the interval N(k) = N(X(k) , x(k) ) is determined from the linear interval equation system Y (k) F (X(k) )(N(k) − x(k) ) = −Y (k) f(x(k) ).
(3)
This method has also been widely studied [3, 5, 7, 8, 33, 34] and has properties similar to the Krawczyk method. Any roots in X(k) are also in N(k) , so the contraction X(k +1) = X(k ) ∩ N(k ) can be used. If X(k ) ∩ N(k ) = ∅, then X(k ) can be eliminated. Furthermore, if N(k ) is in the interior of X(k ) , there is a unique root in X(k ) . In this case, the intervalNewton procedure can be repeated and converges quadratically to a narrow enclosure of the root. Alternatively, an approximation of the root can be found using a standard pointNewton algorithm starting from any point in X(k ) . Again, if X(k ) cannot be eliminated or sufficiently shrunk, or cannot be shown to contain a unique root, it is bisected. N(k ) can be obtained from the linear interval equation system (3) in various ways. However, an interval Gauss–Seidel procedure [35] is widely used. In this case, N(k ) is never obtained explicitly, since after each component Ni(k ) is computed, it is intersected with X i(k ) , and the result is then used in computing subsequent components of N(k ) . For a fixed preconditioning matrix, the enclosure provided by the intervalNewton method using Gauss–Seidel is at least as good as that provided by the Krawczyk method [8, 35]. Nevertheless, the Krawczyk method appears attractive, because it is not necessary to bound the solution of a system of linear interval equations. However, in practice the interval Gauss–Seidel procedure is a very simple and effective way to deal with the linear equation system. Overall, intervalNewton with Gauss–Seidel is regarded as computationally more efficient than the Krawczyk method [3, 36]. There are many variations on the intervalNewton method, corresponding to different choices of the real point x(k) and preconditioning matrix Y (k) , different strategies for choosing the bisection coordinate, and different ways to bound the solution of equation (3). The real point x(k) is typically taken to be the midpoint of X(k ) , and the preconditioning matrix Y (k ) is often taken to be either the inverse of the midpoint of F (X(k) ) or the inverse of the Jacobian evaluated at the midpoint of X(k ) . However, these choices are not necessarily optimal [37]. For example, several alternative preconditioning strategies are given by Kearfott et al. [38]. Gau and Stadtherr [39] combined one of these methods, a pivoting preconditioner, with a standard inverse midpoint scheme and were able to obtain significant performance gains compared with the use of the inverse midpoint preconditioner alone. For the choice of the real point x(k) , one alternative strategy is to use an approximation to a root of the equation system, perhaps obtained using a local equation solver. Gau and Stadtherr [39] suggested a realpoint selection scheme that seeks to minimize the width of the intersection between X i(k) and Ni(k) . Several bisection or other boxsplitting strategies have been studied [3, 7, 29, 40]. The maximum smear heuristic [40], in which bisection is done on the coordinate whose range corresponds to the maximum width in the function range, is often, but not always, an effective choice. For bounding the solution of equation (3) there are many possible approaches, though, as noted above, the preconditioned interval Gauss–Seidel approach is typically quite effective. One alternative, described by Lin and Stadtherr [41, 42], uses a linear programming strategy along with a realpoint selection scheme to provide sharp enclosures of the solution N(k) to equation (3). Although, in general, sharply bounding the solution set of a linear interval equation system is NPhard, for the special case of intervalNewton, this linear programming approach can efficiently provide exact (within roundout) bounds. Finally, it should be noted that a slope matrix can be used in equations (2) and (3) instead of a Lipschitz matrix. In this case, the test for enclosure of a unique root is no longer applicable, unless some type of compound algorithm is used [43]. In implementing the intervalNewton method, values of f(x) are computed using interval arithmetic to bound rounding errors. Thus, in effect, f(x) is interval valued. In general, the intervalNewton method can
Interval Methods for NonLinear Equation Solving Applications
85
be used to enclose the solution set of any intervalvalued function. For example, consider the problem f(x, p) = 0, where p is some parameter. If the value of p is uncertain but is known to be in the interval P, then we have the intervalvalued function F(x, P ) and the problem is to enclose the solution set of F(x, P ) = 0. This solution set is defined by S = {x  f(x, p) = 0, p ∈ P }. An interval enclosure of S can be found readily using the intervalNewton method, though generally, due to bounding of rounding errors, it will not be the smallest possible interval enclosure. However, since S is often not an interval, even its tightest interval enclosure may still represent a significant overestimation. To more closely approximate S, one can divide P into subintervals, obtain an interval enclosure of the solution set over each subinterval, and then take the union of the results. Implementation of interval methods for nonlinear equation solving typically employs a combination of one or more of the concepts outlined above [21, 23, 44], perhaps also in connection with some manipulation of the equation system to be solved [20, 45, 46]. Often function range testing and constraint propagation techniques are first used to contract intervals, as these methods have low computational overhead. Then, more costly intervalNewton steps can be applied to the contracted intervals to obtain final solution enclosures. In most such equationsolving algorithms, the intervals can be treated as independent granules of data. Thus, parallel implementations of interval methods are generally apparent, though must be done with proper attention to load balancing in order to be most effective [47–49]. In the subsequent sections, we will look at some specific applications of interval methods for nonlinear equation solving. The core steps in the algorithm used to solve these problems can be outlined as follows: for a given X(k) , (1) apply function range test; if X(k) is not eliminated, then (2) apply hull consistency (this is done on a problem specific basis); if X(k) is not eliminated, then (3) apply intervalNewton, using either the hybrid preconditioning technique of Gau and Stadtherr [39] or the linear programming method of Lin and Stadtherr [42]; if X(k) is not eliminated, or a unique root in X(k) not identified, then (4) bisect X(k) . This is only one possible way to implement an interval method for nonlinear equation solving applications. However, it has proved to be effective on a wide variety of problems, some of which are discussed below. The applications considered next are purely equationsolving problems. However, since many optimization problems can easily be converted into an equivalent system of equations, the techniques described above are also often applied to problems requiring global optimization, typically in connection with some branchandbound procedure.
4.3 Modeling of Liquid–Liquid Phase Equilibrium The modeling of phase behavior is a rich source of problems in which interval methods can play an important role, by ensuring that correct results are reliably obtained [50, 51]. Of interest is the development and use of models for predicting the number, type (liquid, vapor, or solid), and composition of the phases present at equilibrium for mixtures of chemical components at specified conditions. In model development, parameter estimation problems arise, which typically require solution of a nonconvex optimization problem. Unfortunately, it is not uncommon to find that literature values for parameters are actually locally, but not globally, optimal [52]. Use of parameters that are not globally optimal may result in rejection of a model that would otherwise be accepted if globally optimal parameters were used. For the case of vapor–liquid equilibrium modeling, Gau and Stadtherr [52, 53] have used an interval method to guarantee that the globally optimal parameters are found. After models are developed, they are used to compute the phase equilibrium for mixtures of interest. This is another global optimization problem, the global minimization of the total Gibbs energy in the case of specified temperature and pressure. Again, it is not uncommon to find literature solutions that are only locally optimal, and thus do not represent stable equilibrium states [51]. For the phase stability and equilibrium problems, and for related phase behavior calculations, there have been a number of successful applications of interval methods to the underlying equationsolving and optimization problems [50, 51, 54–67]. In this section, we will focus on the problem of parameter estimation in the modeling of liquid– liquid equilibrium. This can be formulated as a nonlinear equation solving problem involving only two equations and variables. However, the number of solutions to this system is unknown a priori, and it is not uncommon to see incorrect solutions reported in the literature.
86
Handbook of Granular Computing
4.3.1 Problem Formulation Consider liquid–liquid equilibrium in a twocomponent system at fixed temperature and pressure. For this case, the necessary and sufficient condition for equilibrium is that the total Gibbs energy be at a global minimum. The firstorder optimality conditions on the Gibbs energy lead to the equal activity conditions, aiI = aiII ,
i = 1, 2,
(4)
stating the equality of activities of each component (1 and 2) in each phase (I and II). This is a necessary, but not sufficient, condition for equilibrium. Given an activity coefficient model (ai = γi xi ), expressed in terms of observable component mole fractions x1 and x2 = 1 − x1 , and activity coefficients γ1 and γ2 expressed in terms of composition and two binary parameters θ12 and θ21 , then the equal activity conditions can be expressed as xiI γiI x1I , x2I , θ12 , θ21 = xiII γiII x1II , x2II , θ12 , θ21 ,
i = 1, 2.
(5)
Experimental measurements of the compositions of both phases are available. Thus, in equation (5), the values of x1I , x1II , x2I , and x2II are fixed. This results in a system of two equations in the two parameters θ12 and θ21 . This provides a widely used approach for parameter estimation in activity coefficient models for liquid–liquid equilibrium [68, 69], as generally it is possible to use physical grounds to reject all but one solution to equation (5). Parameter solutions are generally sought using local methods with multistart. A curvefollowing approach can also be used [70], but its reliability is stepsize dependent and is not guaranteed. In this section, we will use an intervalNewton approach, as outlined at the end of Section 4.2, to determine reliably all solutions to equation (5) for the case in which the NonRandom TwoLiquid (NRTL) activity coefficient model is used. In the NRTL model, the activity coefficients for use in equation (5) are given by ln γ1 =
x22
ln γ2 =
x12
τ21
τ12
G 21 x1 + x2 G 21
G 12 x2 + x1 G 12
2
2
τ12 G 12 + (x2 + x1 G 12 )2
τ21 G 21 + , (x1 + x2 G 21 )2
(6)
(7)
where τ12 =
Δg12 g12 − g22 = RT RT
τ21 =
g21 − g11 Δg21 = RT RT
G 12 = exp(−α12 τ12 ) G 21 = exp(−α21 τ21 ). Here, gi j is an energy parameter characteristic of the i– j interaction, and the parameter α = α12 = α21 is related to the nonrandomness in the mixture. The nonrandomness parameter α is frequently taken to be fixed when modeling liquid–liquid equilibrium. The binary parameters that must be determined from experimental data are then θ12 = Δg12 and θ21 = Δg21 .
87
Interval Methods for NonLinear Equation Solving Applications
Table 4.1 Comparison of NRTL parameter estimates for the mixture of nbutanol and water (α = 0.4, T = 363 K)a Solution 1 2 3
Reference [71] τ12 τ21 0.0075 10.182 −73.824
Interval method τ12 τ21
3.8021 3.8034 −15.822
0.0075 10.178
3.8021 3.8034
a
The parameter estimates are obtained by Heidemann and Mandhane [71] and by the use of an interval method.
4.3.2 nButanol and Water Consider a mixture of nbutanol (component 1) and water (component 2) at T = 363 K and atmospheric pressure. Liquid–liquid phase equilibrium is observed experimentally with phase compositions x1I = 0.020150 and x1I I = 0.35970. Heidemann and Mandhane [71] modeled this system using NRTL with α = 0.4. They obtained three solutions for the binary parameters, as shown in Table 4.1, in terms of τ12 and τ21 . Applying the interval method to solve this system of nonlinear equations, with an initial search interval of θ12 ∈ [−1 × 106 , 1 × 106 ] and θ21 ∈ [−1 × 106 , 1 × 106 ], we find only two solutions, as also shown in Table 4.1. The extra solution found by Heidemann and Mandhane [71] is well within the search space used by the interval method and, so is clearly a spurious solution resulting from numerical difficulties in the local method used to solve equation (5). This can be verified by direct substitution of solution 3 into the equal activity conditions. When equal activity is expressed in the form of equation (5), the residuals for solution 3 are close to zero. However, when equal activity is expressed in terms of ln xi and ln γi , by taking the logarithm of both sides of equation (5), it becomes clear that the residuals for solution 3 are not really zero.
4.3.3 1,4Dioxane and 1,2,3Propanetriol Consider a mixture of 1, 4dioxane (component 1) and 1, 2, 3propanetriol at T = 298 K and atmospheric pressure. Liquid–liquid phase equilibrium is observed experimentally with phase compositions x1I = 0.2078 and x1I I = 0.9934. Mattelin and Verhoeye [72] modeled this system using NRTL with various values of α. We will focus on the case of α = 0.15. They obtained six solutions for the binary parameters, which are reported graphically without giving exact numerical values. Applying the interval method, with the same initial search interval as given above, we find only four solutions, as shown in Table 4.2 in terms of τ12 and τ21 . The extra solutions found by Mattelin and Verhoeye [72] are well within the search space
Table 4.2 NRTL parameter estimates for the mixture 1, 4dioxane and 1, 2, 3propanetriol (α = 0.15 and T = 298.15 K)a,b Solution
Interval method τ12
1 2 3 4
5.6379 13.478 38.642 39.840 a b
τ21 −0.59940 −82.941 13.554 3.0285
Estimates are found using an interval method. Mattelin and Verhoeye [72] reported finding six solutions.
88
Handbook of Granular Computing
used by the interval method. Again, it appears that numerical difficulties in the use of local methods has led to spurious solutions.
4.3.4 Remarks In this section, we have seen a small nonlinear equation system that in some cases is numerically difficult to solve using standard local methods, as evident from the reporting of spurious roots in the literature. Using an intervalNewton method, tight enclosures of all roots in the initial search space could be found very easily and efficiently, with computational times on the order of seconds (3.2GHz Intel Pentium 4).
4.4 TransitionState Analysis In molecular modeling, the search for features on a potential energy hypersurface is often required and is a very challenging computational problem. In some cases, finding a global minimum is required, but the existence of a very large number of local minima, the number of which may increase exponentially with the size of a molecule or the number of molecules, makes the problem extremely difficult. Interval methods can play a role in solving these problems [73, 74], but are limited in practice to problems of relatively low dimension. In other problems in computational chemistry, it is desired to find all stationary points. Interval methods for equation solving have been applied to one such problem, involving the use of lattice density functional theory to model adsorption in a nanoscale pore, by Maier and Stadtherr [75]. Another such problem is transitionstate analysis, as summarized below, and is described in more detail by Lin and Stadtherr [76]. Transitionstate theory is a wellestablished method which, by providing an approach for computing the kinetics of infrequent events, is useful in the study of numerous physical systems. Of particular interest here is the problem of computing the diffusivity of a sorbate molecule in a zeolite. This can be done using transitionstate analysis, as described by June et al. [77]. It is assumed that diffusive motion of the sorbate molecules through the zeolite occurs by a series of uncorrelated hops between potential energy minima in the zeolite lattice. A sorption state or site is constructed around each minimum of the potential energy hypersurface. Any such pair of sites i and j is then assumed to be separated by a dividing surface on which a saddle point of the potential energy hypersurface is located. The saddle point can be viewed as the transition state between sites, and a pair of steepest decent paths from the saddle point connects the minima associated with the i and j sites. Obviously, in this application, and in other applications of transitionstate theory, finding all local minima and saddle points of the potential energy surface, V, is critical. We show here, using a sorbate–zeolite system, the use of an intervalNewton method, as outlined at the end of Section 4.2, to find all stationary points of a potential energy surface. Stationary points satisfy the condition g = ∇V = 0; that is, at a stationary point, the gradient of the potential energy surface is zero. Using the eigenvalues of H = ∇ 2 V, the Hessian of the potential energy surface, stationary points can be classified into local minima, local maxima, and saddle points (of order determined by the number of negative eigenvalues). There are a number of methods for locating stationary points. A Newton or quasiNewton method, applied to solve the nonlinear equation system ∇V = 0, yields a solution whenever the initial guess is sufficiently close to a stationary point. This method can be used in an exhaustive search, using many different initial guesses, to locate stationary points. The set of initial guesses to use might be determined by the user (intuitively or arbitrarily) or by some type of stochastic multistart approach. Another popular approach is the use of eigenmodefollowing methods, as done, e.g., by Tsai and Jordan [78]. These methods can be regarded as variations of Newton’s method. In an eigenmodefollowing algorithm, the Newton step is modified by shifting some of the eigenvalues of the Hessian (from positive to negative or vice versa). By selection of the shift parameters, one can effectively find the desired type of stationary points, e.g., minima and firstorder saddles. There are also a number of other approaches, many involving some stochastic component, for finding stationary points. In the context of sorbate–zeolite systems, June et al. [77] use an approach in which minima and saddle points are located separately. A threestep process is employed in an exhaustive search for minima. First, the volume of the search space (one asymmetric unit) is discretized by a grid with a spacing of ˚ and the potential and gradient vector are tabulated on the grid. Second, each cube approximately 0.2 A,
Interval Methods for NonLinear Equation Solving Applications
89
formed by a set of nearestneighbor grid nodes is scanned and the three components of the gradient vector on the eight vertices of the cube are checked for changes in sign. Finally, if all three components are found to change sign on two or more vertices of the cube, a BroydenFletcherGoldfarbShanno (BFGS) quasiNewton minimization search algorithm is initiated to locate a local minimum, using the coordinates of the center of the cube as the initial guess. Two different algorithms are tried for determining the location of saddle points. One searches for global minimizers in the function gT g, i.e., the sum of the squares of the components of the gradient vector. The other algorithm, due to Baker [79], searches for saddle points directly from an initial point by maximizing the potential energy along the eigenvector direction associated with the smallest eigenvalue and by minimizing along directions associated with all other eigenvalues of the Hessian. All the methods discussed above have a major shortcoming. They provide no guarantee that all local minima and saddle points of interest will actually be found. One approach to resolving this difficulty is given by Westerberg and Floudas [80], who transform the equationsolving problem ∇V = 0 into an equivalent optimization problem that has global minimizers corresponding to the solutions of the equation system (i.e., the stationary points of V). A deterministic global optimization algorithm, based on a branchandbound strategy with convex underestimators, is then used to find these global minimizers. Whether all stationary points are actually found depends on proper choice of a parameter (α) used in obtaining the convex underestimators, and Westerberg and Floudas do not use a method that guarantees a proper choice. However, there do exist techniques [81, 82], based on an interval representation of the Hessian, that in principle could be used to guarantee a proper value of α, though likely at considerable computational expense. We demonstrate here an approach in which interval analysis is applied directly to the solution of ∇V = 0 using an intervalNewton methodology. This provides a mathematical and computational guarantee that all stationary points of the potential energy surface are found (or, more precisely, enclosed within an arbitrarily small interval).
4.4.1 Problem Formulation Zeolites are materials in which AlO4 and SiO4 tetrahedra are the building blocks of a variety of complex porous structures characterized by interconnected cavities and channels of molecular dimensions [83]. Silicalite contains no aluminum and thus no cations. This has made it a common and convenient choice as a model zeolite system. The crystal structure of silicalite, well known from Xray diffraction studies [84], forms a threedimensional interconnected pore network through which a sorbate molecule can diffuse. In this work, the phase with orthorhombic symmetry is considered, and a rigid lattice model, in which all silicon and oxygen atoms in the zeolite framework are occupying fixed positions and there is perfect crystallinity, is assumed. One spherical sorbate molecule (united atom) will be placed in the lattice, corresponding to infinitely dilute diffusion. The system comprises 27 unit cells, each of which is ˚ with 96 silicon atoms and 192 oxygen atoms. 20.07 × 19.92 × 13.42 A All interactions between the sorbate and the oxygen atoms of the lattice are treated atomistically with a truncated Lennard–Jones 6–12 potential. That is, for the interaction between the sorbate and oxygen atom i, the potential is given by ⎧ a b ⎪ ⎨ 12 − 6 ri < rcut ri Vi = ri ⎪ ⎩ 0 ri ≥ rcut ,
(8)
where a is a repulsion parameter, b is an attraction parameter, rcut is the cutoff distance, and ri is the distance between the sorbate and oxygen atom i. This distance is given by ri2 = (x − xi )2 + (y − yi )2 + (z − z i )2 ,
(9)
where (x, y, z) are the Cartesian coordinates of the sorbate, and (xi , yi , z i ), i = 1, . . . , N , are the Cartesian coordinates of the N oxygen atoms. The silicon atoms, being recessed within the SiO4 tetrahedra, are
90
Handbook of Granular Computing
neglected in the potential function. Therefore, the total potential energy, V, of a single sorbate molecule in the absence of neighboring sorbate molecules is represented by a sum over all lattice oxygens; V=
N
Vi .
(10)
i=1
The intervalNewton approach is applied to determine the sorbate locations (x, y, z) that are stationary points on the potential energy surface V given by equation (10), i.e., to solve the nonlinear equation system ∇V = 0. To achieve tighter interval extensions of the potential function and its derivatives, and thus improve the performance of the interval–Newton method, the mathematical properties of the LennardJones potential and its first and secondorder derivatives can be exploited, as described in detail by Lin and Stadtherr [76].
4.4.2 Results and Discussion Due to the orthorhombic symmetry of the silicalite lattice, the search space for stationary points is ˚ which is oneeighth of a unit cell. only one asymmetric unit, [0, 10.035] × [0, 4.98] × [0, 13.42] A, ˚ This defines the initial interval for the intervalNewton method, namely X (0) = [0, 10.035] A, (0) (0) ˚ ˚ Y = [0, 4.98] A, and Z = [0, 13.42] A. Following June et al. [77], stationary points with extremely high potential, such as V > 0, will not be sought. To do this, we calculate the interval extension of V over the interval currently being tested. If its lower bound is greater than zero, the current interval is discarded. Using the intervalNewton method, with the linear programming strategy of Lin and Stadtherr [42], a total of 15 stationary points were found in a computation time of 724 s (1.7GHz Intel Xeon). The locations of the stationary points, their energy values, and their types are provided in Table 4.3. Five local minima were found, along with eight firstorder saddle points and two secondorder saddle points. June et al. [77] report the same five local minima, as well as nine of the ten saddle points. They do not report finding the lower energy secondorder saddle point (saddle point #14 in Table 4.3. The secondorder saddle point #14, not reported by June et al. [77], is very close to the firstorder saddle point #13 and is slightly lower in energy. Apparently, neither of the two methods tried by June et al. [77] was able to locate this point. The first method they tried uses the same gridbased optimization Table 4.3 No.
Stationary points of the potential energy surface of xenon in silicalite Type
Energy (kcal/mol)
˚ x (A)
˚ y (A)
˚ z (A)
1 2 3 4 5
Minimum Minimum Minimum Minimum Minimum
−5.9560 −5.8763 −5.8422 −5.7455 −5.1109
3.9956 0.3613 5.8529 1.4356 0.4642
4.9800 0.9260 4.9800 4.9800 4.9800
12.1340 6.1112 10.8790 11.5540 6.0635
6 7 8 9 10 11 12 13
First order First order First order First order First order First order First order First order
−5.7738 −5.6955 −5.6060 −4.7494 −4.3057 −4.2380 −4.2261 −4.1405
5.0486 0.0000 2.3433 0.1454 9.2165 0.0477 8.6361 0.5925
4.9800 0.0000 4.9800 3.7957 4.9800 3.9147 4.9800 4.9800
11.3210 6.7100 11.4980 6.4452 11.0110 8.3865 12.8560 8.0122
14 15
Second order Second order
−4.1404 −4.1027
0.5883 9.1881
4.8777 4.1629
8.0138 11.8720
Interval Methods for NonLinear Equation Solving Applications
91
scheme used to locate local minima in V, but instead applied to minimize gT g. However, stationary points ˚ apart, while the grid spacing they used was approximately 0.2A. ˚ #13 and #14 are approximately 0.1A This illustrates the danger in using gridbased schemes for finding all solutions to a problem. By using the interval methods described here, one never needs to be concerned about whether a grid spacing is fine enough to find all solutions. The second method they tried was Baker’s algorithm [79], as described briefly above, but it is unclear how they initialized the algorithm. A key advantage of the interval method is that no point initialization is required. Only an initial interval must be supplied, here corresponding to one asymmetric unit, and this is determined by the geometry of the zeolite lattice. Thus, in this context, the interval method is initialization independent.
4.4.3 Remarks Lin and Stadtherr [42] have also studied two other sorbate–zeolite systems and used the interval method to find all stationary points on the potential energy surfaces. While we have concentrated here on problems involving transitionstate analysis of diffusion in zeolites, we anticipate that the method will be useful in many other types of problems in which transitionstate theory is applied.
4.5 Food Web Models Ecological models, including models of food webs, are being increasingly used as aids in the management and assessment of ecological risks. As a first step in using a food web model, an understanding is needed of the predicted equilibrium states (steady states) and their stability. To determine the equilibrium states, a system of nonlinear equations must be solved, with the number of solutions often not known a priori. Finding bifurcations of equilibria (parameter values at which the number of equilibrium states or their stability changes) is another problem of interest, which can also be formulated as a nonlinear equation solving problem. For both these problems, continuation methods are typically used, but are initialization dependent and provide no guarantees that all solutions will be found. Gwaltney et al. [85] and Gwaltney and Stadtherr [86] have demonstrated the use of an intervalNewton method to find equilibrium states and their bifurcations for some simple food chain models. Interval methods have also been successfully applied to the problem of locating equilibrium states and singularities in traditional chemical engineering problems, such as reaction and reactive distillation systems [87–90]. We will consider here a sevenspecies food web and use an intervalNewton approach, as outlined at the end of Section 4.2, to solve for all steady states predicted by the model.
4.5.1 Problem Formulation The sevenspecies food web is shown schematically in Figure 4.1. It involves two producers (species 1 and 2) and five consumers (species 3–7). The producers are assumed to grow logistically, while the consumers obey predator response functions that will be specified below. 6
7
5
3
1
Figure 4.1
4
2
Diagram illustrating the predation relationships in the sevenspecies food web model
92
Handbook of Granular Computing The model equations (balance equations) are, for i = 1, . . . , 7, 7 dm i f i (m) = = m i gi (m) = m i ri + ai j pi j (m) . dt j=1
(11)
Here the variables are the species biomasses m i , i = 1, . . . , 7, which are the components of the biomass vector m. The constants ai j represent combinations of different model parameters, and also indicate the structure of the food web. The constants ri consist of intrinsic growth and death rate parameters. The functions pi j (m) are determined by the choice of predator response function for the predator–prey interaction involving species i and j. For the 1–3 interaction, we assume a hyperbolic response function (Holling type II). This leads to p13 (m) = m 3 /(m 1 + B13 ) and p31 (m) = m 1 /(m 1 + B13 ), where B13 is the halfsaturation constant for consumption of species 1 by species 3. For all other interactions, we assume a linear response function (Lotka–Volterra), giving pi j (m) = m j . Values of all constants in the model are given by Gwaltney [91]. To determine the equilibrium states predicted by this model, solution of the nonlinear equation system f i (m) = m i gi (m) = 0,
i = 1, . . . , 7,
(12)
is required.
4.5.2 Results and Discussion There are two basic strategies for solving the equation system. In the simultaneous strategy, we simply solve equation (12) directly as a system of seven equations in seven variables. In the sequential strategy, a sequence of smaller problems is solved, one for each feasible zero–nonzero state. A set of feasible zero– nonzero states can be constructed from the structure of the food web. For example, the state [1030060] (indicating that species 1, 3, and 6 have nonzero biomasses and that species 2, 4, 5, and 7 are absent) is feasible. However, the state [1204067] is not feasible, since in the absence of species 3 and 5 species 6 cannot exist. For a relatively small food web, it is not difficult to construct the set of feasible zero–nonzero states. However, for large food webs this is nontrivial, as the number of such states can become very large. For the sevenspecies web of interest here, there are 55 feasible zero–nonzero states. For each zero–nonzero state, an equation system is formulated to solve for the corresponding steady states. For example, for the [1030060] state, m 1 = 0; thus, it is required that g1 = 0. Similarly, g3 = 0 and g6 = 0. This provides three equations in the three nonzero variables m 1 , m 3 , and m 6 . The remaining components of equation (12) are satisfied because m 2 = m 4 = m 5 = m 7 = 0. An intervalNewton approach was used to solve the nonlinear equation system (12) in connection with both the simultaneous and sequential approaches. This was done for several different values of the model parameter K 2 , the carrying capacity for producer species 2. A partial set of results (m 1 and m 2 only) is shown in Figure 4.2. It is clear that for a particular value of K 2 , there are often several steady states. When nonlinear predator response functions are used, the number of steady states is also unknown a priori. The interval method provides a means to guarantee that all steadystate solutions will be found. When the simultaneous approach was used, and a single 7 × 7 equation system solved, the CPU time required for each value of K 2 averaged about 60 s (3.2GHz Pentium 4). When the sequential approach was used, and a sequence of many smaller systems solved, the CPU time required for each value of K 2 averaged about 0.02 s. Clearly, it is much more effective to use a sequential strategy. For further discussion of this problem and an interpretation of the results, see Gwaltney [91].
4.5.3 Remarks In computing the equilibrium states in nonlinear food web models, it is possible to have a very large number of solutions. For example, Gwaltney [91] also considered a food web with 12 species and explicit resource dynamics (4 nutrients). For some sets of parameter values, well over 300 steadystate solutions
93
Interval Methods for NonLinear Equation Solving Applications
8000
8000
m2
Species biomass (mi)
m1 6000
6000
4000
4000
2000
2000
0 0
1000
2000
0 3000 4000 5000 2000 0 1000 Producer species 2 carrying capacity (K2)
3000
4000
5000
Figure 4.2 Solution branch diagrams illustrating the change in the steadystate biomass values of species 1 (m 1 ) and species 2 (m 2 ) with change in species 2 carrying capacity (K 2 ) for the sevenspecies food web model. Black lines indicate stable equilibria. Gray lines indicate unstable equilibria were found by using the sequential approach with an intervalNewton method. In cases for which a large number of solutions is possible, and the number of solutions is not known, the use of interval methods for nonlinear equation solving is an attractive approach for ensuring that no solutions will be missed.
4.6 Concluding Remarks In the examples presented here, we have shown that an interval method for nonlinear equation solving, in particular an approach incorporating the intervalNewton method, is a powerful approach for the solution of systems of nonlinear equation systems. The method provides a mathematical and computational guarantee that all solutions within a specified initial interval are enclosed. Continuing improvements in solution methods, together with advances in software and hardware for the use of intervals, will make this an increasingly attractive problemsolving tool. The verification provided by the interval approach comes at the expense of additional computation time. Essentially one has a choice between fast methods that may give an incorrect or incomplete answer, or a slower method that is guaranteed to give the correct results. Thus, a modeler may need to consider the tradeoff between the additional computing time and the risk of getting the wrong answer to a problem. Certainly, for ‘mission critical’ situations, the additional computing expense is well spent.
Acknowledgments This work was supported in part by the Department of Education Graduate Assistance in Areas of National Needs (GAANN) Program under Grant #P200A010448, by the donors of the Petroleum Research Fund, administered by the ACS, under Grant 35979AC9, by the State of Indiana 21st Century Research and Technology Fund under Grant #909010455, and by the National Oceanic and Atmospheric Administration under Grant #NA050AR4601153.
References [1] A. Bargiela and W. Pedrycz. Granular Computing: An Introduction. Kluwer Academic Publishers, Norwell, MA, 2003. [2] R.E. Moore. Interval Analysis. Prentice Hall, Englewood Cliffs, NJ, 1966.
94
[3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]
[23] [24] [25] [26] [27]
[28] [29] [30]
[31] [32] [33] [34] [35] [36] [37] [38]
Handbook of Granular Computing
E.R. Hansen and G.W. Walster. Global Optimization Using Interval Analysis. Marcel Dekker, New York, 2004. J.D. Pryce and G.F. Corliss. Interval arithmetic with containment sets. Computing 78 (2006) 251–276. ´ Walter. Applied Interval Analysis. SpringerVerlag, London, 2001. L. Jaulin, M. Kieffer, O. Didrit, and E. R.B. Kearfott. Interval computations: introduction, uses, and resources. Euromath. Bull. 2 (1996) 95–112. R.B. Kearfott. Rigorous Global Search: Continuous Problems. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996. A. Neumaier. Interval Methods for Systems of Equations. Cambridge University Press, Cambridge, UK, 1990. R.B. Kearfott, M. Dawande, K. Du, and C. Hu. INTLIB: a portable Fortran77 elementary function library. Interval Comput. 3 (1992) 96–105. R.B. Kearfott, M. Dawande, K. Du, and C. Hu. Algorithm 737: INTLIB: a portable Fortran77 elementary function library. ACM Trans. Math. Softw. 20 (1994) 447–459. R.B. Kearfott. Algorithm 763; INTERVAL ARITHMETIC: a Fortran90 module for an interval data type. ACM Trans. Math. Softw. 22 (1996) 385–392. PROFIL/BIAS, http://www.ti3.tuharburg.de/Software/PROFILEnglisch.html, accessed January 15, 2008. M. Lerch, G. Tischler, J. Wolff von Gudenberg, W. Hofschuster, and W. Kr¨amer. FILIB++, a fast interval library supporting containment computations. ACM Trans. Math. Softw. 32 (2006) 299–324. S.M. Rump. INTLAB – INTerval LABoratory. In: T. Csendes (ed), Developments in Reliable Computing. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999, pp. 77–104. H. Ratschek and J. Rokne. Computer Methods for the Range of Functions. Ellis Horwood, Chichester, UK, 1984. E. Hansen. Sharpening interval computations. Reliab. Comput. 12 (2006) 21–34. V.M. Nesterov. How to use monotonicitytype information to get better estimates of the range of realvalued functions. Interval Comput. 4 (1993) 3–12. K. Makino and M. Berz. Efficient control of the dependency problem based on Taylor model methods. Reliab. Comput. 5 (1999) 3–12. A. Neumaier. Taylor forms – use and limits. Reliab. Comput. 9 (2002) 43–79. K. Yamamura. Finding all solutions of nonlinear equations using linear combinations of functions. Reliab. Comput. 6 (2000) 105–113. R.B. Kearfott. Validated constraint solving – practicalities, pitfalls, and new developments. Reliab. Comput. 11 (2005) 383–391. S. Herbort and D. Ratz. Improving the Efficiency of a NonlinearSystemSolver Using a Componentwise Newton Method. Bericht 2/1997. Institut f¨ur Angewandte Mathematik, Universit¨at Karlsruhe (TH), Karlsruhe, Germany, 1997. L. Granvilliers. On the combination of interval constraint solvers. Reliab. Comput. 7 (2001) 467–483. L. Granvilliers and F. Benhamou. Algorithm 852: RealPaver: an interval solver using constraint satisfaction techniques. ACM. Trans. Math. Softw. 32 (2006) 138–156. P. van Hentenryck, L. Michel, and Y. Deville. Numerica: A Modeling Language for Global Optimization. The MIT Press, Cambridge, MA, 1997. ICOS, http://ylebbah.googlepages.com/icos, accessed February 26, 2008. G. Alefeld. Interval arithmetic tools for range approximation and inclusion of zeros. In: H. Bulgak and C. Zenger (eds), Error Control and Adaptivity in Scientific Computing. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999, pp. 1–21. R.E. Moore. A test for existence of solutions to nonlinear systems. SIAM J. Numer. Anal. 14 (1977) 611–615. R.E. Moore. Methods and Applications of Interval Analysis. SIAM, Philadelphia, 1979. S.P. Shary. Krawczyk operator revised. In: Proceedings of International Conference on Computational Mathematics ICCM2004, Novosibirsk, Russia, June 21–25, 2004, Institute of Computationals Mathematic and Mathametical Geophysics (ICM&MG), 2004. L. Simcik and P. Linz. Boundarybased interval Newton’s method. Interval Comput. 4 (1993) 89–99. K. Min, L. Qi, and S. Zuhe. On the componentwise KrawczykMoore iteration. Reliab. Comput. 5 (1999) 359–370. G. Alefeld and J. Herzberger. Introduction to Interval Computations. Academic Press, New York, 1983. N.S. Dimitrova and S.M. Markov. On validated Newton type method for nonlinear equations. Interval Comput. 1994(2) (1994) 27–51. E. Hansen and S. Sengupta. Bounding solutions of systems of equations using intervalanalysis. BIT 21 (1981) 203–211. E.R. Hansen and R.I. Greenberg. An interval Newton method. Appl. Math. Comput. 12 (1983) 89–98. R.B. Kearfott. Preconditioners for the interval GaussSeidel method. SIAM J. Numer. Anal. 27 (1990) 804–822. R.B. Kearfott, C. Hu, and M. Novoa, III. A review of preconditioners for the interval GaussSeidel method. Interval Comput. (1) (1991) 59–85.
Interval Methods for NonLinear Equation Solving Applications
95
[39] C.Y. Gau and M.A. Stadtherr. New interval methodologies for reliable chemical process modeling. Comput. Chem. Eng. 26 (2002) 827–840. [40] R.B. Kearfott and M. Novoa III. Algorithm 681: INTBIS, a portable interval Newton/bisection package. ACM Trans. Math. Softw. 16 (1990) 152–157. [41] Y. Lin and M.A. Stadtherr. Advances in interval methods for deterministic global optimization in chemical engineering. J. Glob. Optim. 29 (2004) 281–296. [42] Y. Lin and M.A. Stadtherr. LP strategy for the intervalNewton method in deterministic global optimization. Ind. Eng. Chem. Res. 43 (2004) 3741–2749. [43] S.M. Rump. Verification methods for dense and sparse systems of equations. In: J. Herzberger (ed), Topics in Validated Computations – Studies in Computational Mathematics. Elsevier, Amsterdam, The Netherlands, 1994, pp. 63–135. [44] Y.G. Dolgov. Developing interval global optimization algorithms on the basis of branchandbound and constraint propagation methods. Reliab. Comput. 11 (2005) 343–358. [45] L.V. Kolev. A new method for global solution of systems of nonlinear equations. Reliab. Comput. 4 (1998) 125–146. [46] L.V. Kolev. An improved method for global solution of nonlinear systems. Reliab. Comput. 5 (1999) 103–111. [47] C.Y. Gau and M.A. Stadtherr. Dynamic load balancing for parallel intervalNewton using message passing. Comput. Chem. Eng. 26 (2002) 811–825. [48] V. Kreinovich and A. Bernat. Parallel algorithms for interval computations: An introduction. Interval Comput. 3 (1994) 6–62. [49] C.A. Schnepper and M.A. Stadtherr. Application of a parallel interval Newton/generalized bisection algorithm to equationbased chemical process flowsheeting. Interval Comput. 4 (1993) 40–64. [50] G.I. BurgosSol´orzano, J.F. Brennecke, and M.A. Stadtherr. Validated computing approach for highpressure chemical and multiphase equilibrium. Fluid Phase Equilib. 219 (2004) 245–255. [51] G. Xu, W.D. Haynes, and M.A. Stadtherr. Reliable phase stability analysis for asymmetric models. Fluid Phase Equilib. 235 (2005) 152–165. [52] C.Y. Gau, J.F. Brennecke, and M.A. Stadtherr. Reliable nonlinear parameter estimation in VLE modeling. Fluid Phase Equilib. 168 (2000) 1–18. [53] C.Y. Gau and M.A. Stadtherr. Reliable nonlinear parameter estimation using interval analysis: errorinvariable approach. Comput. Chem. Eng. 24 (2000) 631–637. [54] J.Z. Hua, J.F. Brennecke, and M.A. Stadtherr. Reliable phase stability analysis for cubic equation of state models. Comput. Chem. Eng. 20 (1996) S395–S400. [55] J.Z. Hua, J.F. Brennecke, and M.A. Stadtherr. Reliable prediction of phase stability using an intervalNewton method. Fluid Phase Equilib. 116 (1996) 52–59. [56] J.Z. Hua, J.F. Brennecke, and M.A. Stadtherr. Enhanced interval analysis for phase stability: cubic equation of state models. Ind. Eng. Chem. Res. 37 (1998) pp. 1519–1527. [57] J.Z. Hua, J.F. Brennecke, and M.A. Stadtherr. Reliable computation of phase stability using interval analysis: Cubic equation of state models. Comput. Chem. Eng. 22 (1998) 1207–1214. [58] J.Z. Hua, R.W. Maier, S.R. Tessier, J.F. Brennecke, and M.A. Stadtherr. Interval analysis for thermodynamic calculations in process design: a novel and completely reliable approach. Fluid Phase Equilib. 158 (1999) 607–615. [59] R.W. Maier, J.F. Brennecke, and M.A. Stadtherr. Reliable computation of homogeneous azeotropes. AIChE J. 44 (1998) 1745–1755. [60] R.W. Maier, J.F. Brennecke, and M.A. Stadtherr. Reliable computation of reactive azeotropes. Comput. Chem. Eng. 24 (2000) 1851–1858. [61] K.I.M. McKinnon, C.G. Millar, and M. Mongeau. Global optimization for the chemical and phase equilibrium problem using interval analysis. In: C.A. Floudas and P.M. Pardalos (eds), State of the Art in Global Optimization Computational Methods and Applications. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996. [62] A.M. Scurto, G. Xu, J.F. Brennecke, and M.A. Stadtherr. Phase behavior and reliable computation of highpressure solidfluid equilibrium with cosolvents. Ind. Eng. Chem. Res. 42 (2003) 6464–6475. [63] M.A. Stadtherr, C.A. Schnepper, and J.F. Brennecke. Robust phase stability analysis using interval methods. AIChE Symp. Ser. 91(304) (1995) 356–359. [64] B.A. Stradi, J.F. Brennecke, J.P. Kohn, and M.A. Stadtherr. Reliable computation of mixture critical points. AIChE J. 47 (2001) 212–221. [65] S.R. Tessier, J.F. Brennecke, and M.A. Stadtherr. Reliable phase stability analysis for excess Gibbs energy models. Chem. Eng. Sci. 55 (2000) 1785–1796. [66] G. Xu, J.F. Brennecke, and M.A. Stadtherr. Reliable computation of phase stability and equilibrium from the SAFT equation of state. Ind. Eng. Chem. Res. 41 (2002) 938–952.
96
Handbook of Granular Computing
[67] G. Xu, A.M. Scurto, M. Castier, J.F. Brennecke, and M.A. Stadtherr. Reliable computational of highpressure solidfluid equilibrium. Ind. Eng. Chem. Res. 39 (2000) 1624–1636. [68] H. Renon and J.M. Prausnitz. Local compositions in thermodynamic excess functions for liquid mixtures. AIChE J. 14 (1968) 135–144. [69] J.M. Sørensen and W. Arlt. LiquidLiquid Equilibrium Data Collection. Chemistry Data Series, Vol. V, Parts 1–3. DECHEMA, Frankfurt/Main, Germany, 1979–1980. [70] J. Jacq and L. Asselineau. Binary liquidliquid equilibria. Multiple solutions for the NRTL equation. Fluid Phase Equilib. 14 (1983) 185–192. [71] R.A. Heidemann and J.M. Mandhane. Some properties of the NRTL equation in correlating liquidliquid equilibrium data. Chem. Eng. Sci. 28 (1973) 1213–1221. [72] A.C. Mattelin and L.A.J. Verhoeye. The correlation of binary miscibility data by means of the NRTL equation. Chem. Eng. Sci. 30 (1975) 193–200. [73] C. Lavor. A deterministic approach for global minimization of molecular potential energy functions. Int. J. Quantum Chem. 95 (2003) 336–343. [74] Y. Lin and M.A. Stadtherr. Deterministic global optimization of molecular structures using interval analysis. J. Comput. Chem. 26 (2005) 1413–1420. [75] R.W. Maier and M.A. Stadtherr. Reliable densityfunctionaltheory calculations of adsorption in nanoscale pores. AIChE J. 47 (2001) 1874–1884. [76] Y. Lin and M.A. Stadtherr. Locating stationary points of sorbatezeolite potential energy surfaces using interval analysis. J. Chem. Phys. 121 (2004) 10159–10166. [77] R.L. June, A.T. Bell, and D.N. Theodorou. Transitionstate studies of xenon and SF6 diffusion in silicalite. J. Phys. Chem. 95 (1991) 8866–8878. [78] C.J. Tsai and K.D. Jordan. Use of an eigenmode method to locate the stationarypoints on the potentialenergy surfaces of selected argon and water clusters. J. Phys. Chem. 97 (1993) 11227–11237. [79] J. Baker. An algorithm for the location of transitionstates. J Comput. Chem. 7 (1986) 385–395. [80] K.M. Westerberg and C.A. Floudas. Locating all transition states and studying the reaction pathways of potential energy surfaces. J. Chem. Phys. 110 (1999) 9259–9295. [81] C.S. Adjiman, I.P. Androulakis, and C.A. Floudas. A global optimization method, αBB, for general twicedifferentiable constrained NLPs – II. Implementation and computational results. Comput. Chem. Eng. 22 (1998) 1159–1179. [82] C.S. Adjiman, S. Dallwig, C.A. Floudas, and A. Neumaier. A global optimization method, αBB, for general twicedifferentiable constrained NLPs – I. Theoretical advances. Comput. Chem. Eng. 22 (1998) 1137–1158. [83] J. Karger and D.M. Ruthven. Diffusion in Zeolites and Other Microporous Solids. Wiley, New York, 1992. [84] D.H. Olson, G.T. Kokotailo, S.L. Lawton, and W.M. Meier. Crystal structure and structurerelated properties of ZSM5. J. Phys. Chem. 85 (1981) 2238–2243. [85] C.R. Gwaltney, M.P. Styczynski, and M.A. Stadtherr. Reliable computation of equilibrium states and bifurcations in food chain models. Comput. Chem. Eng. 28 (2004) 1981–1996. [86] C.R. Gwaltney and M.A. Stadtherr. Reliable computation of equilibrium states and bifurcations in nonlinear dynamics. Lect. Notes Comput. Sci. 3732 (2006) 122–131. [87] C.H. Bischof, B. Lang, W. Marquardt, and M. M¨onnigmann. Verified determination of singularities in chemical processes. Presented at SCAN 2000, 9th GAMMIMACS International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics, Karlsruhe, Germany, September 18–22, 2000. [88] V. Gehrke and W. Marquardt. A singularity theory approach to the study of reactive distillation. Comput. Chem. Eng. 21 (1997) S1001–S1006. [89] M. M¨onnigmann and W. Marquardt. Normal vectors on manifolds of critical points for parametric robustness of equilibrium solutions of ODE systems. J. Nonlinear Sci. 12 (2002) 85–112. [90] C.A. Schnepper and M.A. Stadtherr. Robust process simulation using interval methods. Comput. Chem. Eng. 20 (1996) 187–199. [91] C.R. Gwaltney. Reliable Location of Equlibrium States and Bifurcations in Nonlinear Dynamical Systems with Applications in Food Web Modeling and Chemical Engineering. Ph.D. Thesis. University of Notre Dame, Notre Dame, IN, 2006.
5 Fuzzy Sets as a UserCentric Processing Framework of Granular Computing Witold Pedrycz
5.1 Introduction This chapter serves as a general introduction to fuzzy sets being regarded as one of the key technologies of granular computing. Fuzzy sets are information granules modeled by the underlying concept of partial membership. Partial membership is crucial to a variety of everyday phenomena. Linguistic concepts are inherently nonbinary. In this way fuzzy sets provide a badly needed formalism rooted in manyvalued logic. The material is organized in the following way. We start with some general observations (Section 5.2) by highlighting the origin of fuzzy sets, linking them to the concept of dichotomy, and underlying a central role of fuzzy sets in system modeling. In Section 5.3, we offer some generic descriptors of fuzzy sets (membership functions). Section 5.4 is concerned with the notion of granulation of information, where we elaborate on linkages between various formalisms being used. Characterizations of families of fuzzy sets are presented in Section 5.5. Next, in Section 5.6, we elaborate on some selected, yet highly representative, methods of membership function estimation by distinguishing between expertdriven and datadriven estimation methods. Granular modeling is presented in Section 5.7. Concluding comments are covered in Section 5.8.
5.2 General Observations The concept of dichotomy becomes profoundly imprinted into our education, philosophy, and many branches of science, management, and engineering. While the formalism and vocabulary of Boolean concepts being effective in handling various discrimination processes involving binary quantification (yes–no, true–false) has been with us from the very beginning of our early education, in many cases it becomes evident that this limited, twovalued view at world could be overly simplified and in many circumstances may lack required rapport with the reality. In real world, there is nothing like black– white, good–bad, etc. All of us recognize that the notion of dichotomy is quite simple and does not look realistic. Concepts do not possess sharp boundaries. Definitions are not binary unless they tackle very Handbook of Granular Computing C 2008 John Wiley & Sons, Ltd
Edited by Witold Pedrycz, Andrzej Skowron and Vladik Kreinovich
98
Handbook of Granular Computing
simple concepts (say odd–even numbers). Let us allude here to the observation being made by B. Russell (1923): [T]he law of excluded middle is true when precise symbols are employed, but it is not true when symbols are vague, as, in fact, all symbols are. In reality, we use terms whose complexity is far higher and which depart from the principle of dichotomy. Consider the notions used in everyday life such as warm weather, low inflation, and long delay. How could you define them if you were to draw a single line? Is 25◦ C warm? Is 24.9◦ C warm? Or is 24.95◦ C warm as well? Likewise in any image could you draw a single line to discriminate between objects such as sky, land, trees, and lake. Experimental data do not come in wellformed and distinct clusters; there are always some points in between welldelineated groups. One might argue that those are concepts that are used in everyday language and therefore they need not possess any substantial level of formalism. Yet, one has to admit that the concepts that do not adhere to the principle of dichotomy are also visible in science, mathematics, and engineering. For instance, we often carry out a linear approximation of nonlinear function and make a quantifying statement that such linearization is valid in some small neighborhood of the linearization point. Under these circumstances the principle of dichotomy does not offer too much. The principle of dichotomy, or as we say an Aristotelian perspective at the description of the world, has been subject to a continuous challenge predominantly from the standpoint of philosophy and logic. Let us recall some of the most notable developments which have led to the revolutionary paradigm shift. Indisputably, the concept of a threevalued and multivalued logic put forward by Jan Lukasiewicz [1–3] and then pursued by others including Emil Post is one of the earliest and the most prominent logical attempts made toward the direction of abandoning the supremacy of the principle of dichotomy. As noted by Lukasiewicz, the question of to the suitability or relevance of twovalued logic in evaluating the truth of propositions was posed in the context of those statements that allude to the future. ‘Tomorrow will rain,’ is this statement true? If we can answer this question, this means that we have already predetermined the future. We start to sense that this twovalued model, no matter how convincing it could be, is conceptually limited if not wrong. The nonAristotelian view of the world was vividly promoted by Alfred Korzybski [4]. While the concept of the threevalued logic was revolutionary in 1920s, we somewhat quietly endorsed it over the passage of time. For instance, in database engineering, a certain entry may be two valued (yes– no) but the third option of ‘unknown’ is equally possible – here we simply indicate that no value of this entry has been provided. In light of these examples, it becomes apparent that we need a suitable formalism to cope with these phenomena. Fuzzy sets offer an important and unique feature of describing information granules whose contributing elements may belong with varying degrees of membership (belongingness). This helps us describe the concepts that are commonly encountered in real word. The notions such as low income, high inflation, small approximation error, and many others are examples of concepts to which the yes–no quantification does not apply or becomes quite artificial and restrictive. We are cognizant that there is no way of quantifying the Boolean boundaries, as there are a lot of elements whose membership to the concept is only partial and quite different from 0 and 1. The binary view of the world supported by set theory and twovalued logic has been vigorously challenged by philosophy and logic. The revolutionary step in logic was made by Lukasiewicz with his introduction of three and afterward multivalued logic [1]. It however took more decades to dwell on the ideas of the nonAristotelian view of the world before fuzzy sets were introduced. This happened in 1965 with the publication of the seminal paper on fuzzy sets by Zadeh [5]. Refer also to other influential papers by Zadeh [6–12]. The concept of fuzzy set is surprisingly simple and elegant. Fuzzy set A captures its elements by assigning them to it with some varying degrees of membership. A socalled membership function is a vehicle that quantifies different degrees of membership. The higher the degree of membership A(x), the stronger is the level of belongingness of this element to A [13–16]. The obvious, yet striking, difference between sets (intervals) and fuzzy sets lies in the notion of partial membership supported by fuzzy sets. In fuzzy sets, we discriminate between elements that are ‘typical’ to the concept and those of borderline character. Information granules such as high speed, warm weather,
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
99
and fast car are examples of information granules falling under this category that can be conveniently represented by fuzzy sets. As we cannot specify a single, welldefined element that forms a solid border between full belongingness and full exclusion, fuzzy sets offer an appealing alternative and a practical solution to this problem. Fuzzy sets with their smooth transition boundaries form an ideal vehicle to capture the notion of partial membership. In this sense information granules formalized in the language of fuzzy sets support a vast array of humancentric pursuits. They are predisposed to play a vital role when interfacing human to intelligent systems. In problem formulation and problem solving, fuzzy sets may arise in two fundamentally different ways; 1. Explicit. Here, they typically pertain to some generic and fairly basic concepts we use in our communication and description of reality. There is a vast amount of examples, such as concepts being commonly used every day, say short waiting time, large data set, low inflation, high speed, long delay, etc. All of them are quite simple as we can easily capture their meaning. We can easily identify a universe of discourse over which such variables are defined. For instance, this could be time, number of records, velocity, and alike. 2. Implicit. Here, we are concerned with more complex and inherently multifaceted concepts and notions where fuzzy sets could be incorporated into the formal description and quantification of such problems yet not in so instantaneous manner. Some examples could include concepts such as ‘preferred car,’ ‘stability of the control system,’ ‘highperformance computing architecture,’ ‘good convergence of the learning scheme,’ ‘strong economy,’ etc. All these notions incorporate some components that could be quantified with the use of fuzzy sets, yet this translation is not that completely straightforward and immediate as it happens for the category of the explicit usage of fuzzy sets. For instance, the concept of ‘preferred car’ is evidently multifaceted and may involve a number of essential descriptors that when put together are really reflective of the notion we have in mind. For instance, we may involve a number of qualities such as speed, economy, reliability, depreciation, maintainability, and alike. Interestingly, each of these features could be easily rephrased in simpler terms and through this process at some level of this refinement phase we may arrive at fuzzy sets that start to manifest themselves in an explicit manner. As we stressed, the omnipresence of fuzzy sets is surprising. Even going over any textbook or research monograph, not mentioning newspapers and magazines, we encounter a great deal of fuzzy sets coming in their implicit or explicit format. Table 5.1 offers a handful of selected examples. From the optimization standpoint, the properties of continuity and commonly encountered differentiability of the membership functions becomes a genuine asset. We may easily envision situations where those information granules incorporated as a part of the neurofuzzy system are subject to optimization – hence the differentiability of their membership functions becomes of critical relevance. What becomes equally important is the fact that fuzzy sets bridge numeric and symbolic concepts. On one hand, fuzzy set can be treated as some symbol. We can regard it as a single conceptual entity by assigning to it some symbol, say L (for low). In the sequel, it could be processed as a purely symbolic entity. On the other hand, a fuzzy set comes with a numeric membership function and these membership grades could be processed in a numeric fashion. Fuzzy sets can be viewed from several fundamentally different standpoints. Here we emphasize the three of them that play a fundamental role in processing and knowledge representation.
As an Enabling Processing Technology of Some Universal Character and of Profound HumanCentric Character Fuzzy sets build on the existing information technologies by forming a usercentric interface, using which one could communicate essential design knowledge, thus guiding problem solving and making it more efficient. For instance, in signal processing and image processing we might incorporate a collection of rules capturing specific design knowledge about filter development in a certain area. Say, ‘if the level of noise is high, consider using a large window of averaging.’ In control engineering, we may incorporate some domain knowledge about the specific control objectives. For instance, ‘if the constraint of fuel
100
Handbook of Granular Computing
Table 5.1 Examples of concepts whose description and processing invoke the use of the technology of fuzzy sets and granular computing p. 65: small random errors in the measurement vector . . . p. 70: The success of the method depends on whether the first initial guess is already close enough to the global minimum . . . p. 72: Hence, the convergence region of a numerical optimizer will be large F. van der Heijden et al. Classification, Parameter Estimation and State Estimation. J. Wiley Chichester, 2004. p. 162: Comparison between bipolar and MOS technology (a part of the table) integration power cost
bipolar low high low
MOS very high low low
R.H. Katz and G. Borriello. Contemporary Logic Design, 2nd ed. Prentice Hall, Upper Saddle River, NJ, 2005. p. 50: validation costs are high for critical systems p. 660: A high value for fanin means that X is highly coupled to the rest of the design and changes to X will have extensive knockon effect. A high value for fanout suggests that the overall complexity of X may be high because of the complexity of control logic needed to coordinate the called components. Generally, the larger the size of the code of a component, the more complex and errorprone the component is likely to be The higher the value of the Fog index, the more difficult the document is to understand I. Sommerville. Software Engineering, 8th ed. AddisonWesley, Harlow, 2007.
consumption is very important, consider settings of a ProportionalIntegralDerivation (PID) controller producing low overshoot.’ Some other examples of highly representative humancentric systems concern those involving (a) construction and usage of relevance feedback in retrieval, organization, and summarization of video and images; (b) queries formulated in natural languages; and (c) summarization of results coming as an outcome of some query. Secondly, there are unique areas of applications in which fuzzy sets form a methodological backbone and deliver the required algorithmic setting. This concerns fuzzy modeling in which we start with collections of information granules (typically realized as fuzzy sets) and construct a model as a web of links (associations) between them. This approach is radically different from the numeric, functionbased models encountered in ‘standard’ system modeling. Fuzzy modeling emphasizes an augmented agenda in comparison with the one stressed in numeric models. While we are still concerned with the accuracy of the resulting model, its interpretability and transparency become of equal, and sometimes even higher, relevance. It is worth stressing that fuzzy sets provide an additional conceptual and algorithmic layer to the existing and wellestablished areas. For instance, there are profound contributions of fuzzy sets to pattern recognition. In this case, fuzzy sets build on the wellestablished technology of feature selection, classification, and clustering. Fuzzy sets are an ultimate mechanism of communication between humans and computing environment. The essence of this interaction is illustrated in Figure 5.1. Any input is translated in terms of fuzzy sets and thus made comprehensible at the level of the computing system. Likewise, we see a similar role of fuzzy sets when communicating the results of detailed processing, retrieval, and alike. Depending on application and the established mode of interaction, the communication layer may involve a substantial deal of processing of fuzzy sets. Quite often we combine the mechanisms of communication and represent
101
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
Human
Computing system
Computing system
(a)
Interface
Human
(b)
Figure 5.1 Fuzzy sets in the realization of communication mechanisms: (a) both at the user end and at the computing system side; (b) a unified representation of input and output mechanisms of communication in the form of the interface which could also embrace a certain machinery of processing at the level of fuzzy sets
them in the form of a single module (Figure 5.1b). This architectural representation stresses the human centricity aspect of the developed systems.
As an Efficient Computing Framework of Global Character Rather than processing individual elements, say a single numeric datum, an encapsulation of a significant number of the individual elements that is realized in the form of some fuzzy sets offers immediate benefits of joint and orchestrated processing. Instead of looking at the individual number, we embrace a more general point of view and process an entire collection of elements represented now in the form of a single fuzzy set. This effect of a collective handling of individual elements is seen very profoundly in socalled fuzzy arithmetic. The basic constructs here are fuzzy numbers. In contrast to single numeric quantities (real numbers) fuzzy numbers represent collections of numbers where each of them belongs to the concept (fuzzy number) to some degree. These constructs are then subject to processing, say addition, subtraction, multiplication, division, etc. Noticeable is the fact that by processing fuzzy numbers we are in fact handling a significant number of individual elements at the same time. Fuzzy numbers and fuzzy arithmetic provide an interesting advantage over interval arithmetic (viz. arithmetic in which we are concerned with intervals – sets of numeric values). Intervals come with abrupt boundaries as elements can belong to or are excluded from the given set. This means, for example, that any gradientbased techniques of optimization invoked when computing solutions become very limited: the derivative is equal to zero, with an exception at the point where the abrupt boundary is located.
Fuzzy Sets as a Vehicle of Raising and Quantifying Awareness About Granularity of Outcomes Fuzzy sets form the results of granular computing. As such they convey a global view at the elements of the universe of discourse over which they are constructed. When visualized, the values of the membership function describe a suitability of the individual points as compatible (preferred) with the solution. In this sense, fuzzy sets serve as a useful visualization vehicle: when displayed, the user could gain an overall view at the character of solution (regarded as a fuzzy set) and make a final choice. Note that this is very much in line with the idea of the human centricity: we present the user with all possible results; however, we do not put any pressure as to the commitment of selecting a certain numeric solution.
Fuzzy Sets as a Mechanism Realizing a Principle of the Least Commitment As the computing realized in the setting of granular computing returns a fuzzy set as its result, it could be effectively used to realize a principle of the least commitment. The crux of this principle is to use fuzzy set as a mechanism of making us cognizant of the quality of obtained result. Consider a fuzzy set being a result of computing in some problem of multiphase decision making. The fuzzy set is defined over various alternatives and associates with them the corresponding degrees of preference (see Figure 5.2). If there are several alternatives with very similar degrees of membership, this serves as a clear indicator of uncertainty or hesitation as to the making of a decision. In other words, in light of the form of the generated fuzzy set, we do not intend to commit ourselves to making any decision (selection of
102
Handbook of Granular Computing
Accumulation of evidence
Time Decision postponed
Decision released
Figure 5.2 An essence of the principle of the least commitment; the decision is postponed until the phase where there is enough evidence accumulated and the granularity of the result becomes specific enough. Shown are also examples of fuzzy sets formed at successive phases of processing that become more specific along with the increased level of evidence
one of the alternatives) at this time. Our intent would be to postpone decision and collect more evidence. For instance, this could involve further collecting of data, soliciting expert opinion, and alike. With this evidence, we could continue with computing and evaluate the form of the resulting fuzzy set. It could well be that the collected evidence has resulted in more specific fuzzy set of decisions on the basis of which we could either still postpone decision and keep collecting more evidence or proceed with decision making. Thus the principle of the least commitment offers us an interesting and useful guideline as to the mechanism of decision making versus evidence collection.
5.3 Some Selected Descriptors of Fuzzy Sets In principle, any function A: X → [0, 1] becomes potentially eligible to represent the membership function of fuzzy set A. Let us recall that any fuzzy set defined in X is represented by its membership function mapping the elements of the universe of discourse to the unit interval. The degree of membership, A(x), quantifies an extent to which ‘x’ is assigned to A. Higher values of A(x) indicate stronger association of ‘x’ with the concept conveyed by A. In practice, however, the type and shape of membership functions should fully reflect the nature of the underlying phenomenon we are interested to model. We require that fuzzy sets should be semantically sound, which implies that the selection of membership functions needs to be guided by the character of the application and the nature of the problem we intend to solve. Given the enormous diversity of potentially useful (viz. semantically sound) membership functions, there are certain common characteristics (descriptors) that are conceptually and operationally qualified to capture the essence of the granular constructs represented in terms of fuzzy sets. In what follows, we provide a list of the descriptors commonly encountered in practice [17–19].
Normality We say that the fuzzy set A is normal if its membership function attains 1; that is, sup x∈X A(x) = 1.
(1)
If this property does not hold, we call the fuzzy set subnormal. An illustration of the corresponding fuzzy set is shown in Figure 5.3. The supremum (sup) in the above expression is also referred to as the height of the fuzzy set A, hgt(A) = sup x∈X A(x) = 1.
103
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
1
hgt(A)
1 A
hgt(A)
A
x
x
Figure 5.3
Examples of normal and subnormal fuzzy sets
The normality of A has a simple interpretation: by determining the height of the fuzzy set, we identify an element with the highest membership degree. The value of the height being equal to 1 states that there is at least one element in X whose typicality with respect to A is the highest one and which could be sought as fully compatible with the semantic category presented by A. A subnormal fuzzy set whose height is lower than 1, viz. hgt(A) 0}.
(3)
In other words, support identifies all elements of X that exhibit some association with the fuzzy set under consideration (by being allocated to A with nonzero membership degrees).
Core The core of a fuzzy set A, Core(A), is a set of all elements of the universe that are typical to A, viz., they come with membership grades equal to 1; Core(A) = {x ∈ XA(x) = 1}.
(4)
The support and core are related in the sense that they identify and collect elements belonging to the fuzzy set, yet at two different levels of membership. Given the character of the core and support, we note that all elements of the core of A are subsumed by the elements of the support of this fuzzy set. Note that both support and core are sets, not fuzzy sets (Figure 5.4.). We refer to them as the setbased characterizations of fuzzy sets. While core and support are somewhat extreme (in the sense that they identify the elements of A that exhibit the strongest and the weakest linkages with A), we may also be interested in characterizing sets
104
Handbook of Granular Computing
A
1
A
1
x
x Core(A)
Supp(A)
Figure 5.4
Support and core of A
of elements that come with some intermediate membership degrees. A notion of a socalled αcut offers here an interesting insight into the nature of fuzzy sets.
αCut
The αcut of a fuzzy set A, denoted by Aα , is a set consisting of the elements of the universe whose membership values are equal to or exceed a certain threshold level α, where α ∈ [0, 1]. Formally speaking, we have Aα = {x ∈ XA(x) ≥ α}. A strong αcut differs from the αcut in the sense that it identifies all elements in X for which we have the following equality: Aα = {x ∈ XA(x) > α}. An illustration of the concept of the αcut and strong αcut is presented in Figure 5.5. Both support and core are limited cases of αcuts and strong acuts. For α = 0 and the strong αcut, we arrive at the concept of the support of A. The threshold α = 1 means that the corresponding αcut is the core of A.
Convexity We say that a fuzzy set is convex if its membership function satisfies the following condition: For all x1 , x2 ∈ X and all λ ∈ [0, 1], A[λx1 + (1 − λ)x2 ] ≥ min [A(x1 ), A(x2 )].
(5)
The above relationship states that whenever we choose a point x on a line segment between x1 and x2 , the point (x, A(x)) is always located above or on the line passing through the two points (x1 , A(x1 )) and (x2 , A(x2 )) (referto Figure 5.6). Let us recall that a set S is convex if for all x1 , x2 ∈ S, then x = λx1 + (1 − λ)x2 ∈ S for all λ ∈ [0, 1]. In other words, convexity means that any line segment identified by any two points in S is also contained in S. For instance, intervals of real numbers are convex sets. Therefore, if a fuzzy set is convex, then all of its αcuts are convex, and conversely, if a fuzzy set has all it’s αcuts convex, then it is a convex fuzzy set (refer to Figure 5.7). Thus we may say that a fuzzy set is convex if all its αcuts are convex (intervals).
1
1
A
A
α
α
x Aα
Figure 5.5
Aα +
Examples of αcut and strong αcut
x
105
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
A
1
A(λx1 + (1− λ)x2)
x1
x
x2 λx1 + (1− λ)x2
Figure 5.6
An example of a convex fuzzy set A
Fuzzy sets can be characterized by counting their elements and bringing a single numeric quantity as a meaningful descriptor of this count. While in case of sets, this sounds convincing: here we have to take into account different membership grades. In the simplest form this counting comes under the name of cardinality.
Cardinality Given a fuzzy set A defined in a finite or countable universe X, its cardinality, denoted by card(A), is expressed as the following sum: A(x), (6) card(A) = x∈X
or, alternatively, as the following integral: card(A) =
A(x) dx.
(7)
X
(We assume that the integral shown above does make sense.) The cardinality produces a count of the number of elements in the given fuzzy set. As there are different degrees of membership, the use of the sum here makes sense as we keep adding contributions coming from the individual elements of this fuzzy set. Note that in the case of sets, we count the number of elements belonging to the corresponding sets. We also use the alternative notation of Card(A) = A and refer to it as a sigma count (σ count). The cardinality of fuzzy sets is explicitly associated with the concept of granularity of information granules realized in this manner. More descriptively, the more the elements of A we encounter, the higher the level of abstraction supported by A and the lower the granularity of the construct. Higher values of
1
A
1
A
α
α
Aα Figure 5.7
x
x Aα
Examples of convex and nonconvex fuzzy sets
106
Handbook of Granular Computing
cardinality come with the higher level of abstraction (generalization) and the lower values of granularity (specificity).
Equality and Inclusion Relationships in Fuzzy Sets We investigate two essential relationships between two fuzzy sets defined in the same space that offer a useful insight into their fundamental dependencies. When defining these notions, bear in mind that they build on the wellknown definitions encountered in set theory.
Equality We say that two fuzzy sets A and B defined in the same universe X are equal if and only if their membership functions are identical, meaning that A(x) = B(x)
∀x ∈ X.
(8)
Inclusion Fuzzy set A is a subset of B (A is included in B), denoted by A ⊆ B, if and only if every element of A also is an element of B. This property expressed in terms of membership degrees means that the following inequality is satisfied: A(x ) ≤ B(x) ∀x ∈ X.
(9)
Interestingly, the definitions of equality and inclusion exhibit an obvious dichotomy as the property of equality (or inclusion) is satisfied or is not satisfied. While this quantification could be acceptable in the case of sets, fuzzy sets require more attention with this regard given that the membership degrees are involved in expressing the corresponding definitions. The approach being envisioned here takes into consideration the degrees of membership and sets up a conjecture that any comparison of membership values should rather return a degree of equality or inclusion. For a given element of X, let us introduce the following degree of inclusion of A(x) in B(x) and denote it by A(x) ⇒ B(x): (⇒ is the symbol of implication; the operation of implication itself will be discussed in detail later on; we do not need these details for the time being.) 1 if A(x) ≤ B(x) (10) A(x) → B(x) = 1 − A(x) + B(x) otherwise. If A(x) and B(x) are confined to 0 and 1 as in the case of sets, we come up with the standard definition of Boolean inclusion being used in set theory. Computing (10) for all elements of X, we introduce a degree of inclusion of A in B, denoted by A ⊂ B, to be in the form 1 A ⊂ B = (A(x) ⇒ B(x)) dx. (11) Card(X) X
We characterize the equality of A and B, A = B, using the following expression: 1 A = B = [min((A(x) ⇒ B(x)), (B(x) ⇒ A(x))] dx. Card(X)
(12)
X
Again this definition is appealing as it results as a direct consequence of the inclusion relationships that have to be satisfied with respect to the inclusion of A in B and B in A.
Examples. Let us consider two fuzzy sets A and B described by the Gaussian and triangular membership functions. Recall that Gaussian membership function is described as (exp(−(x − m)2 /σ 2 )), where the modal value and spread are denoted by ‘m’ and ‘s,’ respectively. The triangular fuzzy set is fully characterized by the spreads (a and b) and the modal value is equal to ‘n.’ Figure 5.8 provides some examples of A and B for selected values of the parameters and the resulting degrees of inclusion. They are intuitively appealing, reflecting the nature of relationship. (A is included in B.)
107
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
1
1
A
B
B
0.5
0
A
0.5
0
5
0
10
0
5
(a)
10
(b)
1
B
A
0.5
0
0
5
10
(c)
Figure 5.8 Examples of fuzzy sets A and B along with their degrees of inclusion: (a) a = 0, n = 2, b = 3, m = 4, s = 2, A = B = 0.637; (b) b = 7, A = B = 0.864; (c) a = 0, n = 2, b = 9, m = 4, s = 0.5, A = B = 0.987
Energy and Entropy Measures of Fuzziness We can offer a global view at the collection of membership grades conveyed by fuzzy sets by aggregating them in the form of socalled measures of fuzziness. Two main categories of such measures are known in the form of energy and entropy measures of fuzziness [20, 21]. Energy measure of fuzziness, of a fuzzy set A in X, denoted by E(A), is a functional of the membership degrees: E(A) =
n
e[A(xi )]
(13)
i=1
if Card (X) = n. In the case of the infinite space, the energy measure of fuzziness is the following integral: (14) E(A) = e(A(x)) dx. X
The mapping e: [0, 1] → [0, 1] is a functional monotonically increasing over [0, 1] with the boundary conditions e(0) = 0 and e(1) = 1. As the name of this measure stipulates, its role is to quantify a sort of energy associated with the given fuzzy set. The higher the membership degrees, the more essential are their contributions to the overall energy measure. In other words, by computing the energy measure of fuzziness we can compare fuzzy sets in terms of their overall count of membership degrees. A particular form of the above functional comes with the identity mapping that is e(u) = u for all u in [0, 1]. We can see that in this case, (13) and (14) reduce to the cardinality of A, E(A) =
n i=1
A(xi ) = Card(A).
(15)
108
Handbook of Granular Computing
1
1
1
1
Figure 5.9 Two selected forms of the functional ‘e’: in (a) high values of membership are emphasized (accentuated), while in (b) the form of ‘e’ shown puts emphasis on lower membership grades
The energy measure of fuzziness forms a convenient way of expressing a total mass of the fuzzy set. Since card(Ø) = 0 and card (X) = n, the more a fuzzy set differs from the empty set, the larger is its mass. Indeed, rewriting (15) we obtain E(A) =
n
A(x i ) =
i=1
n
A(x i ) − Ø(x i ) = d(A, Ø) = Card(A),
(16)
i=1
where d(A, Ø) is the Hamming distance between fuzzy set A and the empty set. While the identity mapping (e) is the simplest alternative one could think of, in general, we can envision an infinite number of possible options. For instance, one could consider the functionals such as e(u) = u p , p > 0, and e(u) = sin( π2 u). Note that by choosing a certain form of the functional, we accentuate a varying contribution of different membership grades. For instance, depending on the form of ‘e,’ the contribution of the membership grades close to 1 could be emphasized, while those located close to 0 could be very much reduced. Figure 5.9 illustrates this effect by showing two different forms of the functional (e). When each element xi of X appears with some probability pi , the energy measure of fuzziness of the fuzzy set A can include this probabilistic information in which case it assumes the following format: E(A) =
n
pi e[A(xi )].
(17)
i=1
A careful inspection of the above expression reveals that E(A) is the expected value of the functional e(A). For infinite X, we use an integral format of the energy measure of fuzziness: E(A) =
p(x)e[A(x)] dx,
(18)
X
where p(x) is the probability density function (pdf) defined over X. Again, E(A) is the expected value of e(A).
Entropy Measure of Fuzziness The entropy measure of fuzziness of A, denoted by H (A), is built on the entropy functional (h) and comes in the form H (A) =
n i=1
h[A(x i )]
(19)
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
or in the continuous case of X
109
H (A) =
h(A(x)) dx,
(20)
X
where h: [0, 1] [0, 1] is a functional such that (a) it is monotonically increasing in [0, 1/2] and monotonically decreasing in [1/2, 1] and (b) comes with the boundary conditions h(0) = h(1) = 0 and h(1/2) = 1. This functional emphasizes membership degrees around 1/2; in particular, the value of 1/2 is stressed to be the most ‘unclear’ (causing the highest level of hesitation with its quantification by means of the proposed functional).
5.4 Granulation of Information The notion of granulation emerges as a direct and immediate need to abstract and summarize information and data to support various processes of comprehension and decision making. For instance, we often sample an environment for values of attributes of state variables, but we rarely process all details because of our physical and cognitive limitations. Quite often, just a reduced number of variables, attributes, and values are considered because those are the only features of interest given the task under consideration. To avoid all necessary and highly distractive details, we require an effective abstraction procedure. As discussed earlier, detailed numeric information is aggregated into a format of information granules where the granules themselves are regarded as collections of elements that are perceived as being indistinguishable, similar, close, or functionally equivalent. Fuzzy sets are examples of information granules. When talking about a family of fuzzy sets, we are typically concerned with fuzzy partitions of X. More generally, the mechanism of granulation can be formally characterized by a fourtuple of the form [22–24] < X, G, S, C >
(21)
where X is a universe of discourse (space), G a formal framework of granulation (resulting from the use of fuzzy sets, rough sets, etc.), S a collection of information granules, and C a transformation mechanism that realizes communication among granules of different nature and granularity levels [25, 26], (see Figure 5.10.) In Figure 5.10, notice the communication links that allow for communication between information granules expressed in the same formal framework but at different levels of granularity as well as communication links between information granules formed in different formal frameworks. For instance, in the case of fuzzy granulation shown in Figure 5.10, if G is the formal framework of fuzzy sets, S = {F1 , F2 , F3 , F4 }, and C is a certain communication mechanism, then communicating the results of processing at the level of fuzzy sets to the framework of interval calculus, one could consider the use of some αcuts. The pertinent computational details will be discussed later on.
S Sm … S2 S1 Fuzzy
Interval
Rough
Formal frameworks
Figure 5.10 Granular computing and communication mechanisms in the coordinates of formal frameworks (fuzzy sets, intervals, rough sets, etc.) and levels of granularity
110
Handbook of Granular Computing
5.5 Characterization of the Families of Fuzzy Sets As we have already mentioned, when dealing with information granulation we often develop a family of fuzzy sets and move on with the processing that inherently uses all the elements of this families. Alluding to the existing terminology, we will be referring such collections of information granules as frames of cognition. In what follows, we introduce the underlying concept and discuss its main properties.
Frame of Cognition A frame of cognition is a result of information granulation in which we encounter a finite collection of fuzzy sets – information granules that ‘represent’ the entire universe of discourse and satisfy a system of semantic constraints. The frame of cognition is a notion of particular interest in fuzzy modeling, fuzzy control, classification, data analysis, to name a fews, of representative examples. In essence, the frame of cognition is crucial to all applications where local and globally meaningful granulation is required to capture the semantics of the conceptual and algorithmic settings in which problem solving has to be placed. A frame of cognition consists of several labeled, normal fuzzy sets. Each of these fuzzy sets is treated as a reference for further processing. A frame of cognition can be viewed as a codebook of conceptual entities. Being more descriptive, we may view them as a family of linguistic landmarks, say small, medium, high, etc. More formally, a frame of cognition Φ, Φ = {A1 , A2 , . . . , Am },
(22)
is a collection of fuzzy sets defined in the same universe X that satisfies at least two requirements of coverage and semantic soundness.
Coverage We say that Φ covers X if any element x ∈ X is compatible with at least one fuzzy set Ai in Φ, i ∈ I = {1, 2, . . . , m}, meaning that it is compatible (coincides) with Ai to some nonzero degree; that is, ∀
∃ Ai (x) > 0.
(23)
x∈X i∈I
Being more strict, we may require a satisfaction of the socalled δlevel coverage, which means that for any element of X, fuzzy sets are activated to a degree not lower than δ: ∀
x∈X
∃ Ai (x) > 0
i∈I
∀
x∈X
∃ Ai (x) > δ,
i∈I
(24)
where δ ∈ [0, 1]. Put it in a computational perspective, the coverage assures that each element of X is represented by at least one of the elements of Φ and guarantees any absence of gaps viz. elements of X for which there is no fuzzy set being compatible with it.
Semantic Soundness The concept of semantic soundness is more complicated and difficult to quantify. In principle, we are interested in information granules of Φ that are meaningful. While there is far more flexibility in a way in which a suite of detailed requirements could be structured, we may agree on a collection several fundamental properties: 1. Each Ai , i ∈ I, is a unimodal and normal fuzzy set. 2. Fuzzy sets Ai , i ∈ I , are made disjoint enough to assure that they are sufficiently distinct to become linguistically meaningful. This imposes a maximum degree λ of overlap among any two elements of Φ. In other words, given any x ∈ X, there is no more than one fuzzy set Ai such that Ai (x) ≥ λ, λ ∈ [0, 1]. 3. The number of elements of Φ is kept low; following the psychological findings reported by Miller [27] and others we consider the number of fuzzy sets forming the frame of cognition to be maintained in the range of 7 ± 2 items.
111
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
Ai
1 λ
δ
x
Figure 5.11
Coverage and semantic soundness of a cognitive frame
Coverage and semantic soundness [28] are the two essential conditions that should be fulfilled by the membership functions of Ai to achieve interpretability. In particular, δcoverage and λoverlapping induce a minimal (δ) and maximal (λ) level of overlap between fuzzy sets (Figure 5.11).
Main Characteristics of the Frames of Cognition Considering the families of linguistic labels and associated fuzzy sets embraced in a frame of cognition, several characteristics are worth emphasizing.
Specificity We say that the frame of cognition Φ1 is more specific than Φ2 if all the elements of Φ1 are more specific than the elements of Φ2 , (for some illustration refer to Figure 5.12.) The less specific cognition frames promote granulation realized at the higher level of abstraction (generalization). Subsequently, we are provided with the description that captures less details. The notion of specificity could be articulated as proposed in [29].
Granularity Granularity of a frame of cognition relates to the granularity of fuzzy sets used there. The higher the number of fuzzy sets in the frame, the finer the resulting granulation. Therefore, the frame of cognition Φ1 is finer than Φ2 if Φ1  > Φ2 . If the converse holds, Φ1 is coarser than Φ2 (Figure 5.12).
Focus of Attention A focus of attention (scope of perception) induced by a certain fuzzy set A = Ai in Φ is defined as a certain αcut of this fuzzy set. By moving A along X while keeping its membership function unchanged, we can focus attention on a certain selected region of X (as portrayed in Figure 5.13).
Ai
A2
Ai
A3
x
Figure 5.12
A2
A3
A4
x
Examples of two frames of cognition; Φ1 is coarser (more general) than Φ2
112
Handbook of Granular Computing
1
Ai
A2
A3
α
x
Figure 5.13 fuzzy sets
Focus of attention; shown are two regions of focus of attention implied by the corresponding
Information Hiding The idea of information hiding is closely related to the notion of focus of attention, and it manifests through a collection of elements that are hidden when viewed from the standpoint of membership functions. By modifying the membership function of A = Ai in Φ we can produce an equivalence of the elements positioned within some region of X. For instance, consider a trapezoidal fuzzy set A on R and its 1cut (viz., core), the closed interval [a2 , a3 ], as depicted in Figure 5.14. All elements within the interval [a2 , a3 ] are made indistinguishable. Through the use of this specific fuzzy set they are made equivalent – in other words, when expressed in terms of A. Hence, more detailed information, viz., a position of a certain point falling within this interval, is ‘hidden.’ In general, by increasing or decreasing the level of the αcut we can accomplish a socalled αinformation hiding through normalization. For instance, as shown in Figure 5.15, the triangular fuzzy set subjected to its αcut leads to the hiding of information about elements of X falling within this αcut.
5.6 Semantics of Fuzzy Sets: Some General Observations and Membership Estimation Techniques There has been a great deal of methods aimed at the determination of membership functions (cf. [30– 41]). Fuzzy sets are constructs that come with a welldefined meaning. They capture the semantics of the framework they intend to operate within. Fuzzy sets are the building conceptual blocks (generic constructs) that are used in problem description, modeling, control, and pattern classification tasks. Before discussing specific techniques of membership function estimation, it is worth casting the overall presentation in a certain context by emphasizing the aspect of the use of a finite number of fuzzy sets
A
1
B
a1 a2
a3
a4
x
Figure 5.14 A concept of information hiding realized by the use of trapezoidal fuzzy set A: all elements in [a2 , a3 ] are made indistinguishable. The effect of information hiding is not present in case of triangular fuzzy set B
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
113
A
1
B
a1 a2
Figure 5.15
a3
a4
x
Triangular fuzzy set, its successive αcuts, and the resulting effect of αinformation hiding
leading to some essential vocabulary reflective of the underlying domain knowledge. In particular, we are concerned with the related semantics, calibration capabilities of membership functions, and the locality of fuzzy sets. The limited capacity of a shortterm memory, as identified by Miller, suggests that we could easily and comfortably handle and process 7 ± 2 items. This implies that the number of fuzzy sets to be considered as meaningful conceptual entities should be kept at the same level. The observation sounds reasonable – quite commonly, in practice, we witness situations in which this holds. For instance, when describing linguistically quantified variables, say error or change of error, we may use seven generic concepts (descriptors) labeling them as positive large, positive medium, positive small, around zero, negative small, negative medium, and negative large. When characterizing speed, we may talk about its quite intuitive descriptors such as low, medium, and high speed. In the description of an approximation error, we may typically use the concept of a small error around a point of linearization. (In all these examples, the terms are indicated in italics to emphasize the granular character of the constructs and the role being played there by fuzzy sets.) While embracing very different tasks, all these descriptors exhibit a striking similarity. All of them are information granules, not numbers (whose descriptive power is very much limited). In modular software development when dealing with a collection of modules (procedures, functions, and alike), the list of their parameters is always limited to a few items, which is again a reflection of the limited capacity of the shortterm memory. The excessively long parameter list is strongly discouraged due to the possible programming errors and rapidly increasing difficulties of an effective comprehension of the software structure and ensuing flow of control. In general, the use of an excessive number of terms does not offer any advantage. To the contrary, it remarkably clutters our description of the phenomenon and hampers further effective usage of the concepts we intend to establish to capture the essence of the domain knowledge. With the increase in the number of fuzzy sets, their semantics also becomes negatively impacted. Fuzzy sets may be built into a hierarchy of terms (descriptors) but at each level of this hierarchy (when moving down toward higher specificity that is an increasing level of detail), the number of fuzzy sets is kept at a certain limited level. While fuzzy sets capture the semantics of the concepts, they may require some calibration, depending on the specification of the problem at hand. This flexibility of fuzzy sets should not be treated as any shortcoming but rather viewed as a certain and fully exploited advantage. For instance, the term low temperature comes with a clear meaning, yet it requires a certain calibration depending on the environment and the context it was put into. The concept of low temperature is used in different climate zones and is of relevance in any communication between people, yet for each of the community the meaning of the term is different thereby requiring some calibration. This could be realized, e.g., by shifting the membership function along the universe of discourse of temperature, affecting the universe of discourse by some translation, dilation, and alike. As a communication means, linguistic terms are fully legitimate and as such they appear in different settings. They require some refinement so that their meaning is fully understood and shared by the community of the users. When discussing the methods aimed at the determination of membership functions or membership grades, it is worthwhile to underline the existence of the two main categories of approaches being reflective
114
Handbook of Granular Computing
of the origin of the numeric values of membership. The first one is reflective of the domain knowledge and opinions of experts. In the second one, we consider experimental data whose global characteristics become reflected in the form and parameters of the membership functions. In the first group we can refer to the pairwise comparison (Saaty’s approach) as one of the representative examples, while fuzzy clustering is usually presented as a typical example of the datadriven method of membership function estimation. In what follows, we elaborate on several representative methods that will help us appreciate the level and flexibility of fuzzy sets.
Fuzzy Set as a Descriptor of Feasible Solutions The aim of the method is to relate membership function to the level of feasibility of individual elements of a family of solutions associated with the problem at hand. Let us consider a certain function f (x) defined in Ω; that is, f : Ω →R, where Ω ⊂ R. Our intent is to determine its maximum, namely x opt = arg maxx f (x). On the basis of the values of f (x), we can form a fuzzy set A describing a collection of feasible solutions that could be labeled as optimal. Being more specific, we use the fuzzy set to represent an extent (degree) to which some specific values of ‘x’ could be sought as potential (optimal) solutions to the problem. Taking this into consideration, we relate the membership function of A with the corresponding value of f (x) cast in the context of the boundary values assumed by ‘ f .’ For instance, the membership function of A could be expressed in the following form: A(x) =
f (x) − f min . f max − f min
(25)
The boundary conditions are straightforward: f min = minx f (x) and f max = maxx f (x), where the minimum and the maximum are computed over Ω. For other values of ‘x’ where f attains its maximal value, A(x) is equal 1, and around this point, the membership values are reduced when ‘x’ is likely to be a solution to the problem f (x) < f max . The form of the membership function depends on the character of the function under consideration. Linearization, its quality, and description of quality falls under the same banner as the optimization problem. When linearizing a function around some given point, a quality of such linearization can be represented in a form of some fuzzy set. Its membership function attains 1 for all these points where the linearization error is equal to zero. (In particular, this holds at the point around which the linearization is carried out.)
Fuzzy Set as a Descriptor of the Notion of Typicality Fuzzy sets address an issue of gradual typicality of elements to a given concept. They stress the fact that there are elements that fully satisfy the concept (are typical for it) and there are various elements that are allowed only with partial membership degrees. The form of the membership function is reflective of the semantics of the concept. Its details could be captured by adjusting the parameters of the membership function or choosing its form depending on experimental data. For instance, consider a fuzzy set of squares. Formally, a rectangle includes a square shape as its special example when the sides are equal, a = b (Figure 5.16). What if a = b + ε, where ε is a very small positive number? Could this figure be sought as a square? It is very likely so. Perhaps the membership value of the corresponding membership
Membership 1 b
a
a − b
Figure 5.16 Perception of geometry of squares and its quantification in the form of membership function of the concept of fuzzy square
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
115
function could be equal to 0.99. Our perception, which comes with some level of tolerance to imprecision, does not allow us to tell apart this figure from the ideal square (Figure 5.16).
NonLinear Transformation of Fuzzy Sets In many problems, we encounter a family of fuzzy sets defined in the same space. The family of fuzzy sets {A1 , A2 , . . . , Ac } is referred to as referential fuzzy sets. To form a family of semantically meaningful descriptors of the variable at hand, we usually require that these fuzzy sets satisfy the requirements of unimodality, limited overlap, and coverage. Technically, all these features are reflective of our intention to provide this family of fuzzy sets with some semantics. These fuzzy sets could be sought as generic descriptors (say, small, medium, high, etc.) being described by some typical membership functions. For instance, those could be uniformly distributed triangular or Gaussian fuzzy sets with some standard level of overlap between the successive terms (descriptors). As mentioned, fuzzy sets are usually subject to some calibration depending on the character of the problem at hand. We may use the same terms small, medium, and large in various contexts, yet their detailed meaning (viz., membership degrees) has to be adjusted (adjusted). For the given family of the referential fuzzy sets, their calibration could be accomplished by taking the space X = [a, b] over which they are originally defined and transforming it into itself, that is, [a, b] through some nondecreasing monotonic and continuous function Φ(x, p), where p is a vector of some adjustable parameters bringing the required flexibility of the mapping. The nonlinearity of the mapping is such that some regions of X are contracted and some of them are stretched (expanded) and in this manner capture the required local context of the problem. This affects the membership functions of the referential fuzzy sets {A1 , A2 , . . . , Ac } whose membership functions are now expressed as Ai (Φ(x)). The construction of the mapping Φ is optimized, taking into account some experimental data concerning membership grades given at some points of X. More specifically, the experimental data come in the form of the input–output pairs: x1 − μ1 (1), μ2 (1), . . . , μc (1) x2 − μ1 (2), μ2 (2), . . . , μc (2) ... x N − μ1 (N), μ2 (N), . . . , μc (N ),
(26)
where the kth input–output pair consists of xk , which denotes some point in X, while μ1 (k), μ2 (k), . . . , μc (k) are the numeric values of the corresponding membership degrees. The objective is to construct a nonlinear mapping that is optimizing it with respect to the available parameters p. More formally, we could translate the problem into the minimization of the following sum of squared errors: c i=1
(Ai (Φ(x 1 ;p) − μi (1))2 +
c i=1
(Ai (Φ(x 2 ;p) − μi (2))2 + · · · +
c
(Ai (Φ(x N ;p) − μi (N ))2 .
(27)
i=1
One of the feasible mapping comes in the form of a piecewise linear function shown in Figure 5.17. Here the vector of the adjustable parameters p involves a collection of the split points r1 , r2 , . . . , r L and the associated differences D1 , D2 , . . . , D L ; hence, p = [r1 , r2 , . . . , r L , D1 , D2 , . . . , D L ]. The regions of expansion or compression are used to affect the referential membership functions and adjust their values given the experimental data.
Examples. We consider some examples of nonlinear transformations of Gaussian fuzzy sets through the piecewise linear transformations (here L = 3) shown in Figure 5.18.
116
Handbook of Granular Computing
D3 D2 D1 r1
r2
r3
x
Figure 5.17 A Piecewise linear transformation function Φ; shown also is a linear mapping not affecting the universe of discourse and not exhibiting any impact on the referential fuzzy sets. The proposed piecewise linear mapping is fully invertible Note the fact that some fuzzy sets become more specific, while the others are made more general and expanded over some regions of the universe of discourse. This transformation leads to the membership functions illustrated in Figure 5.19. Considering the same nonlinear mapping as before, two triangular fuzzy sets are converted into fuzzy sets described by piecewise membership functions as shown in Figure 5.20. Some other examples of the transformation of fuzzy sets through the piecewise mapping are shown in Figure 5.21.
Vertical and Horizontal Schemes of Membership Estimation The vertical and horizontal modes of membership estimation are two standard approaches used in the determination of fuzzy sets. They reflect distinct ways of looking at fuzzy sets whose membership functions at some finite number of points are quantified by experts. In the horizontal approach we identify a collection of elements in the universe of discourse X and request that an expert answers the question Does x belong to concept A? The answers are expected to come in a binary (yes–no) format. The concept A defined in X could be any linguistic notion, say high speed, low temperature, etc. Given ‘n’ experts whose answers for a given point of X form a mix of yes–no replies, we count the number of ‘yes’ answers and compute the ratio of the positive answers ( p) versus the total number of replies (n), i.e., is p/n. This ratio (likelihood) is treated as a membership degree of the concept at the given point of the universe of discourse. When all experts accept that the element belongs to the concept, then its membership degree is equal to 1. Higher disagreement between the experts (quite divided opinions) results in lower membership degrees. The concept A defined in X requires collecting results for some other elements of X and determining the corresponding ratios as outlined in Figure 5.22. 10
5
0
Figure 5.18
0
5
10
An example of the piecewise linear transformation
117
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
1 1
0.5
0
0.5
0
5 (a)
10
0
0
5 (b)
10
Figure 5.19 Examples of (a) original membership functions and (b) the resulting fuzzy sets after the piecewise linear transformation 1
1
0.5
0.5
0
0
2
4
6
8
10
0
0
2
4
(a)
Figure 5.20
8
10
Two triangular fuzzy sets along with their piecewise linear transformation
10
1
5
0.5
0
Figure 5.21
6 (b)
0
5 (a)
10
0
0
5 (b)
10
(a) The piecewise linear mapping and (b) the transformed Gaussian fuzzy sets
p/n
X
Figure 5.22 A horizontal method of the estimation of the membership function; observe a series of estimates determined for selected elements of X. Note also that the elements of X need not be evenly distributed
118
Handbook of Granular Computing
If replies follow some, e.g., binomial distribution, then we could determine a confidence interval of the individual membership grade. The standard deviation of the estimate of the positive answers associated with the point x, denoted here by σ , is given in the form σ =
p(1 − p) . n
(28)
The associated confidence interval which describes a range of membership values is then determined as [ p − σ, p + σ ].
(29)
In essence, when the confidence intervals are taken into consideration, the membership estimates become intervals of possible membership values and this leads to the concept of socalled intervalvalued fuzzy sets. By assessing the width of the estimates, we could control the execution of the experiment: when the ranges are too long, one could redesign the experiment and closely monitor the consistency of the responses collected in the experiment. The advantage of the method comes with its simplicity as the technique explicitly relies on a direct counting of responses. The concept is also intuitively appealing. The probabilistic nature of the replies helps build confidence intervals that are essential to the assessment of the specificity of the membership quantification. A certain drawback is related to the local character of the construct: as the estimates of the membership function are completed separately for each element of the universe of discourse, they could exhibit a lack of continuity when moving from certain point to its neighbor. This concern is particularly valid in the case when X is a subset of real numbers. The vertical mode of membership estimation is concerned with the estimation of the membership function by focusing on the determination of the successive αcuts. The experiment focuses on the unit interval of membership grades. The experts involved in the experiment are asked the questions of the form What are the elements of X which belong to fuzzy set A at degree not lower than α? Here α is a certain level (threshold) of membership grades in [0, 1]. The essence of the method is illustrated in Figure 5.23. Note that the satisfaction of the inclusion constraint is obvious: we envision that for higher values of α, the expert is going to provide more limited subsets of X; the vertical approach leads to the fuzzy set by combining the estimates of the corresponding αcuts. Given the nature of this method, we are referring to the collection of random sets as these estimates appear in the successive stages of the estimation process. The elements are identified by the expert as they form the corresponding αcuts of A. By repeating the process for several selected values of α we end up with the αcuts and using them we reconstruct the fuzzy set. The simplicity of the method is its genuine advantage. Like in the horizontal method of membership estimation, a possible lack of continuity is a certain disadvantage one has to be aware of.
αp α1 X
Figure 5.23 A vertical approach of membership estimation through the reconstruction of a fuzzy set through its estimated αcuts
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
119
Here the selection of suitable levels of α needs to be carefully investigated. Similarly, an order at which different levels of α are used in the experiment could impact the estimate of the membership function.
Saaty’s Priority Method of Pairwise Membership Function Estimation The priority method introduced by Saaty [42, 43] forms another interesting alternative used to estimate the membership function. To explain the essence of the method, let us consider a collection of elements x1 , x2 , . . . , xn (those could be, for instance, some alternatives whose allocation to a certain fuzzy set is sought) for which given are membership grades A(x1 ), A(x2 ), . . . , A(xn ). Let us organize them into a socalled reciprocal matrix of the following form: ⎡ ⎤ ⎤ ⎡ A(x1 ) A(x1 ) A(x1 ) A(x1 ) A(x1 ) ··· ··· 1 ⎢ A(x1 ) A(x2 ) ⎢ A(xn ) ⎥ A(x2 ) A(xn ) ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ A(x ) A(x ) ⎥ ⎢ A(x2 ) ⎥ ⎢ A(x2 ) A(x2 ) ⎥ 2 2 ⎢ ⎥ ··· 1 ··· ⎥ ⎥ ⎢ R = [rij ] = ⎢ (30) ⎢ A(x1 ) A(x2 ) A(xn ) ⎥ = ⎢ A(x1 ) A(xn ) ⎥ . ⎢ ⎥ ⎥ ⎢ ··· ··· ··· ⎢ ⎥ ⎥ ⎢ ⎣ A(xn ) A(xn ) ⎦ A(xn ) ⎦ ⎣ A(xn ) A(xn ) ··· ··· 1 A(x1 ) A(x2 ) A(xn ) A(x1 ) A(x2 ) Noticeably, the diagonal values of R are equal to 1. The entries that are symmetrically positioned with respect to the diagonal satisfy the condition of reciprocality; that is, ri j = 1/r ji . Furthermore, an important transitivity property holds; that is, rik rk j = ri j for all indexes i, j, and k. This property holds because of the way in which the matrix has been constructed. By plugging in the ratios one gets A(xi ) A(xk ) A(xi ) rik rkn = A(x = A(x = ri j . Let us now multiply the matrix by the vector of the membership grades k ) A(x j ) j)
A = [A(x1 ) A(x2 ) . . . A(xn )]T . For the ith row of R (i.e., the ith entry of the resulting vector of results) we obtain ⎤ ⎡
A(x 1 ) ⎥ A(x i ) ⎢ A(x i ) A(x i ) ⎢ A(x 2 ) ⎥ , (31) [R A]i = ··· ⎦ ⎣ · · · A(x 1 ) A(x 2 ) A(x n ) A(x n )
i = 1, 2, . . . , n. Thus, the ith element of the vector is equal to n A(xi ). Overall, once completed the calculations for all ‘i,’ this leads us to the expression R A = n A. In other words, we conclude that A is the eigenvector of R associated with the largest eigenvalue of R, which is equal to ‘n.’ In the above scenario, we assumed that the membership values A(xi ) are given and then showed what form of results could they lead to. In practice, the membership grades are not given and have to be looked for. The starting point of the estimation process are entries of the reciprocal matrix which are obtained through collecting results of pairwise evaluations offered by an expert, designer, or user (depending on the character of the task at hand). Prior to making any assessment, the expert is provided with a finite scale with values spread in between 1 and 7. Some other alternatives of the scales such as those involving five or nine levels could be sought as well. If xi is strongly preferred over x j when being considered in the context of the fuzzy set whose membership function we would like to estimate, then this judgment is expressed by assigning high values of the available scale, say 6 or 7. If we still sense that xi is preferred over x j , yet the strength of this preference is lower in comparison with the previous case, then this is quantified using some intermediate values of the scale, say 3 or 4. If no difference is sensed, the values close to 1 are the preferred choice, say 2 or 1. The value of 1 indicates that xi and x j are equally preferred. On the other hand, if x j is preferred over xi , the corresponding entry assumes values below 1. Given the reciprocal character of the assessment, once the preference of xi over x j has been quantified, the inverse of this number is plugged into the entry of the matrix that is located at the ( j, i)th coordinate. As indicated earlier, the elements on the main diagonal are equal to 1. Next the maximal eigenvalue is computed along with its corresponding eigenvector. The normalized version of the eigenvector is then the membership function of the fuzzy set we considered when doing all pairwise assessments of the elements of its universe of discourse. The pairwise evaluations are far more convenient and manageable in comparison to any effort we make when assigning membership grades to all elements of the universe in a single step.
120
Handbook of Granular Computing
Practically, the pairwise comparison helps the expert focus only on two elements once at a time, thus reducing uncertainty and hesitation, while leading to the higher level of consistency. The assessments are not free of bias and could exhibit some inconsistent evaluations. In particular, we cannot expect that the transitivity requirement could be fully satisfied. Fortunately, the lack of consistency could be quantified and monitored. The largest eigenvalue computed for R is always greater than the dimensionality of the reciprocal matrix (recall that in reciprocal matrices the elements positioned symmetrically along the main diagonal are inverse of each other), λmax > n, where the equality λmax = n occurs only if the results are fully consistent. The ratio ν = (λmax − n)/(n − 1)
(32)
can be regarded as an index of inconsistency of the data; the higher its value, the less consistent are the collected experimental results. This expression can be sought as the indicator of the quality of the pairwise assessments provided by the expert. If the value of ν is too high, exceeding a certain superimposed threshold, the experiment may need to be repeated. Typically if ν is less than 0.1, the assessment is sought to be consistent, while higher values of ν call for a reexamination of the experimental data and a rerun of the experiment. To quantify how much the experimental data deviate from the transitivity requirement, we calculate the absolute differences between the corresponding experimentally obtained entries of the reciprocal matrix, namely rik and ri j r jk . The sum expressed in the form V (i, k) =
n
ri j r jk − rik 
(33)
j=1
serves as a useful indicator of the lack of transitivity of the experimental data for the given pair of elements (i, nk). If required, we may repeat the experiment if the above sum takes high values. The overall sum i,k V (i, k) then becomes a global evaluation of the lack of transitivity of the experimental assessment.
Fuzzy Sets as Granular Representatives of Numeric Data In general, a fuzzy set is reflective of numeric data that are put together in some context. Using its membership function we attempt to embrace them in a concise manner. The development of the fuzzy set is supported by the following experimentdriven and intuitively appealing rationale: (a) First, we expect that A reflects (or matches) the available experimental data to the highest extent. (b) Second, the fuzzy set is kept specific enough so that it comes with a welldefined semantics. These two requirements point at the multiobjective nature of the construct: we want to maximize the coverage of experimental data (as articulated by (a)) and minimize the spread of the fuzzy set (as captured by (b)). These two requirements give rise to a certain optimization problem. Furthermore, which is quite legitimate, we assume that the fuzzy set to be constructed has a unimodal membership function or its maximal membership grades occupy a contiguous region in the universe of discourse in which this fuzzy set has been defined. This helps us separately build a membership function for its rising and declining sections. The core of the fuzzy set is determined first. Next, assuming the simplest scenario when using the linear type of membership functions, the essence of the optimization problem boils down to the rotation of the linear section of the membership function around the upper point of the core of A (for the illustration refer to Figure 5.24.) The point of rotation of the linear segment of this membership function is marked by an empty circle. By rotating this segment, we intend to maximize (a) and minimize (b). Before moving on with the determination of the membership function, we concentrate on the location of its numeric representative. Typically, one could view an average of the experimental data x1 , x2 , . . . , xn to be its sound representative. While its usage is quite common in practice, a better representative of the numeric data is a median value. There is a reason behind this choice. The median is a robust statistic, meaning that it allows for a high level of tolerance to potential noise existing in the data. Its important ability is to ignore outliers. Given that the fuzzy set is sought to be a granular and ‘stable’ representation of the numeric data, our interest is in the robust development, not being affected by noise. Undoubtedly,
121
Fuzzy Sets as a UserCentric Processing Framework of Granular Computing
max Σ A(xk)
min Supp(A) a
x Data
Figure 5.24 Optimization of the linear increasing section of the membership function of A; highlighted are the positions of the membership function originating from the realization of the two conflicting criteria
the use of the median is a good starting point. Let us recall that the median is an order statistic and is formed on the basis of an ordered set of numeric values. In the case of the odd number of data in the data set, the point located in the middle of this ordered sequence is the median. When we encounter an even number of data in the granulation window, instead of picking up an average of the two points located in the middle, we consider these two points to form a core of the fuzzy set. Thus depending on the number of data points, we end up either with triangular or with trapezoidal membership function. Having fixed the modal value of A (that could be a single numeric value, ‘m,’ or a certain interval [m, n]), the optimization of the spreads of the linear portions of the membership functions are carried out separately for their increasing and decreasing portions. We consider the increasing part of the membership function. (The decreasing part is handled in an analogous manner.) Referring to Figure 5.24, the two requirements guiding the design of the fuzzy set are and transformed into the corresponding multiobjective optimization problem as outlined as follows: (a) Maximize the experimental evidence of the fuzzy set; this implies that we tend to ‘cover’ as many numeric data as possible, viz., the coverage has to be made as high as possible. Graphically, in the optimization of this requirement, we rotate the linear segment up (clockwise), as illustrated in Figure 5.13. Formally, the sum of the membership grades A(xk ), k A(x k ), where A is the linear membership function to be optimized and xk is located to the left to the modal value, has to be maximized. (b) Simultaneously, we would like to make the fuzzy set as specific as possible so that it comes with some welldefined semantics. This requirement is met by making the support of A as small as possible, i.e., mina m − a. To accommodate the two conflicting requirements, we combine (a) and (b) in the form of the ratio that is maximized with respect to the unknown parameter of the linear section of the membership function maxa
k A(x k ) . m − a
(34)
The linearly decreasing portion of the membership function is optimized in the same manner. The overall optimization returns the parameters of the fuzzy number in the form of the lower and upper bound (a and b, respectively) and its support (m or [m, n]). We can write down such fuzzy numbers as A(a, m, n, b). We exclude a trivial solution of a = m in which case the fuzzy set collapses to a single numeric entity.
Fuzzy Sets as Aggregates of Numeric Data Fuzzy sets can be formed on the basis of numeric data through their clustering (groupings). The groups of data give rise to membership functions that convey a global, more abstract view at the available data.
122
Handbook of Granular Computing
With this regard, fuzzy cmeans (FCM, for brief) is one of the commonly used mechanisms of fuzzy clustering [16, 44]. Let us review its formulation, develop the algorithm, and highlight the main properties of the fuzzy clusters. Given a collection of ndimensional data set {xk }, k = 1, 2, . . . , N , the task of determining its structure – a collection of ‘c’ clusters – is expressed as a minimization of the following objective function (performance index), Q being regarded as a sum of the squared distances, Q=
c N
m u ik xk − vi 2 ,
(35)
i=1 k=1
where vi are ndimensional prototypes of the clusters, i = 1, 2, . . . , c, and U = [u ik ] stands for a partition matrix expressing a way of allocation of the data to the corresponding clusters; u ik is the membership degree of data xk in the ith cluster. The distance between the data zk and prototype vi is denoted by .. The fuzzification coefficient m(>1.0) expresses the impact of the membership grades on the individual clusters. A partition matrix satisfies two important properties: 0
1 is the most natural extension of the trivial lattice L (1) = {0, 1}. Indeed, the origin of manyvalued logics [23] was the threevalued logic on L = {0, 12 , 1}, which is isomorphic to L (2) . Moreover, this case also covers all situations with a finite number of truth values equipped with a linear order. For example, truthvalues can be granular, expressed linguistically as false, more or less true, almost true, and true, in which case they can be represented by the chain L (3) . For each n ∈ N, and therefore for each L (n) , there is a unique strong negation N : L (n) → L (n) given by N (x) = n − x. Conjunction operators on L (n) are often called discrete tnorms [24] and are defined axiomatically in accordance with Definition 2 The number of conjunction operators on L (n) is growing extremely fast with n [25] (see Table 10.1). Divisible conjunction operators on L (n) were introduced in [26] (called there smooth discrete tnorms) and characterized in [27]. Theorem 8. A mapping C: L 2(n) → L (n) is a divisible conjunction operator on L (n) if and only if there is a subset K ⊂ L (n) , containing 0 and n, i.e., K = {i 1 , . . . , i m } with 0 = i 1 < · · · < i m = n, such that max(x + y − i j+1 , i j ) if (x, y) ∈ {i j , . . . , i j+1 − 1}2 , (2) C(x, y) = min(x, y) otherwise. Each divisible conjunction operator on L (n) is therefore characterized by a subset K ∩ {1, . . . , n − 1}. Hence there are exactly 2n−1 divisible conjunction operators on L (n) . Divisible conjunction operators are further characterized by the 1Lipschitz property [24]. Proposition 9. A conjunction operator C: L 2(n) → L (n) is divisible if and only if it is 1Lipschitz, i.e., for all x, y, x , y ∈ L (n) , C(x, y) − C(x , y ) ≤ x − x  + y − y .
(3)
The unique strong negation N on L (n) brings a onetoone correspondence between disjunction and conjunction operators on L (n) . Note that for a given divisible conjunction operator C: L 2(n) → L (n) , the corresponding disjunction operator D: L 2(n) → L (n) given by (1) is also characterized by the 1Lipschitz property (3). Another way of relating 1Lipschitz discrete conjunction and disjunction operators on L (n) is characterized by the socalled Frank functional equation C(x, y) + D(x, y) = x + y
(4)
for all (x, y) ∈ L 2(n) , where C is an arbitrary divisible conjunction operator on L (n) . If this C is given by (2), then D: L 2(n) → L (n) is given by D(x, y) =
Table 10.1
min(x + y − i j , i j+1 )
if (x, y) ∈ {i j , . . . , i j+1 − 1}2 ,
max(x, y)
otherwise.
(5)
Number of conjunction operators on L (n)
n
Conjunction operators
n
Conjunction operators
n
Conjunction operators
1 2 3 4
1 2 6 22
5 6 7 8
94 451 2.386 13.775
9 10 11 12
86.417 590.489 4, 446.029 37, 869.449
210
Handbook of Granular Computing
Note that the structure (2) of divisible conjunction operators on L (n) (in fact, a particular case of an ordinal sum of discrete tnorms [28, 29]) has an impact not only on the structure of the corresponding disjunctions, but also on the similar structure of the corresponding residual implications. Indeed, if C: L 2(n) → L (n) is given by (2) then the related residual implication RC : L 2(n) → L (n) is given by ⎧ n if x ≤ y, ⎪ ⎪ ⎨ (6) RC (x, y) = i j+1 − x + y if x > y and (x, y) ∈ {i j , . . . , i j+1 − 1}2 , ⎪ ⎪ ⎩ y otherwise. The only divisible conjunction operator CL : L 2(n) → L (n) on L (n) such that the negation NCL is a strong negation (i.e., it is the only strong negation N on L (n) ) corresponds to the minimal set K = {0, n} (see Theorem 8). It is given by CL (x, y) = max(x + y − n, 0), and it is called the Lukasiewicz conjunction operator (Lukasiewicz discrete tnorm) on L (n) . "
"
10.4 Logical Connectives on Closed Intervals In the last decades, the greatest attention in the framework of manyvalued logics was paid to the case when the range of truth values is L = [0, 1], i.e., when L is the real unit interval. This range is also the background of fuzzy sets [30], rough sets [31], etc. Note that each lattice L = [a, b] ⊂ [−∞, ∞] is isomorphic to the lattice [0, 1], and thus the logical connectives on [a, b] can be derived in a straightforward way from the corresponding logical connectives on [0, 1].
Negation Let us first turn to negations which can be defined in accordance with Definition 1, i.e., any nonincreasing function N : [0, 1] → [0, 1] with N (0) = 1 and N (1) = 0 constitutes a negation on [0, 1]. Strong negations on [0, 1] were characterized in [32] and are related to increasing bijections. Theorem 10. A mapping N : [0, 1] → [0, 1] is a strong negation on [0, 1] if and only if there is an increasing bijection f : [0, 1] → [0, 1], such that N (x) = f −1 (1 − f (x)).
(7)
Note that there are uncountably many bijections f leading to the same strong negation N . The standard negation Ns : [0, 1] → [0, 1] is generated by the identity id[0,1] and it is given by Ns (x) = 1 − x.
Conjunction Conjunction operators on [0, 1] are called triangular norms, and they were originally introduced by Schweizer and Sklar [33, 34] in the framework of probabilistic metric spaces, generalizing earlier ideas of Menger [35]. More details about tnorms can be found in the recent monographs [29, 36]. Triangular norms can be rather peculiar, in general. For example, there are Borel nonmeasurable tnorms [37], or tnorms which are noncontinuous in a single point [29, 36]. An example of the latter type is the tnorm T : [0, 1]2 → [0, 1] given by a if x = y = 12 , (8) T (x, y) = max(x+y−1,0) otherwise, 1−4(1−x)(1−y) where either a = 0 (in which case T is
preserving) or a =
1 2
(then T is
preserving).
211
Logical Connectives for Granular Computing
The strongest tnorm T ∗ : [0, 1]2 → [0, 1] is given simply by T ∗ (x, y) = min(x, y), and it is usually denoted as TM . Similarly, the weakest tnorm T∗ : [0, 1]2 → [0, 1] is usually denoted as TD (and it is called the drastic product). The tnorm TM is an example of a continuous tnorm, while TD is a noncontinuous (but rightcontinuous) tnorm. Other typical examples of continuous tnorms are the product tnorm TP and the Lukasiewicz tnorm TL given by TP (x, y) = x y and TL (x, y) = max(x + y − 1, 0), respectively. Divisible triangular norms are just the continuous tnorms and their representation is given in [38]. Observe that the following representation theorem can also be derived from results of Mostert and Shields [39] in the context of I semigroups and that some preliminary results can also be found in [40, 41]. "
Theorem 11. A function T : [0, 1]2 → [0, 1] is a continuous tnorm if and only if T is an ordinal sum of continuous Archimedean tnorms. Note that a tnorm T : [0, 1]2 → [0, 1] is Archimedean if for each x, y ∈ ]0, 1[, there is an n ∈ N such that x T(n) < y, where x T(1) = x and for n > 1, x T(n) = T (x T(n−1) , x). For example, the drastic product TD is Archimedean. If T is a continuous tnorm, then its Archimedeanity is equivalent to the diagonal inequality T (x, x) < x for all x ∈ ]0, 1[ . Observe that both TP and TL are continuous Archimedean tnorms. Further, the ordinal sum of tnorms is a construction method coming from semigroup theory (introduced by Clifford [28] for abstract semigroups). For ordinalsumlike conjunction operators on other truthvalue lattices compare also equations (2), (11), and (13). Definition 12. Let (Tα )α∈A be a family of tnorms and (]aα , eα [)α∈A be a family of nonempty, pairwise disjoint open subintervals of [0, 1]. The tnorm T defined by T (x, y) =
α aα + (eα − aα ) · Tα ( ex−a , α −aα
y−aα eα −aα
)
min(x, y)
if (x, y) ∈ [aα , eα [2 , otherwise,
is called the ordinal sum of the summands aα , eα , Tα , α ∈ A, and we shall write T = (aα , eα , Tα )α∈A . Observe that the index set A is necessarily finite or countably infinite. It may also be empty, in which case the ordinal sum equals the strongest tnorm TM . Continuous Archimedean tnorms are strongly related to the standard addition on [0, ∞] . Theorem 13. For a function T : [0, 1]2 → [0, 1], the following are equivalent: 1. T is a continuous Archimedean tnorm. 2. T has a continuous additive generator, i.e., there exists a continuous, strictly decreasing function t: [0, 1] → [0, ∞] with t(1) = 0, which is uniquely determined up to a positive multiplicative constant, such that for all (x, y) ∈ [0, 1]2 , T (x, y) = t −1 (min(t(0), t(x) + t(y))).
(9)
Note that a continuous Archimedean tnorm T : [0, 1]2 → [0, 1] is called nilpotent whenever there is an x ∈ ]0, 1[ and n ∈ N such that x T(n) = 0, and it is called strict otherwise. Strict tnorms are also characterized by the cancelation property, i.e., T (x, y) = T (x, z) only if either x = 0 or y = z. Each strict tnorm T has an unbounded additive generator t, i.e., t(0) = ∞. Vice versa, each additive generator t of a nilpotent tnorm T is bounded, i.e., t(0) < ∞. Moreover, each strict tnorm T is isomorphic to TP (i.e., there is an increasing bijection ϕ: [0, 1] → [0, 1] such that T (x, y) = ϕ −1 (ϕ(x) · ϕ(y))) and each nilpotent tnorm T is isomorphic to TL .
212
Handbook of Granular Computing
The combination of these results yields the following representation of continuous (i.e., divisible) tnorms: Corollary 14. For a function T : [0, 1]2 → [0, 1], the following are equivalent: 1. T is a continuous tnorm. 2. T is isomorphic to an ordinal sum whose summands contain only the tnorms TP and TL . 3. There is a family (]aα , eα [)α∈A of nonempty, pairwise disjoint open subintervals of [0, 1] and a family h α : [aα , eα ] → [0, ∞] of continuous, strictly decreasing functions with h α (eα ) = 0 for each α ∈ A such that for all (x, y) ∈ [0, 1]2 , T (x, y) =
h −1 α (min(h α (aα ), h α (x) + h α (y)))
if (x, y) ∈ [aα , eα [2 ,
min(x, y)
otherwise.
(10)
Several examples of parameterized families of (continuous Archimedean) tnorms can be found in [29, 36]. We recall here only three such families. Example 15. 1. The family (TλSS )λ∈[−∞,∞] of Schweizer–Sklar tnorms is given by
TλSS (x, y) =
⎧ TM (x, y) ⎪ ⎪ ⎪ ⎪ ⎨ TP (x, y)
if λ = 0,
⎪ TD (x, y) ⎪ ⎪ ⎪ 1 ⎩ (max((x λ + y λ − 1), 0)) λ
if λ ∈ ]−∞, 0[ ∪ ]0, ∞[ .
if λ = −∞, if λ = ∞,
2. Additive generators tλSS : [0, 1] → [0, ∞] of the continuous Archimedean members (TλSS )λ∈]−∞,∞[ of the family of Schweizer–Sklar tnorms are given by tλSS (x)
=
− ln x
if λ = 0,
1−x λ λ
if λ ∈ ]−∞, 0[ ∪ ]0, ∞[ .
This family of tnorms is remarkable in the sense that it contains all four basic tnorms. The investigations of the associativity of duals of copulas in the framework of distribution functions led to the following problem: characterize all continuous (or, equivalently, nondecreasing) associative functions F: [0, 1]2 → [0, 1] which satisfy for each x ∈ [0, 1], the boundary conditions F(0, x) = F(x, 0) = 0 and F(x, 1) = F(1, x) = x, such that the function G: [0, 1]2 → [0, 1] given by G(x, y) = x + y − F(x, y) is also associative. In [42] it was shown that then F has to be an ordinal sum of members of the following family of tnorms. Example 16. 1. The family (TλF )λ∈[0,∞] of Frank tnorms (which were called fundamental tnorms in [43]) is given by ⎧ TM (x, y) ⎪ ⎪ ⎪ ⎪ ⎨ TP (x, y) TλF (x, y) = ⎪ TL (x, y) ⎪ ⎪ ⎪ ⎩ log (1 + λ
if λ = 0, if λ = 1, if λ = ∞, (λx −1)(λ y −1) ) λ−1
otherwise.
213
Logical Connectives for Granular Computing
2. Additive generators tλF : [0, 1] → [0, ∞] of the continuous Archimedean members (TλF )λ∈]0,∞] of the family of Frank tnorms are given by ⎧ ⎪ ⎨ − ln x tλF (x) = 1 − x ⎪ ⎩ ln( λλ−1 x −1 )
if λ = 1, if λ = ∞, if λ ∈ ]0, 1[ ∪ ]1, ∞[ .
Another family used for modeling the intersection of fuzzy sets is the following family of tnorms (which was first introduced in [44] for the special case λ ≥ 1 only). The idea was to use the parameter λ as a reciprocal measure for the strength of the logical and. In this context, λ = 1 expresses the most demanding (i.e., smallest) and, and λ = ∞ the least demanding (i.e., largest) and. Example 17. 1. The family (TλY )λ∈[0,∞] of Yager tnorms is given by ⎧ ⎪ ⎨ TD (x, y) TλY (x, y) = TM (x, y) ⎪ ⎩ 1 max(1 − ((1 − x)λ + (1 − y)λ ) λ , 0)
if λ = 0, if λ = ∞, otherwise.
2. Additive generators tλY : [0, 1] → [0, ∞] of the nilpotent members (TλY )λ∈]0,∞[ of the family of Yager tnorms are given by tλY (x) = (1 − x)λ . Another interesting class of tnorms are internal triangular norms, i.e., tnorms T : [0, 1]2 → [0, 1] such that T (x, y) ∈ {0, x, y} for all (x, y) ∈ [0, 1]2 (see also [29]). Theorem 18. A function T : [0, 1]2 → [0, 1] is an internal tnorm if and only if there is a subset A ⊂ ]0, 1[2 , such that (x, y) ∈ A implies (y, x) ∈ A (symmetry of A) and (u, v) ∈ A for all u ∈ ]0, x] , v ∈ ]0, y] (root property of A), and T (x, y) =
0
if (x, y) ∈ A,
min(x, y)
otherwise.
TD are internal tnorms (related to A = ∅ and A = ]0, 1[2 , respectively). An imNote that TM and portant example of a preserving internal tnorm is the nilpotent minimum T n M : [0, 1]2 → [0, 1] given by 0 if x + y ≤ 1, nM T (x, y) = min(x, y) otherwise. On the basis of these results, let us now turn to disjunction and implication operators on [0, 1].
Disjunction Disjunction operators on [0, 1] are called triangular conorms, and they are usually denoted by letter S. All results concerning triangular conorms can be derived from the corresponding results for triangular norms by means of the duality. For a given tnorm T : [0, 1]2 → [0, 1], its dual tconorm S: [0, 1]2 → [0, 1] is given by S(x, y) = 1 − T (1 − x, 1 − y), i.e., the standard negation Ns connects T and its dual S.
214
Handbook of Granular Computing
The four basic tconorms (dual to the basic tnorms) are SM , SD , SP , and SL given by SM (x, y) = max(x, y), ⎧ ⎪ ⎨ x if y = 0, SD (x, y) = y if x = 0, ⎪ ⎩ 1 otherwise, SP (x, y) = 1 − (1 − x)(1 − y) = x + y − x y, SL (x, y) = min(x + y, 1). We only briefly note that in ordinal sums of tconorms the operator max plays the same role as the operator min in the case of ordinal sums of tnorms. Concerning an additive generator s: [0, 1] → [0, ∞] of a continuous Archimedean tconorm S: [0, 1]2 → [0, 1], s is continuous, strictly increasing and s(0) = 0, and S(x, y) = s −1 (min(s(1), s(x) + s(y))). If t: [0, 1] → [0, ∞] is an additive generator of the corresponding dual tnorm T, then s = t ◦ Ns .
Implication Turning our attention to the implication operators on [0, 1], observe that the residual implications forming an adjoint pair (T, R ) are related to preserving tnorms (recall that, in the lattice [0, 1], the fact that T T is preserving is equivalent to the left continuity of T as a function from [0, 1]2 to [0, 1], so both notations are used in the literature). A deep survey on preserving tnorms is due to Jenei [45]. In the case of BLlogics, residual implications are related to divisible (i.e., continuous) tnorms. Observe that RT : [0, 1]2 → [0, 1] is continuous if and only if T is a nilpotent tnorm. Similarly, N T : [0, 1] → [0, 1], N T (x) = RT (x, 0), is a strong negation if and only if T is a nilpotent tnorm. Note, however, that there are noncontinuous preserving tnorms T such that N T is a strong negation. As an example recall the nilpotent minimum T n M , in which case N T n M = Ns . Similarly, N T = Ns for the tnorm T given in (8) for a = 0. Also another property of nilpotent tnorms is remarkable: for each nilpotent tnorm T, its adjoint residual implication RT coincides with the Simplication I NT ,S , where S: [0, 1]2 → [0, 1] is a tconorm N T dual to T, S(x, y) = N T (T (N T (x), N T (y))). We recall the three basic residual implications: 1 if x ≤ y, (G¨odel implication) RTM (x, y) = y otherwise, 1 if x ≤ y, (Goguen implication) RTP (x, y) = y otherwise, x RTL (x, y) = min(1, 1 − x + y).
(Lukasiewicz implication) "
Distinguished examples of Simplications based on the standard negation Ns are as follows: (Note that I Ns ,SL = RTL is the Lukasiewicz implication.) "
I Ns ,SP (x, y) = 1 − x + x y, I Ns ,SM (x, y) = max(1 − x, y).
(Reichenbach implication) (Kleene–Dienes implication)
Formally, all BLlogics based on a strict tnorm T are isomorphic to the product logic, i.e., to the BLlogic based on TP . Adding a new connective to these logics, namely a strong negation N (observe that N T = N TP = N∗ is the weakest negation for each strict tnorm T ), we obtain at least two different types of logics. One of them is based (up to isomorphism) on TP and Ns , while the another one on the Hamacher product TH and Ns , where TH : [0, 1]2 → [0, 1] is defined by xy TH (x, y) = , x + y − xy using the convention
0 0
= 0. For more details about strict BLlogics with a strong negation we refer to [46].
215
Logical Connectives for Granular Computing
10.5 Logical Connectives on Infinite Discrete Chains In Section 10.3, we discussed logical connectives on finite discrete lattices forming a chain L (n) with n ∈ N, whereas Section 10.4 deals with logical connectives on the unit interval as such on a chain with infinitely many arguments. Some other discrete chain lattices (necessarily infinite) have been discussed in [24] and will be in the focus of this section, namely the truthvalue lattices L (∞) = {0, 1, . . . , ∞} resp. L (−∞) = {−∞, . . . , −1, 0} as well as L (−∞,∞) = {−∞, . . . , −1, 0, 1, . . . , ∞}. As already mentioned in Section 10.2, there is no strong negation on the lattice L (∞) (equipped with the standard order), and thus there is no duality between conjunction operators on L (∞) and disjunction operators on L (∞) . Further, there is no divisible Archimedean conjunction operator on L (∞) . (The Archimedean property is defined similarly as on [0, 1], see also p. 211, and on L (∞) it is equivalent to the nonexistence of nontrivial idempotent elements.) The divisibility of conjunction and disjunction operators on L (∞) is characterized by the 1Lipschitz property similarly as in the case of finite chains L (n) . However, there is a unique divisible Archimedean disjunction operator D+ : L 2(∞) → L (∞) given by D+ (x, y) = x + y. The following result from [24] characterizes all divisible conjunction operators on L (∞) . Theorem 19. A mapping C: L 2(∞) → L (∞) is a divisible conjunction operator on L (∞) if and only if there ∞ is a strictly increasing sequence (n i )i=0 of elements of L (∞) with n 0 = 0 such that C(x, y) =
max(n i , x + y − n i+1 )
if (x, y) ∈ [n i , n i+1 [2 ,
min(x, y)
otherwise.
(11)
For divisible disjunction operators on L (∞) we have the following characterization. Theorem 20. A mapping D: L 2(∞) → L (∞) is a divisible disjunction operator on L (∞) if and only if there m ∞ is a strictly increasing sequence (n i )i=0 or (n i )i=0 of elements of L (∞) with n 0 = 0, and whenever the sequence is finite then n m = ∞, such that min(n i+1 , x + y − n i ) if (x, y) ∈ [n i , n i+1 [2 , D(x, y) = max(x, y) otherwise. Observe that divisible disjunction operators on L (∞) are in a onetoone correspondence with the subsets on N (nontrivial idempotent elements of D), while divisible conjunction operators on L (∞) are related to infinite subsets of N. Note that to any divisible disjunction operator D: L 2(∞) → L (∞) the mapping C: L 2(∞) → L (∞) given by C(x, y) = x + y − D(x, y), using the convention x + y − D(x, y) = min(x, y) if max(x, y) = ∞, is a conjunction operator on L (∞) . It is divisible if and only if the set of idempotent elements of D is infinite. For example, for the only Archimedean divisible disjunction operator D+ , the corresponding conjunction operator on L (∞) is just the weakest one, i.e., C∗ (which evidently is not divisible). The above relation means that the Frank functional equation C(x, y) + D(x, y) = x + y on L (∞) also has nondivisible solutions w.r.t. the conjunction operators C. Concerning the implication operators on L (∞) , it is remarkable that there is no implication operator which is simultaneously a Dimplication and a residual implication operator related to some divisible conjunction operator on L (∞) . This is no more true if we consider nondivisible conjunction operators on L (∞) . We give here some examples: 1. For n ∈ N, let the negation Nn : L (∞) → L (∞) be given by ⎧ ⎪ ⎨∞ Nn (x) = n − x ⎪ ⎩ 0
if x = 0, if x ∈ [1, n[ , otherwise.
216
Then
Handbook of Granular Computing
⎧ ⎪ ⎨∞ I Nn ,D+ (x, y) = n − x + y ⎪ ⎩ y
if x = 0, if x ∈ [1, n[ , otherwise.
2. For the weakest conjunction operator C∗ on L (∞) , we get ∞ if x < ∞, RC∗ (x, y) = y if x = ∞. Observe that RC∗ = I N ∗ ,D∗ , i.e., the residual implication related to the weakest conjunction operator C∗ on L (∞) coincides with the Dimplication with respect to the strongest negation N ∗ : L (∞) → L (∞) and the strongest disjunction operator D ∗ on L (∞) . ∞ 3. Let C: L 2(∞) → L (∞) be a divisible conjunction operator determined by the sequence (n i )i=0 . Then the 2 corresponding residual implication operator RC : L (∞) → L (∞) is given by ⎧ ∞ if x ≤ y, ⎪ ⎪ ⎨ RC (x, y) = n i+1 − x + y if x > y and (x, y) ∈ [n i , n i+1 [2 , ⎪ ⎪ ⎩ y otherwise. Logical connectives on the lattice L (−∞) = {−∞, . . . , −1, 0} can be derived from the logical connectives on L (∞) ; only the role of conjunction operators and disjunction operators is reversed. So, e.g., the only divisible Archimedean conjunction operator C+ : L 2(−∞) → L (−∞) is given by C+ (x, y) = x + y (and there is no divisible Archimedean disjunction operator on L (−∞) ). Another interesting discrete chain is the lattice L (−∞,∞) = {−∞, . . . , −1, 0, 1, . . . , ∞}. Following [24], each strong negation on L (−∞,∞) is determined by its value in 0, and thus each strong negation on L (−∞,∞) belongs to the family (Nn )n∈Z , where Z is the set of all integers, and Nn : L (−∞,∞) → L (−∞,∞) is given by Nn (x) = n − x. The existence of strong negations ensures the duality between the classes of conjunction operators and disjunction operators on L (−∞,∞) . Divisible conjunction operators on L (−∞,∞) are characterized by infinite sets of idempotent elements. Similarly as in the case of conjunction operators on L (∞) (even the same formula can be applied), the only restriction is that there are always infinitely many idempotent elements from the set {0, 1, . . .}. Take, e.g., set J = {−∞, 0, 1, . . . , ∞}. Then the corresponding conjunction operator C J : L 2(−∞,∞) → L (−∞,∞) is given by x+y if (x, y) ∈ {−∞, . . . , −1, 0}2 , C J (x, y) = min(x, y) otherwise. Taking the strong negation N0 : L (−∞,∞) → L (−∞,∞) given by N0 (x) = −x, the dual disjunction operator D J : L 2(−∞,∞) → L (−∞,∞) is given by D J (x, y) =
x+y
if (x, y) ∈ {0, 1, . . . , ∞}2 ,
max(x, y)
otherwise.
Concerning the implication operators on L (−∞,∞) , we introduce only two examples based on the logical connectives mentioned above. The residual implication RC J : L 2(−∞,∞) → L (−∞,∞) is given by ⎧ ∞ ⎪ ⎪ ⎨ RC J (x, y) = y − x ⎪ ⎪ ⎩ y
if x ≤ y, if x > y and (x, y) ∈ {−∞, . . . , −1, 0}2 , otherwise.
217
Logical Connectives for Granular Computing The Dimplication I N0 ,D J : L 2(−∞,∞) → L (−∞,∞) is given by ⎧ y−x ⎪ ⎪ ⎨ I N0 ,D J (x, y) = y ⎪ ⎪ ⎩ −x
if x ≤ 0 ≤ y, if 0 < x and −x ≤ y, otherwise.
10.6 Logical Connectives on IntervalValued TruthValue Lattices A genuine model of uncertainty of truthvalues in manyvalued logics is formed by intervals of some underlying lattice L . Denote by L I the set of all intervals [x, y] = {z ∈ L  x ≤ z ≤ y} with x, y ∈ L and x ≤ y. Evidently, L can be embedded into L I by means of the trivial intervals [x, x] = {x}, x ∈ L . Moreover, L I is a lattice with bottom element [0, 0] and top element [1, 1], and with joint and meet inherited from the original lattice L , i.e.,
[x, y] [u, v] = [x ∨ u, y ∨ v] and [x, y] [u, v] = [x ∧ u, y ∧ v] . Note that we cannot repeat the approach from interval arithmetic [47] when looking for the logical connectives on L I . For example, for any nontrivial lattice L , take any element a ∈ L \ {0, 1}. For the weakest conjunctor C∗ : L 2 → L , putting C∗I ([a, 1] , [a, 1]) = {z ∈ L  z = C∗ (x, y), (x, y) ∈ [a, 1]2 } we see that the result of this operation is [a, 1] ∪ {0}, which is an element of L I only if [a, 1] ∪ {0} = L . Therefore we should elaborate the logical connectives on L I applying the approaches described in Section 10.2. In most cases only the interval lattice [0, 1] I is considered (see, e.g., [45, 48–50]), and thus in this section we will deal with this special case only. To simplify the notation, we denote [0, 1] I = L. Observe that the lattice L is isomorphic to the lattice L ∗ = {(a, b)  a, b ∈ [0, 1], a + b ≤ 1}, which is the background of the intuitionistic fuzzy logic introduced by Atanassov [51] (for a critical comment about the mathematical misuse of the word ‘intuitionistic’ in this context see [52]), and that the logical connectives on L ∗ are extensively discussed in [53, 54]. Each negation N : [0, 1] → [0, 1] induces a negation N : L → L given by N ([x, y]) = [N (y), N (x)] , but not vice versa. However, for the strong negations on L we have the following result (compare also [53, 54]). Theorem 21. A mapping N : L → L is a strong negation on L if and only if there is a strong negation N : [0, 1] → [0, 1] such that N (x, y) = [N (y), N (x)] . On the basis of standard negation Ns on [0, 1], we introduce the standard negation Ns on L given by Ns (x, y) = [1 − y, 1 − x] . Conjunction operators on L are discussed, e.g., in [56, 55] (compare also [53, 54]). We can distinguish four classes of conjunction operators on L: (L1) trepresentable conjunction operators CT1 ,T2 : L2 → L, given by CT1 ,T2 ([x, y] , [u, v]) = [T1 (x, u), T2 (y, v)] , where T1 , T2 : [0, 1]2 → [0, 1] are tnorms such that T1 ≤ T2 ; (L2) lower pseudotrepresentable conjunction operators CT : L2 → L, given by CT ([x, y] , [u, v]) = [T (x, u), max(T (x, v), T (u, y))] , where T : [0, 1]2 → [0, 1] is a tnorm; (L3) upper pseudotrepresentable conjunction operators C T : L2 → L, given by C T ([x, y] , [u, v]) = [min(T (x, v), T (u, y)), T (y, v)] , where T : [0, 1]2 → [0, 1] is a tnorm;
218
Handbook of Granular Computing
(L4) nonrepresentable conjunction operators, i.e., conjunction operators on L not belonging to (L1), neither to (L2) nor to (L3). Observe that for any tnorms T1 , T2 : [0, 1]2 → [0, 1] with T1 ≤ T2 , we have CT1 ≤ CT1 ,T2 ≤ C T2 . The strongest conjunction operator C ∗ : L2 → L is trepresentable because of C ∗ = CTM ,TM , while the weakest conjunction operator C∗ : L2 → L is nonrepresentable. Note that taking the weakest tnorm TD , the corresponding trepresentable conjunction operator CTD ,TD : L2 → L fulfils CTD ,TD ([a, 1] , [a, 1]) = [0, 1] for all a ∈ [0, 1[ , while C∗ ([a, 1] , [a, 1]) = [0, 0] whenever a ∈ [0, 1[ . An interesting parametric class of conjunction operators Ca,T : L2 → L with a ∈ [0, 1], where T : [0, 1]2 → [0, 1] is a tnorm, is given by (see [56]) Ca,T ([x, y] , [u, v]) = [T (x, u), max(T (a, T (y, v)), T (x, v), T (u, y))] . Then C1,T = CT,T is trepresentable, C0,T = CT is lower pseudotrepresentable, and for a ∈ ]0, 1[, Ca,T is a nonrepresentable conjunction operator on L. Observe that there are no divisible conjunction operators on L, and thus L cannotserve as a carrier for a BLlogic. Moreover, continuous conjunction operators on L are not necessarily preserving. For example, the conjunction operator C: L2 → L given by C([x, y] , [u, v]) = [max(0, x + u − (1 − y)(1 − v) − 1), max(0, y + v − 1)] is continuous and nonrepresentable. However, it is not preserving (see [54]). Disjunction operators D: L2 → L can be derived from conjunction operators C: L2 → L by duality, e.g., applying the standard negation Ns : L → L, putting D([x, y] , [u, v]) = Ns (C(Ns ([x, y]), Ns ([u, v]))). Therefore we distinguish again four classes of disjunction operators on L, namely, (L1 ) trepresentable disjunction operators D S1 ,S2 : L2 → L, given by D S1 ,S2 ([x, y] , [u, v]) = [S1 (x, u), S2 (y, v)] , where S1 , S2 : [0, 1]2 → [0, 1] are tconorms such that S1 ≤ S2 ; (L2 ) lower pseudotrepresentable disjunction operators D S : L2 → L, given by
D S [x, y] , [u, v]) = [S(x, u), max(S(x, v), S(u, y))] , where S: [0, 1]2 → [0, 1] is a tconorm; (L3 ) upper pseudotrepresentable disjunction operators D S : L2 → L, given by
D S ([x, y] , [u, v]) = [min(S(x, v), S(u, y)), S(y, v)] , where S: [0, 1]2 → [0, 1] is a tconorm; (L4 ) nonrepresentable disjunction operators, i.e., disjunction operators on L not belonging to (L1 ), neither to (L2 ), nor to (L3 ). Note that the class (L1 ) is dual to (L1), (L2 ) is dual to (L3), (L3 ) is dual to (L2), and (L4 ) is dual to (L4). Recall that the weakest disjunction operator D∗ : L2 → L is trepresentable, D∗ = DSM ,SM , while the strongest disjunction operator D∗ : L2 → L is nonrepresentable. We introduce here the parametric class Da,S with a ∈ [0, 1] of disjunction operators on L generated by a tconorm S: [0, 1]2 → [0, 1] and given by Da,S ([x, y] , [u, v]) = [min(S(a, S(x, u)), S(u, y), S(x, v)), S(y, v)] .
219
Logical Connectives for Granular Computing
Observe that Da,S is Ns dual to C1−a,T whenever T is a tnorm which is Ns dual to S. Moreover, D0,S = D S,S is a trepresentable, D1,S = D S is an upper pseudotrepresentable, and for a ∈ ]0, 1[, Da,S is a nonrepresentable disjunction operator. Among several types of implication operators on L, we recall the two of them discussed in Section 10.2. For a conjunction operator C: L2 → L, the corresponding residual implication RC : L2 → L is given by {[α, β] ∈ L  C([x, y] , [α, β]) ≤ [u, v]}. RC ([x, y] , [u, v]) = Some examples of residual implications RC are given as follows: RCT1 ,T2 ([x, y] , [u, v]) = min(RT1 (x, u), RT2 (y, v)), RT2 (y, v) , RCT ([x, y] , [u, v]) = [min(RT (x, u), RT (y, v)), RT (x, v)] , RC T ([x, y] , [u, v]) = [min(RT (x, u), RT (y, v)), RT (y, v)] . Recall that the mapping NC : L → L given by NC ([x, y]) = RC ([x, y] , [0, 0]) is a negation on L for any conjunction operator C: L2 → L. The Dimplication IN ,D : L2 → L is given by IN ,D ([x, y] , [u, v]) = D(N ([x, y]), [u, v]), where D is a disjunction operator on L and N is a negation on L. Some examples of Dimplications on L are as follows: INs ,D∗ ([x, y] , [u, v]) = [max(1 − y, u), max(1 − x, v)] , INs ,DS1 ,S2 ([x, y] , [u, v]) = I Ns ,S1 (y, u), I Ns ,S2 (x, v) , IN ,DS ([x, y] , [u, v]) = I N ,S (y, u), max(I N ,S (y, v), I N ,S (x, u)) , INs ,DSL ([x, y] , [u, v]) = min(I Ns ,SL (x, u), I Ns ,SL (y, v)), I Ns ,SL (x, v) , where N ([x, y]) = [N (y), N (x)] (compare Theorem 21). Observe that NCTL = Ns and that INs ,D SL = RCTL , where the upper pseudotrepresentable disjunction operator D SL is Ns dual to the lower pseudotrepresentable conjunction operator CTL . Moreover, all these operators are continuous, thus copying the properties of Lukasiewicz operators in [0, 1]valued logics. "
10.7 Logical Connectives on Other Lattices of TruthValues We have already seen in Section 10.6 that logical connectives on more complex lattices might, but need not, be related to logical connectives on basic resp. underlying lattices. Therefore, we now particularly turn to such lattice structures L which are built from a family of given lattices L k with k ∈ K and discuss the corresponding logical connectives. We will see that although we can always look at that new lattice L independently of the originally given lattices L k and of the applied construction method (compare, e.g., the approach of Goguen [57] to Lfuzzy sets), several logical connectives on L can be derived from the corresponding logical connectives on the lattices L k . In particular, we will focus on logical connectives on Cartesian products of lattices, horizontal as well as ordinal sums of lattices.
Cartesian Products The most common construction method is the Cartesian product. Therefore, consider a system (L k )k∈K of lattices and put
L k = {(xk )k∈K  xk ∈ L k }. L= k∈K
220
Handbook of Granular Computing
The lattice operations on L are defined coordinatewise, and 1 = (1k )k∈K and 0 = (0k )k∈K are its top and bottom elements, respectively, such that L is again a bounded lattice. Evidently, for any kind of logical connectives on L , we can derive them from the corresponding logical connectives on L k ’s. However, not each logical connective on L can be constructed coordinatewise. For example, let L = L 1 × L 2 and Ni : L i → L i be a (strong) negation on L i with i ∈ {1, 2}. Then also N : L → L given by N (x1 , x2 ) = (N1 (x1 ), N2 (x2 )) is a (strong) negation. However, if L 1 = L 2 and N1 = N2 , then also N : L → L given by N (x1 , x2 ) = (N1 (x2 ), N1 (x1 )) is a (strong) negation on L . Also the strongest (weakest) negation N ∗ : L → L (N∗ : L → L) is not built coordinatewise. Conjunction operators on product lattices were discussed in [25] (see also [58, 59]). The strongest conjunction operator C ∗ : L 2 → L is derived coordinatewise from the strongest conjunction operators Ck∗ : L 2k → L k , k ∈ K, contrary to the the weakest conjunction operator C∗ : L 2 → L . Sufficient conditions ensuring that a conjunction operator C: L 2 → L is defined coordinatewise, i.e., C((xk )k∈K , (yk )k∈K ) = (Ck (xk , yk ))k∈K , are the preserving property or the preserving property of C (see [25]). The situation with the disjunction operators and the implication operators is the same.
Horizontal and Ordinal Sums Similar is the situation with horizontal and ordinal sums of lattices. Definition 22. Let (L k )k∈K be a system of lattices such that (L k \ {0k , 1k })k∈K is a pairwise disjoint system. Then 1. If 1k = 1 and 0k = 0 for all k ∈ K, then the horizontal sum of lattices (L k )k∈K is the lattice L = k∈K L k with top element 1 and bottom element 0. Moreover, x ≤ y if and only if x, y ∈ L k for some k ∈ K and x ≤k y (i.e., nonextremal elements from different L k ’s are incomparable). 2. If K is a linearly ordered set with top element k ∗ and bottom element k∗ and if, for k1 1). Hence there are no divisible conjunction operators on a nontrivial horizontal sum lattice L . Then, for a negation N : L → L given by N (x) = Nk (x) whenever x ∈ L k , where Nk : L k → L k is a negation on L k , k ∈ K, the corresponding Dimplication I N ,D : L 2 → L is given by ⎧ I N ,D (x, y) if (x, y) ∈ L 2k , ⎪ ⎪ ⎨ k k if N (x) = 0, I N ,D (x, y) = y ⎪ ⎪ ⎩ 1 otherwise, and if N : L → L is moreover a strong negation on L , then I Nk ,Dk (x, y) if (x, y) ∈ L 2k , I N ,D (x, y) = 1 otherwise. In the case of ordinal sums, negation operators on L are not related (up to some special cases) to negation operators on single summands L k . Moreover, there are cases where single summands do not admit any strong negation; however, the ordinal sum lattice L possesses strong negations. This is, e.g., the case for L (−∞,∞) , which is an ordinal sum of L 1 = L (−∞) and L 2 = L (∞) . However, turning to conjunction and disjunction operators on an ordinal sum lattice L = k∈K L k , the following holds: for any system (Ck )k∈K of conjunction operators resp. any system (Dk )k∈K of disjunction operators on L k , k ∈ K, the mapping C: L 2 → L given by if (x, y) ∈ L 2k , Ck (x, y) (13) C(x, y) = min(x, y) otherwise, resp., the mapping D: L → L given by D(x, y) =
Dk (x, y)
if (x, y) ∈ L 2k ,
max(x, y)
otherwise,
is a conjunction resp. a disjunction operator on L (compare also (2), (5)). Observe that they are usually called ordinal sums of the corresponding operators. Note also that while C ∗ is an ordinal sum of ((C ∗ )k )k∈K , this is no more true in the case of the weakest conjunction operator C∗ on L . Similarly, D∗ is an ordinal sum of ((D∗ )k )k∈K , but D ∗ is not an ordinal sum. Finally recall that the residual implication RC : L 2 → L related to an ordinal sum conjunction operator C: L 2 → L given by (13) is given by (compare with (6)) ⎧ 1 if x ≤ y, ⎪ ⎪ ⎨ RC (x, y) = RCk (x, y) if x > y and (x, y) ∈ L 2k , ⎪ ⎪ ⎩ y otherwise.
222
Handbook of Granular Computing
10.8 Conclusion We have discussed logical connectives for different types of truthvalue lattices relevant for dealing with information granules. Particular emphasis has been set on discrete chains, intervals, as well as intervalvalued lattices. We have illustrated that depending on the underlying structure particular types of connectives, like strong negations or relationships between different connectives cannot be provided. Further, it has been demonstrated how connectives on constructed lattices can be related to connectives on underlying given lattices. The diversity of models for granular computing presented here opens new possibilities for fitting a mathematical model to real data. A suitable tool for such a fitting is, e.g., the software package developed by Gleb Beliakov. For a free download, see http://www.it.deakin.edu.au/~gleb.
Acknowledgments Radko Mesiar was supported by the grants VEGA 1/3006/06 and MSM VZ 6198898701, and Andrea Mesiarov´aZem´ankov´a by grant VEGA 2/7142/27. Susanne SamingerPlatz has been on a sabbatical year at the Dipartimento di Matematica ‘Ennio De Giorgi,’ Universit`a del Salento (Italy), when writing large parts of this chapter. She therefore gratefully acknowledges the support by the Austrian Science Fund (FWF) in the framework of the Erwin Schr¨odinger Fellowship J 2636 ‘The Property of Dominance – From Products of Probabilistic Metric Space to (Quasi)Copulas and Applications.’
References [1] L.A. Zadeh. Fuzzy sets and information granularity. In: M.M. Gupta, R.K. Ragade, and R.R. Yager (eds), Advances in Fuzzy Set Theory and Applications. NorthHolland, Amsterdam, 1979, pp. 3–18. [2] L.A. Zadeh. Fuzzy logic = computing with words. IEEE Trans. Fuzzy Syst. 4 (1996) 103–111. [3] L.A. Zadeh. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 90 (1997) 111–127. [4] L.A. Zadeh. From computing with numbers to computing with words – from manipulation of measurements to manipulation of perceptions. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 46 (1999) 105–119. [5] L.A. Zadeh. Toward a logic of perceptions based on fuzzy logic. In: V. Nov´ak and I. Perfilieva (eds), Discovering the World With Fuzzy Logic. PhysicaVerlag, Heidelberg, 2000, pp. 4–28. [6] A. Bargiela and W. Pedrycz. Granular Computing. Kluwer Academic Publishers, Boston, 2003. [7] V. Nov´ak. Granularity via properties: the logical approach. In: Proceedings of the European Society for fuzzy Logic and Technology 2001, Leicester, 2001, pp. 372–376. [8] W. Pedrycz (ed.). Granular Computing. PhysicaVerlag, Heidelberg, 2001. [9] W. Pedrycz. From granular computing to computational intelligence and humancentric systems. Personal communication, 2007. [10] S.K. Pal, L. Polkowski, and A. Skowron (eds.). RoughNeural Computing. Techniques for Computing with Words. Springer, Berlin, 2004. [11] L.A. Zadeh. From computing with numbers to computing with words – from manipulation of measurements to manipulation of perceptions. Int. J. Appl. Math. Comput. Sci. 12 (2002) 307–324. [12] P. H´ajek. Metamathematics of Fuzzy Logic. Kluwer Academic Publishers, Dordrecht, 1998. [13] V. Nov´ak, I. Perfilieva, and J. Moˇckoˇr. Mathematical Principles of Fuzzy Logic. Kluwer Academic Publishers, Norwell, 1999. [14] N.N. Karnik and J.M. Mendel. Operations on type2 fuzzy sets. Fuzzy Sets Syst. 122 (2001) 327–348. [15] Q. Liang and J.M. Mendel. Interval type2 fuzzy logic systems: theory and design. IEEE Trans. Fuzzy Syst. 8 (2000) 535–550. [16] E.P. Klement and R. Mesiar (eds.). Logical, Algebraic, Analytic, and Probabilistic Aspects of Triangular Norms. Elsevier, Amsterdam, 2005. [17] W. Pedrycz. Fuzzy Control and Fuzzy Systems. Technical Report 82 14. Delft University of Technology, Department of Mathematics, Delft, 1982. [18] M. Baczynski and J. Balasubramaniam. Fuzzy implications. Studies in Fuzziness and Soft Computing. Springer, Heidelberg, to appear.
Logical Connectives for Granular Computing
223
[19] U. H¨ohle. Commutative, residuated monoids. In: U. H¨ohle and E.P. Klement (eds), NonClassical Logics and Their Applications to Fuzzy Subsets. A Handbook of the Mathematical Foundations of Fuzzy Set Theory. Kluwer Academic Publishers, Dordrecht, 1995, pp. 53–106. [20] N.N. Morsi and E.M. Roshdy. Issues on adjointness in multiplevalued logics. Inf. Sci. 176 (2006) 2886– 2909. [21] M. Miyakoshi and M. Shimbo. Solutions of composite fuzzy relational equations with triangular norms. Fuzzy Sets Syst. 16 (1985) 53–63. [22] S. Gottwald. A Treatise on ManyValued Logic. Studies in Logic and Computation. Research Studies Press, Baldock, 2001. [23] J. Lukasiewicz. O logice tr´owartosciowej. Ruch Filozoficzny 5 (1920) 170–171. (English translation contained in L. Borkowski (ed.). J. Lukasiewicz: Selected Works. Studies in Logic and Foundations of Mathematics. NorthHolland, Amsterdam, 1970.) [24] G. Mayor and J. Torrens. Triangular norms on discrete settings. In: E.P. Klement and R. Mesiar (eds), Logical, Algebraic, Analytic, and Probabilistic Aspects of Triangular Norms. Elsevier, Amsterdam, 2005, chapter 7, pp. 189–230. [25] B. De Baets and R. Mesiar. Triangular norms on product lattices. Fuzzy Sets Syst. 104 (1999) 61–75. [26] L. Godo and C. Sierra. A new approach to connective generation in the framework of expert systems using fuzzy logic. In: Proceedings of 18th International Symposium on MultipleValued Logic, Palma de Mallorca, IEEE Computer Society Press, Washington, DC, 1988, pp. 157–162. [27] G. Mayor and J. Torrens. On a class of operators for expert systems. Int. J. Intell. Syst. 8 (1993) 771–778. [28] A.H. Clifford. Naturally totally ordered commutative semigroups. Am. J. Math. 76 (1954) 631–646. [29] E.P. Klement, R. Mesiar, and E. Pap. Triangular Norms. Kluwer Academic Publishers, Dordrecht, 2000. [30] L.A. Zadeh. Fuzzy sets. Inf. Control 8 (1965) 338–353. [31] Z. Pawlak. Rough sets. Int. J. Comput. Inf. Sci. 11 (1982) 341–356. [32] E. Trillas. Sobre funciones de negaci´on en la teor´ıa de conjuntas difusos. Stochastica 3 (1979) 47–60. [33] B. Schweizer and A. Sklar. Statistical metric spaces. Pac. J. Math. 10 (1960) 313–334. [34] B. Schweizer and A. Sklar. Probabilistic Metric Spaces. NorthHolland, New York, 1983. [35] K. Menger. Statistical metrics. Proc. Natl. Acad. Sci. USA 8 (1942) 535–537. [36] C. Alsina, M.J. Frank, and B. Schweizer. Associative Functions: Triangular Norms and Copulas. World Scientific, Singapore, 2006. [37] E.P. Klement. Construction of fuzzy σ algebras using triangular norms. J. Math. Anal. Appl. 85 (1982) 543– 565. [38] C.M. Ling. Representation of associative functions. Publ. Math. Debrecen 12 (1965) 189–212. [39] P.S. Mostert and A.L. Shields. On the structure of semigroups on a compact manifold with boundary. Ann. Math. II Ser. 65 (1957) 117–143. [40] J. Acz´el. Sur les op´erations definies pour des nombres r´eels. Bull. Soc. Math. Fr. 76 (1949) 59–64. [41] B. Schweizer and A. Sklar. Associative functions and abstract semigroups. Publ. Math. Debrecen 10 (1963) 69–81. [42] M.J. Frank. On the simultaneous associativity of F(x, y) and x + y − F(x, y). Aeq. Math. 19 (1979) 194–226. [43] D. Butnariu and E.P. Klement. Triangular NormBased Measures and Games with Fuzzy Coalitions. Kluwer Academic Publishers, Dordrecht, 1993. [44] R.R. Yager. On a general class of fuzzy connectives. Fuzzy Sets Syst. 4 (1980) 235–242. [45] S. Jenei. A survey on leftcontinuous tnorms and pseudo tnorms. In: E.P. Klement and R. Mesiar (eds), Logical, Algebraic, Analytic, and Probabilistic Aspects of Triangular Norms. Elsevier, Amsterdam, 2005, chapter 5, pp. 113–142. [46] P. Cintula, E.P. Klement, R. Mesiar, and M. Navara. Residuated logics based on strict triangular norms with an involutive negation. Math. Log. Quart. 52 (2006) 269–282. [47] R.E. Moore. Interval Analysis. Prentice Hall, Englewood Cliffs, NJ, 1966. [48] M.B. Gorzalczany. A method of inference in approximate reasoning based on intervalvalued fuzzy sets. Fuzzy Sets Syst. 21 (1987) 1–17. [49] H.T. Nguyen and E. Walker. A First Course in Fuzzy Logic. CRC Press, Boca Raton, FL, 1997. [50] R. Sambuc. Fonctions Φfloues. Application a` l’aide au diagnostic en pathologie thyrodienne. Ph.D. Thesis. Universit´e de Marseille II, France, 1975. [51] K.T. Atanassov. Intuitionistic Fuzzy Sets. PhysicaVerlag, Heidelberg, 1999. [52] D. Dubois, S. Gottwald, P. H´ajek, J. Kacprzyk, and H. Prade. Terminological difficulties in fuzzy set theory – the case of ‘intuitionistic fuzzy sets.’ Fuzzy Sets Syst. 156 (2005) 485–491. [53] G. Deschrijver, C. Cornelis, and E.E. Kerre. On the representation of intuitionistic fuzzy tnorms and tconorms. IEEE Trans. Fuzzy Syst. 12 (2004) 45–61. "
"
224
Handbook of Granular Computing
[54] G. Deschrijver and E.E. Kerre. Triangular norms and related operators in L ∗ fuzzy set theory. In: E.P. Klement and R. Mesiar (eds), Logical, Algebraic, Analytic, and Probabilistic Aspects of Triangular Norms. Elsevier, Amsterdam, 2005, chapter 8, pp. 231–259. [55] G. Deschrijver. The Archimedean property for tnorms in intervalvalued fuzzy set theory. Fuzzy Sets Syst. 157 (2006) 2311–2327. [56] G. Deschrijver. Archimedean tnorms in intervalvalued fuzzy set theory. In: Proceedings of Eleventh International Conference IPMU 2006, Information Processing and Management of Uncertainty in KnowledgeBased ´ Systems, Paris, Vol. 1, Editions EDK, Paris, 2006, pp. 580–586. [57] J.A. Goguen. The logic of inexact concepts. Synthese 19 (1968/69) 325–373. [58] S. Jenei and B. De Baets. On the direct decomposability of tnorms over direct product lattices. Fuzzy Sets Syst. 139 (2003) 699–707. [59] S. Saminger. On ordinal sums of triangular norms on bounded lattices. Fuzzy Sets Syst. 157 (2006) 1403–1416. [60] S. Saminger, E.P. Klement, and R. Mesiar. On extensions of triangular norms on bounded lattices. Submitted for publication.
11 Calculi of Information Granules: Fuzzy Relational Equations Siegfried Gottwald
11.1 Introduction The earliest and most paradigmatic examples of granular computing are the fuzzy control approaches which are based on finite lists of linguistic control rules, each of which has a finite number of fuzzy input values and a fuzzy output value – each of them a typical information granule. In engineering science, fuzzy control methods have become a standard tool, which allows to apply computerized control approaches to a wider class of problems as those which can be reasonably and effectively treated with the more traditional mathematical methods like the ProportionalDerivative (PD) or ProportionalIntegralDerivative (PID) control strategies. For an industrial engineer, usually success in control applications is the main criterion. Then he or she even accepts methods which have, to a larger extent, only a heuristic basis. And this has been the situation with fuzzy control approaches for a considerable amount of time. Particularly with respect to the linguistic control rules which are constitutive for a lot of fuzzy control approaches. Of course, success in applications then calls for mathematical reflections about and mathematical foundations for the methods under consideration. For linguistic control rules their transformation into fuzzy relation equations has been the core idea in a lot of such theoretical reflections. Here we discuss this type of a mathematical treatment of rulebased fuzzy control, which has the problem of the solvability of systems of fuzzy relation equations as its core, with particular emphasis on some more recent viewpoints which tend toward a more general view at this mathematization. One of these rather new points studied here, first in Section 11.7, is the idea of a certain iteration of different methods to determine pseudosolutions of such systems, methods which aim at finding approximate solutions. But also the same method may be iterated and one may ask for a kind of ‘stability’ in this case, as is done in the more general context of Section 11.16. Another new point of view is to look at this control problem as an interpolation problem. And finally, a third new idea is the treatment of the whole standard approach toward fuzzy control from a more general point of view, which understands the usual compositional rule of inference as a particular inference mechanism which is combined with further aggregation operations. Some of the results which are collected here have been described in previous papers of the author and some of his coauthors, particularly in [1– 4].
Handbook of Granular Computing C 2008 John Wiley & Sons, Ltd
Edited by Witold Pedrycz, Andrzej Skowron and Vladik Kreinovich
Handbook of Granular Computing
226
11.2 Preliminaries We use in this chapter a settheoretic notation for fuzzy sets, which refers to a logic with truth degree set [0, 1] based on a leftcontinuous tnorm t, or – more general – based on a class of (complete) prelinear residuated lattices with semigroup operation ∗. This means that we consider the logic MTL of leftcontinuous tnorms as the formal background (cf. [5]). This logic has as standard connectives two conjunctions and a disjunction: & = ∗,
∧ = min,
∨ = max.
In the lattice case we mean, by a slight abuse of language, by min, max the lattice meet, and the lattice join, respectively. It also has an implication → characterized by the adjointness condition u ∗ v≤w
iff
u ≤ (v → w),
as well as a negation − given by −H = H → 0. The quantifiers ∀ and ∃ mean the infimum and supremum, respectively, of the truth degrees of all instances. And the truth degree 1 is the only designated one. Therefore logical validity = H means that H has always truth degree 1. The shorthand notation [[H ]] denotes the truth degree of formula H , assuming that the corresponding evaluation of the (free) variables, as well as the model under consideration (in the firstorder case), is clear from the context. The class term notation {x H (x)} denotes the fuzzy set A with μ A (a) = A(a) = [[H (a)]]
for each a ∈ X.
Occasionally we also use graded identity relations ≡ and ≡∗ for fuzzy sets, based on the graded inclusion relation A ⊆ B = ∀x(A(x) → B(x)), and defined as A ≡ B = A ⊆ B & B ⊆ A, A ≡∗ B = A ⊆ B ∧ B ⊆ A. Obviously one has the relationships = =
Bi ≡∗ B j ↔ ∀y(Bi (y) ↔ B j (y)), A ≡ B → A ≡∗ B.
11.3 Fuzzy Control and Relation Equations The standard paradigm of rulebased fuzzy control is that one supposes to have given, in a granular way, an incomplete and fuzzy description of a control function Φ from an input space X to an output space Y, realized by a finite family D = (Ai , Bi )1≤i≤n
(1)
of (fuzzy) input–output data pairs. These granular data are supposed to characterize this function Φ sufficiently well.
Calculi of Information Granules
227
In the usual approaches such a family of input–output data pairs is provided by a finite list if α is Ai , then β is Bi ,
i = 1, . . . , n,
(2)
of linguistic control rules, also called fuzzy if–then rules, describing some control procedure with input variable α and output variable β. Mainly in engineering papers one often also consider the case of different input variables α1 , . . . , αn ; in this case the linguistic control rules become of the form if α1 is Ai1 , and . . . and αn is Ain , then β is Bi ,
i = 1, . . . , n.
But from a mathematical point of view such rules are equivalent to the former ones: one simply has to allow as the input universe for α the cartesian product of the input universes of α1 , . . . , αn . Let us assume for simplicity that all the input data Ai are normal; i.e., there is a point x0 in the universe of discourse with Ai (x0 ) = 1. Sometimes even weak normality would suffice; i.e., the supremum over all the membership degrees of the Ai equals 1; but we do not indent to discuss this in detail. The main mathematical problem of fuzzy control, besides the engineering problem to get a suitable list of linguistic control rules for the actual control problem, is therefore the interpolation problem to find a function Φ ∗ : IF(X ) → IF(Y), which interpolates these data, i.e., which satisfies Φ ∗ (Ai ) = Bi
for each i = 1, . . . , n,
(3)
and which, in this way, gives a fuzzy representation for the control function Φ. Actually the standard approach is to look for one single function, more precisely: for some uniformly defined function, which should interpolate all these data, and which should be globally defined over IF(X ), or at least over a suitably chosen sufficiently large subclass of IF(X ).
11.3.1 The Compositional Rule of Inference Following the basic ideas of Zadeh [6], such a fuzzy controller is formally realized by a fuzzy relation R which connects fuzzy input information A with fuzzy output information B via the compositional rule of inference (CRI): B = A ◦ R = R A = {y ∃x(A(x) & R(x, y))}.
(4)
Therefore, applying this idea to the linguistic control rules themselves, these rules in a natural way become transformed into fuzzy relation equations Ai ◦ R = Bi ,
for i = 1, . . . , n,
(5)
i.e., form a system of such relation equations. The problem, to determine a fuzzy relation R which realizes via (4) such a list (2) of linguistic control rules, therefore becomes the problem to determine a solution of the corresponding system (5) of relation equations. This problem proves to be a rather difficult one: it often happens that a given system (5) of relation equations is unsolvable. This is already the case in the more specific situation that the membership degrees belong to a Boolean algebra, as discussed (as a problem for Boolean matrices), e.g., in [7]. Nice solvability criteria are still largely unknown. Thus the investigation of the structure of the solution space for (5) was one of the problems discussed rather early. One essentially has that this space is an upper semilattice under the simple set union determined by the maximum of the membership degrees (cf., e.g., [8]).
Handbook of Granular Computing
228
And this semilattice has, if it is nonempty, a universal upper bound. To state the main result, one has to consider the particular fuzzy relation = R
n
{(x, y) Ai (x) → Bi (y)}.
(6)
i=1
is a solution of it. Theorem 1. The system (5) of relation equations is solvable iff the fuzzy relation R And in the case of solvability, R is always the largest solution of the system (5) of relation equations. This result was first stated by Sanchez [9] for the particular case of the minbased G¨odel implication → in (6), and generalized to the case of the residuated implications based on arbitrary leftcontinuous tnorms – and hence to the present situation – by the author in [10] (cf. also his [11]).
11.3.2 Modeling Strategies Besides the reference to the CRI in this type of approach toward fuzzy control, the crucial point is to determine a fuzzy relation out of a list of linguistic control rules. can be seen as a formalization of the idea that the list (2) of control rules has to The fuzzy relation R be read as if input is A1 then output is B1 and ... and if input is An then output is Bn . Having in mind such a formalization of the list (2) of control rules, there is immediately also another way of how to read this list: input is A1 and output is B1 or ... or input is An and output is Bn . It is this understanding of the list of linguistic control rules as a (rough) description of a fuzzy function which characterizes the approach of Mamdani and Assilian [12]. Therefore they consider instead of R the fuzzy relation RMA =
n
(Ai × Bi ) ,
i=1
again combined with the compositional rule of inference.
11.4 Toward a Solvability Criterion for RMA Having in mind Theorem 1, one is immediately confronted with the following: Problem.. Under which conditions is the fuzzy relation RMA a solution of the corresponding system of relation equations. This problem is discussed in [13]. And one of the main results is the next theorem.
Calculi of Information Granules
229
Theorem 2. Let all the input sets Ai be normal. Then the fuzzy relation RMA is a solution of the corresponding system of fuzzy relation equations iff for all i, j = 1, . . . , n, one has ∃x(Ai (x) & A j (x)) → Bi ≡∗ B j .
=
(7)
This MAsolvability criterion (7) is a kind of functionality of the list of linguistic control rules, at least in the case of the presence of an involutive negation: because in such a case one has =
∃x(Ai (x) & A j (x)) ↔ Ai ∩t A j ≡∗ ∅ ,
and thus condition (7) becomes Ai ∩t A j ≡∗ ∅ → Bi ≡∗ B j .
=
(8)
And this can be understood as a fuzzification of the following idea: ‘if Ai and A j coincide to some degree, then also Bi and B j should coincide to a certain degree.’ Of course, this fuzzification is neither obvious nor completely natural, because it translates ‘degree of coincidence’ in two different ways. as a solution. Corollary 3. If condition (8) is satisfied, then the system of relation equations has R This leads back to the wellknown result, explained, e.g., in [11], that the system of relation equations is solvable in the case that all the input fuzzy sets Ai are pairwise tdisjoint: Ai ∩t A j = ∅
for all i = j.
It is furthermore known, cf. again [11], that functionality holds true for the relational composition at least in the form =
A ≡ B → A ◦ R ≡ B ◦ R,
because one has the (generalized) monotonicity =
A ⊆ B → A ◦ R ⊆ B ◦ R.
This, by the way, gives the following corollary. Corollary 4. A necessary condition for the solvability of a system of relation equations is that one always has = Ai ≡ A j → Bi ≡ B j . This condition is symmetric in i, j. Therefore one gets as a slight generalization also the following corollary. Corollary 5. Let all the input sets Ai be normal. Then the fuzzy relation RMA is a solution of our system of fuzzy relation equations iff for all i, j = 1, . . . , n, one has = Ai ∩t A j ≡∗ ∅ → Bi ⊆ B j .
11.5 Relating RMA with the Largest Solution as a solution, without However, it may happen that the system of relation equations is solvable, i.e., has R having the fuzzy relation RMA as a solution. An example is given in [3].
Handbook of Granular Computing
230
Therefore Klawonn’s condition (7) is a sufficient one only for the solvability of the system (5) of relation equations. Hence one has as a new problem to give additional assumptions, besides the solvability of the system (5) of relation equations, which are sufficient to guarantee that RMA is a solution of (5). As in [11] and already in [14], we subdivide the problem whether a fuzzy relation R is a solution of the system of relation equations into two cases. Definition 1. A fuzzy relation R has the subset property w.r.t. a system (5) of relation equations iff one has Ai ◦ R ⊆ Bi ,
for i = 1, . . . , n,
(9)
and it has the superset property w.r.t. (5) iff one has Ai ◦ R ⊇ Bi ,
for i = 1, . . . , n.
(10)
Particularly for RMA quite natural sufficient conditions for the superset property have been given, but only rather strong ones for the subset property. Proposition 6. If all input sets Ai are normal then RMA has the superset property. So we know with the fuzzy relation RMA , assuming that all input sets Ai are normal, at least one upper approximation of the (possible) solution for the system of relation equations. Proposition 7. If all input sets are pairwise disjoint (under ∩t ), then RMA has the subset property. satisfies these properties. FortuIt is also of interest to ask for conditions under which the relation R nately, for the subset property there is a nice answer. has the subset property. Proposition 8. R Together with Proposition 6 this immediately gives the following. Corollary 9. If all input sets Ai are normal then one has for all indices i the inclusions ⊆ Bi ⊆ Ai ◦ RMA . Ai ◦ R
(11)
at least one lower approximation of the (possible) solution Thus we know with the fuzzy relation R for the system of relation equations. However, the single inclusion relations (11) can already be proved from slightly weaker assumptions. ⊆ Bk ⊆ Ak ◦ RMA . Proposition 10. If the input set Ak is normal then Ak ◦ R are always subsets of Ai ◦ RMA . So we know that with normal input sets the fuzzy outputs Ai ◦ R Furthermore, we immediately have the following global result. Proposition 11. If all the input sets Ai of the system of relation equations are normal and if one also then the system of relation equations is solvable, and RMA is a solution. has RMA ⊆ R, Now we ask for conditions under which the relation RMA maps the input fuzzy sets Ai to subsets of And that means to again ask for some conditions which give the subset property of RMA and thus Ai ◦ R. the solvability of the system of relation equations.
Calculi of Information Granules
231
Proposition 12. Assume the normality of all the input sets Ai . Then to have for some index 1 ≤ k ≤ n, Ak ◦ RMA ⊆ Ak ◦ R is equivalent to the equality = Bk . Ak ◦ RMA = Ak ◦ R Corollary 13. Assume the normality of all the input sets Ai . Then the condition to have for all indices 1≤i ≤n Ai ◦ RMA ⊆ Ai ◦ R is equivalent to the fact that RMA is a solution of the system of relation equations, and hence equivalent to the second criterion of Klawonn.
11.6 Toward the Superset Property of R is a solution. The solvability of the system of relation equations is equivalent to the fact that the relation R has the Therefore the solvability of our system of relation equations is also equivalent to the fact that R subset as well as the superset properties. This Now, as seen in Proposition 8, the subset property is generally satisfied for the fuzzy relation R. means we immediately have the following. has the superset property. Corollary 14. A system of relation equations is solvable iff its relation R Hence, to get sufficient solvability conditions for the system of relation equations means to look for sufficient conditions for this superset property of R. And this seems to be an astonishingly hard problem. What one immediately has in general are the equivalences: = Bk ⊆ Ak ◦ R iff for all y = Bk (y) → ∃x(Ak (x) &
(Ai (x) → Bi (y)))
i
iff for all y and all i = Bk (y) → ∃x(Ak (x) & (Ai (x) → Bi (y))).
(12)
And just this last condition offers the main open problem: to find suitable conditions which are equivalent to (12). Particularly for i = k and continuous tnorms this is equivalent to = Bk (y) → ∃x(Ak (x) ∧ Bk (y)). is that Corollary 15. For continuous tnorms t, a necessary condition for the superset property of R hgt (Bk ) ≤ hgt (Ak ) holds for all input–output pairs (Ak , Bk ). Part of the present problem is to look for sufficient conditions which imply (12).
Handbook of Granular Computing
232
Here a nice candidate seems to have for given i, k, and y the existence of some x with = Bk (y) → Ak (x) & (Ai (x) → Bi (y)). Routine calculations show that this means that it is sufficient for (12) to have, for a given y, either the existence of some x with Bk (y) ≤ Ak (x)
and
Ai (x) ≤ Bi (y)
or the existence of some x with Ak (x) = 1
and
Ai (x) ≤ [[Bk (y) → Bi (y)]].
However, both these sufficient conditions do not look very promising.
11.7 Getting New Pseudosolutions Suppose again that all the input sets Ai are normal. The standard strategy to ‘solve’ such a system of relation equations is to refer to its Mamdani–Assilian relation RMA and to apply, for a given fuzzy input A, the CRI, i.e., to treat the fuzzy set A ◦ RMA as the corresponding, ‘right’ output. Similarly one can ‘solve’ the system of relation equations with reference to its possible largest solution and to the CRI, which means to treat for any fuzzy input A the fuzzy set A ◦ R as its ‘right’ output. R But both these ‘solution’ strategies have the (at least theoretical) disadvantage that they may give insufficient results, at least for the predetermined input sets. may be considered as pseudosolutions. Call R the maximal and RMA the MApseudoThus RMA and R solution. are upper and lower approximations As was mentioned previously, these pseudosolutions RMA and R for the realizations of the linguistic control rules. Now one may equally well look for new pseudosolutions, e.g., by some iteration of these pseudosolutions in the way that for the next iteration step in such an iteration process the system of relation equations is changed such that its (new) output sets become the real output of the former iteration step. This has been done in [3]. from the input and output data, we To formulate the dependence of the pseudosolutions RMA and R denote the ‘original’ pseudosolutions with the input–output data (Ai , Bi ) in another way and write RMA [Bk ] for RMA ,
k ] for R . R[B
is Using the fact that for a given solvable system of relation equations its maximal pseudosolution R really a solution one immediately gets. Proposition 16. For any fuzzy relation S one has for all i k ◦ S] = Ai ◦ S. Ai ◦ R[A Hence it does not give a new pseudosolution if one iterates the solution strategy of the maximal, i.e., Sanchez pseudosolution after some (other) pseudosolution. The situation changes if one uses the Mamdani–Assilian solution strategy after another pseudosolution strategy. Because RMA has the superset property, one should use it for an iteration step which follows a pseudosolution step w.r.t. a fuzzy relation which has the subset property, e.g., after the strategy using R. This gives, cf. again [3], the following result.
Calculi of Information Granules
233
Theorem 17. One always has k ] ⊆ Ai ◦ RMA [Ak ◦ R[B k ]] ⊆ Ai ◦ RMA [Bk ] . Ai ◦ R[B is a better pseudosolution as each one of RMA and R. Thus the iterated relation RMA [Ak ◦ R]
11.8 Approximation and Interpolation The standard mathematical understanding of approximation is that by an approximation process some mathematical object A, e.g., some function, is approximated, i.e., determined within some (usually previously unspecified) error bounds. Additionally one assumes that the approximating object B for A is of some predetermined, usually ‘simpler,’ kind, e.g., a polynomial function. So one may approximate some transcendental function, e.g., the trajectory of some nonlinear process by a piecewise linear function or by a polynomial function of some bounded degree. Similarly one approximates, e.g., in the Runge–Kutta methods the solution of a differential equation by a piecewise linear function, or one uses splines to approximate a difficult surface in 3space by planar pieces. The standard mathematical understanding of interpolation is that a function f is only partially given by its values at some points of the domain of the function, the interpolation nodes. The problem then is to determine ‘the’ values of f for all the other points of the domain (usually) between the interpolation nodes – sometimes also outside these interpolation nodes (extrapolation). And this is usually done in such a way that one considers groups of neighboring interpolation nodes which uniquely determine an interpolating function of some predetermined type within their convex hull (or something like): a function which has the interpolation nodes of the actual group as argument–value pairs – and which in this sense locally approximates the function f . In the standard fuzzy control approach the input–output data pairs of the linguistic control rules just provide interpolation nodes. However, what is lacking – at least up to now – is the idea of a local approximation of the intended crisp control function by some fuzzy function. Instead, in the standard contexts one always asks for something like a global interpolation; i.e., one is interested in interpolating all nodes by only one interpolation function. To get a local approximation of the intended crisp control function Φ, one needs some notion of ‘nearness’ or of ‘neighboring’ for fuzzy data granules. Such a notion is lacking in general. For the particular case of a linearly ordered input universe X, and the additional assumption that the fuzzy input data are unimodal, one gets in a natural way from this crisp background a notion of neighboring interpolation nodes: fuzzy nodes are neighboring if their kernel points are. In general, however, it seems most appropriate to suppose that one may be able to infer from the control problem a – perhaps itself fuzzy – partitioning of the whole input space (or similarly of the output space). Then one will be in a position to split in a natural way the data set (1) or, correspondingly, the list (2) of control rules into different groups – and to consider the localized interpolation problems separately for these groups. This obviously offers better chances for finding interpolating functions, particularly for getting solvable systems of fuzzy relation equations. However, one has to be aware that one should additionally take care that the different local interpolation functions fit together somehow smoothly – again an open problem that needs a separate discussion, and a problem that is more complicated for fuzzy interpolation than for the crisp counterpart because the fuzzy interpolating functions may realize the fuzzy interpolation nodes only approximately. However, one may start from ideas like these to speculate about fuzzy versions of the standard spline interpolation methodology.
Handbook of Granular Computing
234
11.9 CRI as Approximation and Interpolation In the context of fuzzy control, the object which has to be determined, some control function Φ, is described only roughly, i.e., given only by its behavior in some (fuzzy) points of the state space. The standard way to roughly describe the control function is to give a list (2) of linguistic control rules connecting fuzzy subsets Ai of the input space X with fuzzy subsets Bi of the output space Y, indicating that one likes to have Φ ∗ (Ai ) = Bi ,
i = 1, . . . , n,
(13)
for a suitable ‘fuzzified’ version Φ ∗ : IF(X ) → IF(Y) of the control function Φ : X → Y. The additional approximation idea of the CRI is to approximate Φ ∗ by a fuzzy function Ψ ∗ : IF(X ) → IF(Y) determined for all A ∈ IF(X ) by Ψ ∗ (A) = A ◦ R,
(14)
which refers to some suitable fuzzy relation R ∈ IF(X × Y ) and understands ◦ as sup–t composition. Formally, thus, the equation (13) becomes transformed into some wellknown system (5) of relation equations Ai ◦ R = Bi ,
i = 1, . . . , n,
to be solved for the unknown fuzzy relation R. This approximation idea fits well with the fact that one often is satisfied with pseudosolutions of (5), and particularly with the MApseudosolution RMA of Mamdani and Assilian, or the of Sanchez. Both of them determine approximations Ψ ∗ to the (fuzzified) control Spseudosolution R ∗ function Φ .
11.10 Approximate Solutions of Fuzzy Relation Equations The author used in previous papers the notion of approximate solution only naively in the sense of a fuzzy relation which roughly describes the intended control behavior given via some list of linguistic control rules.
11.10.1 A Formal Definition A precise definition of a notion of approximate solution was given by Wu [15]. In that approach an of a system (5) of fuzzy relation equations (FREs) is defined as a fuzzy relation approximate solution R satisfying 1. There are fuzzy sets Ai and Bi such that for all i = 1, . . . , n, one has = Bi . Ai ⊆ Ai and Bi ⊆ Bi as well as Ai ◦ R 2. If there exist fuzzy sets Ai ∗ and Bi ∗ for i = 1, . . . , n and a fuzzy relation R ∗ such that for all i = 1, . . . , n, Ai ∗ ◦ R ∗ = Bi ∗ and Ai ⊆ Ai ∗ ⊆ Ai , Bi ⊆ Bi ∗ ⊆ Bi , then one has Ai ∗ = Ai and Bi ∗ = Bi for all i = 1, . . . , n.
Calculi of Information Granules
235
11.10.2 Generalizations of Wu’s Approach It is obvious that the two conditions (1) and (2) of Wu are independent. What is, however, not obvious at all – and even rather arbitrary – is that condition (1) also says that the approximating input–output data (Ai , Bi ) should approximate the original input data from above and the original output data from below. Before we give a generalized definition we coin the name of an approximating system for (5) and understand by it any system Ci ◦ R = Di ,
i = 1, . . . , n
(15)
of relation equations with the same number of equations. Definition 2. A ulapproximate solution of a system (5) of relation equations is a solution of a ulapproximating system for (5) which satisfies Ai ⊆ C i
and
Bi ⊇ Di ,
for i = 1, . . . , n.
(16)
An luapproximate solution of a system (5) of relation equations is a solution of an luapproximating system for (5) which satisfies Ai ⊇ C i
and
Bi ⊆ Di ,
for i = 1, . . . , n.
(17)
An l*approximate solution of a system (5) of relation equations is a solution of an l*approximating system for (5) which satisfies Ai ⊇ C i
and
Bi = Di ,
for i = 1, . . . , n.
(18)
In a similar way one defines the notions of llapproximate solution, uuapproximate solution, u*approximate solution, *lapproximate solution, and *uapproximate solution. Corollary 18. (i) Each *lapproximate solution of (5) is an ulapproximate solution and an llapproximate solution of (5). (ii) Each u*approximate solution of (5) is also an ulapproximate solution and an uuapproximate solution of (5). is an *lapproximate Proposition 19. For each system (5) of relation equations its Spseudosolution R solution. This generalizes a result of Klir and Yuan [16]. Proposition 20. For each system (5) of relation equations with normal input data, its MApseudosolution RMA is an *uapproximate solution. Together with Corollary 18 these two propositions say that each system of relation equations has approximate solutions of any one of the types introduced in this section. However, it should be mentioned that these types of approximate solutions belong to a rather restricted class: caused by the fact that we considered, following Wu, only lower and upper approximations w.r.t. the inclusion relation; i.e., they are inclusion based. Other and more general approximations of the given input–output data systems are obviously possible. But we will not discuss further versions here.
Handbook of Granular Computing
236
11.10.3 Optimality of Approximate Solutions All the previous results do not give any information about some kind of ‘quality’ of the approximate solutions or the approximating systems. This is to some extent related to the fact that up to now we disregarded in our modified terminology Wu’s condition (2), which is a kind of optimality condition. of a system (5) is called optimal iff there Definition 3. An inclusionbased approximate solution R does not exist a solvable system R Ci = Di of relation equations whose input–output data (Ci , Di ) approximate the original input–output data of (5) strongly better than the input–output data (Ci , Di ) of the system which determines the fuzzy relation R. is optimal, then it is also optimal as a Proposition 21. If an inclusionbased *lapproximate solution R ulapproximate solution and as an llapproximate solution. Similar results hold true also for l*, u*, and *uapproximate solutions. In those considerations we look for solutions of ‘approximating systems’ of FREs: of course, these solutions form some space of functions – and within this space one is interested to find ‘optimal members’ for the solution problem under consideration. An obvious modification is to fix in some other way such a space R of functions, i.e., independent of the idea of approximating systems of FREs. In that moment one also has to specify some ranking for the members of that space R of functions. In the following we go on to discuss optimality results from both these points of view.
11.11 Some Optimality Results for Approximate Solutions and RMA are optimal approximate solutions. The problem now is whether the pseudosolutions R
11.11.1 Optimality of the SPseudoSolution as a ulapproximate solution this optimality was shown by Klir and Yuan For the Spseudosolution R [16]. is always an ⊆optimal *lapproximate solution of (5). Proposition 22. The fuzzy relation R From the second point of view we have, slightly reformulating and extending results presented in [4], the following result, given also in [17]. is the best approxTheorem 23. Consider an unsolvable system of FREs. Then the Spseudosolution R imate solution in the space Rl : Rl = {R ∈ R  Ai ◦ R ⊆ Bi for all 1 ≤ i ≤ n}. under the ranking ≤l : R ≤l R
iff
Ai ◦ R ⊆ Ai ◦ R for all 1 ≤ i ≤ n.
is the best approximate solution in the space Rl under the Remark. Similarly one can prove that R ranking ≤δ : R ≤δ R
iff
δ(R ) ≤ δ(R )
Calculi of Information Granules
237
for δ ∗ (R) =
n
Bi ≡∗ Ai ◦ R
=
n
i=1
(Bi (y) ↔ (Ai ◦ R)(y)) .
(19)
i=1 y∈Y
This index δ ∗ (R) is quite similar to the solvability degree δ(R) to be introduced later on in (23).
11.11.2 Optimality of the MAPseudoSolution For the MApseudosolution the situation is different, as was indicated in [3]. Proposition 24. There exist systems (5) of relation equations for which their MApseudosolution RMA is an *uapproximate solution which is not optimal, i.e., an approximate solution in the approximation space Ru : Ru = {R ∈ R  Ai ◦ R ⊇ Bi for all 1 ≤ i ≤ n}, but is not optimal in this set under the preorder ≤u : R ≤u R
iff
Ai ◦ R ≤ Ai ◦ R for all 1 ≤ i ≤ n.
to the situation for RMA is that in the former case The crucial difference of the optimality result for R the solvable approximating system has its own (largest) solution S. But a solvable approximating system may fail to have its MApseudosolution RMA as a solution. The last remark leads us to a partial optimality result w.r.t. the MApseudosolution. The proofs of the results which shall be mentioned now can be found in [4], or easily derived from the results given there. Definition 4. Let us call a system (5) of relation equations MAsolvable iff its MApseudosolution RMA is a solution of this system. Proposition 25. If a system of FREs has an MAsolvable *uapproximating system Ai ◦ R = Bi∗ ,
i = 1, . . . , n,
(20)
such that for the MApseudosolution RMA of the original system of FREs one has Bi ⊆ Bi∗ ⊆ Ai ◦ RMA ,
i = 1, . . . , n,
then one has Bi∗ = Ai ◦ RMA
for all i = 1, . . . , n.
Corollary 26. If all input sets of (5) are normal then the system Ai ◦ R = Ai ◦ RMA ,
i = 1, . . . , n,
(21)
is the smallest MAsolvable *usupersystem for (5). This leads back to the iterated pseudosolution strategies. for i = 1, . . . , n, and suppose that be the Spseudosolution of (5), let B i = Ai ◦ R Corollary 27. Let R the modified system i , Ai ◦ R = B
i = 1, . . . , n,
(22)
is an optimal *lapproximate solution is MAsolvable. Then this iterated pseudosolution RMA [Ak ◦ R] of (5).
Handbook of Granular Computing
238
Furthermore it is a best approximate solution of the original system in the space Rl under the ranking ≤l . Let us also mention the following result (cf. [17]). Theorem 28. Consider an unsolvable system of FREs such that all input fuzzy sets Ai , 1 ≤ i ≤ n, are normal and form a semipartition of X. Then RMA (x, y) =
n (Ai (x) ∗ Bi (y)) i=1
is a best possible approximate solution in the space Ru = {R ∈ R  Ai ◦ R ⊇ Bi
for all 1 ≤ i ≤ n},
R ≤u R iff Ai ◦ R ≤ Ai ◦ R
for all 1 ≤ i ≤ n.
under the preorder ≤u :
These considerations can be further generalized. Consider some pseudosolution strategy S, i.e., some mapping from the class of families (Ai , Bi )1≤i≤n of input–output data pairs into the class of fuzzy relations, which yields for any given system (5) of relation equations an Spseudosolution RS . Then the system (5) will be called Ssolvable iff RS is a solution of this system. Definition 5. We shall say that the Spseudosolution RS depends isotonically (w.r.t. inclusion) on the output data of the system (5) of relation equations iff the condition if
Bi ⊆ Bi for all i = 1, . . . , n
then
RS ⊆ RS
holds true for the Spseudosolutions RS of the system (5) and RS of an ‘outputmodified’ system Ai ◦ R = Bi , i = 1, . . . , n. Definition 6. We understand by an Soptimal *uapproximate solution of the system (5) the Spseudosolution of an Ssolvable *uapproximating system of (5) which has the additional property that no strongly better *uapproximating system of (5) is Ssolvable. Proposition 29. Suppose that the Spseudosolution depends isotonically on the output data of the systems of relation equations. Assume furthermore that for the Spseudosolution RS of (5) one always has Bi ⊆ Ai ◦ RS (or always has Ai ◦ RS ⊆ Bi ) for i = 1, . . . , n. Then the Spseudosolution RS of (5) is an Soptimal *uapproximate (or: *lapproximate) solution of system (5). It is clear that Corollary 26 is the particular case of the MApseudosolution strategy. But also Proposition 22 is a particular case of this Proposition 29: the case of the Spseudosolution strategy (having in mind that Ssolvability and solvability are equivalent notions).
11.12 Introducing the Solvability Degree Following [1, 11] one may consider for a system of relation equations the (global) solvability degree
ξ=
∃X
n i=1
(Ai ◦ X ≡ Bi ) ,
(23)
Calculi of Information Granules
239
and for any fuzzy relation R their solution degree δ(R) =
n (Ai ◦ R ≡ Bi ) .
(24)
i=1
Here means the finite iteration of the strong conjunction connective &, and is defined in the standard way. The following result was first proved in [10], and has been further discussed in [1, 11]. Theorem 30.
≤ ξ. ξ n ≤ δ( R)
Of course, the nth power here is again the iteration of the strong conjunction operation ∗, i.e., the
semantical counterpart of the syntactic operation . Obviously this result can be rewritten in a slightly modified form which makes it more transparent that Theorem 30 really gives an estimation for the solvability degree ξ in terms of a particular solution degree. Corollary 31.
n ≤ ξ n ≤ δ( R). δ( R)
One has for continuous tnorms that they are ordinal sums of isomorphic copies of two basic tnorms, of the Lukasiewicz tnorm t L given by t L (u, v) = max{u + v − 1, 0} and of the arithmetic product t P . (Sometimes G¨odel’s tnorm min is also allowed for these summands. However, this is unimportant because of the definition of an ordinal sum of tnorms.) and ξ always belong to the Corollary 32. In the case that ∗ is a continuous tnorm t, the values δ( R) same ordinal tsummand. A further property is of interest for the case of tnormbased structures L. Proposition 33. For each continuous tnorm t and each 1 ≤ n ∈ N there exists nth roots. Having this in mind, one can immediately rewrite Theorem 30 for this particular case in an even nicer form as we did it in Corollary 31. Proposition 34. For tnorms which have nth roots one has the inequalities ≤ ξ ≤ n δ( R). δ( R) Using as in [18] the notation z(u) for the largest tidempotent below u, this last result allows for the following slight modification. Corollary 35. For tnorms which have nth roots one has the inequalities . ≤ ξ ≤ n δ( R) z(δ( R)) of the Spseudosolution of the Besides these core results which involve the solution degree δ( R) system (5), the problem appears to determine the solution degree of the relation RMA . Proposition 36. If all input sets Ai are normal then
δ ∗ (RMA ) = (Ai ∩t A j ≡ ∅ → Bi ⊆ B j ) . i
j
Handbook of Granular Computing
240
This is a generalization of the former Klawonn criterion. k ]] is at least We also find, as explained in [3], a second result which indicates that RMA [Ak ◦ R[B sometimes as good a pseudosolution as RMA . Proposition 37. If all input sets Ai are normal and if one has ⊆ A j ◦ R, = Bi ⊆ B j → Ai ◦ R then k ]]). δ ∗ (RMA ) ≤ δ ∗ (RMA [Ak ◦ R[B
11.13 Interpolation Strategies and Aggregation Operators There is the wellknown distinction between FATI and FITA strategies to evaluate systems of linguistic control rules w.r.t. arbitrary fuzzy inputs from F(X). The core idea of a FITA strategy is that it is a strategy which first infers (by reference to the single rules) and then aggregates starting from the actual input information A. Contrary to that, a FATI strategy is a strategy which first aggregates (the information in all the rules into one fuzzy relation) and then infers starting from the actual input information A. Both these strategies use the settheoretic union as their aggregation operator. Furthermore, both of them refer to the CRI as their core tool of inference. In general, however, the interpolation operators we intend to consider depend more generally on some inference operator(s) as well as on some aggregation operator. By an inference operator we mean here simply a mapping from the fuzzy subsets of the input space to the fuzzy subsets of the output space.1 And an aggregation operator A, as explained e.g. in [19, 20], is a family ( f n )n∈N of (‘aggregation’) operations, each f n an nary one, over some partially ordered set M, with ordering ≤, with a bottom element 0 and a top element 1, such that each operation f n is nondecreasing, maps the bottom to the bottom: f n (0, . . . , 0) = 0, and the top to the top: f n (1, . . . , 1) = 1. Such an aggregation operator A = ( f n )n∈N is a commutative one iff each operation f n is commutative. And A is an associative aggregation operator iff, e.g., for n = k + l one always has f n (a1 , . . . , an ) = f 2 ( f k (a1 , . . . , ak ), f l (ak+1 , . . . , an )) and in general f n (a1 , . . . , an ) = f r ( f k1 (a1 , . . . , ak1 ), . . . , f kr (am+1 , . . . , an )) −1 for n = ri=1 ki and m = ri=1 ki . Our aggregation operators further on are supposed to be commutative as well as associative ones.2 Observe that an associative aggregation operator A = ( f n )n∈N is essentially determined by its binary aggregation function f 2 , more precisely, by its subfamily ( f n )n≤2 . Additionally we call an aggregation operator A = ( f n )n∈N additive multiplicative idempotent
1
iff always b ≤ f 2 (b, c), iff always f 2 (b, c) ≤ b, iff always b = f 2 (b, b).
This terminology has its historical roots in the fuzzy control community. There is no relationship at all with the logical notion of inference intended and supposed here; but–of course–also not ruled out. 2 It seems that this is a rather restrictive choice from a theoretical point of view. However, in all the usual cases these restrictions are satisfied.
Calculi of Information Granules
241
Corollary 38. Let A = ( f n )n∈N be an aggregation operator. (i) If A is idempotent, then one has always f 2 (0, b) ≤ f 2 (b, b) = b; (ii) If A is additive, then one has always b ≤ f 2 (0, b); (iii) If A is multiplicative, then one has always f 2 (0, b) = 0. As in [21], we now consider interpolation operators Φ of FITA type and interpolation operators Ψ of FATI type, which have the abstract forms ΨD (A) = A(θ1 (A), . . . , θn (A)) ,
(25)
ΞD (A) = A(θ1 , . . . , θn )(A) .
(26)
Here we assume that each one of the ‘local’ inference operators θi is determined by the single input–output pair Ai , Bi . Therefore we shall prefer to write θAi ,Bi instead of θi only because this extended notation makes the reference to (or even dependence from) the input–output data more transparent. And we have to assume that the aggregation operator A operates on fuzzy sets and that the aggregation operator A operates on inference operators. With this extended notation the formulas (25) and (26) become ΨD (A) = A(θA1 ,B1 (A), . . . , θAn ,Bn (A)),
(27)
ΞD (A) = A(θA1 ,B1 , . . . , θAn ,Bn )(A).
(28)
11.14 Some Particular Examples Some particular cases of these interpolation procedures have been discussed in [22]. These authors consider four different cases. First they look at the FITAtype interpolation ΨD1 (A) = (A ◦ (Ai Bi )), (29) i
using as in [11] the notation Ai Bi to denote the fuzzy relation with membership function (Ai Bi )(x, y) = Ai (x) Bi (y). Obviously this is just (a slight modification of) the fuzzy control Strategy of Holmblad/Ostergaard [23]. Their second example discusses a FATItype approach given by ΞD2 (A) = A ◦ ((Ai Bi )), (30) i
and is thus just the common CRIbased strategy of the Spseudosolution, used in this general form already in [10] (cf. also [11]). Their third example is again of FITA type and determined by ΨD3 (A) = {y δ(A, Ai ) → Bi (y)}, (31) i
using besides the previously mentioned class term notation for fuzzy sets the activation degree δ(A, Ai ) = (A(x) → Ai (x)), x∈X
which is a degree of subsethood of the actual input fuzzy set A w.r.t. the ith rule input Ai .
(32)
Handbook of Granular Computing
242
And the fourth one is a modification of the third one, determined by ΨD4 (A) = {y δ(A, Aj) → Bi (y)}, ∅= J ⊆N
j∈J
(33)
j∈J
using N = {1, 2, . . . , n}. In these examples the main aggregation operators are the settheoretic union and the settheoretic intersection. Both are obviously associative, commutative, and idempotent. Additionally the union is an additive and the intersection a multiplicative aggregation operator.
11.15 Stability Conditions for the Given Data If ΘD is a fuzzy inference operator of one of the types (27) and (28), then the interpolation property one likes to have realized is that one has ΘD (Ai ) = Bi
(34)
for all the data pairs Ai , Bi . In the particular case that the operator ΘD is given by (4), this is just the problem to solve the system (34) of fuzzy relation equations. Definition 7. In the present generalized context let us call the property (34) the Dstability of the fuzzy inference operator ΘD . To find Dstability conditions on this abstract level seems to be rather difficult in general. However, the restriction to fuzzy inference operators of FITA type makes things easier. It is necessary to have a closer look at the aggregation operator A = ( f n )n∈N involved in (25) which operates on F(Y), of course with inclusion as partial ordering. Definition 8. Having B, C ∈ F(Y) we say that C is Anegligible w.r.t. B iff f 2 (B, C) = f 1 (B) holds true. The core idea here is that in any aggregation by A the presence of the fuzzy set B among the aggregated fuzzy sets makes any presence of C superfluous. Example.. 1. C is negligible w.r.t. B iff C ⊆ B; and this holds similarly true for all idempotent and additive aggregation operators. 2. C is negligible w.r.t. B iff C ⊇ B; and this holds similarly true for all idempotent and multiplicative aggregation operators. 3. The bottom element C = 0 in the domain of an additive and idempotent aggregation operator A is Anegligible w.r.t. any other element of that domain. Proposition 39. Consider a fuzzy inference operator of FITA type ΨD = A(θA1 ,B1 , . . . , θAn ,Bn ) . It is sufficient for the Dstability of ΨD , i.e., to have ΨD (Ak ) = Bk
for all k = 1, . . . , n
that one always has θAk ,Bk (Ak ) = Bk
Calculi of Information Granules
243
and additionally that for each i = k, the fuzzy set θAk ,Bk (Ai )
is Anegligible w.r.t.
θAk ,Bk (Ak ) .
The proof follows immediately from the corresponding definitions. And this result has two quite interesting specializations which themselves generalize wellknown results about fuzzy relation equations. Corollary 40. It is sufficient for the Dstability of a fuzzy inference operator ΨD of FITA type that one has ΨD (Ai ) = Bi
for all 1 ≤ i ≤ n
and that always θAi ,Bi (A j ) is Anegligible w.r.t. θAi ,Bi (Ai ). Corollary 41. It is sufficient for the Dstability of a fuzzy inference operator ΨD of FITA type, which is based on an additive and idempotent aggregation operator, that one has ΨD (Ai ) = Bi
for all 1 ≤ i ≤ n
and that always θAi ,Bi (A j ) is the bottom element in the domain of the aggregation operator A. Obviously this is a direct generalization of the fact that systems of fuzzy relation equations are solvable if their input data form a pairwise disjoint family (w.r.t. the corresponding tnormbased intersection) because in this case one usually has θAi ,Bi (A j ) = A j ◦ (Ai × Bi ) = {y ∃x(x ε A j & (x, y) ε Ai × Bi )} = {y ∃x(x ε A j ∩+ Ai & y ε Bi )}. To extend these considerations from inference operators (25) of the FITA type to those ones of the FATI type (26) let us consider the following notion. Definition 9. Suppose that A is an aggregation operator for inference operators and that A is an aggregation operator for fuzzy sets. Then ( A, A) is an application distributive pair of aggregation operators iff A(θ1 , . . . , θn )(X ) = A(θ1 (X ), . . . , θn (X ))
(35)
holds true for arbitrary inference operators θ1 , . . . , θn and fuzzy sets X . Using this notion it is easy to see that one has on the lefthand side of (35) a FATItype inference operator and on the righthand side an associated FITAtype inference operator. So one is able to give a reduction of the FATI case to the FITA case. Proposition 42. Suppose that ( A, A) is an application distributive pair of aggregation operators. Then a fuzzy inference operator ΞD of FATI type is Dstable iff its associated fuzzy inference operator ΨD of FITA type is Dstable.
11.16 Stability Conditions for Modified Data The combined approximation and interpolation problem, as previously explained, sheds new light on the standard approaches toward fuzzy control via CRIrepresentable functions originating from the works of Mamdani and Assilian [12] and Sanchez [9] particularly for the case that neither the Mamdani–Assilian
Handbook of Granular Computing
244
relation RMA , determined by the membership degrees, RMA (x, y) =
n
Ai (x) ∗ Bi (y),
(36)
i=1
determined by the membership degrees, nor the Sanchez relation R, y) = R(x,
n
(Ai (x) Bi (y)),
(37)
i=1
offers a solution for the system of fuzzy relation equations. In any case both these fuzzy relations determine CRIrepresentable fuzzy functions which provide approximate solutions for the interpolation problem. In other words, the consideration of CRIrepresentable functions determined by (36) as well as by (37) provides two methods for an approximate solution of the main interpolation problem. As is well known and explained, e.g., in [11], the approximating interpolation function CRIrepresented by R always gives a lower approximation and that one CRIrepresented by RMA gives an upper approximation for normal input data. Extending these results, in [3] the iterative combination of these methods has been discussed to get better approximation results. For the iterations there, always the next iteration step consisted in an application of a predetermined one of the two approximation methods to the data family with the original input data and the real, approximating output data which resulted from the application of the former approximation method. A similar iteration idea was also discussed in [22], however, restricted always to the iteration of only one of the approximation methods explained in (29), (30), (31), and (33). Therefore let us now, in the general context of this chapter, discuss the problem of Dstability for a modified operator ΘD∗ , which is determined by the kind of iteration of ΘD just explained. Let us consider the ΘD modified data set D∗ given as D∗ = (Ai , ΘD (Ai ))1≤i≤n ,
(38)
and define from it the modified fuzzy inference operator ΘD∗ as ΘD∗ = ΘD∗ .
(39)
For these modifications, the problem of stability reappears. Of course, the new situation here is only a particular case of the former. And it becomes a simpler one in the sense that the stability criteria now refer only to the input data Ai of the data set D = (Ai , Bi )1≤i≤n . Proposition 43. It is sufficient for the D∗ stability of a fuzzy inference operator ΨD∗ of FITA type that one has ΨD∗ (Ai ) = ΨD∗ (Ai ) = ΨD (Ai )
for all 1 ≤ i ≤ n
(40)
and that always θAi ,ΨD (Ai ) (A j ) is Anegligible w.r.t. θAi ,ΨD (Ai ) (Ai ). Let us look separately at the condition (40) and at the negligibility conditions. Corollary 44. The condition (40) is always satisfied if the inference operator ΨD∗ is determined by the standard outputmodified system of relation equations Ai ◦ R[Ak ◦ R] = Bi in the notation of [3]. Corollary 45. In the case that the aggregation operator is the settheoretic union, i.e., A = condition (40) together with the inclusion relationships θAi ,ΨD (Ai ) (A j ) ⊆ θAi ,ΨD (Ai ) (Ai ) is sufficient for the D∗ stability of a fuzzy inference operator ΨD∗ .
, the
Calculi of Information Granules
245
As in Section 11.15 one is able to transfer this result to FATItype fuzzy inference operators. Corollary 46. Suppose that ( A, A) is an application distributive pair of aggregation operators. Then a fuzzy inference operator ΦD∗ of FATI type is D∗stable iff its associated fuzzy inference operator ΨD∗ of FITA type is D∗stable.
11.17 Application Distributivity Based on the notion of application distributive pair of aggregation operators, the property of Dstability can be transferred back and forth between two inference operators of FATI type and FITA type if they are based on a pair of application distributive aggregation operators. What has not been discussed previously was the existence and the uniqueness of such pairs. Here are some results concerning these problems. The uniqueness problem has a simple solution. Proposition 47. If ( A, A) is an application distributive pair of aggregation operators then A is uniquely determined by A, and conversely also A is uniquely determined by A. Proof. Let A be given. Then condition (35), being valid for all fuzzy sets X , determines for all fuzzy inference operators θ1 , . . . , θn uniquely the functions A(θ1 , . . . , θn ). And therefore (35) also determines the aggregation operator A uniquely. The converse statement follows in a similar way. And for the existence problem we have a nice reduction to the twoargument case. Proposition 48. Suppose that A is a commutative and associative aggregation operator and G some operation for fuzzy inference operators satisfying A(θ1 (X ), θ2 (X )) = G(θ1 , θ2 )(X )
(41)
which is commutative and for all fuzzy sets X . Then G can be extended to an aggregation operator G associative and forms with A an application distributive pair (G, A) of aggregation operators. Proof. The commutativity of A yields G(θ1 , θ2 )(X ) = G(θ2 , θ1 )(X ) for all fuzzy sets X and hence G(θ1 , θ2 ) = G(θ2 , θ1 ), i.e., the commutativity of G as an operation for fuzzy inference operators. In a similar way the associativity of A implies the associativity of G. Hence it is a routine matter to expand the binary operator G to an nary one G n for each n ≥ 2. Thus = (G n )n∈N for fuzzy inference operators one has a commutative and associative aggregation operator G 1 if one additionally puts G = id. A) is again easily derived from (41) and the definition Finally the application distributivity the pair (G, of G. It is easy to recognize that this result can be reversed. Corollary 49. Suppose that A is a commutative and associative aggregation operator and ( A, A) is f 2 satisfies an application distributive pair of aggregation operators, and let A = ( f n )n∈N . Then G = condition (41) for all fuzzy sets X . Both results together give us the following reduction of the fullapplication distributivity condition.
Handbook of Granular Computing
246
Theorem 50. Suppose that A is a commutative and associative aggregation operator. For the case that there exists an aggregation operator A such that ( A, A) forms an application distributive pair of aggregation operators it is necessary and sufficient that there exists some operation G for fuzzy inference operators satisfying A(θ1 (X ), θ2 (X )) = G(θ1 , θ2 )(X )
(42)
for all fuzzy inference operators θ1 and θ2 and all fuzzy sets X . The proof is obvious from these last two results. For the particular, and very popular, cases that one has A = or A = , and that the application of a fuzzy inference operator θ to a fuzzy set X means the CRI application of a fuzzy relation to a fuzzy = or G = , respectively. set, one immediately sees that one may choose G
11.18 Invoking a Defuzzification Strategy In a lot of practical applications of the fuzzy control strategies which form the starting point for the previous general considerations, the fuzzy model – e.g., determined by a list of linguistic IF–THEN rules – is realized in the context of a further defuzzification strategy, which is nothing but a mapping F : F(Y) → Y for fuzzy subsets of the output space Y. Having this in mind, it seems reasonable to consider the following modification of the Dstability condition, which is a formalization of the idea to have ‘stability modulo defuzzification.’ Definition 10. A fuzzy inference operator ΘD is (F, D)stable w.r.t. a fuzzification method F : F(Y) → Y iff one has F(ΘD (Ai )) = F(Bi )
(43)
for all the data pairs Ai , Bi from D. For the fuzzy modeling process which is manifested in the data set D this condition (43) is supposed to fit well with the control behavior one is interested to implement. If for some application this condition (43) seems to be unreasonable, this indicates that either the data set D or the choosen defuzzification method F is unsuitable. As a first, and rather restricted stability result for this modified situation, the following proposition shall be mentioned. Proposition 51. Suppose that ΘD is a fuzzy inference operator of FITA type, i.e., of the form (25), that the aggregation is union A = as, e.g., in the fuzzy inference operator for the Mamdani–Assilian case, and that the defuzzification strategy F is the ‘mean of max’ method. Then it is sufficient for the (F, D)stability of ΘD to have satisfied hgt (
n
θk (A j )) < hgt(θk (Ak ))
(44)
j=1, j=k
for all k = 1, . . . , n. The proof follows from the corresponding definitions by straightforward routine calculations, and hgt means the ‘height’ of a fuzzy set, i.e., the supremum of its membership degrees.
Calculi of Information Granules
247
11.19 Conclusion Essentially the first appearance of the idea of information granulation has been the idea of linguistic values of some variables, their use for the rough description of functional dependencies using ‘linguistic’ rules, and the application of this idea to fuzzy control. The most suitable mathematical context for fuzzy control problems determined by systems of linguistic control rules is to understand them as interpolation problems: a function from fuzzy subsets of an input space X to fuzzy subsets of an output space Y has to be determined from a (usually finite) list of information granules, i.e., in this functional setting of argument–value pairs. With suitably restricted classes of interpolating functions, however, this global interpolation problem may become unsolvable. Then one is interested in approximate solutions of acceptable quality. We discuss a series of optimal approximation results for classes of approximating functions which naturally arise out of the natural transformation of the interpolation problem into the problem of solving systems of fuzzy relational equations. But one may also consider some modifications of the original input–output data. For one such approach we also discuss sufficient conditions for the solvability of the modified interpolation problem. Additionally the whole approaches may be put into a more general context. What has been considered here is a context which focuses on different combinations of aggregation and inference operators. Interestingly mainly the properties of the aggregation operators proved to be of importance for these considerations. So it actually remains an open problem whether the inference operations are really of minor importance, or whether our discussion simply missed some aspects for which the properties of the inference operations become crucial. For completeness it shall be mentioned that only other generalizations are possible and may become important too. One such more algebraically oriented generalization was quite recently offered in [24].
References [1] S. Gottwald. Generalised solvability behaviour for systems of fuzzy equations. In: V. Nov´ak and I. Perfilieva (eds), Discovering the World with Fuzzy Logic, Advances in Soft Computing. PhysicaVerlag, Heidelberg, 2000, pp. 401–430. [2] S. Gottwald. Mathematical fuzzy control. A survey of some recent results. Log. J. IGPL 13 (5) (2005) 525–541. [3] S. Gottwald, V. Nov´ak, and I. Perfilieva. Fuzzy control and tnormbased fuzzy logic. Some recent results. In: Proceedings of the 9th International Conference of IPMU’2002, ESIA – Universit´e de Savoie, Annecy, 2002, pp. 1087–1094. [4] I. Perfilieva and S. Gottwald. Fuzzy function as a solution to a system of fuzzy relation equations. Int. J. Gen. Syst. 32 (2003) 361–372. [5] S. Gottwald. A Treatise on ManyValued Logics. Studies in Logic and Computation, Vol. 9. Research Studies Press, Baldock, 2001. [6] L.A. Zadeh. Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Syst. Man Cybcrn. SMC3 (1973) 28–44. [7] R.D. Luce. A note on Boolean matrix theory. Proc. Am. Math. Soc. 3 (1952) 382–388. [8] A. DiNola, S. Sessa, W. Pedrycz, and E. Sanchez. Fuzzy Relation Equations and Their Applications to Knowledge Engineering. Theory and Decision Library, Series D. Kluwer, Dordrecht, 1989. [9] E. Sanchez. Resolution of composite fuzzy relation equations. Inf. Control 30 (1976) 38–48. [10] S. Gottwald. Characterizations of the solvability of fuzzy equations. Elektron. Inf. Kybern. 22 (1986) 67–91. [11] S. Gottwald. Fuzzy Sets and Fuzzy Logic. The Foundations of Application – From a Mathematical Point of View. Vieweg: Braunschweig/Wiesbaden and Teknea, Toulouse, 1993. [12] A. Mamdani and S. Assilian. An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. ManMach. Stud. 7 (1975) 1–13. [13] F. Klawonn. Fuzzy points, fuzzy relations and fuzzy functions. In: V. Nov´ak and I. Perfilieva (eds), Discovering the World with Fuzzy Logic, Advances in Soft Computing. PhysicaVerlag, Heidelberg, 2000, pp. 431–453. [14] S. Gottwald. Criteria for noninteractivity of fuzzy logic controller rules. In: A. Straszak(ed), Large Scale Systems: Theory and Applications, Proceedings of the 3rd IFAC/IFORS Sympasium Warsaw 1983. Pergamon Press, Oxford, 1984, pp. 229–233. [15] W. Wu. Fuzzy reasoning and fuzzy relation equations. Fuzzy Sets Syst. 20 (1986) 67–78.
248
Handbook of Granular Computing
[16] G. Klir and B. Yuan. Approximate solutions of systems of fuzzy relation equations. In: FUZZIEEE ’94. Proceedings of the 3rd International Conference on Fuzzy Systems, Orlando, FL, 1994, pp. 1452–1457. [17] I. Perfilieva. Fuzzy function as an approximate solution to a system of fuzzy relation equations. Fuzzy Sets Syst. 147 (2004) 363–383. [18] I. Perfilieva and A. Tonis. Compatibility of systems of fuzzy relation equations. Int. J. Gen. Syst. 29 (2000) 511–528. [19] T. Calvo, G. Mayor, and R. Mesiar (eds). Aggregation Operators: New Trends and Applications. PhysicaVerlag, Heidelberg, 2002. [20] D. Dubois and H. Prade. On the use of aggregation operations in information fusion processes. Fuzzy Sets Syst. 142 (2004) 143–161. [21] S. Gottwald. On a generalization of fuzzy relation equations. In: Proceedings of the 11th International Conference of IPMU 2006, Edition EDK, Paris, 2006, Vol. 2, pp. 2572–2577. [22] N.N. Morsi and A.A. Fahmy. On generalized modus ponens with multiple rules and a residuated implication. Fuzzy Sets Syst. 129 (2002) 267–274. [23] L.P. Holmblad and J.J. Ostergaard. Control of a cement kiln by fuzzy logic. In: M.M. Gupta and E. Sanchez (eds), Fuzzy Information and Decision Processes. NorthHolland, Amsterdam, 1982, pp. 389–399. [24] A. DiNola, A. Lettieri, I. Perfilieva, and V. Nov´ak. Algebraic analysis of fuzzy systems. Fuzzy Sets Syst. 158 (2007) 1–22.
12 Fuzzy Numbers and Fuzzy Arithmetic Luciano Stefanini, Laerte Sorini, and Maria Letizia Guerra
12.1 Introduction The scientific literature on fuzzy numbers and arithmetic calculations is rich in several approaches to define fuzzy operations having many desired properties that are not always present in the implementations of classical extension principle or its approximations (shape preservation, reduction of the overestimation effect, requisite constraints, distributivity of multiplication and division, etc.). What is well known to all practitioners is that appropriate use of fuzzy numbers in applications requires at least two features to be satisfied: 1. An easy way to represent and model fuzzy information with a sufficient or possibly high flexibility of shapes, without being constrained to strong simplifications, e.g., allowing asymmetries or nonlinearities; 2. A relative simplicity and computational efficiency to perform exact fuzzy calculations or to obtain good or errorcontrolled approximations of the results. The two requirements above, if not solved, are often a bottleneck in the utilization of fuzzy information and a lot of work and scientific literature has been spent in those directions. On the other hand, as we will see, the fuzzy calculations are not immediate to be performed and in many cases they require to solve mathematically or computationally hard subproblems (e.g., global optimization, setvalued analysis, intervalbased arithmetic, functional inverse calculation, and integration) for which a closed form is not available. Fuzzy sets (and numbers) are complementary to probability and statistics in modeling uncertainty, imprecision, and vagueness of data and information (see the presentation in [1]); together with the methodologies of interval analysis and rough sets, they are the basic elements of granular computing (GrC) and serve as a basis for the methodology of computing with words ([2]). In particular, fuzziness is essentially related to imprecision (or partial truth) and uncertainty in the boundaries of sets and numbers, while granularity (and the granulation techniques) defines the scale or the detail level at which the domain of the interested variable or object values are described and coded. Fuzzy granulation of information and data (see [3–5]) produces fuzzy sets and numbers for the represented granules; fuzzy logic and calculus are the basic mathematical concepts and tools to formalize fuzzy variable functions and relations. Handbook of Granular Computing C 2008 John Wiley & Sons, Ltd
Edited by Witold Pedrycz, Andrzej Skowron and Vladik Kreinovich
250
Handbook of Granular Computing
The arithmetical and topological structures of fuzzy numbers have been developed in the 1980s and this enabled to design the elements of fuzzy calculus (see [6, 7]); Dubois and Prade stated the exact analytical fuzzy mathematics and introduced the wellknown LR model and the corresponding formulas for the fuzzy operations. For the basic concepts see, e.g., [8–12]. More recently, the literature on fuzzy numbers has grown in terms of contributions to fuzzy arithmetic operations and to the use of simple formulas to approximate them; an extensive recent survey and bibliography on fuzzy intervals is in [13]. Zadeh’s extension principle (with some generalizations) plays a very important role in fuzzy set theory as it is a quite natural and reasonable principle to extend the operators and the mapping from classical set theory, as well as its structures and properties, into the operators and the mappings in fuzzy set theory ([14, 15]). In general, the arithmetic operations on fuzzy numbers can be approached either by the direct use of the membership function (by Zadeh’s extension principle) or by the equivalent use of the αcuts representation. The arithmetic operations and more general fuzzy calculations are natural when dealing with fuzzy reasoning and systems, where variables and information are described by fuzzy numbers and sets; in particular, procedures and algorithms have to take into account the existing dependencies (and constraints) relating all the operands involved and their meaning. The essential uncertainties are generally modeled in the preliminary definitions of the variables, but it is very important to pay great attention to how they propagate during the calculations. A solid result in fuzzy theory and practice is that calculations cannot be performed by using the same rules as in arithmetic with real numbers and in fact fuzzy calculus will not always satisfy the same properties (e.g., distributivity, invertibility, and others). If not performed by taking into account existing dependencies between the data, fuzzy calculations will produce excessive propagation of initial uncertainties (see [16–19]). As we will see, the application of Zadeh’s extension principle to the calculation of fuzzy expressions requires to solve simultaneously global (constrained) minimization and maximization problems and they have typically a combinatorial structure; the task is not easy, except for particular cases. For this reason, general algorithms have been proposed (the vertex method and its variants) but also specific methods based on the exploitation of the problem at hand to produce exact solutions or generate approximated subproblems to be solved more efficiently than the original ones. By the αcuts approach, it is possible to define a parametric representation of fuzzy numbers that allow a large variety of possible shapes and is very simple to implement, with the advantage of obtaining a much wider family of fuzzy numbers than for standard LR model (see [20–22]). This representation has the relevant advantage of being applied to the same [0, 1] interval for all the fuzzy numbers involved in the computations. In many fields of different sciences (physics, engineering, economics, social, and political sciences) and disciplines, where fuzzy sets and fuzzy logic are applied (e.g., approximate reasoning, image processing, fuzzy systems modeling and control, fuzzy decision making, statistics, operations research and optimization, computational engineering, artificial intelligence, and fuzzy finance and business) fuzzy numbers and arithmetic play a central role and are frequently and increasingly the main instruments (see [1, 9, 11, 12, 17, 19, 23, 24]). A significant research activity has been devoted to the approximation of fuzzy numbers and fuzzy arithmetic operations, by following essentially two approaches: the first is based on approximating the nonlinearities introduced by the operations, e.g., multiplication and division (see [20, 21] and references therein); the other consists in producing trapezoidal (linear) approximations based on the minimization of appropriate distance measures to obtain preservation of desired elements like expected intervals, values, ambiguities, correlation, and properties such as ordering, invariancy to translation, and scale transformation (see [25–29]). An advantage of the second approach is that, in general, the shape representations are simplified, but possibly uncontrolled errors are introduced by forcing linearization; on the other hand, the first approach has the advantage of better approximating the shape of the fuzzy numbers and this allows in most cases to control and reduce the errors but with a computational cost associated with the handling of nonlinearities. A difficulty in the adoption of fuzzy modeling is related to the fact that, from a mathematical and a practical view, fuzzy numbers do not have the same algebraic properties common to the algebra of
Fuzzy Numbers and Fuzzy Arithmetic
251
real numbers (e.g., a group algebraic structure) as, for example, the lack of inverses in fuzzy arithmetic (see [30]). It follows that modeling fuzzy numbers and performing fuzzy calculations has many facets and possible solutions have to balance simple representations and approximated calculations with a sufficient control in error propagation. The organization of the chapter is the following: Section 12.2 contains an introduction to the fuzzy numbers in the unidimensional and multidimensional cases; Section 12.3 introduces some simple and flexible representations of the fuzzy numbers, based on shapefunction modeling; in Section 12.4 the fundamental elements of the fuzzy operations and calculus are given; in Sections 12.5 and 12.6 we describe the procedures and detail some algorithms for the fuzzy arithmetic operations; and in Section 12.7 we illustrate some extensions to fuzzy mathematics (integration and differentiation of fuzzyvalued functions, fuzzy differential equations). The final Section 12.8 contains a brief account of recent applications and some concluding remarks.
12.2 Fuzzy Quantities and Numbers We will consider fuzzy quantities, i.e., fuzzy sets defined over the field R of real numbers and the ndimensional space Rn . In particular we will focus on particular fuzzy quantities, called fuzzy numbers, having a particular form of the membership function. Definition 1. A general fuzzy set over a given set (or space) X of elements (the universe) is usually defined by its membership function μ : X → T ⊆ [0, 1] and a fuzzy (sub)set u of X is uniquely characterized by the pairs (x, μu (x)) for each x ∈ X; the value μu (x) ∈ [0, 1] is the membership grade of x to the fuzzy set u. If T = {0, 1} (i.e., μu assumes only the two values 0 or 1), we obtain a subset of X in the classical settheoretic sense (what is called a crisp set in the fuzzy context) and μu is simply the characteristic function of u. Denote by F(X) the collection of all the fuzzy sets over X. Elements of F(X) will be denoted by letters u, v, w and the corresponding membership functions by μu , μv , μw . Of our interest are fuzzy sets when the space X is R (unidimensional real fuzzy sets) or Rn (multidimensional real fuzzy sets). Fundamental concepts in fuzzy theory are the support, the level sets (or level cuts), and the core of a fuzzy set (or of its membership function). Definition 2. Let μu be the membership function of a fuzzy set u over X. The support of u is the (crisp) subset of points of X at which the membership grade μu (x) is positive: supp(u) = {x  x ∈ X, μu (x) > 0};
(1)
we always assume that supp(u) = Ø. For α ∈]0, 1], the αlevel cut of u (or simply the αcut) is defined by [u]α = {x  x ∈ X, μu (x) ≥ α}
(2)
and for α = 0 (or α → +0) by the closure of the support [u]0 = cl{x  x ∈ X, μu (x) > 0}. The core of u is the set of elements of X having membership grade 1 core(u) = {x  x ∈ X, μu (x) = 1} and we say that u is normal if core(u) = Ø.
(3)
252
Handbook of Granular Computing
Wellknown properties of the levelcuts are [u]α ⊆ [u]β [u]α =
[u]β
for α > β,
(4)
for α ∈]0, 1]
(5)
β 0, ∀α ∈ [0, 1] and that u is negative if u + α < 0, ∀α ∈ [0, 1]; the sets of the positive and negative fuzzy numbers are denoted by F+ and F− respectively and their symmetric subsets by S+ and S− . A wellknown theorem in (generalized) convexity states that a function of a single variable over an interval I , μ : I → [0, 1] is quasi concave if and only if I can be partitioned into two subintervals I1 and I2 such that μ is nondecreasing over I1 and nonincreasing over I2 ; it follows that a quasiconcave membership function is formed of two monotonic branches, one on the left subinterval I1 and one on the right subinterval I2 ; further, if it reaches the maximum value in more than one point, there exists a third central subinterval where it is constant (and maximal). This is the basis for the socalled LR fuzzy numbers, as in Figure 12.1.
Definition 7. An LR fuzzy quantity (number or interval) u has membership function of the form ⎧ b−x L b−a if a ≤ x ≤ b ⎪ ⎪ ⎪ ⎪ ⎪ ⎨1 if b ≤ x ≤ c x−c μu (x) = if c ≤ x ≤ d R d−c ⎪ ⎪ ⎪ ⎪ ⎪ ⎩0 otherwise,
(20)
where L , R : [0, 1] → [0, 1] are two nonincreasing shape functions such that R(0) = L(0) = 1 and R(1) = L(1) = 0. If b = c, we obtain a fuzzy number. If L and R are invertible functions, then the αcuts are obtained by [u]α = [b − (b − a)L −1 (α), c + (d − c)R −1 (α)].
(21)
The usual notation for an LR fuzzy quantity is u = a, b, c, d L ,R for an interval and u = a, b, c L ,R for a number. We refer to functions L(.) and R(.) as the left and right branches (shape functions) of u, respectively. On the other hand, the level cuts of a fuzzy number are ‘nested’ closed intervals and this property is the basis for the LU representation. Definition 8. An LU fuzzy quantity (number or interval) u is completely determined by any pair u = (u − , u + ) of functions u − , u + : [0, 1] → R, defining the endpoints of the αcuts, satisfying the three conditions: (i) u − : α → u − α ∈ R is a bounded monotonic nondecreasing leftcontinuous function
255
Fuzzy Numbers and Fuzzy Arithmetic
∀α ∈]0, 1] and rightcontinuous for α = 0; (ii) u + : α → u + α ∈ R is a bounded monotonic nonincreasing + leftcontinuous function ∀α ∈]0, 1] and rightcontinuous for α = 0; (iii) u − α ≤ u α , ∀α ∈ [0, 1] . + − + − + The support of u is the interval [u − 0 , u 0 ] and the core is [u 1 , u 1 ]. If u 1 < u 1 , we have a fuzzy interval − + − and if u 1 = u 1 we have a fuzzy number. We refer to the functions u (.) and u + (.) as the lower and upper branches on u, respectively. The obvious relation between u − , u + , and the membership function μu is + μu (x) = sup{αx ∈ [u − α , u α ]}.
u− (.)
(22)
u+ (.)
and are continuous invertible functions then μu (.) is formed In particular, if the two branches − − by two continuous branches, the left being the increasing inverse of u − (.) on [u 0 , u 1 ] and the right the + + + decreasing inverse of u (.) on [u 1 , u 0 ]. + There are many choices for functions L(.), R(.) (and correspondingly for u − (.) and u (.) ); note that the + same model function is valid both for L and for R (or u − and u ). Simple examples are ( p = 1 for linear (.) (.) shapes) L(t) = (1 − t) p with p > 0, t ∈ [0, 1] and
(23)
L(t) = 1 − t with p > 0, t ∈ [0, 1]. p
Some more general forms can be obtained by orthogonal polynomials: for i = 1, 2, . . . , p, p ≥ 1, L(t) = ϕi, p (t) =
i
B j, p (t) , t ∈ [0, 1],
(24)
j=0 p! t j (1 − t) p− j, j = where B j, p (t) is the jth Bernstein polynomial of degree p given by B j, p (t) = j!( p− j)! 0, 1, . . . , p. Analogous forms can be used for the LU fuzzy quantities (as in Figure 12.2). If we start with an increasing shape function p(.) such that p(0) = 0, p(1) = 1, and a decreasing shape function q(.) such − + + that q(0) = 1, q(1) = 0, and with four numbers u − 0 ≤ u 1 ≤ u 1 ≤ u 0 defining the support and the core − + − + of u = (u , u ), then we can model u (.) and u (.) by − − − u− α = u 0 + (u 1 − u 0 ) p(α)
and
+ + + u+ α = u 1 + (u 0 − u 1 )q(α)
for all α ∈ [0, 1] .
(25)
u+
a 0
1
α
u−
Figure 12.2 Upper and lower branches of an LU fuzzy number. For each α ∈ [0, 1] the functions u − + and u + form the αcuts [u − α , uα ]
256
Handbook of Granular Computing
The simplest fuzzy quantities have linear branches (in LR and LU representations): Definition 9. A trapezoidal fuzzy interval, denoted by u = a, b, c, d, where a ≤ b ≤ c ≤ d, has αcuts [u]α = [a + α(b − a), d − α(d − c)], α ∈ [0, 1] , and membership function
μTra (x) =
⎧ x−a ⎪ b−a ⎪ ⎪ ⎨1
if a ≤ x ≤ b if b ≤ x ≤ c
d−x ⎪ ⎪ ⎪ ⎩ d−c 0
if c ≤ x ≤ d otherwise.
Some authors use the equivalent notation u = b, p, c, q, with p = b − a ≥ 0 and q = d − c ≥ 0 so that the support of u is [b − p, c + q] and the core is [b, c]. Definition 10. A triangular fuzzy number, denoted by u = a, b, c, where a ≤ b ≤ c, has αcuts [u]α = [a + α(b − a), c − α(c − b)] , α ∈ [0, 1] , and membership function μTri (x) =
⎧ ⎪ ⎨ ⎪ ⎩
x−a b−a c−x c−b
if a ≤ x ≤ b
0
otherwise.
if b ≤ x ≤ c
Some authors use the equivalent notation u = b, p, q, with p = b − a ≥ 0 and q = c − b ≥ 0 so that the support of u is [b − p, b + q] and the core is {b}. Other forms of fuzzy numbers have been proposed in the literature, e.g., the quasiGaussian membership function (m ∈ R, k, σ ∈ R+ , and if k → +∞, the support is unbounded) 2 if m − kσ ≤ x ≤ m + kσ exp − (x−m) 2 2σ μqGauss (x) = 0 otherwise, and the hyperbolic tangent membership function 2 1 + tanh − (x−m) 2 σ μhTangent (x) = 0
if m − kσ ≤ x ≤ m + kσ otherwise.
To have continuity and μ = 0 at the extreme values of the support [m − kσ, m + kσ ], we modify the fuzzy membership functions above to the following: 2 ⎧ (x−m)2 ⎪ − exp −k exp − ⎪ 2 2σ ⎨ 2 2 if m − kσ ≤ x ≤ m + kσ μ(x) = (26) 1 − exp − k2 ⎪ ⎪ ⎩ 0 otherwise, and μ(x) =
⎧ 2 ⎪ ⎨ tanh(−k 2 ) − tanh − (x−m) σ2 ⎪ ⎩
tanh(−k 2 ) 0
if m − kσ ≤ x ≤ m + kσ. otherwise.
(27)
257
Fuzzy Numbers and Fuzzy Arithmetic
12.2.2 Multidimensional Fuzzy Quantities Any quasiconcave uppersemicontinuous membership function μu : Rn → [0, 1], with compact support and nonempty core, defines a fuzzy quantity u ∈ F n and it can be considered as a general possibility distribution (see [32–34]). A membership function μ j : R → [0, 1] is called the jth marginal of μu : Rn → [0, 1] if, for all x ∈ R, μ j (x) = max{μu (x1 , . . . , x j−1 , x, x j+1 , . . . , xn )  xi ∈ R, i = j}
(28)
and the corresponding fuzzy set (i.e., having μ j as membership function) is called the jth projection of u ∈ F n . It is obvious that the availability of all the projections is not sufficient, in general, to reconstruct the original membership function μu and we say that the projections are interacting each other. (For a discussion of interacting fuzzy numbers see [11, 35, 36].) Particular ndimensional membership functions can be obtained by the Cartesian product of n unidimensional fuzzy numbers or intervals. Let u j ∈ FI have membership functions μu j (x j ) for j = 1, 2, . . . , n; the membership function of the vector u = (u 1 , . . . , u n ) of noninteracting fuzzy quantities u j ∈ FI is defined by (or satisfies) μu (x1 , . . . , xn ) = min{μu j (x j ), j = 1, 2, . . . , n}. In this case, if the αcuts of u j are [u j ]α = [u −j,α , u +j,α ], α ∈ [0, 1], j = 1, 2, . . . , n, then the αcuts of u are the cartesian products + − + [u]α = [u 1 ]α × · · · × [u n ]α = [u − 1,α , u 1,α ] × · · · × [u n,α , u n,α ].
(29)
For noninteracting fuzzy quantities, the availability of the projections is sufficient to define the vector; we denote by FnI (or by Fn if all u j ∈ F) the corresponding set. Fuzzy calculations with interacting numbers are in general quite difficult, with few exceptions; in the following we will consider fuzzy arithmetic based on unidimensional and multidimensional noninteracting fuzzy quantities.
12.3 Representation of Fuzzy Numbers As we have seen in the previous section, the LR and the LU representations of fuzzy numbers require to use appropriate (monotonic) shape functions to model either the left and right branches of the membership function or the lower and upper branches of the αcuts. In this section we present the basic elements of a parametric representation of the shape functions proposed in [20] and [21] based on monotonic Hermitetype interpolation. The parametric representations can be used both to define the shape functions and to calculate the arithmetic operations by errorcontrolled approximations. We first introduce some models for ‘standardized’ differentiable monotonic shape functions p : [0, 1] → [0, 1] such that p(0) = 0
and
p(1) = 1 with p(t) increasing on [0, 1];
if interested in decreasing functions, we can start with an increasing function p(.) and simply define corresponding decreasing functions q : [0, 1] → [0, 1] by q(t) = 1 − p(t)
or q(t) = p(ϕ(t)),
where ϕ : [0, 1] → [0, 1] is any decreasing bijection (e.g., ϕ(t) = 1 − t).
258
Handbook of Granular Computing
p (t) 1
p ′(0 ) p ′(1 )
t 0
1
Figure 12.3 Standard form of the monotonic Hermitetype interpolation function: p(0) = 0, p(1) = 1 and p (0) = β0 , p (1) = β1 As illustrated in [21], increasing functions p : [0, 1] → [0, 1] satisfying the four Hermitetype interpolation conditions p(0) = 0, p(1) = 1
and
p (0) = β0 , p (1) = β1
for any value of the two nonnegative parameters βi ≥ 0, i = 0, 1, can be used as valid shape function: we obtain infinite many functions simply by fixing the two parameters βi that give the slopes (first derivatives) of the function at t = 0 and t = 1 (see Figure 12.3). To explicit the slope parameters we denote the interpolating function by t → p(t; β0 , β1 )
for
t ∈ [0, 1].
We recall here two of the basic forms illustrated in [21]:
r (2,2)Rational spline: p(t; β0 , β1 ) =
t 2 + β0 t(1 − t) . 1 + (β0 + β1 − 2)t(1 − t)
(30)
r Mixed exponential spline: 1 2 [t (3 − 2t) + β0 − β0 (1 − t)a + β1 t a ], a where a = 1 + β0 + β1 .
p(t; β0 , β1 ) =
(31)
Note that in (30) and (31) we obtain a linear p(t) = t, ∀t ∈ [0, 1], if β0 = β1 = 1 and a quadratic p(t) = t 2 + β0 t(1 − t) if β0 + β1 = 2. In order to produce different shapes we can either fix the slopes β0 and β1 (if we have information on the first derivatives at t = 0, t = 1) or we can estimate them by knowing the values of p(t) in additional points. For example, if 0 < p1 < · · · < pk < 1 are given k ≥ 2 increasing values of p(ti ), i = 1, . . . , k, at internal points 0 < t1 < · · · < tk < 1, we can estimate the slopes by solving the following twovariable constrained minimization problem: min F(β0 , β1 ) =
k [ p(t j ; β0 , β1 ) − p j ]2 j=1
s.t. β0 , β1 ≥ 0.
(32)
259
Fuzzy Numbers and Fuzzy Arithmetic
If the data 1 > q1 > · · · > qk > 0 are decreasing (as for the right or upper branches), the minimization (32) will have the objective function G(β0 , β1 ) =
k
1 − p(t j ; β0 , β1 ) − q j
2
.
j=1
The model functions above can be adopted not only to define globally the shapes, but also to represent the functions ‘piecewise’, on a decomposition of the interval [0, 1] into N subintervals 0 = α0 < α1 < · · · < αi−1 < αi < · · · < α N = 1. + It is convenient to use the same subdivision for both the lower u − α and upper u α branches (we can always reduce to this situation by the union of two different subdivisions). We have a preference in using a uniform subdivision of the interval [0, 1] and in refining the decomposition by successively bisecting each subinterval, producing N = 2 K , K ≥ 0. In each subinterval Ii = [αi−1 , αi ], the values and the slopes of the two functions are − + + − − + + u− (αi−1 ) = u 0,i , u (αi−1 ) = u 0,i , u (αi ) = u 1,i , u (αi ) = u 1,i − + + − − + + u − (αi−1 ) = d0,i , u (αi−1 ) = d0,i , u (αi ) = d1,i , u (αi ) = d1,i ;
(33)
i−1 , α ∈ Ii , each subinterval Ii is mapped into the standard [0, 1] and by the transformation tα = αα−α i −αi−1 interval to determine each piece independently and obtain general leftcontinuous LU fuzzy numbers. Globally continuous or more regular C (1) fuzzy numbers can be obtained directly from the data (e.g., − + + − − + + u− 1,i = u 0,i+1 , u 1,i = u 0,i+1 for continuity and d1,i = d0,i+1 , d1,i = d0,i+1 for differentiability at α = αi ). ± Let pi (t) denote the model function on Ii ; we easily obtain
− − + + pi− (t) = p(t; β0,i , β1,i ) , pi+ (t) = 1 − p(t; β0,i , β1,i ),
with β −j,i
(34)
αi − αi−1 − αi − αi−1 + + = − d j,i for j = 0, 1, − d j,i and β j,i = − + u 1,i − u 0,i u 1,i − u + 0,i
so that, for α ∈ [αi−1 , αi ] and i = 1, 2, . . . , N , − − − − u− α = u 0,i + (u 1,i − u 0,i ) pi (tα ) , tα =
α − αi−1 ; αi − αi−1
(35)
+ + + + u+ α = u 0,i + (u 1,i − u 0,i ) pi (tα ) , tα =
α − αi−1 . αi − αi−1
(36)
12.3.1 Parametric LU Fuzzy Numbers The monotonic models illustrated in the previous section suggest a first parametrization of fuzzy numbers + obtained by representing the lower and upper branches u − α and u α of u on the trivial decomposition of interval [0, 1], with N = 1 (without internal points) and α0 = 0, α1 = 1. In this simple case, u can be represented by a vector of eight components: (The slopes corresponding to u i− are denoted by δu i−, etc.) − + + − − + + u = (u − 0 , δu 0 , u 0 , δu 0 ; u 1 , δu 1 , u 1 , δu 1 ),
(37)
− − − + + + + − where u − 0 , δu 0 , u 1 , δu 1 are used for the lower branch u α , and u 0 , δu 0 , u 1 , δu 1 for the upper branch + uα . On a decomposition 0 = α0 < α1 < · · · < α N = 1 we can proceed piecewise. For example, a differentiable shape function requires 4(N + 1) parameters
u = (αi ; u i− , δu i− , u i+ , δu i+ )i=0,1,...,N with − + + + ≤ u− 1 ≤ · · · ≤ u N ≤ u N ≤ u N −1 ≤ · · · ≤ u 0 (data) + ≥ 0, δu i ≤ 0 (slopes),
u− 0 δu i−
(38)
260
Handbook of Granular Computing
2
1
1.5
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
1 0.5 0 −0.5 −1 −1.5 −2
0 −4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
(a) LU form
−1 0 1 (b) LR form
2
3
4
Figure 12.4 LU and LR parametric Fuzzy numbers. (a) Fuzzy number in LU representation; the parameters are reported in (39) and the construction is obtained by the mixed spline with N = 2. (b) QuasiGaussian fuzzy number; the parameters are reported in (43) and the membership function is obtained by the mixed spline with N = 4. and the branches are computed according to (35) and (36). An example with N = 4 is in (39) and is plotted in Figure 12.4a. LU parametrization of a fuzzy number αi
u i−
δu i−
u i+
δu i+
0.0 0.5 1.0
−2.0 −1.0 0.0
5.0 1.5 2.5
2.0 1.2 0.0
−0.5 −2.0 −0.1
(39)
12.3.2 Parametric LR Fuzzy Numbers The (parametric) monotonic splines can be used as models for the shape functions L and R; in fact, if β0 , β1 ≥ 0 are given and we consider q(t; β0 , β1 ) = 1 − p(t; β0 , β1 ),
(40)
then q(0) = 1, q(1) = 0, q (0) = −β0 , q (1) = −β1 and we can write the L and R shapes as L(t) = q(t; β0,L , β1,L ) R(t) = q(t; β0,R , β1,R ).
(41)
An LR fuzzy number can be obtained by using (40) and (41) with the parameters u LR = (u 0,L , δu 0,L , u 0,R , δu 0,R ;
u 1,L , δu 1,L , u 1,R , δu 1,R ),
(42)
provided that u 0,L ≤ u 1,L ≤ u 1,R ≤ u 0,R and the slopes δu 0,L , δu 1,L ≥ 0 and δu 0,R , δu 1,R ≤ 0 are the first derivatives of the membership function μ in (20) at the points x = u 0,L , x = u 1,L , x = u 0,R , and x = u 1,R , respectively. The β parameters in (41) are related to the slopes by the equations δu 0,L =
β1,L ≥ 0, u 1,L − u 0,L
δu 1,L =
β0,L ≥ 0; u 1,L − u 0,L
δu 0,R =
β1,R ≤ 0, u 1,R − u 0,R
δu 1,R =
β0,R ≤ 0. u 1,R − u 0,R
On a decomposition 0 = α0 < α1 < · · · < α N = 1 we proceed similarly to (38).
261
Fuzzy Numbers and Fuzzy Arithmetic
As two examples, the LR parametrization of a fuzzy QuasiGaussian number (m = 0, σ = 2, and k = 2) approximated with N = 4 (five points) is (see Figure 12.4b), LR parametrization of fuzzy number (26) αi 0.0 0.25 0.5 0.75 1.0
(43)
u i,L
δu i,L
u i,R
δu i,R
−4.0 −2.8921 −2.1283 −1.3959 0.0
0.156518 0.293924 0.349320 0.316346 0.0
4.0 2.8921 2.1283 1.3959 0.0
−0.156518 −0.293924 −0.349320 −0.316346 0.0
and of a hyperbolic tangent fuzzy number (m = 0, σ = 3, and k = 1) is LR parametrization of fuzzy number (27) αi 0.0 0.25 0.5 0.75 1.0
(44)
u i,L
δu i,L
u i,R
δu i,R
−3.0 −2.4174 −1.8997 −1.3171 0.0
0.367627 0.475221 0.473932 0.370379 0.0
3.0 2.4174 1.8997 1.3171 0.0
−0.367627 −0.475221 −0.473932 −0.370379 0.0
The representations are exact at the nodes αi and the average absolute errors in the membership functions (calculated in 1000 uniform x values of the corresponding supports [−4, 4] and [−3, 3]) are 0.076% and 0.024% respectively.
12.3.3 Switching LR and LU The LU and LR parametric representations of fuzzy numbers produce subspaces of the space of fuzzy numbers. Denote by F LU and by F L R the sets of (differentiable shaped) fuzzy numbers defined by (37) LR and (42), respectively, and by F LU N and by F N the corresponding extensions to a uniform decomposition αi = Ni ∈ [0, 1], i = 0, 1, . . . , N , into N subintervals. By using equations (46) there is a onetoone LR correspondence between F LU N and F N so that we can go equivalently from a representation to the other. − − − − − + + For example, for the case N = 1, let u − α = u 0 + (u 1 − u 0 ) p(α; β0 , β1 ) and u α = u 0 + + + + (u + − u ) p(α; β , β ) be the lower and upper functions of the LU representation of a fuzzy number 1 0 0 1 u ∈ F LU ; the LR representation of u has the membership function ⎧ −
− − − −1 x−u 0 ⎪ if x ∈ u − p − − ; β0 , β1 ⎪ 0 , u1 u −u ⎪ 1 0 ⎪
⎪ + ⎨1 if x ∈ u − 1 , u1 μ(x) = (45) + + + ⎪ + + −1 u 1 −x ⎪ ; β , β , u if x ∈ u p ⎪ + + 0 1 1 0 ⎪ u 1 −u 0 ⎪ ⎩ 0 otherwise, where α = p −1 (t; β0 , β1 ) is the inverse function of t = p(α; β0 , β1 ). If we model the LU fuzzy numbers by a (2,2)rational spline p(α; β0 , β1 ) like (30), the inverse p −1 (t; β0 , β1 ) can be computed analytically as we have to solve the quadratic equation (with respect to α) α 2 + β0 α(1 − α) = t[1 + (β0 + β1 − 2)α(1 − α)], i.e., (1 + A(t))α 2 − A(t)α − t = 0, where A(t) = −β0 + β0 t + β1 t − 2t. If A(t) = −1, then the equation is linear and the solution is α = t. If A(t) = −1, then there exist two real solutions and we choose the one belonging to [0, 1].
262
Handbook of Granular Computing
We can also switch the two representations: for example, for a given LR fuzzy number u ∈ F L R given by (42), its approximated LU representation u ∈ F LU corresponding to (37) is ⎧ − + + − + + u LU = (u − u− ⎪ 0 , δu 0 , u 0 , δu 0 ; 1 , δu 1 , u 1 , δu 1 ) (46) ⎪ ⎪ ⎪ ⎪ with ⎪ ⎪ ⎪ ⎪ − 1 ⎪ u− ⎨ 0 = u 0,L , δu 0 = δu 0,L ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
− u− 1 = u 1,L , δu 1 =
1 δu 1,L
+ u+ 0 = u 0,R , δu 0 =
1 δu 0,R
+ u+ 1 = u 1,R , δu 1 =
1 δu 1,R
.
(If some δu i,L , δu i,R is zero, the corresponding infinite δu i− , δu i+ slope can be assigned a BIG number.) Following [37] we can define a geometric distance D p (u, v) between fuzzy numbers u, v ∈ F LU, given by − p − − p + + p + + p D pLU (u, v) = [u − 0 − v0  + u 1 − v1  + u 0 − v0  + u 1 − v1  − p − − p + + p + + p 1/ p +δu − 0 − δv0  + δu 1 − δv1  + δu 0 − δv0  + δu 1 − δv1  ]
and LU (u, v) D∞
− − − + + + + = max { u − 0 − v0 , u 1 − v1 , u 0 − v0 , u 1 − v1 , − − − − + + + δu 0 − δv0 , δu 1 − δv1 , δu 0 − δv0 , δu 1 − δv1+ }.
LR Analogous formulas can be introduced for F L R and for F LU N and F N . In very recent years, a particular attention has been dedicated to the class of trapezoidal and triangular fuzzy numbers, as they are one of the simplest representations of fuzzy uncertainty and only four parameters are sufficient to characterize them. For this reason, several methods have been proposed to approximate fuzzy numbers by the trapezoidal family (see [25–29, 38].) The families of fuzzy numbers F L R and F LU , which include triangular and trapezoidal fuzzy numbers, are (in the simpler form) characterized by eight parameters and it appears that the inclusion of the slopes of lower and upper functions, even without generating piecewise monotonic approximations over subintervals (i.e., working with N = 1) is able to capture much more information than the linear approximation (see [20, 21]).
12.4 Fuzzy Arithmetic: Basic Elements The fuzzy extension principle introduced by Zadeh in [32, 39] is the basic tool for fuzzy calculus; it extends functions of real numbers to functions of noninteractive fuzzy quantities and it allows the extension of arithmetic operations and calculus to fuzzy arguments. We have already defined the addition (9) and the scalar multiplication (10). Let ◦ ∈ {+, −, ×, /} be one of the four arithmetic operations and let u, v ∈ FI be given fuzzy intervals − + (or numbers), − having μu (.) and μv (.) as membership functions and level cuts representations u = u , u + and v = v , v ; the extension principle for the extension of ◦ defines the membership function of w = u ◦ v by μu◦v (z) = sup { min{μu (x), μv (y)}  z = x ◦ y}.
(47)
In terms of the αcuts, the four arithmetic operations and the scalar multiplication for k ∈ R are obtained by the wellknown interval arithmetic: Addition:
u + v = (u − + v − , u + + v + ) − + + α ∈ [0, 1], [u + v]α = [u − α + vα , u α + vα ].
263
Fuzzy Numbers and Fuzzy Arithmetic
Scalar multiplication: ⎧ ⎪ ⎨ ⎪ ⎩
ku = (ku − , ku + )
if k > 0
ku = (ku + , ku − )
if k < 0
α ∈ [0, 1], [ku]α =
+ − + [min{ku − α , ku α }, max{ku α , ku α }].
Subtraction:
u − v = u + (−v) = (u − − v + , u + − v − ) + + − α ∈ [0, 1], [u − v]α = [u − α − vα , u α − vα ].
Multiplication: ⎧ ⎪ ⎨
⎪ ⎩ α ∈ [0, 1] ,
u × v = (uv)− , (uv)+ − − − + + − + + (uv)− α = min{u α vα , u α vα , u α vα , u α vα } − − − + + − + + (uv)+ α = max{u α vα , u α vα , u α vα , u α vα }
.
Division: + u − , uv if 0 ∈ / [v0− , v0+ ] v ⎧ − − + + − u u u u ⎪ ⎨ uv α = min v−α , v+α , v−α , v+α α α α α ⎪ ⎪ − − + + ⎪ α ∈ [0, 1] , ⎪ + ⎪ ⎩ ⎩ u = max u−α , u+α , u−α , u+α . v α v v v v ⎧ ⎪ ⎪ ⎪ ⎪ ⎨
u v
=
α
α
α
α
From an algebraic point of view, addition and multiplication are commutative and associative and have a neutral element. If we include the crisp numbers 0 and 1 into the set of fuzzy numbers with [0]α = [0, 0] = {0} and [1]α = [1, 1] = {1}, it is easy to see that, for every u ∈ FI , u + 0 = 0 + u = u (additive neutral element) and u × 1 = 1 × u = u (multiplicative neutral element). But addition and multiplication of fuzzy numbers do not admit an inverse element: u+v = w ⇔ / u =w−v u/v = w ⇔ / u = vw. For given u ∈ FI and real numbers p and q, it is pu + qu = ( p + q)u unless p and q have the same sign ( pq ≥ 0) so that, in particular, u − u = (1 − 1)u = 0. Analogously, u/u = 1. This implies that the ‘inversion’ of fuzzy operations, like in the cases u + v − u or uv/u, is not possible in terms of what is expected of crisp numbers; e.g., (3u 2 − 2u 2 )v/u = uv. It is clear that the direct application of the fuzzy extension principle to the computation of expressions u + v − u and uv/u always produces the correct result v; but it cannot generally be obtained by the iterative application of the extension principle to ‘parts’ of the expressions. For example, the twostep procedure Step 1. w1 = u + v
or
w1 = uv
Step 2. w = w1 − u
or
w = w1 /u
will always produce w = v (exception if some numbers are crisp). Also the distributivity property (u + v)w = uw + vw is valid only in special cases, e.g., (for recent results see [40, 41]) w w w w
∈ ∈ ∈ ∈
F and u, v ∈ F+ , or F and u, v ∈ F− , or F and u, v ∈ S0 , or F+ ∪ F− and u, v ∈ F0 .
264
Handbook of Granular Computing
The simple examples given above suggest that fuzzy arithmetic has to be performed very carefully and, in particular, we cannot mimic the rules of standard crisp situations. This does not mean that fuzzy arithmetic (based on extension principle) is not compatible with crisp arithmetic; we cannot use in the fuzzy context the same algorithms and rules as for crisp calculations. Further investigation (in particular [16]) has pointed out that critical cases are related to the multiple occurrence of some fuzzy quantities in the expression to be calculated. From a mathematical point of view this is quite obvious as min{ f (x1 , x2 )  x1 ∈ A, x2 ∈ A} is not the same as min{ f (x, x)  x ∈ A} and, e.g., [u 2 ]α = [min{x 2  x ∈ [u]α }, max{x 2  x ∈ [u]α }] is not the same as [u · u]α = [min{x y  x, y ∈ [u]α }, max{x y  x, y ∈ [u]α }]. In the fuzzy or in the interval arithmetic contexts, equation u = v + w is not equivalent to w = u − v = u + (−1)v or to v = u − w = u + (−1)w, and this has motivated the introduction of the following Hukuhara difference (Hdifference) (see [8]): Definition 11. Given u, v ∈ F, the Hdifference of u and v is defined by u v = w ⇔ u = v + w; if − + + u v exists, it is unique and its αcuts are [u v]α = [u − α − vα , u α − vα ]. Clearly, u u = {0}. The Hdifference is also motivated by the problem of inverting the addition: if x, y are crisp numbers then (x + y) − y = x, but this is not true if x, y are fuzzy. It is possible to see that (see [42]) if u and v are fuzzy numbers (and not in general fuzzy sets), then (u + v) v = u; i.e., the Hdifference inverts the addition of fuzzy numbers. Note that in defining the Hdifference, also the following case can be taken into account u v = w ⇔ v = u + (−1)w and the Hdifference can be generalized to the following definition. Definition 12. Given u, v ∈ F, the generalized Hdifference can be defined as the fuzzy number w, if it exists, such that u gv = w ⇔
(i) u = v + w
either or
(ii) v = u + (−1)w
.
If u g v exists, it is unique and its αcuts are given by − + + − − + + [u g v]α = [min{u − α − vα , u α − vα }, max{u α − vα , u α − vα }].
If u v exists, then u v = u g v and if (i) and (ii) are satisfied simultaneously, then w is a crisp number. Also, u g u = u u = {0}. Two simple examples on real (crisp) compact intervals illustrate the generalization (from [8, p. 8]; [−1, 1] [−1, 0] = [0, 1] as in fact (i) is [−1, 0] + [0, 1] = [−1, 1], but [0, 0] g [0, 1] = [−1, 0] and [0, 1] g [− 12 , 1] = [0, 12 ] satisfy (ii). Note that [a, b] g [c, d] = [min{a − c, b − d}, max{a − c, b − d}] is always defined for real intervals. The generalized Hdifference is (implicitly) used by Bede and Gal (see [43]) in their definition of generalized differentiability of a fuzzyvalued function. Consider now the extension of function f : Rn → R to a vector of n (noninteractive) fuzzy numbers u = (u 1 , . . . , u n ) ∈ (FI )n , with kth component u k ∈ FI + [u k ]α = [u − k,α , u k,α ] for k = 1, 2, . . . , n (αcuts) μu k : supp (u k ) → [0, 1] for k = 1, 2, . . . , n (membership function)
265
Fuzzy Numbers and Fuzzy Arithmetic
and denote v = f (u 1 , . . . , u n ) with membership function μv and LU representation v = (v − , v + ); the extension principle states that μv is given by sup{min{μu 1 (x1 ), . . . , μu n (x x )}  y = f (x1 , . . . , xn )} if y ∈ Range( f ) (48) μv (y) = 0 otherwise, where Range( f ) = {y ∈ R  ∃(x1 , . . . , xn ) ∈ Rn s.t. y = f (x1 , . . . , xn )}. For a continuous function f : Rn → R, the αcuts of the fuzzy extension v are obtained by solving the following boxconstrained global optimization problems (α ∈ [0, 1]): vα− = min{ f (x1 , . . . , xn )  xk ∈ [u k ]α , k = 1, 2, . . . , n};
(49)
vα+ = max{ f (x1 , . . . , xn )  xk ∈ [u k ]α , k = 1, 2, . . . , n};
(50)
The lower and upper values vα− and vα+ of v define equivalently (as f is assumed to be continuous) the n
image of the cartesian product × [u k ]α via f , i.e. (see Figure 12.5), k=1
[vα− , vα+ ] = f ([u 1 ]α , . . . , [u n ]α ). If function f (x1 , . . . , xn ) is sufficiently simple, the analytical expressions for vα− and vα+ can be obtained, as it is the case for many unidimensional elementary functions (see, e.g., [44]). For general functions, such as polynomials or trigonometric functions, for which many min/max global points exist, we need to solve numerically the global optimization problems (49) and (50) above; general methods for global optimization have been proposed and a very extended scientific literature is available. It is clear that in these cases we have only the possibility of fixing a finite set of values α ∈ {α0 , . . . , α M } and obtain the corresponding vα− and vα+ pointwise; a sufficiently precise calculation requires M in the range from 10 to 100 or more (depending on the application and the required precision) and the computational time may become very high. To reduce these difficulties, various specific heuristic methods have been proposed and all the specific methods try to take computational advantage from the specific structure of ‘nested’ optimizations (49)–(50) intrinsic in the property (4) of the αcuts; among others, the vertex method and its variants
α
[u1]α
α
[v]α
α [v]α= {f (x1, x2) / x1 ∈[u1]α,x2 ∈ [u2]α} [u2]α
Figure 12.5 Interval view of fuzzy arithmetic. Each αcut [v]α of fuzzy number v = f (u 1 , u 2 ) is the image via function f of the αcuts of u 1 and u 2 corresponding to the same membership level α ∈ [0, 1]
266
Handbook of Granular Computing
(see [45–47]), the fuzzy weighted average method (see [48]), the general transformation method (see [49–51]), and the interval arithmetic optimization with sparse grids (see [52]). The computational complexity of the algorithms is generally exponential in the number n of (distinct) operators and goes from O(M2n ) of the vertex method to O(M n ) of the complete version of the transformation method. Since its origins, fuzzy calculus has been related and has received improvements from interval analysis (see [53, 54] and the references therein); the overestimation effect that arises in interval arithmetic when a variable has more than one occurrence in the expression to be calculated is also common to fuzzy calculus and ideas to overcome this are quite similar ([23, 55]). At least in the differentiable case, the advantages of the LU representation appear to be quite interesting, based on the fact that a small number of α points is in general sufficient to obtain good approximations (this is the essential gain in using the slopes to model fuzzy numbers), so reducing the number of constrained min (49) and max (50) problems to be solved directly. On the other hand, finding computationally efficient extension solvers is still an open research field in fuzzy calculations.
12.4.1 Constrained Fuzzy Arithmetic A research area in fuzzy calculations attains the socalled overestimation effect associated with the adoption of interval arithmetic for the calculation of fuzzy arithmetic expressions. The fuzzy literature is rich in examples and a general setting has been formulated and illustrated by Klir in [16, 17]; after his paper both ‘radical’ and ‘compromise’ solutions have been proposed. The basic question is associated with the fact that, in standard interval arithmetic, addition and multiplication do not possess inverse elements and, in particular, u − u = 0 or u/u = 1 or the fuzzy extension (48) of f (x) = x n to a fuzzy argument u is not equivalent to the product u · · · u · · · u (n times). In this context, a fuzzy expression like z = 3x − (y + 2x) − (u 2 + v)(v + w 2 ), for given fuzzy numbers x, y, u, v, and w, if calculated by the application of the standard interval arithmetic (INT), produces a fuzzy number z INT with αcuts [z α− , z α+ ]INT that are much larger than the fuzzy extension principle (49) and (50) applied to z = f (x, y, u, v, w). In particular, the constrained arithmetic requires that, in the expression above, 3x − (y + 2x) and (u 2 + v)(v + w 2 ) be computed with constraints induced by the double occurrence of x (as 3x and 2x), of u, w (as u 2 and w2 ), and of v. The ‘radical’ solution (constrained fuzzy arithmetic, CFA) produces the extension principle (48) result: in particular, it requires that 3x − (y + 2x) = x − y and (u 2 + v)(v + w 2 ) be obtained by 2 2 [(u 2 + v)(v + w 2 )]− α = min{(a + b)(b + c )  a ∈ [u]α ; b ∈ [v]α ; c ∈ [w]α } 2 2 [(u 2 + v)(v + w 2 )]+ α = max{(a + b)(b + c )  a ∈ [u]α ; b ∈ [v]α ; c ∈ [w]α }.
(Denote by z CFA the corresponding results.) Observe, in particular, that (u 2 )CFA = (uu)INT . The full adoption of the CFA produces a great increase in computational complexity, as the calculations cannot be decomposed sequentially into binary operations and all the variables have to be taken globally and the dimension may grow very quickly with the number of distinct operands. Also a mixed (compromise) approach (see [56]) is frequently used, e.g., z MIX = (3x − (y + 2x))CFA − ((u 2 )CFA + v)(v + (w 2 )CFA ), where only isolated parts of the expression are computed via CFA (e.g., 3x − (y + 2x) is simplified to x − y and u 2 and w2 are obtained via the unary square operator) and the other operations are executed by interval arithmetic. It is well known that, in general, [z CFA ]α ⊆ [z MIX ]α ⊆ [z INT ]α . In a recent paper, Chang and Hung (see [57]) have proposed a series of rules to simplify the calculation of algebraic fuzzy expressions, by identifying components to be solved by the direct use of the vertex method, such as products and sums of powers, and by isolating subfunctions that operate on partitions of the total variables so as to reduce the complexity or to calculate directly according to a series of catalogued cases to simplify the applications of the vertexlike methods.
Fuzzy Numbers and Fuzzy Arithmetic
267
12.5 Algorithmic Fuzzy Arithmetic In [20] and [21] we have analyzed the advantages of the LU representation in the computation of fuzzy expressions, by the direct interval arithmetic operations (INT) or by the equalityconstrained fuzzy arithmetic (CFA) method of Klir. In this section we adopt an algorithmic approach to describe the application of the fuzzy extension principle to arithmetic operators and to fuzzy function calculation associated with the LU representation of the fuzzy quantities involved. For simplicity, we will illustrate the case of differentiable representations (38); if the functions are not differentiable or if the slopes are not used (i.e., only the values u i− and u i+ are used) then in each algorithm we can omit all the blocks referring to the δu i− , δu i+ . For fuzzy basic operations we have easytoimplement algorithms, based on the application of exact fuzzy operations at the nodes of the αsubdivision.1 Algorithm 1 (LU addition, subtraction, and Hdifference). Let u = (u i− , δu i− , u i+ , δu i+ )i=0,1,...,N and v = (vi− , δvi− , vi+ , δvi+ )i=0,1,...,N be given; calculate w = u + v, z = u − v, and y = u v with w = (wi− , δwi− , wi+ , δwi+ )i=0,1,...,N , y = (yi− , δyi− , yi+ , δyi+ )i=0,1,...,N , and z = (z i− , δz i− , z i+ , δz i+ )i=0,1,...,N . For i = 0, 1, . . . , N wi− = u i− + vi− , z i− = u i− − vi+ , yi− = u i− − vi− δwi− = δu i− + δvi− , δz i− = δu i− − δvi+ , δyi− = δu i− − δvi− wi+ = u i+ + vi+ , z i+ = u i+ − vi− , yi+ = u i+ − vi+ δwi+ = δu i+ + δvi+ , δz i+ = δu i+ − δvi− , δyi+ = δu i+ − δvi+ end test if conditions (38) are satisfied for (yi− , δyi− , yi+ , δyi+ )i=0,1,...,N . Algorithm 2 (LU scalar multiplication). Let k ∈ R and u = (u i− , δu i− , u i+ , δu i+ )i=0,1,...,N be given; calculate w = ku with w = (wi− , δwi− , wi+ , δwi+ )i=0,1,...,N . For i = 0, 1, . . . , N if k ≥ 0 then wi− = ku i− , δwi− = kδu i− , wi+ = ku i+ , δwi+ = kδu i+ else wi− = ku i+ , δwi− = kδu i+ , wi+ = ku i− , δwi+ = kδu i− end Algorithm 3 (LU multiplication). Let u = (u i− , δu i− , u i+ , δu i+ )i=0,1,...,N and v = (vi− , δvi− , vi+ , . .δvi+ )i=0,1,...,N be given; calculate w = uv with w = (wi− , δwi− , wi+ , δwi+ )i=0,1,...,N. For i = 0, 1, . . . , N m i = min{u i− vi− , u i− vi+ , u i+ vi− , u i+ vi+ } Mi = max{u i− vi− , u i− vi+ , u i+ vi− , u i+ vi+ } wi− = m i , wi+ = Mi
1
In multiplication and division with symmetric fuzzy numbers, the min and the max values of products and ratios can be attained more than once, as for [−3, 3] ∗ [−2, 2], where min = (−3)(2) = (3)(−2) and max = (3)(2) = (2)(3); in these cases, the slopes are to be calculated carefully by avoiding improper use of branches. We suggest to keep the correct branches by working with [−3 − ε, 3 + ε] ∗ [−2 − ε, 2 + ε], where ε is a very small positive number (e.g., ε 10−6 ). Similarly for cases like [a, a] ∗ [b, b].
268
Handbook of Granular Computing
if u i− vi− = m i then δwi− = δu i− vi− + u i− δvi− elseif u i− vi+ = m i then δwi− = δu i− vi+ + u i− δvi+ elseif u i+ vi− = m i then δwi− = δu i+ vi− + u i+ δvi− elseif u i+ vi+ = m i then δwi− = δu i+ vi+ + u i+ δvi+ endif if u i− vi− = Mi then δwi+ = δu i− vi− + u i− δvi− elseif u i− vi+ = Mi then δwi+ = δu i− vi+ + u i− δvi+ elseif u i+ vi− = Mi then δwi+ = δu i+ vi− + u i+ δvi− elseif u i+ vi+ = Mi then δwi+ = δu i+ vi+ + u i+ δvi+ endif end Similar algorithms can be deduced for the division and the scalar multiplication. Algorithm 4 (LU division). Let u = (u i− , δu i− , u i+ , δu i+ )i=0,1,...,N and v = (vi− , δvi− , vi+ , δvi+ )i=0,1,...,N be given with v > 0 or v < 0; calculate w = u/v with w = (wi− , δwi− , wi+ , δwi+ )i=0,1,...,N . For i = 0, 1, . . . , N m i = min{u i− /vi− , u i− /vi+ , u i+ /vi− , u i+ /vi+ } Mi = max{u i− /vi− , u i− /vi+ , u i+ /vi− , u i+ /vi+ } wi− = m i , wi+ = Mi if u i− /vi− = m i then δwi− = (δu i− vi− − u i− δvi− )/[vi− ]2 elseif u i− /vi+ = m i then δwi− = (δu i− vi+ − u i− δvi+ )/[vi+ ]2 elseif u i+ /vi− = m i then δwi− = (δu i+ vi− − u i+ δvi− )/[vi− ]2 elseif u i+ /vi+ = m i then δwi− = (δu i+ vi+ − u i+ δvi+ )/[vi+ ]2 endif if u i− /vi− = Mi then δwi+ = (δu i− vi− − u i− δvi− )/[vi− ]2 elseif u i− /vi+ = Mi then δwi+ = (δu i− vi+ − u i− δvi+ )/[vi+ ]2 elseif u i+ /vi− = Mi then δwi+ = (δu i+ vi− − u i+ δvi− )/[vi− ]2 elseif u i+ /vi+ = Mi then δwi+ = (δu i+ vi+ − u i+ δvi+ )/[vi+ ]2 endif end If the fuzzy numbers are given in the LR form, then the (LU)–(LR) fuzzy relationship (46) can be used as an intermediate step for LR fuzzy operations. Consider two LR fuzzy numbers u and v (N = 1 for simplicity) u LR = (u 0,L , δu 0,L , u 0,R , δu 0,R ;
u 1,L , δu 1,L , u 1,R , δu 1,R ),
vLR = (v0,L , δv0,L , v0,R , δv0,R ;
v1,L , δv1,L , v1,R , δv1,R ),
(51)
having the LU representations − + + u LU = (u − 0 , δu 0 , u 0 , δu 0 ;
− + + u− 1 , δu 1 , u 1 , δu 1 ),
(v0− , δv0− , v0+ , δv0+ ;
v1− , δv1− , v1+ , δv1+ ),
vLU =
(52)
with u i± , vi± , δu i± , and δvi± (i = 0, 1) calculated according to (46). Note that in the formulas below u and v are not constrained to have the same L(.) and R(.) shape functions and changing the slopes will change the form of the membership functions.
269
Fuzzy Numbers and Fuzzy Arithmetic
Addition is immediate: (u + v)LR =
δu 0,L δv0,L δu 0,R δv0,R u 0,L + u 0,L , , u 0,R + v0,R , ; δu 0,L + δv0,L δu 0,R + δv0,R δu 1,L δv1,L δu 1,R δv1,R , u 1,R + v1,R , . u 1,L + u 1,L , δu 1,L + δv1,L δu 1,R + δv1,R
The sum of u given by (43) and v given by (44) has LR representation (at αi = 1 use
(53) 0 0
= 0)
LR form of the addition of two LR fuzzy numbers αi
(u + v)i,L
δ(u + v)i,L
(u + v)i,R
δ(u + v)i,R
0.0 0.25 0.5 0.75 1.0
−7.0 −5.3095 −4.0280 −2.7130 0.0
0.1098 0.1816 0.2011 0.1706 0.0
7.0 5.3095 4.0280 2.7130 0.0
−0.1098 −0.1816 −0.2011 −0.1706 0.0
and, with respect to the exact addition, the absolute average error is 0.3%. The multiplication w = uv of two positive LR fuzzy numbers is given, in LU form, by − − − − − + + + + + + wLU = (u − 0 v0 , δu 0 v0 + u 0 δv0 , u 0 v0 , δu 0 v0 + u 0 δv0 ; − − − u− 1 v1 , δu 1 v1
+
− + + + + u− 1 δv1 , u 1 v1 , δu 1 v1
+
(54)
+ u+ 1 δv1 )
and back to the LR form of w, we obtain wLR =
δu 0,L δv0,L δu 0,R δv0,R , u 0,R v0,R , ; v0,L δv0,L + u 0,L δu 0,L v0,R δv0,R + u 0,R δu 0,R δu 1,L δv1,L δu 1,R δv1,R , u 1,R v1,R , . u 1,L v1,L , v1,L δv1,L + u 1,L δu 1,L v1,R δv1,R + u 1,R δu 1,R u 0,L v0,L ,
The corresponding algorithm is immediate. Algorithm 5 (LR multiplication). Let u = (u i,L , δu i,L , u i,R , δu i,R )i=0,1,...,N and v = (vi,L , δvi,L , vi,R , δvi,R )i=0,1,...,N be given LR fuzzy numbers in parametric form; calculate w = uv with w = (wi,L , δwi,L , wi,R , δwi,R )i=0,1,...,N . (If necessary, set 00 = 0.) For i = 0, 1, . . . , N m i = min{u i,L vi,L , u i,L vi,R , u i,R vi,L , u i,R vi,R } Mi = max{u i,L vi,L , u i,L vi,R , u i,R vi,L , u i,R vi,R } wi,L = m i , wi,R = Mi if u i,L vi,L = m i then δwi,L = δu i,L δvi,L /[vi,L δvi,L + u i,L δu i,L ] elseif u i,L vi,R = m i then δwi,L = δu i,L δvi,R /[vi,R δvi,R + u i,L δu i,L ] elseif u i,R vi,L = m i then δwi,L = δu i,R δvi,L /[vi,L δvi,L + u i,R δu i,R ] elseif u i,R vi,R = m i then δwi,L = δu i,R δvi,R /[vi,R δvi,R + u i,R δu i,R ] endif
270
Handbook of Granular Computing
if u i,L vi,L = Mi then δwi,R = δu i,L δvi,L /[vi,L δvi,L + u i,L δu i,L ] elseif u i,L vi,R = Mi then δwi,R = δu i,L δvi,R /[vi,R δvi,R + u i,L δu i,L ] elseif u i,R vi,L = Mi then δwi,R = δu i,R δvi,L /[vi,L δvi,L + u i,R δu i,R ] elseif u i,R vi,R = Mi then δwi,R = δu i,R δvi,R /[vi,R δvi,R + u i,R δu i,R ] endif end As pointed out by the experimentation reported in [20] and [21] the operations above are exact at the nodes αi and have very small global errors on [0, 1]. Further, it is easy to control the error by using a sufficiently fine αdecomposition and the results have shown that both the rational (30) and the mixed (31) models perform well. Some parametric membership functions in the LR framework are present in many applications and the use of nonlinear shapes is increasing. Usually, one defines a given family, e.g., linear, quadratic (see the extended study on piecewise parabolic functions in [58]), sigmoid, or quasiGaussian, and the operations are performed within the same family. Our proposed parametrization (linking directly LR and LU representations) allows an extended set of flexible fuzzy numbers and is able to approximate all other forms with acceptable small errors, with the additional advantage of producing good approximations to the results of the arithmetic operations even between LU or LR fuzzy numbers having very different original shapes.
12.6 Computation of FuzzyValued Functions Let v = f (u 1 , u 2 , . . . , u n ) denote the fuzzy extension of a continuous function f in n variables; it is well known that the fuzzy extension of f to normal uppersemicontinuous fuzzy intervals (with compact support) has the levelcutting commutative property (see [13]); i.e. the αcuts [vα− , vα+ ] of v are the images of the αcuts of (u 1 , u 2 , . . . , u n ) and are obtained by solving the boxconstrained optimization problems (EP)α :
+ vα− = min{ f (x1 , x2 , . . . , xn )  xk ∈ [u − k,α , u k,α ], k = 1, 2, . . . , n} + vα+ = max{ f (x1 , x2 , . . . , xn )  xk ∈ [u − k,α , u k,α ], k = 1, 2, . . . , n}.
(54)
Except for simple elementary cases for which the optimization problems above can be solved analytically, the direct application of (EP) is difficult and computationally expensive. Basically, the vertex method evaluates the objective function at the 2n vertices of the hyper rectangular box + − + − + Uα = [u − 1,α , u 1,α ] × [u 2,α , u 2,α ] × · · · × [u n,α , u n,α ]
and modifications have been proposed to take into account eventual internal or boundary optima (see [47, 59]) or to extend both a function and its inverse (see [60]). The transformation method, in its general or reduced or extended versions (see [51] for a recent efficient implementation), evaluates the objective function in a sufficiently large number of points in a hierarchical selection of αcuts Ui/m , with αi = i/m for i = m, m − 1, . . . , 1, 0 (including the vertices, midpoints of vertices, midpoints of midpoints, . . . ) and estimates the m + 1 αcuts of the solution [v]i/m = [vi− , vi+ ] by choosing recursively the best (min and max) values for each i = m, m − 1, . . . , 0. Recently, a sparse grids method for interval arithmetic optimization has been proposed (see [52]) to further improve the computational efficiency for general functions; the method starts with a hierarchical selection of αcuts Ui/m and constructs a linear (or polynomial) interpolation of the objective function f (.) over a grid of points (internal and on the boundary) which is sufficiently sparse (a strong selection of the possible points) and has optimal ‘covering’ properties. The optimizations are then performed by finding the min and the max of the interpolant A f (.) (in general simpler than the original function) by using adapted (global) search procedures.
Fuzzy Numbers and Fuzzy Arithmetic
271
In the following subsections, we give the details of the fuzzy extension of general (piecewise) differentiable functions by the LU representation. In all the computations we will adopt the EP method, but also if other approaches are adopted, the representation still remains valid.
12.6.1 Univariate Functions We consider first a singlevariable differentiable function f : R → R; its (EP)extension v = f (u) to a fuzzy argument u = (u − , u + ) has αcuts [v]α = [min{ f (x)  x ∈ [u]α }, max{ f (x)  x ∈ [u]α }].
(56)
+ If f is monotonic increasing, we obtain [v]α = [ f (u − α ), f (u α )], while if f is monotonic decreas− − + + − ing, [v]α = [ f (u + ), f (u )]; the LU representation of v = (v , δv i i , vi , δvi )i=0,1,...,N is obtained by the α α following.
Algorithm 6 (1dim monotonic extension). Let u = (u i− , δu i− , u i+ , δu i+ )i=0,1,...,N be given and f : supp(u) → R be differentiable monotonic; calculate v = f (u) with v = (vi− , δvi− , vi+ , δvi+ )i=0,1,...,N . For i = 0, 1, . . . , N if ( f is increasing) then vi− = f (u i− ), δvi− = f (u i− )δu i− vi+ = f (u i+ ), δvi+ = f (u i+ )δu i+ else vi− = f (u i+ ), δvi− = f (u i+ )δu i+ vi+ = f (u i− ), δvi+ = f (u i− )δu i− endif end As an example, the monotonic exponential function f (x) = exp(x) has LU fuzzy extension exp(u) = (exp(u i− ), exp(u i− ) δu i− , exp(u i+ ), exp(u i+ ) δu i+ )i=0,1,...N . In the nonmonotonic (differentiable) case, we have to solve the optimization problems in (56) for each α = αi , i = 0, 1, . . . , N ; i.e., − vi = min{ f (x)  x ∈ [u i− , u i+ ]} (EPi ): vi+ = max{ f (x)  x ∈ [u i− , u i+ ]}. The min (or the max) can occur either at a point which is coincident with one of the extremal values of [u i− , u i+ ] or at a point which is internal; in the last case, the derivative of f is null and δvi− = 0 (or δvi+ = 0). Algorithm 7 (1dim nonmonotonic extension). Let u = (u i− , δu i− , u i+ , δu i+ )i=0,1,...,N be given and f : supp(u)→ R be differentiable; calculate v = f (u) with v = (vi− , δvi− , vi+ , δvi+ )i=0,1,...,N . For i = 0, 1, . . . , N x i− = arg min{ f (x)  x ∈ [u i− , u i+ ]} solve min{ f (x)  x ∈ [u i− , u i+ ]}, let if x i− = u i− then vi− = f (u i− ), δvi− = f (u i− )δu i− elseif x i− = u i+ then vi− = f (u i+ ), δvi− = f (u i+ )δu i+ else vi− = f ( x i− ), δvi− = 0 endif
272
Handbook of Granular Computing
solve max{ f (x)  x ∈ [u i− , u i+ ]}, let x i+ = arg max{ f (x)  x ∈ [u i− , u i+ ]} if x i+ = u i− then vi+ = f (u i− ), δvi+ = f (u i− )δu i− elseif x i+ = u i+ then vi+ = f (u i+ ), δvi+ = f (u i+ )δu i+ else vi+ = f ( x i+ ), δvi+ = 0 endif end As an example of unidimensional nonmonotonic function, consider the hyperbolic cosinusoidal funcx −x . Its fuzzy extension to u can be obtained as follows: tion y = cosh(x) = e +e 2 Example 1. Calculation of fuzzy v = cosh(u). For i = 0, 1, . . . , N if u i+ ≤ 0 then vi− = cosh(u i+ ), vi+ = cosh(u i− ) δvi− = δu i+ sinh(u i+ ), δvi+ = δu i− sinh(u i− ) elseif u i− ≥ 0 then vi− = cosh(u i− ), vi+ = cosh(u i+ ) δvi− = δu i− sinh(u i− ), δvi+ = δu i+ sinh(u i+ ) else vi− = 1, δvi− = 0 if abs(u i− ) ≥ abs(u i+ ) then vi+ = cosh(u i− ), δvi+ = δu i− sinh(u i− ) else vi+ = cosh(u i+ ), δvi+ = δu i+ sinh(u i+ ) endif endif end The fuzzy extension of elementary functions by the LU fuzzy representation are documented in [44] and an application to fuzzy dynamical systems is in [61].
12.6.2 Multivariate Functions Consider now the extension of a multivariate differentiable function f : Rn → R to a vector of n fuzzy numbers u = (u 1 , u 2 , . . . , u n ) with kth component − + + u k = (u − k,i , δu k,i , u k,i , δu k,i )i=0,1,...,N for k = 1, 2, . . . , n.
Let v = f (u 1 , u 2 , . . . , u n ) and v = (vi− , δvi− , vi+ , δvi+ )i=0,1,...,N be its LU representation; the αcuts of v are obtained by solving the boxconstrained optimization problems (EP). For each α = αi , i = 0, 1, . . . , N, the min and the max (EP) can occur either at a point whose compo+ nents xk,i are internal to the corresponding intervals [u − k,i , u k,i ] or are coincident with one of the extremal − + + values; denote by x i− = ( x− , . . . , x ) and x = ( x , . . . , x+ n,i i n,i ) the points where the min and the max 1,i 1,i take place; then + x− x− x− x+ x+ x+ vi− = f ( n,i ) and vi = f ( n,i ), 1,i , 2,i , . . . , 1,i , 2,i , . . . ,
273
Fuzzy Numbers and Fuzzy Arithmetic and the slopes δvi− and δvi+ are computed (as f is differentiable) by δvi− =
n
k=1 − x− k,i = u k,i
δvi+ =
n
k=1 − x+ k,i = u k,i
n ∂ f ( x− x− n,i ) 1,i , . . . , δu − k,i + ∂ xk
∂ f ( x− x− n,i ) 1,i , . . . , δu + k,i ∂ xk
n ∂ f ( x+ x+ n,i ) 1,i , . . . , δu − k,i + ∂ xk
∂ f ( x+ x+ n,i ) 1,i , . . . , δu + k,i . ∂ xk
k=1 + x− k,i = u k,i
k=1 + x+ k,i = u k,i
(57)
If, for some reasons, the partial derivatives of f at the solution points are nonavailable (and the points are not internal), we can always produce an estimation of the shapes δvi− and δvi+ ; it is sufficient to − + calculate vΔi and vΔi corresponding to α = αi ± Δα (Δα small) and estimate the slopes by applying a least squares criterion like (32). − + + Algorithm 8 (ndim extension). Let u k = (u − k = 1, 2, . . . , n be given and k,i , δu k,i , u k,i , δu k,i )i=0,1,...,N − − + + n f : R → R be differentiable; with v = (vi , δvi , vi , δvi )i=0,1,...,N , calculate v = f (u 1 , . . . , u n ).
For i = 0, 1, . . . , N + solve min{ f (x1 , . . . , xn )  xk ∈ [u − k,i , u k,i ] ∀k} − + x− let ( x− n,i ) = arg min{ f (x 1 , . . . , x n )  x k ∈ [u k,i , u k,i ], ∀k} 1,i , . . . , − let vi− = f ( x− x− n,i ), δvi = 0 1,i , . . . ,
for k = 1, 2, . . . , n (loop to calculate δvi− ) − − − if x− k,i = u k,i then δvi = δvi +
∂ f ( x− x− n,i ) 1,i ,..., ∂ xk
+ − − elseif x− k,i = u k,i then δvi = δvi +
δu − k,i
∂ f ( x− x− n,i ) 1,i ,..., ∂ xk
δu + k,i
end + solve max{ f (x1 , . . . , xn )  xk ∈ [u − k,i , u k,i ] ∀k} − + x+ let ( x+ n,i ) = arg max{ f (x 1 , . . . , x n )  x k ∈ [u k,i , u k,i ], ∀k} 1,i , . . . , + x+ x+ let vi+ = f ( n,i ), δvi = 0 1,i , . . . ,
for k = 1, 2, . . . , n (loop to calculate δvi+ ) − + + if x+ k,i = u k,i then δvi = δvi +
∂ f ( x+ x+ n,i ) 1,i ,...,
+ + + elseif x+ k,i = u k,i then δvi = δvi +
∂ xk
δu − k,i
∂ f ( x+ x+ n,i ) 1,i ,..., ∂ xk
δu + k,i
end end The main and possibly critical steps in the above algorithm are the solution of the optimization problems (EP), depending on the dimension n of the solution space and on the possibility of many local optimal points. (If the min and the max points are not located with sufficient precision, an underestimation of the fuzziness may be produced and the propagation of the errors may grow without control.) In many applications, a careful exploitation of the min and max problems can produce efficient solution methods. An example is offered in [62] and [63] for the fuzzy finite element analysis; the authors analyze the essential sources of uncertainty and define the correct form of the objective function (so avoiding unnecessary overestimation); then they propose a safe approximation of the objective function by a quadratic interpolation scheme and use a version of the corner method to determine the optimal solutions. As we have mentioned, all existing general methods (in cases where the structure of the min and max subproblems do not suggest specific efficient procedures) try to take advantage of the nested structure of the box constraints for different values of α.
274
Handbook of Granular Computing
We suggest here a relatively simple procedure, based on the differential evolution (DE) method of Storn and Price (see [64–66]) and adapted to take into account both the nested property of αcuts and the min and max problems over the same domains. The general idea of DE to find min or max of { f (x1 , . . . , xn )  (x1 , . . . , xn ) ∈ ∀ ⊂ Rn } is simple. Start with an initial ‘population’ (x1 , . . . , xn )(1) , . . . , (x1 , . . . , xn )( p) ∈ ∀ of p feasible points; at each iteration obtain a new set of points by recombining randomly the individuals of the current population and by selecting the best generated elements to continue in the next generation. A typical recombination operates on a single component j ∈ {1, . . . , n} and has the form (see [65, 67, 68]) (t) x j = x (rj ) + γ x (s) , γ ∈]0, 1], j − xj where r, s, t ∈ {1, 2, . . . , p} are chosen randomly. The components of each individual of the current population are modified to x j by a given probability q. Typical values are γ ∈ [0.2, 0.95] and q ∈ [0.7, 1.0]. To take into account the particular mentioned nature of the problem, we modify the basic procedure: start with the (α = 1)cut back to the (α = 0)cut so that the optimal solutions at a given level can be inserted into the ‘starting’ populations of lower levels; use two distinct populations and perform the recombinations such that, during generations, one of the populations specializes to find the minimum and the other to find the maximum. + n Algorithm 9 (DE procedure). Let [u − k,i , u k,i ], k = 1, 2, . . . , n, and f : R → R be given; find, for i = 0, 1, . . . , N , + vi− = min{ f (x1 , . . . , xn )  xk ∈ [u − k,i , u k,i ] ∀k} and + vi+ = max{ f (x1 , . . . , xn )  xk ∈ [u − k,i , u k,i ] ∀k}.
Choose p ≈ 10n, gmax ≈ 200, q, and γ . Function rand(0,1) generates a random uniform number between 0 and 1. Select (x1(l) , . . . , xn(l) ),
+ xk(l) ∈ [u − k,N , u k,N ]∀k, l = 1, . . . , 2 p
let y (l) = f (x1(l) , . . . , xn(l) ) for i = N , N − 1, . . . , 0 for g = 1, 2, . . . , gmax (up to gmax generations or other stopping rule) for l = 1, 2, . . . , 2 p select (randomly) r, s, t ∈ {1, 2, . . . , 2 p} and j ∗ ∈ {1, 2, . . . , n} for j = 1, 2, . . . , n if ( j = j ∗ or random(0, 1) < q) (t) then x j = x (rj ) + γ [x (s) j − xj ]
else x j = x (l) j endif if ( x j < u −j,i ) then x j = u −j,i (lower feasibility) if ( x j > u +j,i ) then x j = u +j,i (upper feasibility) end let y = f (x1 , . . . , xn ) if l ≤ p and y < y (l) then substitute (x1 , . . . , xn )(l) with (x1 , . . . , xn ) (best min) endif
275
Fuzzy Numbers and Fuzzy Arithmetic if l > p and y > y (l) then substitute (x1 , . . . , xn )(l) with (x1 , . . . , xn ) (best max) endif end end ∗
(l x− x− vi− = y (l ) = min{y (l)  l = 1, 2, . . . , p}, ( n,i ) = (x 1 , . . . , x n ) 1,i , . . . ,
vi+ = y (l
∗∗ )
∗)
(l x+ = max{y ( p+l)  l = 1, 2, . . . , p}, ( x+ n,i ) = (x 1 , . . . , x n ) 1,i , . . . ,
∗∗ )
if i < N + select (x1(l) , . . . , xn(l) ), xk(l) ∈ [u − k,i−1 , u k,i−1 ]∀k, l = 1, . . . , 2 p
including ( x− x− x+ x+ n,i ) and ( n,i ); 1,i , . . . , 1,i , . . . , eventually reduce gmax . endif end Extended experiments of the DE procedure (and some variants) are documented in [67], where two algorithms SPDE (single population) and MPDE (multiple populations) have been implemented and executed on a set of 35 test functions with different dimension n = 2, 4, 8, 16, 32. If the extension algorithm is used in combinations with the LU fuzzy representation for differentiable membership functions (and differentiable extended functions), then the number N + 1 of αcuts (and correspondingly of min/max optimizations) can be sufficiently small. Experiments in [20] and [21] motivated that N = 10 is in general quite sufficient to obtain good approximations. The number of function evaluations FESPDE and FEMPDE needed to the two algorithms SPDE and MPDE to reach the solution of the nested min/max optimization problems corresponding to the 11 αcuts of the uniform αdecomposition αi = 10i , i = 0, 1, . . . , 10 (N = 10 subintervals), is reported in Figure 12.6. The graph (Figure 12.6) represents the logarithm of the number of function evaluations FE vs. the logarithm of the number n of arguments ln(FESPDE ) = a + b ln(n) and ln(FEMPDE ) = c + d ln(n). The estimated coefficients are a = 8.615, b = 1.20 and c = 7.869, d = 1.34. The computational complexity of the DE algorithm (on average for the 35 test problems) grows less than quadratically (FESPDE ≈ 5513.8n 1.2 and FEMPDE ≈ 2614.9n 1.34 ) with the dimension n. (SPDE is less efficient but grows slowly than MPDE.) This is an interesting result, as all the existing methods for the fuzzy extension of functions are essentially exponential in n. 16
SPDE MPDE
14
ln(FE )
12 10 8 6 4 2 0
0
0.5
1
1.5
2
2.5
3
3.5
4
ln ( n )
Figure 12.6 Number of function evaluations FE vs. number of variables n for two versions of DE algorithm for fuzzy extension of functions (n = 2, 4, 8, 16, 32)
276
Handbook of Granular Computing
12.7 Integration and Differentiation of FuzzyValued Functions Integrals and derivatives of fuzzyvalued functions have been established, among others, by Dubois and Prade [6], Kaleva ([69, 70]), and Puri and Ralescu [71]; see also [43] and [72] for recent results. We consider here a fuzzyvalued function u : [a, b] → FI , where u(t) = (u − (t), u + (t)) for t ∈ [a, b] is an LU fuzzy number of the form u(t) = (u i− (t), δu i− (t), u i+ (t), δu i+ (t))i=0,1,...,N . The integral of u(t) with respect to t ∈ [a, b] is given by ⎡ b ⎤ ⎡ b ⎤ b ⎦ [v]α := ⎣ u(t)dt ⎦ = ⎣ u − u+ α (t)dt, α (t)dt , α ∈ [0, 1] a
a
α
(58)
a
and its LU representation v = (vi− , δvi− , vi+ , δvi+ )i=0,1,...,N is simply vi± =
b
u i± (t)dt
δvi± =
and
a
b
δu i± (t)dt , i = 0, 1, . . . , N .
(59)
a
The Hderivative [71] (and the generalized derivative [43]) of u(t) at a point t0 is obtained by considering the intervals defined by the derivatives of the lower and upper branches of the αcuts ! [u (t0 )]α =
d − d (t)  t=t0 u (t)  t=t0 , u + dt α dt α
! or [u (t0 )]α =
"
" d + d u α (t)  t=t0 , u − (t)  , t=t 0 dt dt α
(60)
(61)
provided that the intervals define a correct fuzzy number for each t0 . Using the LU fuzzy representation, we obtain, in the first case (the means derivative w.r.t. t) u (t) = (u i− (t), δu i− (t), u i+ (t), δu i+ (t))i=0,1,...,N
(62)
and the conditions for a valid fuzzy derivative are, for i = 0, 1, . . . , N , − − u − 0 (t) ≤ u 1 (t) ≤ . . . ≤ u N (t) ≤ + + ≤ u + N (t) ≤ u N −1 (t) ≤ . . . ≤ u 0 (t)
δu i− (t) ≥ 0, δu i+ (t) ≤ 0. + As an example, consider the fuzzyvalued function [u(t)]α = [u − α (t), u α (t)], t ∈ [0, 2π], with
u− α (t) =
t2 (3α 2 − 2α 3 ) sin2 (t) + 40 20
u+ α (t) =
t2 (2 − 3α 2 + 2α 3 ) sin2 (t) + . 40 20
At t ∈ {0, π, 2π} the function u(t) has a crisp value.
(63)
277
Fuzzy Numbers and Fuzzy Arithmetic
The generalized fuzzy derivative exists at all the points of ]0, 2π[ and is obtained by (60) for t ∈]0, 12 π ] ∪ [π, 32 π], by (61) for t ∈ [ 12 π, π] ∪ [ 32 π, 2π[, and by both for t ∈ { 12 π, π, 32 π}. Note that at d + the points t ∈ { 12 π, π, 32 π} the derivatives dtd u − α (t) and dt u α (t) change the relative position in defining the lower and the upper branches of the generalized fuzzy derivative, given for t ∈]0, 12 π] ∪ [π, 32 π] by
t + (3α 2 − 2α 3 ) sin(t) cos(t) /10 2 t 2 3 = + 2α ) sin(t) cos(t) /10, [u (t)]+ + (2 − 3α α 2
[u (t)]− α =
and for t ∈ [ 12 π, π ] ∪ [ 32 π, 2π[ by
t + (2 − 3α 2 + 2α 3 ) sin(t) cos(t) /10 2 t 2 3 + (3α = − 2α ) sin(t) cos(t) /10. [u (t)]+ α 2 [u (t)]− α =
Observe that the two cases (60) and (61) can be formulated in a compact form by using the gener + alized Hdifference to the incremental ratio, so defining m α (t) = min{(u − α ) (t), (u α ) (t)} and Mα (t) = + max{(u − ) (t), (u ) (t)} and [u (t)] = [m (t), M (t)], α ∈ [0, 1], provided that we have a fuzzy (or a α α α α α crisp) number; i.e., ! [u (t)]α = lim
Δt→0
" [u(t + Δt)]α g [u(t)]α , Δt
(64)
provided that [u(t + Δt)]α g [u(t)]α and the limit exist. To obtain the LU fuzzy parametrization of the generalized fuzzy derivative, we need to choose a decomposition of the membership interval [0, 1] into N subintervals and define the values and the slopes as in (62). For simplicity of notation, we consider the trivial decomposition with only two points 0 = α0 < α1 = 1 (i.e., N = 1) so that the parametrization of u(t) is, for a given t, − + + − − + + u(t) = (u − 0 (t), δu 0 (t), u 0 (t), δu 0 (t); u 1 (t), δu 1 (t), u 1 (t), δu 1 (t)) 2 t2 t2 t2 1 1 1 t , 0, + sin2 (t), 0; + sin2 (t), 0, + sin2 (t), 0 . = 40 40 10 40 20 40 20 + The slopes δu i− (t) and δu i+ (t) are the derivatives of u − α (t) and u α (t) with respect to α at α = 0 and α = 1, and they are null for any t. By applying the cases (60) and (61) the values of the generalized fuzzy derivative of u at point t (the derivatives (·) are intended with respect to t) are given by the − + t t 1 t 1 correct combinations of (u − 0 ) (t) = 20 , (u 1 ) (t) = 20 + 10 sin(t) cos(t), (u 1 ) (t) = 20 + 10 sin(t) cos(t), + t 1 and (u 0 ) (t) = 20 + 5 sin(t) cos(t), i.e., depending on the sign of sin(t) cos(t); it has the following LU fuzzy parametrization:
Generalized derivative in case (60): t ∈]0, 12 π] ∪ π, 32 π t 1 t 1 t 1 t , 0, + sin(t) cos(t), 0; + sin(t) cos(t), 0, + sin(t) cos(t), 0 . u (t) = 20 20 5 20 10 20 10 Generalized derivative in case (61): t ∈ [ 12 π, π] ∪ [ 32 π, 2π [ u (t) =
1 t t 1 t 1 t + sin(t) cos(t), 0, , 0; + sin(t) cos(t), 0, + sin(t) cos(t), 0 . 20 5 20 20 10 20 10
278
Handbook of Granular Computing
12.8 Applications and Concluding Remark The characterization of uncertainty by fuzzy sets is an issue to manage information in modeling reasoning, processes, systems design and analysis, and decision making in many fields of application. In particular, as we have seen, fuzzy numbers (having the real numbers as the basic support) are special fuzzy sets whose definition can be based on intervals and can be parametrized to obtain a flexible representation modeling and for calculations. A brief indication of major trends in the applications of fuzzy numbers (intervals) is described in the concluding section of [13], which contains the essential points and references. We follow an analogous frame to sketch the very recent essential literature where fuzzy numbers and fuzzy arithmetic are used for some applications (which are near to the research interests of the authors).2 In the field of fuzzy analysis (see [73] for the basic concepts and [8, 31] for a formal treatment), the very recent literature has dedicated great attention to various areas involving fuzzy numbers and corresponding arithmetic aspects. Two problems have been particularly addressed: (i) for the solution of fuzzy algebraic equations, in particular having the matrix form Au = v with some or all elements being fuzzy (A, v, and/or u) (see [74–78]); (ii) the analysis and numerical treatment of fuzzy integral (see [79]) and differential equations are a second series of arguments addressed by a great research activity, both from a theoretical point of view (see [43, 80–82] and extensions to viability theory in [83]), with the discovery of important relations between the fuzzy and the differential inclusions settings [84, 85] and from the computational aspects (in some cases with commonly inspired algorithms, see [21, 86]). Also of interest are recent developments in the study of fuzzy chaotic dynamical systems, in the continuous or discrete (maps) time settings [22, 87], and the inclusion of randomness into fuzzy mappings or fuzziness into random mappings), see [88]. The increased knowledge of the properties of spaces of fuzzy numbers and their metric topology has produced an important spinoff on the methodology of fuzzy statistics and probability and on linear/nonlinear fuzzy regression and forecasting. The theory of statistical estimation of fuzzy relationships, in the setting of fuzzy random variables [89] and hypothesis testing [90, 91], has been improved; the used approaches are based on the linear structure of the space of fuzzy random variables, on the extension principle applied to classical estimators, or on a fuzzy least squares linear regression) estimation theory [92–95]. Also possibilistic approaches to fuzzy regression using linear programming formulations similar to Tanaka method, [96] and to statistical testing [97] have been analyzed. Most of proposed methods and algorithms are efficiently implemented [98–103] and allow many shapes of fuzzy numbers, not restricted to symmetric or to linear simplifications. In the areas of systems engineering, applications of traditional interval analysis are increasingly approached by fuzzy arithmetic with two substantial benefits in terms of a better setting for sensitivity to uncertain data and relations (due to the higher flexibility of fuzzy vs. interval uncertainty) and in terms of an increased attention (by the ‘fuzzy’ community) to the overestimation effect, associated with improper use of fuzzy interval arithmetic or to the underestimation effect, depending on the use of heuristic calculations with scarce control of approximation errors [1, 24, 104]. In applications of approximate reasoning, some new methodologies have been proposed, based on fuzzy arithmetic to define or implement fuzzy IF–THEN rules, with nonlinear shapes of the fuzzy rulebases [14, 15, 105–107]. In [107] a procedure for the estimation of a fuzzy number from data is also suggested. In recent years, following the ideas of GrC and fuzzy logic, relevant importance of fuzzy numbers and arithmetic has been associated with the universal approximation of soft computing systems [108] with applications to knowledge discovery [109] and data mining, to fuzzy systems and machine learning techniques [110], to fuzzy systems simulation, to expert systems [111], and, very recently, to fuzzy, geographic information systems (GIS) and spatial analysis [112–114].
2
The references in this section are very essential and nonexaustive of the current work. We indicate some of the more recent results and publications; the interested reader is adressed to the references therein.
Fuzzy Numbers and Fuzzy Arithmetic
279
In the various fields of operations research, recent work is addressing the inclusion of fuzzy methodologies for wellanalyzed problems, e.g., scheduling [115, 116], assignment, location, graphbased optimization [117–119], network analysis, distribution and supply chain management [120, 121], vehicle routing [122], and for the fuzzy linear or nonlinear programming problem [36, 123–126]. It is also increasing the number and quality of connections between fuzzy concepts and evolutionary methods for solving hard computational problems in global optimization [53, 127–129], integer programming, combinatorial optimization, or multiple criteria optimization. Finally (and we omit many others) an emerging field of application of fuzzy tools is in Economics, covering fuzzy game theory [130–136], fuzzy preferences and decision making [137, 139], fuzzy Pareto optimality, and fuzzy DEA; in Business, with emphasis on knowledge intelligent support systems, project evaluation and investment decisions [139, 140]; in Finance, for financial pricing [141], fuzzy stochastic differential equations and option valuation [142–145], portfolio selection [146], and trading strategies. It appears that the use of fuzzy numbers, possibly of general and flexible shape, and the search for sufficiently precise arithmetic algorithms is one of the current research fields of general attention by the scientific community and the number of applications is high and increasing.
References [1] H. Bandemer. Mathematics of Uncertainty: Ideas, Methods, Application Problems. Springer, New York, 2006. [2] L.A. Zadeh. Some reflections on soft computing, granular computing, and their roles in the computation, design and utilization of information/intelligent systems. Soft Comput. 2 (1998) 23–25. [3] A. Bargiela and W. Pedrycz. Granular Computing: An Introduction, Kluwer, Dordrecht, 2003. [4] L.A. Zadeh. Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 90 (1997) 111–127. [5] L.A. Zadeh. Towards a generalized theory of uncertainty (GTU): an outline. Inf. Sci. 172 (2005) 1–40. [6] D. Dubois and H. Prade. Towards fuzzy differential calculus. Fuzzy Sets Syst. 8 (1982) 1–17(I), 105–116(II), 225–234(III). [7] R. Goetschel and W. Voxman. Elementary fuzzy calculus. Fuzzy Sets Syst. 18 (1986) 31–43. [8] P. Diamond and P. Kl¨oden. Metric Spaces of Fuzzy Sets. World Scientific, Singapore, 1994. [9] D. Dubois and H. Prade. Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York, 1980. [10] D. Dubois and H. Prade. Ranking fuzzy numbers in a setting of possibility theory. Inf. Sci. 30 (1983) 183–224. [11] D. Dubois and H. Prade. Possibility Theory. An Approach to Computerized Processing of Uncertainty. Plenum Press, New York, 1988. [12] D. Dubois and H. Prade (eds). Fundamentals of Fuzzy Sets, The Handbooks of Fuzzy Sets Series. Kluwer, Boston, 2000. [13] D. Dubois, E. Kerre, R. Mesiar, and H. Prade. Fuzzy interval analysis. In: D. Dubois and H. Prade (eds), Fundamentals of Fuzzy Sets, The Handbooks of Fuzzy Sets Series. Kluwer, Boston, 2000, pp. 483–581. [14] Y. Xu, E.E. Kerre, D. Ruan, and Z. Song. Fuzzy reasoning based on the extension principle. Int. J. Intell. Syst. 16 (2001) 469–495. [15] Y. Xu, J. Liu, D. Ruan, and W. Li. Fuzzy reasoning based on generalized fuzzy ifthen rules. Int. J. Intell. Syst. 17 (2002) 977–1006. [16] G.J. Klir. Fuzzy arithmetic with requisite constraints. Fuzzy Sets Syst. 91 (1997) 165–175. [17] G.J. Klir. Uncertainty Analysis in Engineering and Science, Kluwer, Dordrecht, 1997. [18] G.J. Klir and Y. Pan. Constrained fuzzy arithmetic, basic questions and some answers. Soft Comput. 2 (1998) 100–108. [19] G.J. Klir and B. Yuan. Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, Englewood Cliffs, NJ, 1995. [20] M.L. Guerra and L. Stefanini. Approximate fuzzy arithmetic operations using monotonic interpolations. Fuzzy Sets Syst. 150 (2005) 5–33. [21] L. Stefanini, L. Sorini, and M.L. Guerra. Parametric representation of fuzzy numbers and application to fuzzy calculus. Fuzzy Sets Syst. 157 (2006) 2423–2455. [22] L. Stefanini, L. Sorini, and M.L. Guerra. Simulation of fuzzy dynamical systems using the LUrepresentation of fuzzy numbers. Chaos Solitons Fractals 29(3) (2006) 638–652. [23] A. Kaufmann and M.M. Gupta. Introduction to Fuzzy Arithmetic – Theory and Applications. Van Nostrand Reinhold, New York, 1985. [24] H.J. Zimmermann. Fuzzy Set Theory and Its Applications, 4th edn. Kluwer, Dordrecht, 2001.
280
Handbook of Granular Computing
[25] S. Abbasbandy and M. Amirfakhrian. The nearest trapezoidal form of a generalized left right fuzzy number. Int. J. Approx. Reason. 43 (2006) 166–178. [26] P. Grzegorzewski. Nearest interval approximation of a fuzzy number. Fuzzy Sets Syst. 130 (2002) 321–330. [27] P. Grzegorzewski and E. Mrowka. Trapezoidal approximations of fuzzy numbers. Fuzzy Sets Syst. 153 (2005) 115–135. Revisited in Fuzzy Sets Syst. 158 (2007) 757–768. [28] M. Oussalah. On the compatibility between defuzzification and fuzzy arithmetic operations. Fuzzy Sets Syst. 128 (2002) 247–260. [29] M. Oussalah and J. De Schutter. Approximated fuzzy LR computation. Inf. Sci. 153 (2003) 155–175. [30] R.R. Yager. On the lack of inverses in fuzzy arithmetic. Fuzzy Sets Syst. 4 (1980) 73–82. [31] P. Diamond and P. Kl¨oden. Metric Topology of fuzzy numbers and fuzzy analysis. In: D. Dubois and H. Prade (eds), Fundamentals of Fuzzy Sets, The Handbooks of Fuzzy Sets Series. Kluwer, Boston, 2000, pp. 583–641. [32] L.A. Zadeh. Concept of a linguistic variable and its application to approximate reasoning. I. Inf. Sci. 8 (1975) 199–249. [33] L.A. Zadeh. Concept of a linguistic variable and its application to approximate reasoning. II. Inf. Sci. 8 (1975) 301–357. [34] L.A. Zadeh. Concept of a linguistic variable and its application to approximate reasoning, III. Inf. Sci. 9 (1975) 43–80. [35] R. Fuller and P. Majlender. On interactive fuzzy numbers. Fuzzy Sets Syst. 143 (2004) 355–369; [36] M. Inuiguchi, J. Ramik, and T. Tanino. Oblique fuzzy vectors and their use in possibilistic linear programming. Fuzzy Sets Syst. 135 (2003) 123–150. [37] S. Heilpern. Representation and application of fuzzy numbers. Fuzzy Sets Syst. 91 (1997) 259–268. [38] C.T. Yeh. A note on trapezoidal approximations of fuzzy numbers. Fuzzy Sets Syst. 158 (2007) 747–754. [39] L.A. Zadeh. Fuzzy Sets. Inf. Control 8 (1965) 338–353. [40] M.L. Guerra and L. Stefanini. On fuzzy arithmetic operations: some properties and distributive approximations. Int. J. Appl. Math. 19 (2006) 171–199. Extended version in Working Paper Series EMS, University of Urbino, 2005, available online by the RePEc Project (www.repec.org). [41] D. Ruiz and J. Torrens. Distributivity and conditional distributivity of a uniform and a continuous tconorm. IEEE Trans. Fuzzy Syst. 14 (2006) 180–190. [42] B. BouchonMeunier, O. Kosheleva, V. Kreinovich, and H.T. Nguyen. Fuzzy numbers are the only fuzzy sets that keep invertible operations invertible. Fuzzy Sets Syst. 91 (1997) 155–163. [43] B. Bede and S.G. Gal. Generalizations of the differentiability of fuzzy number valued functions with applications to fuzzy differential equations. Fuzzy Sets Syst. 151 (2005) 581–599. [44] L. Sorini and L. Stefanini. An LUFuzzy Calculator for the Basic Fuzzy Calculus. Working Paper Series EMS No. 101. University of Urbino, Urbino, Italy, 2005. Revised and extended version available online by the RePEc Project (www.repec.org). [45] H.K. Chen, W.K. Hsu, and W.L. Chiang. A comparison of vertex method with JHE method. Fuzzy Sets Syst. 95 (1998) 201–214. [46] W.M. Dong and H.C. Shah. Vertex method for computing functions of fuzzy variables. Fuzzy Sets Syst. 24 (1987) 65–78. [47] E.N. Otto, A.D. Lewis, and E.K. Antonsson. Approximating αcuts with the vertex method. Fuzzy Sets Syst. 55 (1993) 43–50. [48] W.M. Dong and F.S. Wong. Fuzzy weighted averages and implementation of the extension principle. Fuzzy Sets Syst. 21 (1987) 183–199. [49] M. Hanss. The transformation method for the simulation and analysis of systems with uncertain parameters. Fuzzy Sets Syst. 130 (2002) 277–289. [50] M. Hanss and A. Klimke. On the reliability of the influence measure in the transformation method of fuzzy arithmetic. Fuzzy Sets Syst. 143 (2004) 371–390. [51] A. Klimke. An Efficient Implementation of the Transformation Method of Fuzzy Arithmetic. Extended Preprint Report, 2003/009. Institute of Applied Analysis and Numerical Simulation, University of Stuttgard, Germany, 2003. [52] A. Klimke and B. Wohlmuth. Computing expensive multivariate functions of fuzzy numbers using sparse grids. Fuzzy Sets Syst. 153 (2005) 432–453. [53] W.A. Lodwick and K.D. Jamison. Interval methods and fuzzy optimization. Int. J. of Uncertain., Fuzziness and Knowl.Based Reason. 5 (1997) 239–250. [54] R. Moore and W.A. Lodwick. Interval analysis and fuzzy set theory. Fuzzy Sets Syst. 135 (2003) 5–9. [55] R.B. Kearfott and V. Kreinovich (eds). Applications of Interval Analysis. Kluwer, Dordrecht, 1996.
Fuzzy Numbers and Fuzzy Arithmetic
281
[56] M. Navara and Z. Zabokrtsky. How to make constrained fuzzy arithmetic efficient. Soft Comput. 6 (2001) 412–417. [57] P.T. Chang and K.C. Hung. αcut fuzzy arithmetic: simplifying rules and a fuzzy function optimization with a decision variable. IEEE Trans. Fuzzy Syst. 14 (2006) 496–510. [58] R. Hassine, F. Karray, A.M. Alimi, and M. Selmi. Approximation properties of piecewise parabolic functions fuzzy logic systems. Fuzzy Sets Syst. 157 (2006) 501–515. [59] K.L. Wood, K.N. Otto, and E.K. Antonsson. Engineering design calculations with fuzzy parameters. Fuzzy Sets Syst. 52 (1992) 1–20. [60] O.G. Duarte, M. Delgado, and I. Requena. Algorithms to extend crisp functions and their inverse functions to fuzzy numbers. Int. J. Intell. Syst. 18 (2003) 855–876. [61] L. Stefanini, L. Sorini, and M.L. Guerra. A parametrization of fuzzy numbers for fuzzy calculus and application to the fuzzy BlackScholes option pricing. In: Proceedings of the 2006 IEEE International Conference on Fuzzy Systems, Vancouver, Canada, 2006, pp. 587–594. Extended version Working Paper Series EMS, No. 106. University of Urbino, Urbino, Italy, 2006. [62] M. De Munck, D. Moens, W. Desmet, and D. Vandepitte. An automated procedure for interval and fuzzy finite element analysis. Proceedings of ISMA, Leuven, Belgium, September, 20–22, 2004, pp. 3023–3033. [63] D. Moens and D. Vandepitte. Fuzzy finite element method for frequency response function analysis of uncertain structures. AIAA J. 40 (2002) 126–136. [64] K. Price. An introduction to differential evolution. In: D. Corne, M. Dorigo and F. Glover (eds). New Ideas in Optimization. McGrawHill, New York, 1999, pp. 79–108. [65] R. Storn and K. Price. Differential Evolution: A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. ICSI Technical Report TR95–012. Berkeley University, Berkeley, CA, 1995. Also. J. Glob. Optim. 11 (1997) 341–359. [66] R. Storn. System design by constraint adaptation and differential evolution. IEEE Trans. Evol. Comput. 3 (1999) 22–34. [67] L. Stefanini. Differential Evolution Methods for Arithmetic with Fuzzy Numbers. Working Paper Series EMS No. 104. University of Urbino, Urbino, Italy, 2006. Available online by the RePEc Project (www.repec.org). [68] M.M. Ali and A. T¨orn. Population setbased global optimization algorithms and some modifications and numerical studies. Comput. Oper. Res. 31 (2004) 1703–1725. [69] O. Kaleva. Fuzzy differential equations. Fuzzy Sets Syst. 24 (1987) 301–317. [70] O. Kaleva. The calculus of fuzzy valued functions. Appl. Math. Lett. 3 (1990) 55–59. [71] M. Puri and D. Ralescu. Differentials of fuzzy functions. J. Math. Anal. Appl. 91 (1983) 552–558. [72] Y.L. Kim and B.M. Ghil. Integrals of fuzzynumbervalued functions. Fuzzy Sets Syst. 86 (1997) 213–222. [73] J.J. Buckley and A. Yan. Fuzzy functional analysis (I): basic concepts. Fuzzy Sets Syst. 115 (2000) 393–402. [74] T. Allahviranloo. The Adomian decomposition method for fuzzy systems of linear equations. Appl. Math. Comput. 163 (2005) 553–563. [75] T. Allahviranloo. Successive over relaxation iterative method for fuzzy system of linear equations. Appl. Math. Comput. 162 (2005) 189–196. [76] B. Asady, S. Abbasbandy, and M. Alavi. Fuzzy general linear systems. Appl. Math. Comput. 169 (2005) 34–40. [77] A. Vroman, G. Deschrijver, and E.E. Kerre. Solving systems of linear fuzzy equations by parametric functions; an improved algorithm. Fuzzy Sets Syst. 158 (2007) 1515–1534. [78] S.M. Wang, S.C. Fang, and H.L.W. Nuttle. Solution sets of interval valued fuzzy relational equations. Fuzzy Optim. Decis. Mak. 2 (2003) 41–60. [79] P. Diamond. Theory and applications of fuzzy Volterra integral equations. IEEE Trans. Fuzzy Syst. 10 (2002) 97–102. [80] D.N. Georgiou, J.J. Nieto, and R. RodriguezLopez. Initial value problems for higher order fuzzy differential equations. Nonlinear Anal. 63 (2005) 587–600. [81] J.J. Nieto. The Cauchy problem for continuous fuzzy differential equations. Fuzzy Sets Syst. 102 (1999) 259– 262. [82] V. Laksmikantham. Set differential equations versus fuzzy differential equations. Appl. Math. Comput. 164 (2005) 277–294. [83] R.P. Agarwal, D. O’Regan, and V. Lakshmikantham. Viability theory and fuzzy differential equations. Fuzzy Sets Syst. 151 (2005) 563–580. [84] T.G. Bhaskar, V. Lakshmikantham, and V. Devi. Revisiting fuzzy differential equations. Nonlinear Anal. 58 (2004) 351–358. [85] V. Laksmikantham and A.A. Tolstonogov. Existence and interrelation between set and fuzzy differential equations. Nonlinear Anal. 55 (2003) 255–268.
282
Handbook of Granular Computing
[86] K.R. Jackson and N.S. Nedialkov. Some recent advances in validated methods for IVPs and ODEs. Appl. Numer. Math. 42 (2002) 269–284. [87] S.M. Pederson. Fuzzy homoclinic orbits and commuting fuzzifications. Fuzzy Sets Syst. 155 (2005) 361–371. [88] R. Ahmad and F.F. Bazan. An interactive algorithm for random generalized nonlinear mixed variational inclusions for random fuzzy mappings. Appl. Math. Comput. 167 (2005) 1400–1411. [89] V. Kr¨atschmer. A unified approach to fuzzy random variables. Fuzzy Sets Syst. 123 (2001) 1–9. [90] W. N¨ather. Random fuzzy variables of second order and applications to statistical inference. Inf. Sci. 133 (2001) 69–88. [91] H.C. Wu. Statistical hypothesis testing for fuzzy data. Inf. Sci. 175 (2005) 30–56. [92] R. Alex. A new kind of fuzzy regression modeling and its combination with fuzzy inference. Soft Comput. 10 (2006) 618–622. [93] V. Kr¨atschmer. Strong consistency of least squares estimation in linear regression model with vague concepts. J. Multivariate Anal. 97 (2006) 633–654. [94] V. Kr¨atschmer. Least squares estimation in linear regression models with vague concepts. Fuzzy Sets Syst. 157 (2006) 2579–2592. [95] A. W¨unsche and W N¨ather. Least squares fuzzy regression with fuzzy random variable. Fuzzy Sets Syst. 130 (2002) 43–50. [96] M. Modarres, E. Nasrabadi, and M.M. Nasrabadi. Fuzzy linear regression models with least squares errors. Appl. Math. Comput. 163 (2005) 977–989. [97] O. Hryniewicz. Possibilistic decisions and fuzzy statistical tests. Fuzzy Sets Syst. 157 (2006) 2665–2673. [98] P. D’Urso. Linear regression analysis for fuzzy/crisp input and fuzzy/crisp output data. Comput. Stat. Data Anal. 42 (2003) 47–72. [99] P. D’Urso and A. Santoro. Goodness of fit and variable selection in the fuzzy multiple linear regression. Fuzzy Sets Syst. 157 (2006) 2627–2647. [100] M. Hojati, C.R. Bector, and K. Smimou. A simple method for computation of fuzzy linear regression. Eur. J. Oper. Res. 166 (2005) 172–184. [101] D.H. Hong, J.K. Song, and H.Y. Do. Fuzzy least squares linear regression analysis using shape presenving operations. Inf. Sci. 138 (2001) 185–193. [102] C. Kao and C.L. Chyu. Least squares estimates in fuzzy regression analysis. Eur. J. Oper. Res. 148 (2003) 426–435. [103] H.K. Yu. A refined fuzzy time series model for forecasting. Physica A 346 (2005) 347–351. [104] O. Wolkenhauser. Data Engineering: Fuzzy Mathematics in Systems Theory and Data Analysis. Wiley, New York, 2001. [105] M. Delgado, O. Duarte, and I. Requena. An arithmetic approach for the computing with words paradigm. Int. J. Intell. Syst. 21 (2006) 121–142. [106] V.G. Kaburlasos. FINs: lattice theoretic tools for improving prediction of sugar production from populations of measurement. IEEE Trans. Syst. Man, Cybern. B 34 (2004) 1017–1030. [107] V.G. Kaburlasos and A. Kehagias. Novel fuzzy inference system (FIS) analysis and design based on lattice theory. IEEE Trans. Fuzzy Syst. 15 (2007) 243–260. [108] S. Wang and H. Lu. Fuzzy system and CMAC network with Bspline membership/basis functions are smooth approximators. Soft Comput. 7 (2003) 566–573. [109] Y.Q. Zhang. Constructive granular systems with universal approximation and fast knowledge discovery. IEEE Trans. Fuzzy Syst. 13 (2005) 48–57. [110] H.K. Lam, S.H. Ling, F.H.F. Leung, and P.K.S. Tam. Function estimation using a neural fuzzy network and an improved genetic algorithm. Int. J. Approx. Reason. 36 (2004) 243–260. [111] S.H. Liao. Expert system methodologies and applications: a decade review from 1995 to 2004. Expert Syst. Appl. 28 (2005) 93–103. [112] G. Bordogna, S. Chiesa, and D. Geneletti. Linguistic modeling of imperfect spatial information as a basis for simplifying spatial analysis. Inf. Sci. 176 (2006) 366–389. [113] Y. Li and S. Li. A fuzzy set theoretic approach to approximate spatial reasoning. IEEE Trans. Fuzzy Syst. 13 (2005) 745–754. [114] S.Q. Ma, J. Feng, and H.H. Cao. Fuzzy model of regional economic competitiveness in Gis spatial analysis: case study of Gansu. Western China. Fuzzy Optim. Deci. Mak. 5 (2006) 99–112. [115] W. Herroelen and R. Leus. Project scheduling under uncertainty: survey and research potentials. Eur. J. Oper. Res. 165 (2005) 289–306. [116] S. Petrovic and X.Y. Song. A new approach to two machine flow shop problem with uncertain processing times. Optim. Eng. 7 (2006) 329–342. [117] S. Mu˜noz, M.T. Otu˜no, J. Ramirez, and J. Ya˜nez. Coloring fuzzy graphs. Omega 33 (2005) 211–221.
Fuzzy Numbers and Fuzzy Arithmetic
283
[118] T. Savsek, M. Vezjah, and N. Pavesic. Fuzzy trees in decision support systems. Eur. J. Oper. Res. 174 (2006) 293–310. [119] A. Sengupta and T.K. Pal. Solving the shortest path problem with interval arcs. Fuzzy Optim. Decis. Mak. 5 (2006) 71–89. [120] R. Alex. Fuzzy point estimation and its application on fuzzy supply chain analysis. Fuzzy Sets Syst. 158 (2007) 1571–1587. [121] J. Wang and Y.F. Shu. Fuzzy decision modeling for supply chain management. Fuzzy Sets Syst. 150 (2005) 107–127. [122] E.E. Ammar and E.A. Youness. Study of multiobjective transportation problem with fuzzy numbers. Appl. Math. Comput. 166 (2005) 241–253. [123] G. Facchinetti, S. Giove, and N. Pacchiarotti. Optimisation of a nonlinear fuzzy function. Soft Comput. 6 (2002) 476–480. [124] K. Ganesan and P. Veeramani. Fuzzy linear programs with trapezoidal fuzzy numbers. Ann. Oper. Res. 143 (2006) 305–315. [125] F.F. Guo and Z.Q. Xia. An algorithm for solving optimization problems with one linear objective function and finitely many constraints of fuzzy relation inequalities. Fuzzy Optim. Decis. Mak. 5 (2006) 33–48. [126] J. Ramik. Duality in fuzzy linear programming: some new concepts and results. Fuzzy Optim. Decis. Mak. 4 (2005) 25–40. [127] J. Alami, A. El Imrani, and A. Bouroumi. A multipopulation cultural algorithm using fuzzy clustering. Appl. Soft Comput. 7 (2007) 506–519. [128] J. Liu and J. Lampinen. A fuzzy adaptive differential evolution algorithm. Soft Comput. 9 (2005) 448–462. [129] W.A. Lodwick and K.A. Bachman. Solving large scale fuzzy and possibilistic optimization problems. Fuzzy Optim. Deci. Mak. 4 (2005) 257–278. [130] B. Arfi. Linguistic fuzzy logic game theory. J. Confl. Resolut. 50 (2006) 28–57. [131] D. Butnariu. Fuzzy games: a description of the concept. Fuzzy Sets Syst. 1 (1978) 181–192. [132] D. Garagic and J.B. Cruz. An approach to fuzzy noncooperative Nash games. J. Optim. Theory Appl. 118 (2003) 475–491. [133] M. Mares. Fuzzy Cooperative Games: Cooperation with Vague Expectations (Studies in Fuzziness and Soft Computing), PhysicaVerlag, Heidelberg, 2001. [134] M. Mares. On the possibilities of fuzzification of the solution in fuzzy cooperative games. Mathw. Soft Comput. IX (2002) 123–127. [135] Q. Song and A. Kandel. A fuzzy approach to strategic games. IEEE Trans. Fuzzy Syst. 7 (1999) 634–642. [136] L. Xie and M. Grabisch. The core of bicapacities and bipolar games. Fuzzy Sets Syst. 158 (2007) 1000–1012. [137] B. Matarazzo and G. Munda. New approaches for the comparison of LR numbers: a theoretical and operational analysis. Fuzzy Sets Syst. 118 (2001) 407–418. [138] R.R. Yager. Perception based granular probabilities in risk modeling and decision making. IEEE Trans. Fuzzy Syst. 14 (2006) 329–339. [139] E.E. Ammar and H.A. Khalifa. Characterization of optimal solutions of uncertainty investment problems. Appl. Math. Comput. 160 (2005) 111–124. [140] C. Kahraman, D. Ruan, and E. Tolga. Capital budgeting techniques using discounted fuzzy versus probabilistic cash flows. Inf. Sci. 142 (2002) 57–76. [141] J. de A. Sanchez and A.T. Gomez. Estimating a term structure of interest rates for fuzzy financial pricing by using fuzzy regression methods. Fuzzy Sets Syst. 139 (2003) 313–331. [142] S. Li and A. Ren. Representation theorems, set valued and fuzzy set valued Ito integral. Fuzzy Sets Syst. 158 (2007) 949–962. [143] I. Skrjanc, S. Blazie, and O. Agamennoni. Interval fuzzy modeling applied to Wiener models with uncertainties. IEEE Trans. Syst. Man Cybernet. B 35 (2005) 1092–1095. [144] Y. Yoshida, M. Yasuda, J.I. Nakagami, and M. Kurano. A discrete time american put option model with fuzziness of stock process. Fuzzy Optim. Decis. Mak. 4 (2005) 191