Applied Programming/Printable version

Applied Programming

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/Applied_Programming

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Variables

What are variables?

A variable is a named piece of computer memory, containing some information inside. Think of a variable as a box with a name, where we can "store" something. We create, edit, and delete variables, as much as we need in our tasks.[1]

In the following example, we create a variable with the identifier "my_variable" and store the number 13 within it. We then print out "my_variable" and receive the number 13 in return.

```my_variable = 13
print(my_variable)
''>13''
```

How are they used?

Variables are useful when you need to store, modify, or call information during the execution of programs. In essence, variables are the lifeblood of computer programming because they can store inputs and computational results. They allow for more flexibility in design and operation.

Types of variables

Local - A variable whose value is defined inside of a single function, and therefore is not accessible by other functions in the program.

Global - A variable whose value is defined outside of the functions, and therefore is accessible by all functions in the program.

Dynamic - A variable whose memory location is determined when the program runs.

Static - A variable whose memory is reserved for it during compilation.

Constant - A variable? While there are similarities between the two, a constant is markedly different in that its value never changes.

Data types

String - multiple characters that can include letters, numbers or special characters.

Integer - a whole number (no decimal or fraction)

Float - a number with or without a decimal; 32 bit precision (or accurate to approximately 8 decimal places)

Double - a number with or without a decimal; 64 bit precision (or accurate to approximately 16 decimal places)

Boolean - data type who’s value can only be true or false

Character - data type consisting of a single Unicode character

Mixed - any combination of the above data types

Declaring and initializing variables

The standard format for declaring and initializing a variable has the name identifier on the left and the value on the right. When you initialize the variable, you normally use the = symbol. It goes in between the name identifier and the value. Single whitespaces are used to separate the elements within the variable's declaration/initialization. This is done to increase code readability and understandability.[2] Here are some examples:

C++, C#, and Java

```int veggies = 23;
```

Python

``` veggies = int(23)
```

PHP

``` \$veggies = 23;
```

It is not always a requirement to initialize or give a value to a variable during the declaration. There may be circumstances when it is preferable to declare a variable without initializing it; perhaps the variable will hold a lot of data and it is wiser to conserve memory resources until necessary. An example in C++:

```int veggies;
```

Different programming languages have their own naming conventions, some of these overlap. Here are some universal tips to keep in mind when creating variables:[3]

• It is important to use descriptive names for variables.
• Variable names must begin with a letter, underscore, or non-number character.
• Do not use a comma for decimal numbers, only the period or point.
• Every language has reserved words that cannot be used in variable names, for example, Date.

Functions

Functions are structuring elements that simultaneously group and isolate portions of code. Within functions, variables are used for inputting and outputting data. If a function does not specify it, any calculations stored within variables will be lost if the function does not return the value upon conclusion. This is called the return statement. To do this in Python, the last line of the function definition simply states:

```return something
```

Subsequent functions can use these return values if stated so within their definition by using a set of parenthesis, known as parameters. The variable holding the return values gets stated within the parameters. Like so:

```def subsequent_function(something):
```

The return values or arguments stored within the variable 'something' can now be used by 'subsequent_function'. Functions are not required to have return values.

The timing or location of a variable's declaration/initialization within a program affect its operation. This applies to the concept of functional scope. For instance, a variable that is declared/initialized outside of the program's function sets becomes a global variable. All of the functions within a program will have access to use the global variables. Here's some food for thought:

```sofritas = 1
guacamole = 1

def make_burrito():
tortilla = 1
veggies = 1
rice = 2
beans = 2
burrito = guacamole + veggies + sofritas + rice + beans + tortilla
return burrito

def make_taco():
shell = 1
lettuce = 1
tomatoes = 1
taco = guacamole + tomatoes + lettuce + sofritas + shell
return taco

def make_dinner():
burrito = make_burrito()
taco = make_taco()

make_dinner()
```

In this Python example, the chef does not have access to lettuce and tomatoes when making the burrito. Likewise, the chef can't use veggies, rice, and beans for making the taco. These are examples of variables with local scope. The sofritas and guacamole represent global variables; the chef has access to both when making the burrito and the taco.

Activities

1. Create a variable named carname and assign the value Volvo to it.

2. Create a variable named x and assign the value 50 to it.

3. Display the sum of 5 + 10, using two variables: x and y.

4. Create a variable called z, assign x + y to it, and display the result.

5. Remove the illegal characters in the variable name:

```    2my-first_name = "John"
```

6. Insert the correct syntax in the boxes to assign the same value to all three variables in one code line.

```    x □ y □ z □ "Orange"
```

7. Insert the correct keyword in the box to make the variable x belong to the global scope.

```    def myfunc():
□ x
x = "fantastic"
```

[4]

Key Terms

Arguments - The actual input expression passed/supplied to a function, procedure, or routine in the invocation/call statement.[5]

Array - A list or group of similar types of data values that are grouped.[6]

Assignment - Sets the value saved in the storage location denoted by a given variable name.[7]

Boolean - A data type having two values, typically denoted true and false.[8]

Constant - A value that cannot be altered by the program during normal execution.[9]

Data type - A classification of data which tells the compiler or interpreter how the programmer intends to use the data.[10]

Declaration - A language construct that specifies the properties of a given identifier.[11]

Double - A computer number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.[12]

Dynamic variable - A variable whose address is determined when the program is run. [13]

Expression - A combination of one or more explicit values, constants, variables, operators, and functions that a programming language interprets and computes to produce another value.[14]

Floating point - The formulaic representation that approximates a real number to a fixed amount of significant digits.[15]

Function / Subroutine - A sequence of program instructions that performs a specific task, packaged as a unit.[16]

Global variable - A variable with global scope, meaning that it is visible (hence accessible) throughout the program, unless shadowed.[17]

Identifier - Tokens (also called symbols) which name the language entities (variables, types, labels, subroutines, and packages).[18]

Initialize - The assignment of an initial value for a data object or variable.[19]

Integer - A number that can be written without a fractional component.[20]

Local variable - A variable that is given local scope.[21]

Modulus - The remainder after division of one number by another.[22]

Naming conventions - A set of rules for choosing the character sequence to be used for identifiers which denote variables, types, functions, and other entities in source code and documentation.[23]

Objects - A combination of related variables, constants and other data structures which can be selected and manipulated together.[24]

Operator - A programming language construct that performs a calculation from zero or more input values to an output value.[25]

Order of operations - A collection of rules that reflect conventions about which procedures to perform first in order to evaluate a given mathematical expression.[26]

Parameters - A special kind of variable used in a subroutine to refer to one of the pieces of data (values) provided as input to the subroutine.[27]

Pointer - A variable that contains the address of a location in the memory. The location is the commencing point of an object, such as an element of the array or an integer.[28]

Program - As an organized collection of instructions, which when executed perform a specific task or function. It is processed by the central processing unit (CPU) of the computer before it is executed.[29]

Real number - A value that represents a quantity along a line, including integers, fractions, and irrational numbers.[30]

Reserved word - A word that cannot be used as an identifier, such as the name of a variable, function, or label – it is "reserved from use".[31]

Return statement - Code that causes execution to leave the current subroutine and resume at the point in the code immediately after the instruction which called the subroutine.[32]

Scope - The part of a program where the name binding is valid, that is where the name can be used to refer to the entity.[33]

Statement - The smallest standalone element of an imperative programming language that expresses some action to be carried out.[34]

Static variable - A variable that has been allocated "statically", meaning that its lifetime (or "extent") is the entire run of the program.[35]

String - A sequence of characters, either as a literal constant or as some kind of variable.[36]

Subroutine - A sequence of program instructions that performs a specific task, packaged as a unit.[16]

Variable - A storage location paired with an associated symbolic name (an identifier), which contains some known or unknown quantity of information referred to as a value.[37]

Whitespace character - A character that does not correspond to a visible mark, but typically does occupy an area on a page.[38]

References

1. https://en.wikiversity.org/wiki/Types_and_variables
3. https://www.kidscodecs.com/variables/
4. https://www.w3schools.com/python/python_variables.asp
5. Wikipedia: Parameter_(computer_programming)
6. "Top Programming Terms and Definitions for Beginners [Updated]". Hackr.io. Retrieved 2021-01-22.
7. Wikipedia: Type safety
8. Wikipedia: Boolean data type
9. Wikipedia: Constant (computer programming)
10. Wikipedia: Data type
11. Wikipedia: Declaration (computer programming)
12. Wikipedia: Double-precision_floating-point_format
13. "Dynamic Variable". webopedia.com. Retrieved 2021-02-04.
14. Wikipedia: Expression (computer science)
15. Wikipedia: Floating point
16. a b Wikipedia: Subroutine
17. Wikipedia: Global_variable
18. Wikipedia: Identifier_(computer_languages)
19. Wikipedia: Initialization_(programming)
20. Wikipedia: Integer
21. Wikipedia: Local_variable
22. Wikipedia: Modulo operation
23. Wikipedia: Naming_convention_(programming)
24. "Top Programming Terms and Definitions for Beginners [Updated]". Hackr.io. Retrieved 2021-01-22.
25. Wikipedia: Operation (mathematics)
26. Wikipedia: Order of operations
27. Wikipedia: Parameter_(computer_programming)
28. "Top Programming Terms and Definitions for Beginners [Updated]". Hackr.io. Retrieved 2021-01-22.
29. "Top Programming Terms and Definitions for Beginners [Updated]". Hackr.io. Retrieved 2021-01-22.
30. Wikipedia: Real number
31. Wikipedia: Reserved_word
32. Wikipedia: Return_statement
33. Wikipedia: Scope_(computer_science)
34. Wikipedia: Statement (computer science)
35. Wikipedia: Static_variable
36. Wikipedia: String (computer science)
37. Wikipedia: Variable (computer science)
38. Wikipedia: Whitespace_character

Functions

Overview

A function is a structuring element grouping statements together, often utilized in a computer program more than once to repeat a task. Without these functions, the alternative to reuse code would be copying it and adapting it to different contexts, which would be a bad idea. Redundant code - repeating code in this case - should be avoided. Using functions usually enhances the comprehensibility and quality of a program. It also lowers the cost for the development and maintenance of the software. Functions have various names in programming languages (e.g., subroutines, routines, procedures, methods, or subprograms).

Let us look at the following code:

```print("Program starts")

print("Hi Peter")
print("Nice to see you again!")
print("Enjoy our video!")

print("Hi Sarah")
print("Nice to see you again!")
print("Enjoy our video!")

print("Hi Dominque")
print("Nice to see you again!"
print("Enjoy our video!")
```

Output:

```''Program starts''
''Hi Peter''
''Nice to see you again!''
''Enjoy our video!''
''Hi Sarah''
''Nice to see you again!''
''Enjoy our video!''
''Hi Dominque''
''Nice to see you again!''
''Enjoy our video!''
```

Let us have a closer look at the code above. You can see in the code that we are greeting three persons. We use three print statements that are nearly the same; just the name is different. This is what we call redundant code. We are repeating the code three times. This is an example where functions can and should be used to eliminate redundant code.

The following Python code uses a function containing a parameter with the name "name." Redundancy is eliminated and the code is more concise. Take a look:

```def greet(name):
print("Hi " + name)
print("Nice to see you again!")
print("Enjoy our video!")
print("Program starts")
greet("Peter")
greet("Sarah")
greet("Dominque")</nowiki>
```

Output:

```''Program starts''
''Hi Peter''
''Nice to see you again!''
''Enjoy our video!''
''Hi Sarah''
''Nice to see you again!''
''Enjoy our video!''
''Hi Dominque''
''Nice to see you again!''
''Enjoy our video!''
```

In the samples above, the use of functions results in less lines of code, 20% less to be specific. Any further reuse of the function would provide even greater efficiency.

A function in Python is defined by a def statement. The general syntax looks like this:

```     '''def''' function_name(parameter list):
statements and the function body
```

The parameter list consists of none or more parameters. Parameters call the actual data to be used (arguments), if the function employs any. The rest of the function body consists of indented statements. The function body gets executed every time the function is called.

Local and Global Variable Scope in Functions

When defined within the function, variable are by default local in scope to the function.

Here's an example:

```     def f():
print(s)
s = "chimichurri"
f()
```

Output:

```chimichurri
```

Example:

```     def f():
s = "chimichanga"
print(s)
f()
```

Output

```chimichanga
```

Example:

```     s = "chimichurri"
f()
print(s)
```

Output:

```chimichanga
chimichurri
```

Example:

```     def f():
print(s)
s = "chimichanga"
print(s)
s = "chimichurri"
f()
print(s)
```

Output:

```    UnboundLocalError                         Traceback (most recent call last)
<ipython-input-25-81b2fbbc4d42> in <module>
6
7 s = "chimichurri"
---->8 f()
9 print(s)

<ipython-input-25-81b2fbbc4d42> in f()
1 def f():
---->2     print(s)
3     s = "chimichanga"
4     print(s)
5
UnboundLocalError: local variable 's' referenced before assignment
```

If we execute the previous script, we get the error message: UnboundLocalError: local variable 's' referenced before assignment.

The variable 's' is ambiguous in f(), i.e. in the first print in f() the global 's' could be used with the value "chimichurri". After this, we define a local variable s with the assignment s = "chimichanga".

```     def f():
global s
print(s)
s = "dog"
print(s)
s = "cat"
f()
print(s)
```

Output:

```cat
dog
dog
```

We made the variable 's' global inside of the script. Therefore anything we do inside of the function body is done to the global variables outside of it. [1]

Modular Programming

What is it?

Modular programming is a software design technique that emphasizes separating the functionality of a program into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of the desired functionality.[2] It can be used in a variety of applications and functions with other components of the system. Similar functions are grouped in the same unit of programming code and separate functions are developed as separate units of code so that the code can be reused by other applications.[3]

Why use it?

The purpose of modular programming is to make it easier to develop and maintain large software programs, by breaking them up into smaller parts. Modular programming usually makes your code easier to read because it means separating it into functions that only deal with one aspect of the overall functionality. It can make your files a lot smaller and easier to understand compared to monolithic code. Software that’s split into distinct modules is also ideal for testing. That’s because tests can be much stricter and more detailed when you test small functions that do less, compared to big functions that do many things. This is especially true if you can only test the outputs of a function, rather than seeing the steps it follows. Separating functions in this way can make it much quicker and easier to find what you’re looking for later.[4]

Modular vs Structured vs Object-Oriented

Modular programming is closely related to structured programming and object-oriented programming, all having the same goal of facilitating construction of large software programs and systems by decomposition into smaller pieces. In more specific terms, modular programming refers to the high-level decomposition of the code of an entire program into pieces: structured programming to the low-level code use of structured control flow, and object-oriented programming to the data use of objects, a kind of data structure.[5]

Modules

The code is composed of many different code modules that are developed separately. This allows different developers to take on discrete pieces of the system, designing and implementing them without having to understand all of the rest. But to build large programs out of modules effectively, we need to be able to write modules that are correctly isolated from the rest of the program. Rather than have to think about every other part of the program when developing a code module, we need to be able to use local reasoning. That is reasoning about just the module, and the contract it needs to satisfy with respect to the rest of the program. If everyone has done their job, separately developed code modules can be plugged together to form a working program without every developer needing to understand everything done by every other developer in the team. This is the key idea of modular programming.[6]

Parameters

A function or procedure usually needs some information about the environment, in which it has been called. The relationship between the environment and the function involves special variables, which are called parameters. By using these parameters, it's possible to use all kinds of "outside" objects inside of a function. The syntax for how parameters are declared and the semantics for how the arguments are passed to the parameters of the function depends on the programming language.

Very often the terms parameter and argument are used synonymously, but there is a clear difference. Parameters are inside functions or procedures, while arguments are used in procedure calls (i.e., the values passed to the function at run-time).

The evaluation strategy for arguments (i.e., how the arguments from a function call are passed to the parameters of the function) differs between programming languages. The most common evaluation strategies are "call by value" and "call by reference."

Subroutine

What is it?

A subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that task should be performed. Subroutines may be defined within programs, or separately in libraries that can be invoked. In different programming languages, a subroutine may be called a routine, subprogram, function, method, or procedure. Technically, these terms all have different definitions. The generic umbrella term 'callable unit' is sometimes used as well.[7]

Types

There are two types of subroutines: procedures and functions. A procedure is a subroutine that performs a specific task. When the task is complete, the subroutine ends and the main program continues from where it left off. For example, a procedure may be written to reset all the values of an array to zero, or to clear a screen. Meanwhile, a function works in the same way as a procedure, except that it manipulates data and returns a result back to the main program. [8]

Call by value

The most common strategy is the call-by-value evaluation, sometimes also called pass-by-value. This strategy is used in C and C++, for example. In call-by-value, the argument expression is evaluated, and the result of this evaluation is bound to the corresponding variable in the function. So, if the expression is a variable, its value will be assigned (copied) to the corresponding parameter. This ensures that the variable in the caller's scope will stay unchanged when the function returns. In other words, a copy of the argument passed to the function is made. When any operation is done on the copy, the actual value does not change. It only changes the value of the copy which is made inside the function. Here's an unbiased Python example:

``` def test(string):
string = "Python is cool"
print("Inside function: " + string)

string = "Python"
print("Before function call: " + string)
test(string)
print("After function call: " + string)
```

Output

``` Before function call: Python
Inside function: Python is cool
After function call: Python
```

As you can see, a copy of the argument passed to the function is made. When any operation is done on the copy, the actual value does not change. It only changes the value of the copy which is made inside the function.

Call by reference

In call-by-reference evaluation, which is also known as pass-by-reference, a function gets an implicit reference to the argument, rather than a copy of its value. As a consequence, the function can modify the argument, i.e. the value of the variable in the caller's scope can be changed. By using Call by Reference we save both computation time and memory space because arguments do not need to be copied. On the other hand, this harbors the disadvantage that variables can be "accidentally" changed in a function call. So, special care has to be taken to "protect" the values, which shouldn't be changed. Many programming languages support call-by-reference, like C or C++, but Perl uses it as default. In ALGOL 60 and COBOL there has been a different concept called call-by-name, which isn't used anymore in modern languages.

Python uses a mechanism, which is known as "Call-by-Object", sometimes also called "Call by Object Reference" or "Call by Sharing". If you pass immutable arguments like integers, strings or tuples to a function, the passing acts like call-by-value. The object reference is passed to the function parameters. They can't be changed within the function, because they can't be changed at all (i.e., they are immutable). It's different, if we pass mutable arguments. They are also passed by object reference, but they can be changed in place within the function. If we pass a list to a function, we have to consider two cases: elements of a list can be changed in place (i.e., the list will be changed even in the caller's scope); if a new list is assigned to the name, the old list will not be affected (i.e., the list in the caller's scope will remain untouched). [9] A Python example:

``` def add_list(a):
a.append('world')
print("Inside function call: " + str(a))

a = ['Hello']
print("Before function call: " + str(a))
print("After function call: " + str(a))
```

Output

``` Before function call: ['Hello']
Inside function call: ['Hello', 'world']
After function call: ['Hello', 'world']
```

On this reference to the original, the arguments is passed and when any operation is performed on the argument the actual value also changes. It changes the value in both function scope and globally.[10]

Naming Conventions

What is it?

A naming convention is a pre-determined set of rules or guidelines used for your program’s variables, functions, etc. They are general rules applied when creating text scripts for software programming. They have many different purposes, such as adding clarity and uniformity to scripts, readability for third-party applications, and functionality in certain languages and applications.[11]

Why they are needed

Common reasons for following a naming convention include: reducing the effort needed to read and understand source code, to enable code reviews to focus on more important issues than arguing over syntax and naming standards, and to enable code quality review tools to focus their reporting mainly on significant issues other than syntax and style preferences.[12]

Common conventions

Camel case - every word is capitalized except the first. There are no spaces in between words. Example:

``` exampleVariable = 1
def myFunction():
```

Pascal case - every word is capitalized. There are no spaces in between words. Example:

``` Example Variable = 1
def MyFunction():
```

Snake case - every word is lower case. Words are separated by underscores. Example:

``` example_variable = 1
def my_function():
```

Which do I use

C++, Java, and JavaScript typically use camelCase, with PascalCase reserved for libraries and classes. C# primarily uses PascalCase, with camelCase used for parameters. Python uses snake_case for most identifiers. In addition, the following rules apply:

• CONSTANTS IN ALL UPPER CASE (often UPPER_SNAKE_CASE)[13]

Conditions

Structured Programming

What is it?

Structured programming is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by making extensive use of the structured control flow constructs of selection (if/then/else) and repetition (while and for), block structures, and subroutines. Structured programming is most frequently used with deviations that allow for clearer programs in some particular cases, such as when exception handling has to be performed. [1]

Control Structures

Sequence

Ordered statements or subroutines executed in sequence. It allows us to carry tasks that have multiple steps. In programming, sequence is a basic algorithm: A set of logical steps carried out in order.[2]

Sequence execution of code statements (one line after another) -- like following a recipe [3]

```x = 33
y = 55
z = x + y
print(z)
```

Output:

```88
```

In the above example, lines of code are executed until the program outputs the sum of x and y.

Selection

It's used to make choices depending on information. An algorithm can be made smarter by using IF, THEN, and ELSE functions to reiterate instructions, or to move the process in question to different parts of the program. Selection is also called decision. It is one of the three basic logic assemblies in computer programming. The other two logic assemblies are sequence and loop.[4]

The most common selection statement is the if/else statement. Basic syntax:

```   if (expression)
statement
else
statement
```

The else clause is optional, so this format is also legal:

```   if (expression)
statement <ref>https://www.cs.fsu.edu/~myers/c++/notes/control1.html</ref>

course_score = int(input("Enter a course score: "))

if course_score >= 90:

elif 80 <= course_score < 90:

elif 70 <= course_score < 80:

elif 60 <= course_score < 70:

else:
```

Output:

```Enter a course score: 50
```

In the above example[5], the user inputs a score and the program outputs a grade based on that score. The if/elif/else structure is used here.

Repetition/Iteration

Repetition in a program means that lines of code will be run multiple times.

Iteration is a term similar to repetition: it means to continue repeating an action until you achieve the correct outcome.[6]

Such as:

• while
• do/while
• for[7]
``` for num in range(1, 5):
print(num)
```

Output:

``` 1
2
3
4
```

In the above example,[8] the program prints from 1 to 4. The for structure is used here.

``` keep_going = 'y'

while keep_going == 'y':
course_score = int(input("Enter a course score: "))

if course_score >= 90:

elif 80 <= course_score < 90:

elif 70 <= course_score < 80:

elif 60 <= course_score < 70:

else:

keep_going = input("Do you want to keep going? "
+ "Enter y for yes. ")
```

In the above example, the user inputs a score and the program outputs a grade based on that score. The program continues to run as long as the the user enters y to keep going. The program is terminated when the user enters anything else. The while structure is used here.

Recursion

Recursion is the process of defining a problem (or the solution to a problem) in terms of (a simpler version of) itself.[9]

``` def factorial(number):
if number == 1:
return 1
else:
return (number * factorial(number - 1))

```

Output:

```24
```

In the above example, the program calculates a factorial by using recursion. When this function is run, an ”if” statement is executed. This statement checks whether the number passed to the function is equal to 1. If it is, our function returns 1. Otherwise, the factorial of our number is calculated. This calculation works by multiplying the number passed to the function by the factorial of the previous number. This function is called again and again until “number” is equal to 1. Each time the function is called, the value of “number” is reduced by 1.[10]

Subroutines

In structured programming, various functions are declared, defined, and executed. Functions that structure programs are called "subroutines" or "procedures." Subroutines become very handy if they are implemented repeatedly.[11]

Such functions are declared with "def."

``` def course_letter_Grade (course_score):
course_score = int(input("Enter a course score: "))

if course_score >= 90:

elif 80 <= course_score < 90:

elif 70 <= course_score < 80:

elif 60 <= course_score < 70:

else:
```

Output:

```Enter a course score: 50
```

For example, if one wants to execute grading programming, "course_letter_grade" function execute the corresponding letter grade based on the entered score.

Blocks

Blocks in Python is an indentation. Indentation determines the scope of conditional statements. The indentation level constrains each block.[12]

```    course_score = int(input("Enter a course score: "))    #Block 1; intake a course score

if course_score >= 90:                               #Block 2; selects course score
print("Your course grade is A.")                  #Block 3; displays result of the selection

elif 80 <= course_score < 90:                        #Block 2; selects course score
print("Your course grade is B.")                   #Block 3; displays results of the selection

elif 70 <= course_score < 80:                        #Block 2; selects course score
print("Your course grade is C.")                   #Block 3; displays results of the selection

elif 60 <= course_score < 70:                        #Block 2; selection course score
print("Your course grade is D.")                   #Block 3; displays results of the selection

else:                                                #Block 2; selection course score
print("Your course grade is F.")                   #Block 3; displays results of the selection
```

Output:

``` Enter a course score: 50
```

Block 1 serves as an initiation of operation or an opening code for the "course_score" operation. Block 2s are served to determine and to select appropriate Block3 based on the input in Block1. Block 3s are served to show the result of Block 2.

Why use this?

• Easier to read and understand
• User Friendly
• Easier to Maintain
• Mainly problem based instead of being machine based
• Development is easier as it requires less effort and time
• Easier to Debug
• Machine-Independent, mostly.

• Being Machine-Independent, it takes time to convert into machine code.
• The converted machine code is not the same as for assembly language.
• The program depends upon changeable factors like data-types. Therefore it needs to be updated with the need on the go.
• Usually the development in this approach takes longer time as it is language-dependent. Whereas in the case of assembly language, the development takes lesser time as it is fixed for the machine.

Data validation

In computer science, data validation is the process of ensuring data has undergone data cleansing to ensure they have, that is, that they are both correct and useful. It uses routines, often called "validation rules", "validation constraints", or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated facilities of a data dictionary, or by the inclusion of explicit application program validation logic of the computer and its application.

Validation in Python

Whenever an input is accepted by the user, it needs to be checked for validation which checks if the input data is what we are expecting and the validation can be done in two different ways, that is by using a flag variable or by using try or except which the flag variable will be set to false initially and if we can find out that the input data is what we are expecting the flag status can be set to true and find out what can be done next based on the status of the flag whereas while using try or except, a section of code is tried to run and if there is a negative response, then the except block of code is run.

Types of Validation in Python

There are three types of validation in python, they are:

Type Check: This validation technique in python is used to check the data type of the given input. For example, int, float, etc. Length Check: This validation technique in python is used to check the length of the given input string. Range Check: This validation technique in python is used to check if a given number falls in between the two numbers. The syntax for validation in Python is given below:

Syntax using flag:

```     flagName = False
while not flagName:
if [Do check here]:
flagName = True
else:
print('error message')
```

The status of the flag is set to false initially and the same condition is considered for a while loop to make the statement while not true, and the validation is performed setting the flag to true if the validation condition is satisfied otherwise the error message is printed.

Syntax using an exception:

```     while True:
try:
[run code that might fail here] break
except:
print('This is the error message if the code fails')

print(‘run the code from here if code is successfully run in the try block of code above’)
```

We set the condition to be true initially and perform the necessary validation by running a block of code and if the code fails to perform the validation, an exception is raised displaying the error message and a success message is printed if the code is successfully executed by the try block.

Examples of Python validation are:

Example 1 Python program using a flag to validate if the input given by the user is an integer.#Datatype check.

```     #Declare a variable validInt which is also considered as flag and set it to false
validInt = False
#Consider the while condition to be true and prompt the user to enter the input
while not validInt:
#The user is prompted to enter the input
#The input entered by the user is checked to see if it’s a digit or a number
if age1.isdigit():
validInt = True
else:
print('The input is not a valid number')
#This statement is printed if the input entered by the user is a number
print('The entered input is a number and that is ' + str(age1))
```

Output:

```     Please enter your age: 22
The entered input is a number that is 22
```

Example 2 Python program using flag and exception to validate the type of input given by the user and to determine if it lies within a given range.

```
#Range Check.

#Declare a variable areTeenager which is also considered as flag and set it to false
areTeenager = False
#Consider the while condition to be true and prompt the user to enter the input
while not areTeenager:
try:
#The user is prompted to enter the input
#The input entered by the user is checked if it lies between the range specified
if age1 >= 13 and age1 <= 19:
areTeenager = True
except:
print('The age entered by you is not a valid number between 13 and 19')
#This statement is printed if the input entered by the user lies between the range of the number specified
print('You are a teenager whose age is between 13 and 19, and the entered age is  ' + str(age1))
```

Output:

```Please enter your age: 14
You are a teenager whose age is between 13 and 19, and the entered age is 14
```

Example 3 Python program using flag to check the length of the input string. #Length Check.

```     #Declare a variable lenstring which is also considered as flag and set it to false
lenstring = False
#Consider the while condition to be true and prompt the user to enter the input
while not lenstring:
#The input entered by the user is checked for its length and if it is below five
lenstring = True
else:
print('The number of characters in the entered password is less than five characters')
#This statement is printed if the input entered by the user consists of less than five characters
```

Output:

```Please enter a password consisting of at least five characters sdsdsds
```

Benefits

• It helps to improve the security of code.
• Validation in python prevents third-party users from mishandling the code accidentally or intentionally.
• It can be used to check if the input data type is correct or not.
• It can be used to check if there are no invalid values in the given input.
• It can be used to check if the given input lies in the range or is it out of range.
• It can be used to check if the given input meets the constraints specified on them.
• It can be used to check if the given input is consistent or not.
• Validation in python can be used to check if the given input is valid.
• Validation in python can be used to check if the given input is complete or incomplete.

[14]

Exception Handling

In computing and computer programming, exception handling is the process of responding to the occurrence of exceptions – anomalous or exceptional conditions requiring special processing - during the execution of a program. In general, an exception breaks the normal flow of execution and executes a pre-registered exception handler; the details of how this is done depend on whether it is a hardware or software exception and how the software exception is implemented. It is provided by specialized programming language constructs, hardware mechanisms like interrupts, or operating system (OS) inter-process communication (IPC) facilities like signals. Some exceptions, especially hardware ones, may be handled so gracefully that execution can resume where it was interrupted.

An alternative approach to exception handling in software is error checking, which maintains normal program flow with later explicit checks for contingencies reported using special return values, an auxiliary global variable such as C's errno, or floating point status flags. Input validation, which preemptively filters exceptional cases, is also an approach.

Python syntax and semantics/exceptions - EAFP vs. LBYL

The syntax of the Python programming language is the set of rules that defines how a Python program will be written and interpreted (by both the runtime system and by human readers). The Python language has many similarities to Perl, C, and Java. However, there are some definite differences between the languages.

Defensive programming

Defensive programming is a form of defensive design intended to ensure the continuing function of a piece of software under unforeseen circumstances. Defensive programming practices are often used where high availability, safety, or security is needed.[15] Essentially, defensive programming is a programming style to ensure your program functions as intended no matter the circumstances. Below are a few of the most popular and powerful techniques that can be applied when taking a defensive approach to programming.

Assertions

An assertion is a boolean expression at a specific point in a program which will be true unless there is a bug in the program. An assertion could simply be a comment used by the programmer to think about how the code works. Or an assertion could document a constraint on the system.[16]

```def my_function(age):
assert int(age) > 18
print("You are over 18.")
```

The code above shows a function with an argument declared "age". Within the function, we assert that the age variable must be greater than 18. If we pass any integer 19 or greater through the function, we are met with a response: "You are over 18". However, if we pass an integer of 18 or lower, our program throws an assertion error and crashes.

Intelligent source code reuse

Intelligent source code reuse is the practice of using previously tested blocks of code as opposed to typing everything up from scratch. If existing code is tested and known to work, reusing it may reduce the chance of bugs being introduced. However, reusing code is not always a good practice, because it also amplifies the damages of a potential attack on the initial code.[17]

Design by contract

Design by contract (DbC), also known as contract programming, programming by contract and design-by-contract programming, is an approach for designing software.

It prescribes that software designers should define formal, precise and verifiable interface specifications for software components, which extend the ordinary definition of abstract data types with preconditions, postconditions and invariants. These specifications are referred to as "contracts", in accordance with a conceptual metaphor with the conditions and obligations of business contracts.

The DbC approach assumes all client components that invoke an operation on a server component will meet the preconditions specified as required for that operation.

Where this assumption is considered too risky (as in multi-channel or distributed computing), the inverse approach is taken, meaning that the server component tests that all relevant preconditions hold true (before, or while, processing the client component's request) and replies with a suitable error message if not.[18]

Exception handling

In computing and computer programming, exception handling is the process of responding to the occurrence of exceptions – anomalous or exceptional conditions requiring special processing - during the execution of a program. In general, an exception breaks the normal flow of execution and executes a pre-registered exception handler; the details of how this is done depend on whether it is a hardware or software exception and how the software exception is implemented.[19] Take a look at the code below.

```age = int(input("How old are you?"))
new_age = age + 10
print(f"In 10 years, you will be {new_age}!")
```

When we run this code, we assume the user will be inputting their age in numbers. However, if the user enters an invalid input, an exception is thrown. Below is the exception thrown when a user enters a letter instead of a number.

```How old are you?
>g

''Traceback (most recent call last):''
''File "C:/Users/Sam/Pycharm/test.py", line 1, in <module>''
''age = int(input("How old are you?"))''
''ValueError: invalid literal for int() with base 10: 'g''
```

Exceptions are typically multiple lines which can be difficult to decipher. One way to solve this issue would be to use a "try-except". In this case, we will be excepting a "ValueError" in the event that the user enters an invalid character.

```    try:
age = int(input("How old are you?"))
new_age = age + 10
print(f"In 10 years, you will be {new_age}!")
except ValueError:
print("Invalid input!")
```

Now when we run the program and input invalid information, we are met with a message that is much more decipherable.

```   How old are you?
>g

Invalid input!
```

Key Terms

Assertion - An assertion is a predicate connected to a point in the program, that always should evaluate to true at that point in code execution. Assertions can help a programmer read the code, help a compiler compile it, or help the program detect its own defects.

Boolean Expression - An expression in a programming language that produces a Boolean value when evaluated, i.e. one of true or false.

Conditional Statements - Allow for selection between alternatives at runtime.

Data Validation - The process of ensuring that data have undergone data cleansing to ensure they have data quality, that is, that they are both correct and useful.

Defensive Programming - A form of defensive design intended to ensure the continuing function of a piece of software under unforeseen circumstances.

EAFP (Easier to Ask Forgiveness than Permission) - Approach that first attempts the desired action then handles any resulting exceptions.

Exception Handling - The process of responding to the occurrence, during computation, of exceptions. Anomalous or exceptional conditions requiring special processing, often changing the normal flow of program execution.

GoTo Statement - Performs a one-way transfer of control to another line of code; in contrast, a function call normally returns control.

If Statement - An if statement is a programming conditional statement that, if proved true, performs a function or displays information.

LBYL (Look Before You Leap) - Approach which a precondition is tested before accessing the sought-after resource.

Logic Error - Error that makes the program deliver unexpected results without crashing it.

Relational Operator - A programming language construct or operator that tests or defines some kind of relation between two entities, including numerical equality (e.g., 5 = 5) and inequalities (e.g., 4 ≥ 3).

Runtime Error - Error produced by the runtime system if something goes wrong while a syntactically correct program is running.

Software Bug - An error, flaw, failure or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways.

Structured Programming - A programming paradigm aimed at improving the clarity, quality, and development time of a computer program.

Syntax - Syntax of a computer language is the set of rules that defines the combinations of symbols that are considered to be a correctly structured document or fragment in that language.

Syntax error - Error that indicates something is wrong with program syntax. It's produced by Python during the translation of the source code into byte code.

Truth Table - A mathematical table used in logic, truth tables can be used to show whether a propositional expression is true for all legitimate input values, that is, logically valid.

Objects - A combination of related variables, constants and other data structures which can be selected and manipulated together.

Conditionals - Help the code make a choice and result in either TRUE or FALSE. These perform different actions depending on the need of the programmer, and multiple conditions can be combined into a single condition, as long as the final value of the condition is either TRUE or FALSE. Examples of conditional statements are ‘IF’, ‘IF-Else’, ‘While’ and ‘Else-If’.

Validation - An automatic computer check to ensure that the data entered is sensible and reasonable. It does not check the accuracy of data

Loops

Overview

This lesson introduces loops, including while, for, do and other loops. Loops allow you to repeat similar operations in your code. A loop executes a block of code until the loop has iterated over an object. In a for loop, as will be further explained, the loop runs until it has iterated over every item in an iterable. For loops help you reduce repetition in your code because they let you execute the same operation multiple times. All loops are sequences of instructions designed to be repeated until a certain condition is met or achieved. Loops only need to be written once but may repeat multiple times over. Because of this, they are used to do certain tasks multiple times based on the program's task, avoiding having to create extra, unnecessary steps in a program. [1]

What are loops:

Loops are a fundamental construct for many programs. In fact, all but the most basic of programs are likely to include at least one loop in them. Loops can be very useful and can save you, the developer, a lot of time. [2]

We can elaborate on the time-saving part. Let's consider a situation when you want to print Hello, World! five times.

``` print("Hello, World!")
print("Hello, World!")
print("Hello, World!")
print("Hello, World!")
print("Hello, World!")
```

Output:

```Hello, World!
Hello, World!
Hello, World!
Hello, World!
Hello, World!
```

It was simple, but again, let's consider another situation when you want to write Hello, World! a thousand times. We can certainly not write print() statements a thousand times. Almost all the programming languages provide a concept called loop, which helps in executing one or more statements up to a desired number of times. All high-level programming languages provide various forms of loops, which can be used to execute one or more statements repeatedly. [3]

Loop Structure:

They allow you to run one or more lines of code repetitively. You can repeat the statements in a loop structure until a condition is True, until a condition is False, a specified number of times, or once for each element in a collection.[4]

The following fall under the Loop Structure:

• While Loop
• Do-While Loop
• For Loop
• Break Statement
• Continue Statement[5]

Types of loops:

For loop:

It enables a particular set of conditions to be executed repeatedly until a condition is satisfied.[6] Below is a following example of a For Loop:

```  for x in range(0, 3):
print("We're on time %d" % (x))
```

The Python for statement iterates over the members of a sequence in order, executing the block each time. They are traditionally used when you have a block of code which you want to repeat a fixed number of times.[7]

While loop:

While loops are implemented when there is specific condition to satisfy or to follow. The while loop can be terminated whenever the conditional statement is not satisfied. In other words, when the while loop's test expression becomes false, the code inside of the while loop is neglected, and the following code is executed immediately. [8]

``` #While loop Format
While ''Expression''
''Statement''
```

Another way to implement the while loop is control the number of repetitions. The repetition of the while can be determined with control variable.

``` #While loop repetition Format
n = 0  #repetition control variable
While (n < 5): n = n + 1; # the repetition of the loop is limited to 5 times, incrementing by 1 each time.
Statement

#While loop repetition example
n = 0
while (n < 5): n = n + 1; print(“Welcome to CIS206”)
```

The output gives the following results:

``` Welcome to CIS206
Welcome to CIS206
Welcome to CIS206
Welcome to CIS206
Welcome to CIS206
```

In the while loop operation, there are two distinguished control expressions: "break" and "continue." [9] The use of either break or continue allow more than one conditional statements inside the while loop. break terminates while loop immediately. continue shifts to the alternative conditional statement as long as original while loop's conditional statement is True.

``` #While loop - Break example
n = 3
while n > 0:
n = n + 1
if n == 1:
break  #break is engaged if the while loop is repeated once
print (n)
print('Loop is terminated')

#While loop - Continue example
n = 3
while n > 0:
n = n + 1
if n == 2:
continue #continue is engaged if the while is repeated twice
print (n)
print('Loop is terminated')
```

Do-while loop:

Do-while has distinctive difference from the regular while loop. The difference is that the Do-while is executed at least once. [10] Do-while loop is initiated with conditional statement(s).

``` #Do-while loop template
n = 0 #number of repetition controller
while True:
print(n)
n = n + 1
if (n > 5);
break #when the execution is repeated five times, the while operation is terminated
```

Infinite loop:

An infinite loop (sometimes called an endless loop ) is a piece of coding that lacks a functional exit so that it repeats indefinitely. Typically, a certain process is done, such as getting an item of data and changing it, and then some condition is checked, such as whether a counter has reached a prescribed number. If the presence of the specified condition cannot be ascertained, the next instruction in the sequence tells the program to return to the first instruction and repeat the sequence, which typically goes on until the program terminates automatically after a certain duration of time, or the operating system terminates the program with an error.[11]

In most cases, infinite loops are unintentional and the result of poor programming. However, there are situations in which infinite loops can be used intentionally. The most common occurrence of this is with the modern computer. Modern interactive computers require that the computer constantly be monitoring for user input or device activity, so at some fundamental level there is an infinite processing idle loop that must continue until the device is turned off or reset.[12]

Examples:

```    apples = 4
while apples > 3:
print("You have " + apples + " apples.")
apples = apples + 1
```
```    ''You have 4 apples.''
''You have 5 apples.''
''You have 6 apples.''
''You have 7 apples.''
''You have 8 apples.''
''...''
```

In the example above, the loop is intended to break when the number of apples is at or below three. However, the variable "apples" starts at four. With each iteration of the loop, "apples" increases by one. In this case, "apples" will never fall to zero, and the loop repeats infinitely.

```    while True:
print("This statement is true")
else:
print("This statement is not true")
```
```    ''This statement is true''
''This statement is true''
''This statement is true''
''This statement is true''
''...''
```

In this example, the programmer has failed to create a break statement to exit the loop. Because there is no break statement or any code that would cause the loop to be "false", the loop repeats infinitely.

Nested loop:

A nested loop occurs when one loop is placed inside the body of another loop. The inside loop is executed each time the outer loop is executed.

Examples

```    i = 1
j = 5
while i < 4:
while j < 8:
print(i, ",", j)
j = j + 1
i = i + 1
```
```    ''1 , 5''
''2 , 6''
''3 , 7''
```

[13]

The function above contains two "while" loops; one nested within the other. The outer loop is executed first with "i" being less than four. This causes the internal loop to execute with "j" being less than 8. The function then breaks when "i" is no longer less than 4; on the fourth iteration.

Loop keywords:

Break Statement - This statement terminates the loop containing it. Control of the program flows to the statement immediately after the body of the loop. If the statement is inside a nested loop (loop inside another loop), the statement will terminate the innermost loop.

Condition - This statement a loop checks in order to decide if it should loop again or stop looping.. It's a set of rules performed if a certain condition is met. It is sometimes referred to as an If-Then statement, because IF a condition is met, THEN an action is performed.

Continue Statement - This statement is used inside loops. When the statement is encountered inside a loop, control jumps to the beginning of the loop for next iteration, skipping the execution of statements inside the body of loop for the current iteration.

Counting controlled - A loop where a variable (such as a counter) determines how many times a loop will run.

Do-While Loop - A do-while loop is a control flow statement that executes a block of code at least once, and then repeatedly executes the block, or not, depending on a given boolean condition at the end of the block.

Exit - A predefined function used to prematurely stop a program and return to the operating system . It is used to finish or exit the execution of an enclosing loop statement. If the statement includes a condition, then the exit from the loop is conditional. .

Flag - Commonly used to control or to indicate the intermediate state or outcome of a particular operation. It is the value that acts as a signal for a function or process. The value is used to determine the next step of a program

For Loop - A for loop is a control flow statement for specifying iteration, which allows code to be executed repeatedly. A for loop has two parts: a header specifying the iteration, and a body which is executed once per iteration.

For each Loop - For each (or foreach, sometimes called an iterative for-loop) is a control flow statement for traversing items in a collection.

Goto - An unstructured branching command that makes the logic in a program jump to a different location

Infinite Loop (or Endless Loop) - A sequence of instructions in a computer program which loops endlessly, either due to the loop having no terminating condition or having one that can never be met.

Iteration Control Structure - A statement or block that is executed until the program reaches a certain state, or operations have been applied to every element of a collection or array.

Loop Counters - Variable that controls the iterations of a loop (a computer programming language construct). It is named because most uses of this construct result in the variable taking on a range of integer values in some orderly sequences.

Nested Loop - A loop within a loop. The inner loop will repeat depending on the number of iterations in the outer loop.

Pass Statement - This statement is used as a placeholder for future code. When the statement is executed, nothing happens; but you avoid getting an error when empty code is not allowed.

Raise - This keyword is used to raise an exception. You can define what kind of error to raise, and the text to print to the user.

Return - A branching statement that causes a function to jump back to the function that called it. Allowing it to be also used as a way to break out of a loop. A return statement returns occasional results from a function or simply finishes a function. There is an implicit return at the end of any function, so you do not need to use one if your function ends naturally, without returning any value.

Sentinel value - It's a special value in the context of an algorithm which uses its presence as a condition of termination, typically in a loop or recursive algorithm. The value is a form of in-band data that makes it possible to detect the end of the data when no out-of-band data (such as an explicit size indication) is provided.

While Loop - A while loop is a control flow statement that allows code to be executed repeatedly based on a given boolean condition.

Testing

Testing Approach

Static, dynamic, and passive testing

There are many approaches available in software testing. Reviews, walkthrough, or inspections are referred to as static testing, whereas executing programmed code with a given set of test cases is referred to as dynamic testing.[1][2]

Static testing is often implicit, like proofreading, plus when programming tools/text editors check source code structure or compilers (pre-compilers) check syntax and data flow as static program analysis. Dynamic testing takes place when the program itself is running. Dynamic testing may begin before the program is 100% complete in order to test particular sections of code and are applied to discrete functions or modules.[1][2] Typical techniques for these are either using stubs/drivers or execution from a debugger environment.[2]

Static testing involves verification, whereas dynamic testing also involves validation.[2]

Passive testing means verifying the system behavior without any interaction with the software product. Contrary to active testing, testers do not provide any test data but look at system logs and traces. They mine for patterns and specific behavior in order to make some kind of decisions.[3] This is related to offline runtime verification and log analysis.

Exploratory approach

Exploratory testing is an approach to software testing that is concisely described as simultaneous learning, test design, and test execution. Cem Kaner, who coined the term in 1984,[4]:2 defines exploratory testing as "a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the quality of his/her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project."[4]:36

The "box" approach

Software testing methods are traditionally divided into white- and black-box testing. These two approaches are used to describe the point of view that the tester takes when designing test cases. A hybrid approach called grey-box testing may also be applied to software testing methodology.[5][6] With the concept of grey-box testing—which develops tests from specific design elements—gaining prominence, this "arbitrary distinction" between black- and white-box testing has faded somewhat.[7]

White-box testing

White-box testing (also known as clear box testing, glass box testing, transparent box testing, and structural testing) verifies the internal structures or workings of a program, as opposed to the functionality exposed to the end-user. In white-box testing, an internal perspective of the system (the source code), as well as programming skills, are used to design test cases. The tester chooses inputs to exercise paths through the code and determine the appropriate outputs.[5][6] This is analogous to testing nodes in a circuit, e.g., in-circuit testing (ICT).

While white-box testing can be applied at the unit, integration, and system levels of the software testing process, it is usually done at the unit level.[7] It can test paths within a unit, paths between units during integration, and between subsystems during a system–level test. Though this method of test design can uncover many errors or problems, it might not detect unimplemented parts of the specification or missing requirements.

Techniques used in white-box testing include:[6][8]

• API testing – testing of the application using public and private APIs (application programming interfaces)
• Code coverage – creating tests to satisfy some criteria of code coverage (e.g., the test designer can create tests to cause all statements in the program to be executed at least once)
• Fault injection methods – intentionally introducing faults to gauge the efficacy of testing strategies
• Mutation testing methods
• Static testing methods

Code coverage tools can evaluate the completeness of a test suite that was created with any method, including black-box testing. This allows the software team to examine parts of a system that are rarely tested and ensures that the most important function points have been tested.[9] Code coverage as a software metric can be reported as a percentage for:[5][9][10]

• Function coverage, which reports on functions executed
• Statement coverage, which reports on the number of lines executed to complete the test
• Decision coverage, which reports on whether both the True and the False branch of a given test has been executed

100% statement coverage ensures that all code paths or branches (in terms of control flow) are executed at least once. This is helpful in ensuring correct functionality, but not sufficient since the same code may process different inputs correctly or incorrectly.[11] Pseudo-tested functions and methods are those that are covered but not specified (it is possible to remove their body without breaking any test case).[12]

Black-box testing

Black-box testing (also known as functional testing) treats the software as a "black box," examining functionality without any knowledge of internal implementation, without seeing the source code. The testers are only aware of what the software is supposed to do, not how it does it.[13] Black-box testing methods include: equivalence partitioning, boundary value analysis, all-pairs testing, state transition tables, decision table testing, fuzz testing, model-based testing, use case testing, exploratory testing, and specification-based testing.[5][6][10]

Specification-based testing aims to test the functionality of software according to the applicable requirements.[14] This level of testing usually requires thorough test cases to be provided to the tester, who then can simply verify that for a given input, the output value (or behavior), either "is" or "is not" the same as the expected value specified in the test case. Test cases are built around specifications and requirements, i.e., what the application is supposed to do. It uses external descriptions of the software, including specifications, requirements, and designs to derive test cases. These tests can be functional or non-functional, though usually functional.

Specification-based testing may be necessary to assure correct functionality, but it is insufficient to guard against complex or high-risk situations.[15]

One advantage of the black box technique is that no programming knowledge is required. Whatever biases the programmers may have had, the tester likely has a different set and may emphasize different areas of functionality. On the other hand, black-box testing has been said to be "like a walk in a dark labyrinth without a flashlight."[16] Because they do not examine the source code, there are situations when a tester writes many test cases to check something that could have been tested by only one test case or leaves some parts of the program untested.

This method of test can be applied to all levels of software testing: unit, integration, system and acceptance.[7] It typically comprises most if not all testing at higher levels, but can also dominate unit testing as well.

Component interface testing

Component interface testing is a variation of black-box testing, with the focus on the data values beyond just the related actions of a subsystem component.[17] The practice of component interface testing can be used to check the handling of data passed between various units, or subsystem components, beyond full integration testing between those units.[18][19] The data being passed can be considered as "message packets" and the range or data types can be checked, for data generated from one unit, and tested for validity before being passed into another unit. One option for interface testing is to keep a separate log file of data items being passed, often with a timestamp logged to allow analysis of thousands of cases of data passed between units for days or weeks. Tests can include checking the handling of some extreme data values while other interface variables are passed as normal values.[18] Unusual data values in an interface can help explain unexpected performance in the next unit.

Visual testing

The aim of visual testing is to provide developers with the ability to examine what was happening at the point of software failure by presenting the data in such a way that the developer can easily find the information she or he requires, and the information is expressed clearly.[20][21]

At the core of visual testing is the idea that showing someone a problem (or a test failure), rather than just describing it, greatly increases clarity and understanding. Visual testing, therefore, requires the recording of the entire test process – capturing everything that occurs on the test system in video format. Output videos are supplemented by real-time tester input via picture-in-a-picture webcam and audio commentary from microphones.

Visual testing provides a number of advantages. The quality of communication is increased drastically because testers can show the problem (and the events leading up to it) to the developer as opposed to just describing it and the need to replicate test failures will cease to exist in many cases. The developer will have all the evidence she or he requires of a test failure and can instead focus on the cause of the fault and how it should be fixed.

Ad hoc testing and exploratory testing are important methodologies for checking software integrity, because they require less preparation time to implement, while the important bugs can be found quickly.[22] In ad hoc testing, where testing takes place in an improvised, impromptu way, the ability of the tester(s) to base testing off documented methods and then improvise variations of those tests can result in more rigorous examination of defect fixes.[22] However, unless strict documentation of the procedures are maintained, one of the limits of ad hoc testing is lack of repeatability.[22]

Grey-box testing

Grey-box testing (American spelling: gray-box testing) involves having knowledge of internal data structures and algorithms for purposes of designing tests while executing those tests at the user, or black-box level. The tester will often have access to both "the source code and the executable binary."[23] Grey-box testing may also include reverse engineering (using dynamic code analysis) to determine, for instance, boundary values or error messages.[23] Manipulating input data and formatting output do not qualify as grey-box, as the input and output are clearly outside of the "black box" that we are calling the system under test. This distinction is particularly important when conducting integration testing between two modules of code written by two different developers, where only the interfaces are exposed for the test.

By knowing the underlying concepts of how the software works, the tester makes better-informed testing choices while testing the software from outside. Typically, a grey-box tester will be permitted to set up an isolated testing environment with activities such as seeding a database. The tester can observe the state of the product being tested after performing certain actions such as executing SQL statements against the database and then executing queries to ensure that the expected changes have been reflected. Grey-box testing implements intelligent test scenarios, based on limited information. This will particularly apply to data type handling, exception handling, and so on.[24]

Testing Levels

There are mainly 4 levels of testing in software testing:[25]

1. Unit Testing - This checks if software components are fulfilling functionalities or not. It's the smallest testable portion of system or application which can be compiled, liked, loaded and executed. This kind of testing helps to test each module separately. The aim is to test each part of the software by separating it. It checks that component are fulfilling functionalities or not. This kind of testing is performed by developers.

2. Integration Testing - This checks the data flow from one module to other modules. Integration means combining. For Example, In this testing phase, different software modules are combined and tested as a group to make sure that integrated system is ready for system testing. Integrating testing checks the data flow from one module to other modules. This kind of testing is performed by testers.

3. System Testing - This evaluates both functional and non-functional needs for the testing. System testing is performed on a complete, integrated system. It allows checking system's compliance as per the requirements. It tests the overall interaction of components. It involves load, performance, reliability and security testing. System testing most often the final test to verify that the system meets the specification. It evaluates both functional and non-functional need for the testing.

4. Acceptance Testing - This checks the requirements of a specification or contract are met as per its delivery. Acceptance testing is a test conducted to find if the requirements of a specification or contract are met as per its delivery. Acceptance testing is basically done by the user or customer. However, other stockholders can be involved in this process.

Testing Techniques and Tactics

There are 5 important software testing techniques and tactics:[26]

1. Boundary Value Analysis (BVA) - It's based on testing at the boundaries between partitions. It includes maximum, minimum, inside or outside boundaries, typical values and error values. It is generally seen that a large number of errors occur at the boundaries of the defined input values rather than the center. It is also known as BVA and gives a selection of test cases which exercise bounding values. This black box testing technique complements equivalence partitioning. This software testing technique base on the principle that, if a system works well for these particular values then it will work perfectly well for all values which comes between the two boundary values

Guidelines for Boundary Value Analysis:

• If an input condition is restricted between values x and y, then the test cases should be designed with values x and y as well as values which are above and below x and y.
• If an input condition is a large number of values, the test case should be developed which need to exercise the minimum and maximum numbers. Here, values above and below the minimum and maximum values are also tested.
• Apply guidelines 1 and 2 to output conditions. It gives an output which reflects the minimum and the maximum values expected. It also tests the below or above values.

2. Equivalence Class Partitioning - This allows you to divide set of test condition into a partition which should be considered the same. This software testing method divides the input domain of a program into classes of data from which test cases should be designed. The concept behind this technique is that test case of a representative value of each class is equal to a test of any other value of the same class. It allows you to Identify valid as well as invalid equivalence classes.

3. Decision Table Based Testing - It's also known as to Cause-Effect table. This software testing technique is used for functions which respond to a combination of inputs or events. For example, a submit button should be enabled if the user has entered all required fields. The first task is to identify functionalities where the output depends on a combination of inputs. If there are large input set of combinations, then divide it into smaller subsets which are helpful for managing a decision table. For every function, you need to create a table and list down all types of combinations of inputs and its respective outputs. This helps to identify a condition that is overlooked by the tester.

Following are steps to create a decision table:

• Enlist the input in rows.
• Enter all the rules in the column.
• Fill the table with the different combination of inputs.
• In the last row, note down the output against the input combination.

4. State Transition - This technique changes in input conditions change the state of the Application Under Test (AUT). This testing technique allows the tester to test the behavior of an AUT. The tester can perform this action by entering various input conditions in a sequence. In State transition technique, the testing team provides positive as well as negative input test values for evaluating the system behavior.

Guideline for State Transition:

• State transition should be used when a testing team is testing the application for a limited set of input values.
• The technique should be used when the testing team wants to test sequence of events which happen in the application under test.

5. Error Guessing - It's a software testing technique based on guessing the error which can prevail in the code. The technique is heavily based on the experience where the test analysts use their experience to guess the problematic part of the testing application. Hence, the test analysts must be skilled and experienced for better error guessing. The technique counts a list of possible errors or error-prone situations. Then tester writes a test case to expose those errors. To design test cases based on this software testing technique, the analyst can use the past experiences to identify the conditions.

Guideline for Error Guessing:

• The test should use the previous experience of testing similar applications.
• Understanding of the system under test.
• Knowledge of typical implementation errors.
• Remember previously troubled areas.
• Evaluate Historical data & Test results.

Unit Testing

Benefits and Limitations

Here are the benefits or advantages of using unit testing:[27][28]

• Validate your work - By writing a test, you double-check what you did. Unit testing finds problems early in the development cycle. This includes both bugs in the programmer's implementation and flaws or missing parts of the specification for the unit. The process of writing a thorough set of tests forces the author to think through inputs, outputs, and error conditions, and thus more crisply define the unit's desired behavior.
• Separate concerns in your code - Unit testing your code requires you make it testable. Making code testable often means declaring dependencies upfront. This kind of structuring of the code usually leads to a cleaner design and better separation of concerns. With code that is not tested, it's easier to miss implicit dependencies and classes silently taking on multiple responsibilities.
• An always up-to-date documentation - While documentation will get out of date, unless you update it, tests that run and pass with the code will not get out of date. If you write clean tests, these tests can serve as evergreen documentation for the code. Unit testing provides a sort of living documentation of the system. Developers looking to learn what functionality is provided by a unit, and how to use it, can look at the unit tests to gain a basic understanding of the unit's interface (API).
• Fewer regressions - Unit tests do a remarkable job of catching regressions as long as they are in the codebase.
• A safety net for refactoring - While unit tests can catch regressions for small changes, they shine with large refactoring. On codebases with high test coverage and good tests, refactoring is much higher confidence work. Unit testing allows the programmer to refactor code or upgrade system libraries at a later date, and make sure the module still works correctly (e.g., in regression testing). The procedure is to write test cases for all functions and methods so that whenever a change causes a fault, it can be quickly identified. Unit tests detect changes which may break a design contract.

Here are the limitations or disadvantages of using unit testing:[29]

• With unit testing, you have to increase the amount of code that needs to be written. You usually have to write one or more unit tests depending on how complex things are. It is suggested to have at least three so you don’t just get a yes and a no that contradicts each other. For every line of code written, programmers often need 3 to 5 lines of test code. This obviously takes time and its investment may not be worth the effort. While the test code should be fairly simple, this testing method is still more work and more code which means more hours and more cost.
• Unit testing cannot and will not catch all errors in a program. There is no way it can test every execution path or find integration errors and full system issues. Unit testing should be done in conjunction with other software testing activities, as they can only show the presence or absence of particular errors; they cannot prove a complete absence of errors. To guarantee correct behavior for every execution path and every possible input, and ensure the absence of errors, other techniques are required, namely the application of formal methods to proving that a software component has no unexpected behavior.
• Unit tests have to be realistic. You want the unit you’re testing to act as it will as part of the full system. If this doesn’t happen, the test value and accuracy are compromised.

Test-driven Development

Test-Driven Development (TDD) uses tests as a way to design code by creating the test first before any actual production code is written. You then try to make the test pass by creating production code that fulfills the test. This is usually a five-step process:

• Write a test (some also call this a specification).
• Run the test and show that it fails. (red)
• Write the smallest amount of production code possible that meets the needs of the test.
• Run the test until it passes. (green)
• Refactor.

This process is sometimes called red-green-refactor. Red symbolizes fail and green represent pass. [[30]]

Key Terms

Acceptance Testing - Checks the requirements of a specification or contract are met as per its delivery.[31]

Alpha Testing - Tests performed near the end of development focused on simulating real user experience.

Assertion - A predicate…connected to a point in the program, that always should evaluate to true at that point in code execution.[32]

Beta Testing - Performed by "real users" of the software application in "real environment"[33]

Buddy Testing - Two buddies mutually work on identifying defects in the same module. Mostly one buddy will be from development team and another person will be from testing team. Buddy testing helps the testers develop better test cases and development team can also make design changes early. This testing usually happens after Unit Testing completion [34]

Coverage - A measure used to describe the degree to which the source code of a program is executed when a particular test suite runs.[35]

Integration Testing - Checks the data flow from one module to other modules.[31]

Manual Testing - Type of software testing in which test cases are executed manually by a tester without using any automated tools.[36]

Pytest - A testing framework that allows users to write test codes using Python programming language.[37]

Pytest Coverage - A coverage plugin for PyTest used for measuring code coverage of Python programs.

Refactoring - To restructure existing code.

Regression Testing - Re-running functional and non-functional tests to ensure that previously developed and tested software still performs after a change [38]

System Testing - Evaluates both functional and non-functional needs for the testing.[31]

Unit Testing - Checks if software components are fulfilling functionalities or not.[31]

References

1. a b Graham, D.; Van Veenendaal, E.; Evans, I. (2008). Foundations of Software Testing. Cengage Learning. pp. 57–58. ISBN 9781844809899.
2. a b c d Oberkampf, W.L.; Roy, C.J. (2010). Verification and Validation in Scientific Computing. Cambridge University Press. pp. 154–5. ISBN 9781139491761.
3. Lee, D.; Netravali, A.N.; Sabnani, K.K.; Sugla, B.; John, A. (1997). "Passive testing and applications to network management". Proceedings 1997 International Conference on Network Protocols. IEEE Comput. Soc: 113–122. doi:10.1109/icnp.1997.643699. ISBN 081868061X. S2CID 42596126.
4. a b Cem Kaner (2008), A Tutorial in Exploratory Testing (PDF)
5. a b c d Limaye, M.G. (2009). Software Testing. Tata McGraw-Hill Education. pp. 108–11. ISBN 9780070139909.
6. a b c d Saleh, K.A. (2009). Software Engineering. J. Ross Publishing. pp. 224–41. ISBN 9781932159943.
7. a b c Ammann, P.; Offutt, J. (2016). Introduction to Software Testing. Cambridge University Press. p. 26. ISBN 9781316773123.
8. Everatt, G.D.; McLeod Jr., R. (2007). "Chapter 7: Functional Testing". Software Testing: Testing Across the Entire Software Development Life Cycle. John Wiley & Sons. pp. 99–121. ISBN 9780470146347.
9. a b Cornett, Steve (c. 1996). "Code Coverage Analysis". Bullseye Testing Technology. Introduction. Retrieved November 21, 2017.
10. a b Black, R. (2011). Pragmatic Software Testing: Becoming an Effective and Efficient Test Professional. John Wiley & Sons. pp. 44–6. ISBN 9781118079386.
11. As a simple example, the C function `int f(int x){return x*x-6*x+8;}` consists of only one statement. All tests against a specification `f(x)>=0` will succeed, except if `x=3` happens to be chosen.
12. Vera-Pérez, Oscar Luis; Danglot, Benjamin; Monperrus, Martin; Baudry, Benoit (2018). "A comprehensive study of pseudo-tested methods". Empirical Software Engineering. 24 (3): 1195–1225. arXiv:1807.05030. Bibcode:2018arXiv180705030V. doi:10.1007/s10664-018-9653-2. S2CID 49744829.
13. Patton, Ron (2005). Software Testing (2nd ed.). Indianapolis: Sams Publishing. ISBN 978-0672327988.
14. Laycock, Gilbert T. (1993). The Theory and Practice of Specification Based Software Testing (PDF) (dissertation). Department of Computer Science, University of Sheffield. Retrieved January 2, 2018.
15. Bach, James (June 1999). "Risk and Requirements-Based Testing" (PDF). Computer. 32 (6): 113–114. Retrieved August 19, 2008.
16. Savenkov, Roman (2008). How to Become a Software Tester. Roman Savenkov Consulting. p. 159. ISBN 978-0-615-23372-7.
17. Mathur, A.P. (2011). Foundations of Software Testing. Pearson Education India. p. 63. ISBN 9788131759080.
18. a b Clapp, Judith A. (1995). Software Quality Control, Error Analysis, and Testing. p. 313. ISBN 978-0815513636. Retrieved January 5, 2018.
19. Mathur, Aditya P. (2007). Foundations of Software Testing. Pearson Education India. p. 18. ISBN 978-8131716601.
20. Lönnberg, Jan (October 7, 2003). Visual testing of software (PDF) (MSc). Helsinki University of Technology. Retrieved January 13, 2012.
21. Chima, Raspal. "Visual testing". TEST Magazine. Archived from the original on July 24, 2012. Retrieved January 13, 2012.
22. a b c Lewis, W.E. (2016). Software Testing and Continuous Quality Improvement (3rd ed.). CRC Press. pp. 68–73. ISBN 9781439834367.
23. a b Ransome, J.; Misra, A. (2013). Core Software Security: Security at the Source. CRC Press. pp. 140–3. ISBN 9781466560956.
24. "SOA Testing Tools for Black, White and Gray Box" (white paper). Crosscheck Networks. Archived from the original on 1 October 2018. Retrieved December 10, 2012.
25. https://www.guru99.com/levels-of-testing.html
26. https://www.guru99.com/software-testing-techniques.html
27. https://blog.pragmaticengineer.com/unit-testing-benefits-pyramid/
28. Unit Testing
30. https://testguild.com/unit-tdd-and-bdd-testing-whats-the-difference/
31. a b c d Guru99 - Levels of Testing
32. w:Assertion_(software_development)
33. Guru99 - Alpha/Beta Testing
35. w:Code_coverage
36. Guru99 - Manual testing
37. Guru99 - PyTest
38. w:Regression_testing

Strings

String

String Functions

String functions are used in computer programming languages to manipulate a string or query information about a string (some do both).

Most programming languages that have a string datatype will have some string functions although there may be other low-level ways within each language to handle strings directly. In object-oriented languages, string functions are often implemented as properties and methods of string objects. In functional and list-based languages a string is represented as a list (of character codes), therefore all list-manipulation procedures could be considered string functions. However such languages may implement a subset of explicit string-specific functions as well.

For function that manipulate strings, modern object-oriented languages, like C# and Java have immutable strings and return a copy (in newly allocated dynamic memory), while others, like C manipulate the original string unless the programmer copies data to a new string. See for example Concatenation below.

The most basic example of a string function is the `length(string)` function. This function returns the length of a string literal.

e.g. `length("hello world")` would return 11.

Other languages may have string functions with similar or exactly the same syntax or parameters or outcomes. For example, in many languages the length function is usually represented as len(string).

Run-length encoding

Run-length encoding (RLE) is a form of lossless data compression in which runs of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs. Consider, for example, simple graphic images such as icons, line drawings, Conway's Game of Life, and animations. It is not useful with files that don't have many runs as it could greatly increase the file size.

RLE may also be used to refer to an early graphics file format supported by CompuServe for compressing black and white images, but was widely supplanted by their later Graphics Interchange Format (GIF). RLE also refers to a little-used image format in Windows 3.x, with the extension rle, which is a Run Length Encoded Bitmap, used to compress the Windows 3.x startup screen.

Example

For example, consider a screen containing plain black text on a solid white background. There will be many long runs of white pixels in the blank space, and many short runs of black pixels within the text. A hypothetical scan line, with B representing a black pixel and W representing white, might read as follows:

```   WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
```

With a run-length encoding (RLE) data compression algorithm applied to the above hypothetical scan line, it can be rendered as follows:

```   12W1B12W3B24W1B14W
```

This can be interpreted as a sequence of twelve Ws, one B, twelve Ws, three Bs, etc.,

The run-length code represents the original 67 characters in only 18. While the actual format used for the storage of images is generally binary rather than ASCII characters like this, the principle remains the same. Even binary data files can be compressed with this method; file format specifications often dictate repeated bytes in files as padding space. However, newer compression methods such as DEFLATE often use LZ77-based algorithms, a generalization of run-length encoding that can take advantage of runs of strings of characters (such as BWWBWWBWWBWW).

Run-length encoding can be expressed in multiple ways to accommodate data properties as well as additional compression algorithms. For instance, one popular method encodes run lengths for runs of two or more characters only, using an "escape" symbol to identify runs, or using the character itself as the escape, so that any time a character appears twice it denotes a run. On the previous example, this would give the following:

```   WW12BWW12BB3WW24BWW14
```

This would be interpreted as a run of twelve Ws, a B, a run of twelve Ws, a run of three Bs, etc. In data where runs are less frequent, this can significantly improve the compression rate.

One other matter is the application of additional compression algorithms. Even with the runs extracted, the frequencies of different characters may be large, allowing for further compression; however, if the run lengths are written in the file in the locations where the runs occurred, the presence of these numbers interrupts the normal flow and makes it harder to compress. To overcome this, some run-length encoders separate the data and escape symbols from the run lengths, so that the two can be handled independently. For the example data, this would result in two outputs, the string "WWBWWBBWWBWW" and the numbers (12,12,3,24,14).

Escape sequence

What is it?

The programmers refer to the “backslash (\)” character as an escape character. In other words, it has a special meaning when we use it inside the strings. As the name suggests, the escape character escapes the characters in a string for a brief moment to introduce unique inclusion. That is to say; backlash signifies that the next character after it has a different meaning.[1]

Examples in Python

Single quotes

```string = 'That\'s my bag.'
print(string)
```

Output:

```That's my bag.
```

This example used (\') to print the single-quote in the string.

Double quotes

``` string = "\"Python\""
print(string)
```

Output:

```"Python"
```

This example used (\") to remove the backslash and put the quote in the string.

Newline character

``` string = 'applied \nprogramming'
print(string)
```

Output:

```applied
programming
```

A newline character is used to write the words in a new separate line.

Backslash

``` string = 'applied\\ programming'
print(string)
```

Output:

```applied\ programming
```

This example prints a single backslash.

Space

``` string = 'applied\tprogramming'
print(string)
```

Output:

```applied programming
```

This example adds a space between the words.

Backspace

``` string = 'applied \bprogramming'
print(string)
```

Output:

```appliedprogramming
```

This example used "\b" to remove the space between the words.

Hexa value

``` string = "\x50\x59\x54\x48\x4f\x4E"
print(string)
```

Output:

```PYTHON
```

This example used \xhh to convert hexa values to a string.

``` string = "Nancy said \x22Hello World!\x22 to the crowd."
print(string)
```

Output:

```Nancy said "Hello World!" to the crowd.
```

This example uses "\x" to indicate the following two characters are hexadecimal digits, "22" being the ASCII value for a double-quote in hexadecimal.

Octal value

``` string = "\120\131\124\110\117\116"
print(string)
```

Output:

```PYTHON
```

This example used \ooo to convert the octal value into a normal string.

Key terms

Concatenation - The joining of two character strings. Also referred to as as ‘concat’.

Control Characters - Used to perform actions rather than to display a printable character on screen. Easily understood examples include 'Escape', 'Backspace' and 'Delete'. [4]

Escape Sequence - A combination of characters that has a meaning other than the literal characters contained therein.[5]

Fixed Length String - A string with a pre-determined, static length.

Iteration - The repetition of a process in order to generate an outcome.[6]

Prefix - A string A = a1, a2, …an has a prefix Â = a1, a2, … am when m ≤ n. A proper prefix of the string A would not be equal to itself (0 ≤ m < n). [7]

Run-Length Encoding (RLE) - a Form of data compression in which a stream of data is given as the input (i.e. "AAABBCCCC") and the output is a sequence of counts of consecutive data values in a row (i.e. "3A2B4C").[8]

String - An array of characters typically surrounded by quotation marks.

String Literal - A type of literal in programming for the representation of a string value within the source code of a computer program.[9]

Substring - Occurs when one string is a prefix of a suffix of an original string, and equivalently a suffix of a prefix.[7]

Suffix - Any substring of an original string that includes the original string’s last letter, including itself. A proper suffix of a string is not equal to/the same as the string original string itself.[7]

Variable Length String - A string where the length can vary and is often determined by user input.

Files

Computer Files

What is it?

When you use your computer to create things, those things are stored in units of information called files. A computer file can be a document you write with your word processor. A computer file can also be a graphical image from a digital camera or an image you create with a digital paintbrush, a piece of music, a video, or just about anything. Whatever it is, the computer stores that information as a file.[1]

File sizes

Since data storage on computers is still limited, file sizes still matter. File sizes are always measured in bytes. A byte is a sequence of 8 bits (and remember, a bit is the smallest piece of digital information, 000 or 111). A single byte is enough bits to represent 256 numbers, because 28 = 2562. That also means a byte is big enough to represent a single letter in the ASCII encoding standard. Larger files are referred to in larger byte sizes. Terms include kilobyte (1000 bytes), megabyte (10002 bytes), gigabyte (10003 bytes), terabyte (10004 bytes), petabyte (10005 bytes).[2]

File types

Computers store all files as binary data, long strings of 111s and 000s. Files represent all different types of data, however—like images, videos, documents, text files, and spreadsheets. Even applications are files. Each file has a type/kind/format, which is often reflected in its file extension.[3] Each individual file in Windows (and most other operating systems) will also have a file attribute which sets a condition to the specific file. For example, you can't write new information to a file that has the read-only attribute turned on. A filename is just the name that a user or program titles the file to help identify what it is. An image file may be named something like kids-lake-2017.jpg. The name itself doesn't affect the contents of the file, so even if a video file is named something like image.mp4, it doesn't mean it's suddenly a picture file. Files in any operating system are stored on hard drives, optical drives, and other storage devices. The specific way a file is stored and organized is referred to as a file system, which starts with the root directory and then continues to countless subdirectories or folders.[4]

File operations

Include:[5]

• Create a new file
• Change the access permissions and attributes of a file
• Open a file, which makes the file contents available to the program
• Read data from a file
• Write data to a file
• Delete a file
• Close a file, terminating the association between it and the program
• Truncate a file, shortening it to a specified size within the file system without rewriting any content
• Updating contents to an existing file
• Searching data on a file

File Systems

In computing, a file system or filesystem (often abbreviated to fs) controls how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stops and the next begins. By separating the data into pieces and giving each piece a name, the data is easily isolated and identified. Taking its name from the way paper-based data management system is named, each group of data is called a "file." The structure and logic rules used to manage the groups of data and their names is called a "file system."[6]

A file system functions in much the same way a book's table of contents does. It is a way for the computer to refer to the location of data on the storage device. Where a book may refer to a chapter and range of pages, a file system does the same with directories and folders. A filename consists of a sequence of characters followed by an extension type, such as myfile.txt. If we were to continue the book analogy, a filename would be the title of the chapter, and the directory would be the chapter number or page range. Directories can structured so that they are flat/linear or hierarchical; meaning they can be either one long list of files, or stored within many different folders and sub-folders.

Common file systems include FAT32 and NTFS on Windows platforms and APFS on macOS.

Text Files

What are they?[7]

A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. There are for most text files a need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.

Binary Files

What are they?[8]

A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.

Binary files contain formatting information that only certain applications or processors can understand. While humans can read text files, binary files must be run on the appropriate software or processor before humans can read them. For example, only Microsoft Word and certain other word processing programs can interpret the formatting information in a Word document. Executable files, compiled programs, SAS and SPSS system files, spreadsheets, compressed files, and graphic (image) files are all examples of binary files.[9]

Reading and Writing to files in C#

The StreamReader and StreamWriter tools are used in C# for reading and writing to files. The declaration “using System.IO;” must be included in the heading of the program for the classes to be accessible, and the StreamReader or StreamWriter object is created with the keyword “new”. For example, the below code will use a StreamReader object that we’ll name sr to read and display a file called “filename.txt”:

```    using System;
using System.IO;

public class MyProgram
{
public static void Main(string[] Args)
{
/*the StreamReader object takes different commands to read or write to the file.*/
Console.WriteLine(line);
}
}
```

Reading and Writing to files in Python

The same program can be performed in Python with the following code.

```    file = open("filename.txt")
for line in text:
print(line)
file.close()
```

Key Terms

Absolute Path - Contains the root element and the complete directory list required to locate the file.[10]

Binary File - A computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file".[11]

Directory structure - The way an operating system's file system and its files are displayed to the user.[12]

Directory - A location for storing files on your computer. Directories are found in a hierarchical file system, such as Linux, MS-DOS, OS/2, and Unix.[13]

File System - Alternatively referred to as file management or FS, a file system is a method of organizing and retrieving files from a storage medium.[14]

File Utilities - Allow users to create, list, copy, move and delete files, and alter metadata.[15]

Fully-Qualified File Name - A string that uniquely identifies a file stored on the computer by including the path, name, and extension of the file.[12]

Metadata - Information that is typically associated with each file within a file system. File systems might store the file creation time, the time it was last accessed, etc.[15]

Parent and Children - A parent directory houses "children" files or subdirectories. A child is a file or subdirectory housed in a parent directory.[16]

Path - Complete location or name of where a computer, file, device, or web page is located.[17]

Relative Path - Needs to be combined with another path in order to access a file.[10]

Root - The highest-level directory in the file system hierarchy, found in UNIX-like operating systems.[18]

Text File - A file that is structured as a sequence of lines of electronic text, it exists stored as data within a computer file system.[19]

Lists and Tuples

Data Structure

What is it?

A way in which data are stored for efficient search and retrieval. Different data structures are suited for different problems. Some data structures are useful for simple general problems, such as retrieving data that has been stored with a specific identifier. For example, an online dictionary can be structured so that it can retrieve the definition of a word. On the other hand, specialized data structures have been devised to solve complex specific search problems.[1]

Looking at basic examples is an effective way to understand data structures. For example, a very basic example of a data structure is an array, in which multiple data bits are coordinated into a group sharing a common label. This helps programs call these data bits or perform other work on the data set as a whole. Another example of a data structure is a stack, which places data units in relative hierarchies, allowing code functions to work on the data in coordinated ways, such as pushing a new data unit into a stack, or popping a data unit from the top of a stack.[2]

In a general sense, the data structure concept dovetails with that of virtual objects and virtual reality. As data is more elaborately arranged by developers and others, the data becomes more functional, allowing the emergence of a virtual reality. This is a core concept of many technological advances from the last few decades.[2]

Array Data Structure

What is it?

In computer science, an array data structure, or simply an array, is a data structure consisting of a collection of elements (values or variables), each identified by at least one array index or key. An array is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional array. Arrays are among the oldest and most important data structures, and are used by almost every program. They are also used to implement many other data structures, such as lists and strings. Arrays are useful mostly because the element indices can be computed at run time. Among other things, this feature allows a single iterative statement to process arbitrarily many elements of an array. For that reason, the elements of an array data structure are required to have the same size and should use the same data representation.[3]

Indexing

Zero-based

Index starts with 0. Array length is equal to the number of elements the array can store. Zero-based arrays are zero-based because of math-related reasons. Not for hardware-related reasons.[4]

One-based

Index starts with 1.

N-based

The base index of an array can be freely chosen. Usually programming languages allowing n-based indexing also allow negative index values and other scalar data types like enumerations, or characters may be used as an array index.

Dimensions

One-dimensional

A one-dimensional array (or single dimension array) is a type of linear array. Accessing its elements involves a single subscript which can either represent a row or column index.

```days_week = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']
print(days_week[1])
```

Output:

```Tues
```

Two-dimensional

Two dimensional array is an array within an array. It is an array of arrays. In this type of array the position of an data element is referred by two indices instead of one. So it represents a table with rows and columns of data.[5]

``` random_array = [[11, 12, 5, 2], [15, 6,10], [10, 8, 12, 5], [12, 15, 8, 6]]
print(random_array[0])
print(random_array[1][2])
```

Output:

```[11, 12, 5, 2]
10
```

Tuples

A tuple is a data structure that is an immutable, or unchangeable, ordered sequence of elements. Because tuples are immutable, their values cannot be modified.

Cannot modify

[6]

``` t = ('a', 'b', 'c', 'd', 'e')
t[0] = 'A'
print(t)
```

Output:

```TypeError: 'tuple' object does not support item assignment
```

As a general rule, tuples use less memory than a list or array.

Tuples allows for slicing and indexing like lists, but do not have methods for deleting or changing any elements.

Indexing

``` t = ('a', 'b', 'c', 'd', 'e')
print(t[1])
print(t[-1])
```

Output:

```b
e
```

Slicing

``` t = ('a', 'b', 'c', 'd', 'e')
print(t[1:])
print(t[:2])
print(t[1:3])
print(t[::2]
```

Output:

```('b', 'c', 'd', 'e')
('a', 'b')
('b', 'c')
('a', 'c', 'e')
```

Activities

1) Create a program that compares two lists and returns "True" if those lists have at least one common element.

2) Create a program that compares two lists and identifies common elements. Display the index of these common elements for each list.

3) Create a program that reads a list and removes any consecutive and repeating elements. For example, ["A", "B", "C", "C", "D", "E", "E", "G", "E", "F"] would result in ["A", "B", "C", "D", "E", "G", "E", "F"].

4) Create a program that combines two lists into a single list by alternating elements from each list. For example ["A", "B", "C"] + ["1", "2", "3"] would result in ["A", "1", "B", "2", "C", "3"].

Key Terms

Array (list) - A data structure consisting of a collection of elements (values or variables), each identified by at least one array index or key.[7]

Data structure - A collection of data values, the relationships among them, and the functions or operations that can be applied to the data.[8]

Field - A particular piece of data encapsulated within a class or object.[9]

Hash table -A type of data structure that stores key-value pairs. The key is sent to a hash function that performs arithmetic operations on it. The result (commonly called the hash value or hash) is the index of the key-value pair in the hash table.[10]

Heap - A specialized tree-based data structure that satisfies the heap property: if P is a parent node of C, then the key (the value) of P is either greater than or equal to (in a max heap) or less than or equal to (in a min heap) the key of C.[11]

Index (key) - A numerical representation of an item's position in a sequence.[12]

Linked list - A collection of elements called nodes, where each node has a value and points to the next node in the list (and sometimes the previous).[8]

Member - A single datum of a record; for example, the 'Name' field of a 'Person' record.[13]

Record (struct) - A structure used to collect multiple variables, often of different types stored as fields.[13]

Tagged union - A union that contains one additional field indicating the current type for enhanced type safety.[8]

Tuple - Similar to an array or list with the exception that they are immutable and enclosed in parentheses.

Union - A data structure where a number of primitive types may be stored in concert, similar to a struct or a record.[8]

Dictionaries and Sets

Data Structure

In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification.[1][2][3] More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.[4]

Data structures provide a means to manage large amounts of data efficiently for uses such as large databases and internet indexing services. Usually, efficient data structures are key to designing efficient algorithms. Some formal design methods and programming languages emphasize data structures, rather than algorithms, as the key organizing factor in software design. Data structures can be used to organize the storage and retrieval of information stored in both main memory and secondary memory.[5]

Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by a pointer—a bit string, representing a memory address, that can be itself stored in memory and manipulated by the program. Thus, the array and record data structures are based on computing the addresses of data items with arithmetic operations, while the linked data structures are based on storing addresses of data items within the structure itself.

The implementation of a data structure usually requires writing a set of procedures that create and manipulate instances of that structure. The efficiency of a data structure cannot be analyzed separately from those operations. This observation motivates the theoretical concept of an abstract data type, a data structure that is defined indirectly by the operations that may be performed on it, and the mathematical properties of those operations (including their space and time cost).[6]

There are numerous types of data structures, generally built upon simpler primitive data types:[7]

• An array is a number of elements in a specific order, typically all of the same type (depending on the language, individual elements may either all be forced to be the same type, or may be of almost any type). Elements are accessed using an integer index to specify which element is required. Typical implementations allocate contiguous memory words for the elements of arrays (but this is not always a necessity). Arrays may be fixed-length or resizable.
• A linked list (also just called list) is a linear collection of data elements of any type, called nodes, where each node has itself a value, and points to the next node in the linked list. The principal advantage of a linked list over an array is that values can always be efficiently inserted and removed without relocating the rest of the list. Certain other operations, such as random access to a certain element, are however slower on lists than on arrays.
• A record (also called tuple or struct) is an aggregate data structure. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members.
• A union is a data structure that specifies which of a number of permitted primitive types may be stored in its instances, e.g. float or long integer. Contrast with a record, which could be defined to contain a float and an integer; whereas in a union, there is only one value at a time. Enough space is allocated to contain the widest member datatype.
• A tagged union (also called variant, variant record, discriminated union, or disjoint union) contains an additional field indicating its current type, for enhanced type safety.
• An object is a data structure that contains data fields, like a record does, as well as various methods which operate on the data contents. An object is an in-memory instance of a class from a taxonomy. In the context of object-oriented programming, records are known as plain old data structures to distinguish them from objects.[8][9]

Dictionary (Data Structure)

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection.

Operations associated with this data type allow:[10][11]

• the addition of a pair to the collection;
• the removal of a pair from the collection;
• the modification of an existing pair;
• the lookup of a value associated with a particular key.

Implementing associative arrays poses the dictionary problem, a classic computer science problem: the task of designing a data structure that maintains a set of data during 'search', 'delete', and 'insert' operations.[12] The two major solutions to the dictionary problem are a hash table and a search tree.[10][11][13][14] In some cases it is also possible to solve the problem using directly addressed arrays, binary search trees, or other more specialized structures.

Many programming languages include associative arrays as primitive data types, and they are available in software libraries for many others. Content-addressable memory is a form of direct hardware-level support for associative arrays.

Associative arrays have many applications including such fundamental programming patterns as memoization and the decorator pattern.[15]

The name does not come from the associative property known in mathematics. Rather, it arises from the fact that we associate values with keys.

In an associative array, the association between a key and a value is often known as a "mapping", and the same word mapping may also be used to refer to the process of creating a new association.

The operations that are usually defined for an associative array are:[10][11]

• Add or insert: add a new ${\displaystyle (key,value)}$  pair to the collection, mapping the new key to its new value. The arguments to this operation are the key and the value.
• Reassign: replace the value in one of the ${\displaystyle (key,value)}$  pairs that are already in the collection, mapping an existing key to a new value. As with an insertion, the arguments to this operation are the key and the value.
• Remove or delete: remove a ${\displaystyle (key,value)}$  pair from the collection, unmapping a given key from its value. The argument to this operation is the key.
• Lookup: find the value (if any) that is bound to a given key. The argument to this operation is the key, and the value is returned from the operation. If no value is found, some associative array implementations raise an exception, while others create a pair with the given key and the default value of the value type (zero, empty container...).

Often then instead of add or reassign there is a single set operation that adds a new ${\displaystyle (key,value)}$  pair if one does not already exist, and otherwise reassigns it.

In addition, associative arrays may also include other operations such as determining the number of mappings or constructing an iterator to loop over all the mappings. Usually, for such an operation, the order in which the mappings are returned may be implementation-defined.

A multimap generalizes an associative array by allowing multiple values to be associated with a single key.[16] A bidirectional map is a related abstract data type in which the mappings operate in both directions: each value must be associated with a unique key, and a second lookup operation takes a value as an argument and looks up the key associated with that value.

Suppose that the set of loans made by a library is represented in a data structure. Each book in a library may be checked out only by a single library patron at a time. However, a single patron may be able to check out multiple books. Therefore, the information about which books are checked out to which patrons may be represented by an associative array, in which the books are the keys and the patrons are the values. Using notation from Python or JSON, the data structure would be:

```{
"Pride and Prejudice": "Alice",
"Wuthering Heights": "Alice",
"Great Expectations": "John"
}
```

A lookup operation on the key "Great Expectations" would return "John". If John returns his book, that would cause a deletion operation, and if Pat checks out a book, that would cause an insertion operation, leading to a different state:

```{
"Pride and Prejudice": "Alice",
"The Brothers Karamazov": "Pat",
"Wuthering Heights": "Alice"
}
```

The basic definition of the dictionary does not mandate an order. To guarantee a fixed order of enumeration, ordered versions of the associative array are often used. There are two senses of an ordered dictionary:

• The order of enumeration is always deterministic for a given set of keys by sorting. This is the case for tree-based implementations, one representative being the `<map>` container of C++.
• The order of enumeration is key-independent and is instead based on the order of insertion. This is the case for the "ordered dictionary" in .NET Framework and Python.

The latter sense of ordered dictionaries are more commonly encountered. They can be implemented using an association list, or by overlaying a doubly linked list on top of a normal dictionary. The latter approach, as used by CPython before version 3.6, has the advantage of keeping the potentially better complexity of another implementation. CPython 3.6+ makes dictionaries ordered by splitting the hash table into an insertion-ordered array of k-v pairs and a sparse array ("hash table") of indices.[17]

Key-Value Database

What is it?

A key-value database is a type of nonrelational database that uses a simple key-value method to store data. A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier. Both keys and values can be anything, ranging from simple objects to complex compound objects. Key-value databases are highly partitionable and allow horizontal scaling at scales that other types of databases cannot achieve.[18]

Compared to relational databases?

Relational databases organize data using tables. Tables are structures that impose a schema on the records that they hold. Each column within a table has a name and a data type. Each row represents an individual record or data item within the table, which contains values for each of the columns. Relational databases get their name from mathematical relationships that use tuples (like the rows in a table) to represent ordered sets of data.[19]

What does this mean?

Key–value databases work in a very different fashion from the better known relational databases (RDB). RDBs predefine the data structure in the database as a series of tables containing fields with well defined data types. Exposing the data types to the database program allows it to apply a number of optimizations. In contrast, key–value systems treat the data as a single opaque collection, which may have different fields for every record. This offers considerable flexibility and more closely follows modern concepts like object-oriented programming. Because optional values are not represented by placeholders or input parameters, as in most RDBs, key–value databases often use far less memory to store the same database, which can lead to large performance gains in certain workloads.[20]

Activities

1) Write a program that creates a dictionary where the keys are even integers between 2 and 20 and the values are the corresponding key multiplied by the odd integer before it.

Example output: {2: 2, 4: 12, 6: 30, 8: 56...}

2) Write a program that takes two lists and merges them into a single dictionary.

Example:

```list_1 = [5, 10, 15, 20]

list_2 = ['a', 'b', 'c', 'd']
```

Result: {5: 'a', 10: 'b', 15: 'c', 20: 'd'}

3) Write a program that searches for the minimum value and returns the corresponding key.

Example:

```prices = {'Popcorn': 1.50, 'Soda': 1.00, 'Hotdog': 1.25, 'Nachos': 0.75}
```

Result: 'Nachos'

Key Terms

Dictionary / Associative array / Hash / Map - An abstract data type that can hold data in (key, value) pairs.[21]

Frozen Set - A native data type in Python that have the qualities of sets — including class methods — but are immutable like tuples.[22]

Hash collision - Occur when two entries are generated using the same index key. This has a high chance of occurring when hashing a random subset of a large set of possible keys.[23]

Hash table - A type of data structure that stores key-value pairs. The key is sent to a hash function that performs arithmetic operations on it. The result (commonly called the hash value or hash) is the index of the key-value pair in the hash table.[24]

Key - A field, or combination of fields, in a database table used to retrieve and sort rows in the table based on certain requirements. Keys are defined to speed up access to data and, in many cases, to create links between different tables.[25]

Load factor - A critical statistic for hash tables that references how quickly the hash table can be accessed. A higher load factor indicates the table may become slower or fail to work at all.

Merge algorithm - A family of algorithms that take multiple sorted lists as input and produce a single list as output, containing all the elements of the inputs lists in sorted order. These algorithms are used as subroutines in various sorting algorithms, most famously merge sort.[26]

Multimap - A generalization of a map or associative array abstract data type in which more than one value may be associated with and returned for a given key.[27]

Multiset - A type of associative containers similar to set, with an exception that multiple elements can have same values.[28]

Serialization - Produces a text or binary representation of the original objects that can be written directly to a file and offers a solution to use associative arrays in permanent form.[29]

Set - An unordered, mutable data type where each value must be unique.[22]

References

1. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009). Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press. ISBN 978-0262033848.
2. Black, Paul E. (15 December 2004). "data structure". In Pieterse, Vreda; Black, Paul E. (eds.). Dictionary of Algorithms and Data Structures [online]. National Institute of Standards and Technology. Retrieved 2018-11-06.
3. "Data structure". Encyclopaedia Britannica. 17 April 2017. Retrieved 2018-11-06.
4. Wegner, Peter; Reilly, Edwin D. (2003-08-29). Encyclopedia of Computer Science. Chichester, UK: John Wiley and Sons. pp. 507–512. ISBN 978-0470864128.
5. "When data is too big to fit into the main memory". homes.sice.indiana.edu.
6. Dubey, R. C. (2014). Advanced biotechnology : For B Sc and M Sc students of biotechnology and other biological sciences. New Delhi: S Chand. ISBN 978-81-219-4290-4. OCLC 883695533.
7. Seymour, Lipschutz (2014). Data structures (Revised first ed.). New Delhi, India: McGraw Hill Education. ISBN 9781259029967. OCLC 927793728.
8. Walter E. Brown (September 29, 1999). "C++ Language Note: POD Types". Fermi National Accelerator Laboratory. Archived from the original on 2016-12-03. Retrieved 6 December 2016.
9. https://en.wikipedia.org/wiki/Data_structure
10. a b c Goodrich, Michael T.; Tamassia, Roberto (2006), "9.1 The Map Abstract Data Type", Data Structures & Algorithms in Java (4th ed.), Wiley, pp. 368–371
11. a b c Mehlhorn, Kurt; Sanders, Peter (2008), "4 Hash Tables and Associative Arrays", Algorithms and Data Structures: The Basic Toolbox (PDF), Springer, pp. 81–98
12. Andersson, Arne (1989). "Optimal Bounds on the Dictionary Problem". Proc. Symposium on Optimal Algorithms. Lecture Notes in Computer Science. Springer Verlag. 401: 106–114. doi:10.1007/3-540-51859-2_10. ISBN 978-3-540-51859-4.
13. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001), "11 Hash Tables", Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. 221–252, ISBN 0-262-03293-7.
14. Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. "Dynamic Perfect Hashing: Upper and Lower Bounds" . SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370 doi:10.1137/S0097539791194094
15. Goodrich & Tamassia (2006), pp. 597–599.
16. Goodrich & Tamassia (2006), pp. 389–397.
17. https://en.wikipedia.org/wiki/Associative_array
18. https://www.xspdf.com/resolution/21031178.html
20. Key-value database
21. https://brilliant.org/wiki/associative-arrays/
22. a b https://betterprogramming.pub/what-are-frozen-sets-in-python-88f8a15a28dc
23. https://en.wikipedia.org/wiki/Hash_table
24. https://www.educative.io/edpresso/what-is-a-hash-table
25. https://www.techopedia.com/definition/1780/key
26. https://en.wikipedia.org/wiki/Merge_algorithm
27. https://en.wikipedia.org/wiki/Multimap
28. https://www.geeksforgeeks.org/multiset-in-cpp-stl/
29. https://en.wikipedia.org/wiki/Associative_array

RegEx

Regular Expression

In programming, regular expressions refers to a series of characters that can be used to specify a search function. They allow for robust and powerful ways of searching through a plethora of data without creating redundant or repetitive lines of code. Regular expressions, or regex(es) for short, behave like a ctrl+f for coding, allowing us to easily retrieve data patterns such as email addresses, domain names, phone numbers; virtually any data that we can attribute a pattern to.

Regex characters consist of a combination of special metacharacters, and certain standard characters used to specify what elements of the string to match.

Metacharacter

A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.

Characters[1]

• []

A set of characters

``` import re

txt = "The rain in Spain"

#Find all lower case characters alphabetically between "a" and "m":

x = re.findall("[a-m]", txt)
print(x)
```

Output:

```['h', 'e', 'a', 'i', 'i', 'a', 'i']
```
• \

Signals a special sequence (can also be used to escape special characters)

``` import re

txt = "That will be 59 dollars"

#Find all digit characters:

x = re.findall("\d", txt)
print(x)
```

Output:

```['5', '9']
```
• .

Any character (except newline character)

``` import re

txt = "hello world"

#Search for a sequence that starts with "he", followed by two (any) characters, and an "o":

x = re.findall("he..o", txt)
print(x)
```

Output:

```['hello']
```
• ^

Starts with

``` import re

txt = "hello world"

#Check if the string starts with 'hello':

x = re.findall("^hello", txt)
if x:
print("Yes, the string starts with 'hello'")
else:
print("No match")
```

Output:

```Yes, the string starts with 'hello'
```
• \$

Ends with

``` import re

txt = "hello world"

#Check if the string ends with 'world':

x = re.findall("world\$", txt)
if x:
print("Yes, the string ends with 'world'")
else:
print("No match")
```

Output:

```Yes, the string ends with 'world'
```
• *

Zero or more occurrences

``` import re

txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains "ai" followed by 0 or more "x" characters:

x = re.findall("aix*", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")
```

Output:

```['ai', 'ai', 'ai', 'ai']
Yes, there is at least one match!
```
• +

One or more occurrences

``` import re

txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains "ai" followed by 1 or more "x" characters:

x = re.findall("aix+", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")
```

Output:

```[]
No match
```
• {}

Exactly the specified number of occurrences

``` import re

txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains "a" followed by exactly two "l" characters:

x = re.findall("al{2}", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")
```

Output:

```['all']
Yes, there is at least one match!
```
• |

Either or

``` import re

txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains either "falls" or "stays":

x = re.findall("falls|stays", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")
```

Output:

```['falls']
Yes, there is at least one match!
```
Other Examples[2]

Some other characters may have special meaning in some environments.

• In some Unix shells the semicolon (";") is a statement separator.
• In XML and HTML, the ampersand ("&") introduces an HTML entity. It also has special meaning in MS-DOS/Windows Command Prompt.
• In some Unix shells and MS-DOS/Windows Command Prompt, the less-than sign and greater-than sign ("<" and ">") are used for redirection and the grave accent/backquote ("`") is used for command substitution.
• In many programming languages, strings are delimited using quotes (" or '). In some cases, escape characters (and other methods) are used to avoid delimiter collision, e.g. "He said, \"Hello\"".
• In printf format strings, the percent sign ("%") is used to introduce format specifiers and must be escaped as "%%" to be interpreted literally. In SQL, the percent is used as a wildcard character.
• In SQL, the underscore ("_") is used to match any single character.

Greedy Algorithm

Greedy is an algorithmic paradigm that builds up a solution piece by piece, always choosing the next piece that offers the most obvious and immediate benefit. So the problems where choosing locally optimal also leads to global solution are best fit for Greedy.[3]

Here are the reasons for using the greedy approach:

• The greedy approach has a few tradeoffs, which may make it suitable for optimization.
• One prominent reason is to achieve the most feasible solution immediately. In the activity selection problem (Explained below), if more activities can be done before finishing the current activity, these activities can be performed within the same time.
• Another reason is to divide a problem recursively based on a condition, with no need to combine all the solutions.
• In the activity selection problem, the "recursive division" step is achieved by scanning a list of items only once and considering certain activities.[4]

1. It is quite easy to come up with a greedy algorithm (or even multiple greedy algorithms) for a problem.
2. Analyzing the run time for greedy algorithms will generally be much easier than for other techniques (like Divide and conquer). For the Divide and conquer technique, it is not clear whether the technique is fast or slow. This is because at each level of recursion the size of gets smaller and the number of sub-problems increases.
3. The difficult part is that for greedy algorithms you have to work much harder to understand correctness issues. Even with the correct algorithm, it is hard to prove why it is correct. Proving that a greedy algorithm is correct is more of an art than a science. It involves a lot of creativity.[5]

Activities

1) Write a program that uses Regex to parse through a text file containing individual e-mail addresses and sort them into a dictionary where the key is the e-mail address' handle and the value is the domain.

Example: A file containing 'johndoe@aol.com' and 'janedoh@yahoo.com' would result in ["johndoe": "aol", "janedoe": "yahoo"].

2) Write a program that uses Regex to find and replace all occurrences of vowels in a string and replaces them with a period. Only replace 'y' if it is the only vowel in the word.

Example: "The quick brown fox jumped over the lazy dog" would result in "Th. q..ck br.wn f.x j.mp.d .v.r th. l.zy d.g"

3) Write a program that uses Regex to convert a date in dd-mm-yyyy format to yy-mm-dd format.

Example: 06-04-1992 would result in 92-06-04.

4) Write a program that uses Regex to break down a URL and sort the information into a dictionary. The key should be the primary domain while the value should be a list of subpages.

Example: "https://www.wikipedia.org/wiki/Computer_programming" would result in ["wikipedia": ['wiki', 'Computer_programming']]

Key Terms

Approximate string-match algorithm - (also known as Fuzzy String Searching) searches for substrings of the input string. [6]

Escape character - A character used to bypass a symbol that may be a metacharacter and express the symbol literally.

Exact string-match algorithm - To find one, several, or all occurrences of a defined string (pattern) in a large string (text or sequences) such that each matching is perfect.[6]

Greedy algorithm - An algorithmic paradigm that builds up a solution piece by piece, always choosing the next piece that offers the most obvious and immediate benefit.[3]

Kleene star - A programming resource that offers outcomes related to the concatenation of a string set. Using the Kleene star, developers and others assess how to filter given results based on input.[7]

Metacharacter - A character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.[8]

Monoid - A set equipped with an associative binary operation and an identity element.[9]

Pattern - A regular expression against which text is either matched or captured.[10]

Quantifier - A modifier that follows another regex token, enumerating the number of times you expect it to appear.[10]

RegEx processor - A processor that translates a regular expression in the above syntax into an internal representation which can be executed and matched against a string representing the text being searched.[10]

Regular expression - A regular expression (regex or regexp for short) is a special text string for describing a search pattern.[11]

Wildcard - A generic character that stands in for anything.[10]

Internet Data

HTML

What is it?[1]

HTML (HyperText Markup Language) is the most basic building block of the Web. It defines the meaning and structure of web content. Other technologies besides HTML are generally used to describe a web page's appearance/presentation (CSS) or functionality/behavior (JavaScript). "Hypertext" refers to links that connect web pages to one another, either within a single website or between websites. Links are a fundamental aspect of the Web. By uploading content to the Internet and linking it to pages created by other people, you become an active participant in the World Wide Web. HTML uses "markup" to annotate text, images, and other content for display in a Web browser. HTML markup includes special "elements" such as <head>, <title>, <body>, <header>, <footer>, <article>, <section>, <p>, <div>, <span>, <img>, <aside>, <audio>, <canvas>, <datalist>, <details>, <embed>, <nav>, <output>, <progress>, <video>, <ul>, <ol>, <li> and many others. An HTML element is set off from other text in a document by "tags", which consist of the element name surrounded by "<" and ">". The name of an element inside a tag is case insensitive. That is, it can be written in uppercase, lowercase, or a mixture. For example, the <title> tag can be written as <Title>, <TITLE>, or in any other way.

HTML markup[2]

Example:

``` <!DOCTYPE html>
<html>
<title>This is a title</title>
<body>
<div>
<p>Hello world!</p>
</div>
</body>
</html>
```

Explanation:[3]

• The <!DOCTYPE html> declaration defines that this document is an HTML5 document
• The <html> element is the root element of an HTML page
• The <title> element specifies a title for the HTML page (which is shown in the browser's title bar or in the page's tab)
• The <body> element defines the document's body, and is a container for all the visible contents, such as headings, paragraphs, images, hyperlinks, tables, lists, etc.
• The <div> element is used to group content and can be later styled by CSS
• The <p> element defines a paragraph

JSON

JSON (JavaScript Object Notation, pronounced /ˈdʒeɪsən/; also /ˈdʒeɪˌsɒn/) is an open standard file format, and data interchange format, that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and array data types (or any other serializable value). It is a very common data format, with a diverse range of applications, such as serving as a replacement for XML in AJAX systems.[4]

JSON is a language-independent data format. It was derived from JavaScript, but many modern programming languages include code to generate and parse JSON-format data. The official Internet media type for JSON is `application/json`. JSON filenames use the extension `.json`.[5]

JSON grew out of a need for stateless, real-time server-to-browser communication protocol without using browser plugins such as Flash or Java applets, the dominant methods used in the early 2000s.[5] JSON is very similar to the JavaScript and is even derived from the programming language. This doesn't mean JSON is exclusive to the JavaScript language and many languages have importable libraries in order to process and parse JSON files.

The following example shows a possible JSON representation describing a person.[5]

``` {
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 27,
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
}
],
"children": [],
"spouse": null
}
```

Activities

1. See http://open-notify.org/Open-Notify-API/ISS-Location-Now/ and https://developers.google.com/maps/documentation/timezone/overview - Create a program that displays the current coordinates for the International Space Station. Using those coordinates, display the current time in that location.
2. See https://freegeoip.app/ and http://www.7timer.info/doc.php?lang=en#getting_forecast. Create a program the retrieve's the user's current location and displays the forecast in that location.
3. See https://agify.io/, https://nationalize.io/. Create a program that allows a user to enter their name and display their predicted age an nationality. Display the probability of each as well.

Key Terms

Application programming interface (API) - Allows two applications to communicate with one another to access data. Every action you take on your phone, like sending a direct message or checking the score of the baseball game, uses an API to access and deliver that information.[6]

Attribute - A specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such.[7]

Browser - Receive HTML documents from a web server or from local storage and render the documents into multimedia web pages.[8]

Document type declaration (DTD) - Informs the web browser about the type and version of HTML used in building the web document.[9]

Element - An element is a logical document component that either begins with a start-tag and ends with a matching end-tag or consists only of an empty-element tag. The characters between the start-tag and end-tag, if any, are the element's content, and may contain markup, including other elements, which are called child elements.[10]

HTML - HyperText Markup Language, or HTML(HyperText Markup Language) is the standard markup language for documents designed to be displayed in a web browser.[8]

JSON - An open standard file format, and data interchange format, that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and array data types (or any other serializable value).[11]

Markup language - A computer language that consists of easily understood keywords, names, or tags that help format the overall view of a page and the data it contains. Some examples of a markup language are BBC, HTML, SGML, and XML.[12]

MIME - Permits users to send non ASCII-based text attachments, and non-text files (e.g., compressed file, sound file, graphic file, document file, or video file) in an e-mail message.[13]

Node - A structure which may contain a value or a condition or represent an entirely separate data structure.[14]

Query string - Part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML form.[15]

REST - (Representational State Transfer) is designed to take advantage of existing protocols. While REST can be used over nearly any protocol, it usually takes advantage of HTTP when used for Web APIs.[16]

Serialization - The process of translating a data structure or object state into a format that can be stored (for example, in a file or memory data buffer) or transmitted (for example, across a computer network) and reconstructed later (possibly in a different computer environment).[17]

Tag - A markup construct that begins with < and ends with >.[10]

Tree - A hierarchical data structure comprised of nodes with a single element, known as the root, on the highest or topmost layer. Both HTML and XML documents are best represented as trees.[14]

XML - Extensible Markup Language, a language that generalizes the marking up of documents, so users can define their own semantics.[8]

Databases

Databases

What is a database?

A database is a data structure that stores organized information. Most databases contain multiple tables, which may each include several different fields. For example, a company database may include tables for products, employees, and financial records. Each of these tables would have different fields that are relevant to the information stored in the table.[1]

A database is stored as a file or a set of files. The information in these files may be broken down into records, each of which consists of one or more fields. Fields are the basic units of data storage, and each field typically contains information pertaining to one aspect or attribute of the entity described by the database. Records are also organized into tables that include information about relationships between its various fields. Although database is applied loosely to any collection of information in computer files, a database in the strict sense provides cross-referencing capabilities. Using keywords and various sorting commands, users can rapidly search, rearrange, group, and select the fields in many records to retrieve or create reports on particular aggregates of data.[2]

Why do we use them?[3]

The three main advantages that databases have over other, simpler data storage systems (such as text files and spreadsheets) are access, integrity, and security.

Access

Access is about making data available to users.

Databases support good data access because:

• Large volumes of data can be stored in one place
• Multiple users can read and modify the data at the same time
• Databases are searchable and sortable, so the data you need can be found quick and easily
• The data structure is extendable and can be modified as requirements change

Integrity

Databases can ensure that the data contained within them is correct, or has integrity.

To ensure the integrity of a database, each change or transaction must conform to a set of rules known as ACID:

• Atomicity: when changing data within a database, if any part of the change fails, the whole change will fail and the data will remain as it was before the change was made; this prevents partial records being created
• Consistency: before data can be changed in a database, it must be validated against a set of rules
• Isolation: databases allow multiple changes at the same time, but each change is isolated from others
• Durability: once a change has been made, the data is safe, even in the event of system failure

In addition, databases will have mechanisms for backup, distribution, and redundancy, to ensure data is not lost.

Security

While access to text files or spreadsheets can be secured, once someone has access to a file, they have access to all data within that file. Databases can be made very secure, and that includes the ability to have access rights to specific parts of the database and not others.

Databases allow access to be controlled, allowing users to have different privileges: for example, some users may be able to read data, but not to write it.

Data can also be segmented so that users can access only certain parts: for example, a user may be able to read an employee’s name and address but not their salary information.

What other uses for a database can you think of? Share your thoughts in the comments.

In the next step, you will explore an example relational database and look at the data contained within it.

Relational Model

What is it?

[4]

The relational model is an abstract model used to organize data within a database. In order to control access to a database, write data, run queries, or perform any other tasks related to database management, a database management system must have some kind of underlying model that defines how the data within it are organized. Databases that implement the relational model are often referred to as relational databases. The relational model was for a long time the most sophisticated model for organizing data, and its widespread use has only recently been curbed by the rise of nonrelational — or, NoSQL — data models.

The most fundamental elements in the relational model are relations, which users and modern RDBMSs recognize as tables. A relation is a set of tuples, or rows in a table, with each tuple sharing a set of attributes, or columns. A column is the smallest organizational structure of a relational database, and represents the various facets that define the records in the table. Hence their more formal name, attributes. It can be helpful to think of each tuple as a unique instance of whatever type of people, objects, events, or associations the table holds.

Constraints

[5]

Domain constraints

Domain constraints can be violated if an attribute value is not appearing in the corresponding domain or it is not of the appropriate data type. Domain constraints specify that within each tuple, and the value of each attribute must be unique. This is specified as data types which include standard data types integers, real numbers, characters, Booleans, variable length strings, etc.

Key constraints

An attribute that can uniquely identify a tuple in a relation is called the key of the table. The value of the attribute for different tuples in the relation has to be unique.

Referential integrity constraints

Referential Integrity constraints in DBMS are based on the concept of Foreign Keys. A foreign key is an important attribute of a relation which should be referred to in other relationships. Referential integrity constraint state happens where relation refers to a key attribute of a different or same relation. However, that key element must exist in the table.

Example in Database

[6]

An idealized, very simple example of a description of some relvars (relation variables) and their attributes:

• Customer (Customer ID, Tax ID, Name, Address, City, State, Zip, Phone, Email, Sex)
• Order (Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status)
• Order Line (Order No, Order Line No, Product Code, Qty)
• Invoice (Invoice No, Customer ID, Order No, Date, Status)
• Invoice Line (Invoice No, Invoice Line No, Product Code, Qty Shipped)
• Product (Product Code, Product Description)

In this design we have six relvars: Customer, Order, Order Line, Invoice, Invoice Line and Product. The bold, underlined attributes are candidate keys. The non-bold, underlined attributes are foreign keys.

Usually one candidate key is chosen to be called the primary key and used in preference over the other candidate keys, which are then called alternate keys.

A candidate key is a unique identifier enforcing that no tuple will be duplicated; this would make the relation into something else, namely a bag, by violating the basic definition of a set. Both foreign keys and superkeys (that includes candidate keys) can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed to a relvar.

Key Terms

Attribute - A particular feature which describes and is characteristic of a record.

Commit - Ends a transaction within a relational database management system (RDBMS) and makes all changes visible to other users.[7]

Connection - read-only attribute provides the SQLite database Connection used by the Cursor object. A Cursor object created by calling con.cursor() will have a connection attribute that refers to con.

Cursor - A database cursor is an object that enables traversal over the rows of a result set. It allows you to process individual row returned by a query.

Database - And organized collection of data.[7]

Database Management System - There are multiple types of DBMS, (Relational, Object Oriented, etc), but each variant allows for reading, inserting, updating, deleting of a database along with general administrative/security tools. [7]

Delete - Removes one or more records from a table.[7]

Heading - A set of attributes.[7]

Insert - Adds one or more records to a single table.[7]

Primary key - A specific choice of one or more columns (attributes) whose data uniquely identifies each row (tuple) in a table (relation).[7]

Query - Retrieves/creates data based on specific criteria. Queries must follow SQL rules and syntax.[7]

Retrieval - Providing information in a form directly usable or for further processing by other applications.[7]

Record - An instance of an entity in a database table.[7]

Schema - A set of integrity constraints that structures and establishes the rules for data in a database.[7]

Select - A common DQL command used to return a result set of records, either from a single or from multiple tables.[7]

SQL - Structured Query Language, a programmatic standard for querying data from a relational database.[7]

SQLite - A relational database management system (abbreviated RDBMS) embedded directly into a program.[7]

Update - Changes specified data of one or more records in a table.[7]

View - Set of rows from a database of a stored query on the data, which the database users can query just as they would in a persistent database collection object.[7]

Modules and Classes

Modular Programming

Modular programming is an approach to programming that breaks down and compartmentalizes portions of code into independent, specialized functions. Ideally, these functions should be reusable and each function should only perform one specific, focused task. You can think of these essentially as "mini programs" within a given sample of code. Modularity also makes it easier to perform maintenance, as everything is self-contained. You can often update, modify, or outright replace functions without the need to adjust much of the remaining body of code. Modularity helps add structure to the project and often tends to make the code more readable.

Object-Oriented Programming (OOP)

In a similar way that Modular Programming aims to separate and organize segments of a program, the object-oriented approach to programming groups and quantifies similar variables together into classes and objects. A class can be thought of as the type of variable you are working with. Strings, Integers, and Booleans are examples of classes intrinsic to many programming languages. You can also add customized classes to programs, so long as you follow the languages syntax; such as Employees or Foods. A single Employee or Food would be an object in this analogy. Objects are merely individual instances of a class.

When you create an object you also provide attributes for that object. For example, if you were to create an object for employee you would probably include the individual's name, payrate, some contact information, and title or rank within the company. These traits can be specific to the object (instance variables) or the entire class (class variables) and a combination of both (member variables). John, Mary, and Mike are all employees. John is the is Manager, while Mike and Mary would be Shift Leads.

Objects are accessed somewhat like variables with complex internal structure, and in many languages are effectively pointers, serving as actual references to a single instance of said object in memory within a heap or stack. They provide a layer of abstraction which can be used to separate internal from external code. External code can use an object by calling a specific instance method with a certain set of input parameters, read an instance variable, or write to an instance variable. Objects are created by calling a special type of method in the class known as a constructor. A program may create many instances of the same class as it runs, which operate independently. This is an easy way for the same procedures to be used on different sets of data.[1]

It's important to understand that objects aren't merely a complex list or dictionary. They are codified and provide specific functionality much like modules. When an object contains a function, or can execute an action, it is referred to as a method.

Keywords

Class- A blueprint for constructing an object—a set of variables and methods that defines an entity.[2]
Class Diagram- A graphical representation of the structure of an object-oriented system that displays their attributes and relationships[3]
Class Methods- belong to the class as a whole and have access only to class variables and inputs from the procedure call.[2]
Class Variable- A variable that belongs to an entire class; there is only one such variable shared between all objects.[2]
Encapsulation- The act of hiding implementation details, either to protect internal data or for the purpose of abstraction.[2]
instance Methods- belong to individual objects, and have access to instance variables for the specific object they are called on, inputs, and class variables.[2]
Instance Variable- A variable that is unique and belongs to each instance of a class.[2]
Library- When a program invokes a library, it gains the behavior implemented inside that library without having to implement that behavior itself. Libraries encourage the sharing of code in a modular fashion, and ease the distribution of the code.[2]
Me, Self, This- A keyword that refers to the current object of focus.[2]
Member Variable- A variable that is either a class or instance variable.[2]
Method- A function that is defined inside a class.[2]
Object- A particular instance of a class.[2]
Object composition- describes objects that can contain other objects in their instance variables.[2]
Property- An intermediary between a variable and a method, providing the functionality of both.[2]
Variables- Store information formatted in a small number of built-in data types like integers and alphanumeric characters