The Pollen Programming Language
Congratulations on the new book! [click for more] Please read Using Wikibooks to learn about editing and writing books. You can ask any questions in the help room. |
Table of Contents
edit
The C Programming Language
editThe major source of security vulnerabilities is pointers.
- Without pointers, there is no null dereference.
- Without pointers, there is no pointer arithmetic.
- Without pointer arithmetic, there is no array out of bounds.
Traditionally, C source is divided into .c and .h. The C source contain no pointers. The header files contain macros that use pointers.
Headers can be used to define typedefs for the .c files to use. Similarly to variable declarations, the new identifier is on the right hand side of the typedef. The following declares a pointer to a my_struct record as my_struct_t. Traditionally, _t means "type" and is used to defferentiate the type identifiers from variable and parameter identifiers.
/* typedef <existing_name> <alias_name>; */
struct my_struct {
int my_field;
};
typedef struct my_struct *my_struct_t;
Or simply...
typedef struct my_struct {
int my_field;
} *my_struct_t;
The amp symbol as well as the asterisk symbols are forbidden on source files.
Allocate new objects on the heap using memory pools. A good implementation of memory pools is provided by the Apache APR library. The API may be taken as a reference of how to implement memory pools.
Object orientation
editThe concept of object orientation that was developed for the C++ programming language is also available for the C programming language. That can be done by using macros for getters and setters.
Getters and setters check for null dereferences before they are used on an expression. The GLib runtime libraries from the GNOME project is written in C and uses this strategy to avoid memory errors. Keeping that library secure is essential because the GLib runtime libraries are used on all GNOME projects. An example of object orientation in C using macro follows:
#define NEW_CODEGEN_PLUGIN(ctx) \
(tmpl_codegen_t) pollen_malloc(ctx, sizeof(struct pollen_codegen_plugin));
#define CODEGEN_SET_CONTEXT(codegen, x) \
((codegen != NULL)? (codegen->ctx = x) : (abort(), NULL))
#define CODEGEN_GET_CONTEXT(codegen) \
((codegen != NULL)? (codegen->ctx) : (abort(), NULL))
#define CODEGEN_SET_LIBRARY(codegen, lib) \
((codegen != NULL)? (codegen->library = lib) : (abort(), NULL))
/* Macro to get the library handle from the plugin context */
#define CODEGEN_GET_LIBRARY(codegen) \
((codegen != NULL)? (codegen->library) : (abort(), NULL))
In the above code, the comma after the call to the abort() function allows for returning NULL as the result of the sub-expression (abort(), NULL) to the parent expression which is ...? ... : .... The abort function returns void.
Functions that make use of unknown types should also use macros for performing casts.
/* Macro to call a function with no arguments */
#define POLLEN_CALL_SYMBOL_0(symbol, rc) \
do { \
int (*fn_ptr)() = (int (*)()) symbol; \
rc = fn_ptr(); \
} while(0);
#define NEW_CODEGEN_PLUGIN(ctx) \
(tmpl_codegen_t) templatizer_malloc(ctx, sizeof(struct pollen_codegen_plugin));
Virtual address spaces
editVirtual addressing can be used in addition to secure programming languages. That allows the developer to include legacy code written in older programming languages.
Only the root user can run programs which will run as a different user (except for setuid). Daemons can fork its process in order to have a process running under an unprivileged user in order to insulate each API request from each other.
Therefore the only major concern is the part of the programs that parse the request input (such as JSON or XML) which contain the authentication data of the request and the socket connection itself. The part of the server that deals with such type of data should be handled by a carefully verified framework written in a secure language, and the parts which contain code written in an older programming language should be in worker processes which are forked or executed.
The Ada Programming Language
editAda is a programming language focused on memory safety and also in overall program correctness. There is the SPARK subset (not the obsolete SPARC architecture) which proves mathematically that a given program has no flaws.
The Ariane rocket made by the American government was written in Ada/SPARK exploded due to a software flaw. This issue aways comes up when discussing the validity of Ada as a very safe language, but in fact the software flaw was present because a pragma was used. The actual source code snippet containing the section of the rocket that caused the explosion is available to the public by Nasa and can be found online still in these days, way in the future after the explosion was in the news. Maybe it is still a tabu because it is the only aledged "proof" that Ada actually has flaws.
As I mentioned, it was caused by the incorrect use of a pragma that makes false asumptions of the hardware system of the rocked because it was ported from a previous rocket with a completely different hardware.
Besides, pragmas in Ada are compiler specific and are not part of the language. The language does not give the same security guarantees about pragmas because, as I stated before, most pragmas are not defined on the standard. The Ada standard specifications define the keyword "pragma" but they say they are compiler-specific custom code.
According to the Ada standard, the 0xFFFE and 0xFFFF byte sequences are not allowed on strings. Invalid strings are fixed by the Ada runtime. That restriction brings error with UTF-8.
A character whose relative code position in its plane is 16#FFFE# or 16#FFFF# is not allowed anywhere in the text of a program.
Those byte sequences are used in UTF-8 to represent emojis.
Programs written in Ada usually need an aditional runtime with its own types and functions. That runtime may have a part written in Ada (a binding) and a lower level part written in another language such as C.
That is exactly where Pollen comes into play. Pollen can be used as a runtime for Ada for accessing modern operating system features, external libraries or other functionality that is not covered by the Ada runtime.
Here are some good examples of functionality not covered by the Ada runtime:
- UTF-8;
- Memory streams;
- Database support;
- Modern compression algorithms;
- Modern cryptography algorithms;
- Networking (Berkeley sockets).
Pollen brings all those features to Ada if used as a runtime.
Freestanding Ada code
editCreating freestanding Ada code for being used as an operating system or as the main code on a microcontroller is considerably more complicated than creating freestanding code in C.
Creating freestanding Ada code is more difficult because Ada requires a runtime to compile and to run. But the important part to understand is that the Pollen should follow the example of having all the runtime code required for a language to run on bare metal as freestanding code is that everything about the runtime must be covered on the standard and publicly available documentation so that kernel developers can effectively use the language.
The Ada runtime is written in Ada itself and calls C code using the pragma Import directive to run code that depend on unsafe features not provided by the language, such as:
- pointers and pointer arithmetic;
- exception handling;
- tasking;
- delays;
- array bounds checking;
- and others.
Those are all portable across compilers, operating systems and hardware. The hardware on which the Ada code runs do not necessarely have to provide all of those features therefore not every feature must be present on a specific hardware or operating system. All of those features are defined on the Ada language specification standard.
Ada code typically relies on code generated by the compiler such as those for null checks on pointer dereferences. However those operators - such as the dereference operator in this instance - can be overriden manually.
Those automatically generated code do not generally pose danger to a low level developer as they call the exception handler on the Ada.Exceptions package. If a programmer wants to support null checks on pointer dereferences and comparason operators, all that is needed is to create the appropriate handlers on that package on the Ada runtime of the kernel of the operating system that is being created.
Integer overflow checking
editOverflow checking is done by the Ada runtime by using operator overloading as in object oriented code.
function "+" (Left, Right : Integer) return Integer;
pragma Import (C, "+", "my_overflow_checking_function");
The Ada language provide no types. They are all defined by the runtime and the overflow checking is usually specified as a compiler intrinsic with:
pragma Import (Intrinsic, "+");
Global allocator
editThe global allocator made by the the definition of the Root_Storage_Pool type and its methods, also as in object oriented code.
5
with Ada.Finalization;
with System.Storage_Elements;
package System.Storage_Pools is
pragma Preelaborate(System.Storage_Pools);
6/2
{AI95-00161-01} type Root_Storage_Pool is
abstract new Ada.Finalization.Limited_Controlled with private;
pragma Preelaborable_Initialization(Root_Storage_Pool);
7
procedure Allocate(
Pool : in out Root_Storage_Pool;
Storage_Address : out Address;
Size_In_Storage_Elements : in Storage_Elements.Storage_Count;
Alignment : in Storage_Elements.Storage_Count) is abstract;
8
procedure Deallocate(
Pool : in out Root_Storage_Pool;
Storage_Address : in Address;
Size_In_Storage_Elements : in Storage_Elements.Storage_Count;
Alignment : in Storage_Elements.Storage_Count) is abstract;
9
function Storage_Size(Pool : Root_Storage_Pool)
return Storage_Elements.Storage_Count is abstract;
10
private
... -- not specified by the language
end System.Storage_Pools;
Calling Ada code from C
editCreate your Ada source files, and compile them to object files using the GNAT compiler, gcc. For example, let's assume you have two Ada source files named mypackage.ads and mypackage.adb, and you want to compile them to object files:
gcc -c mypackage.adb
Create a C source file that will call the Ada library function(s). In this example, let's assume you want to call a function named my_ada_function that is defined in mypackage.ads. The C source file might look like this:
#include <stdio.h>
#include <stdlib.h>;
#include "mypackage.h"
int main(int argc, char** argv) {
int result = my_ada_function();
printf("Result: %d\n", result);
return 0;
}
Create a C header file (mypackage.h) that declares the Ada function(s) you want to call. In this example, the header file might look like this:
#ifndef MYPACKAGE_H
#define MYPACKAGE_H
#ifdef __cplusplus
extern "C" {
#endif
int my_ada_function();
#ifdef __cplusplus
}
#endif
#endif /* MYPACKAGE_H */
Use the GNAT compiler to generate C bindings for your Ada package:
gnatbind -Llibada.a -static -I. -C mypackage
Use the C compiler to compile the C source file and link it with the Ada library file and the C bindings file:
#!/bin/bash -e
gcc -c mypackage.adb.c
gcc -c main.c
gcc -o myprogram main.o mypackage.o -L. -lada
The Ada standard library handles Unicode using wide characters. The package Interfaces.C.Strings does not support UTF-8 or wide character encoding. It is wise to include character transcoding functions on the C side instead of the Ada side. Those functions should convert Unicode strings to Ada Wide_String and similars.
with Interfaces.C;
use Interfaces.C;
with Interfaces.C.Strings;
use Interfaces.C.Strings;
package body Pollen.Strings is
-- ...
function Value (Input : chars_ptr)
return Wide_String
is
function To_Ada_Encoding (Input : chars_ptr; Output : Wide_String) return int;
pragma Import (C, To_Ada_Encoding, "tmpl_to_ada_encoding");
Output : Wide_String (1 .. Strlen (Input));
Return_Code : int;
begin
Return_code := To_Ada_Encoding (Input, Output);
if Return_Code /= 0 then
throw Program_Error with "Unable to convert C string to Pollen string.";
endif;
return Output;
end Value;
-- ...
end Pollen.Strings;
JIT
edit- GNU libjit (libgccjit-10-dev)
- LLVM bytecode
Kaleidoscope is an example of a JIT language that uses the LLVM code generation.
LLVM code generation
editllvm-config is similar to pkg-config but it is specific to LLVM code generation. llvm-config must be found in $PATH for it to work. It is expected to be only one llvm toolchain associated with that llvm-config. That toolchain is responsible for code generation and may have different architectures as targets.
Steps for producing bitcode with llvm-dev:
LLVM_CFLAGS=$(shell llvm-config --cflags)
LLVM_LDFLAGS=$(shell llvm-config --ldflags --libs)
- Create a new Pollen plug-in
- Add CFLAGS and LDFLAGS llvm-config
- Create module;
- Add code;
- Add basic blocks;
- Append basic block to program in memory;
- Write bitcode to object file or BC file on HDD or SSD.
Contributing
editContributions to the Pollen project are welcome.
You may crate a bash script to automatically insert your Git username and password. Push code to a Git repository using simple HTTP authentication.
#!/bin/bash -e
git push https://mygithubuser:ghp_mytoken@github.com/ativarsoft/pollen-lang.git
Hosting
editPollen requires running native code, so a typical PHP hosting server will not work. Free CGI, Docker and VPS hosting are difficult to find as of March 2023.
Google offers free hosting suitable for personal use on their Cloud Run.
Pollen
editThe libpollen runtime has a function called hello world. This function is supposed to be called from the tmpl script file. The libpollen runtime is optional and is not required for a program written in Pollen to run.
The compiler and the interpreter get the symbol for the "hello_world" function in different ways.
The interpreter gets the symbol using dlopen on the runtime library selected by the script. The select script is the "lib" atrribute on the "pollen" element.
The compiler gets the symbol by using the linker as the object file created created from the script is linked against libpollen.
Authors
edit- Mateus de Lima Oliveira