crap

Concise, Regex-Aware Preprocessor (CRAP)

C (computer language) code mangler and language maker. Website. https://github.com/themanyone/crap

Giving everyone crap for writing crappy code.

Crap. It’s not a compiler. It’s a movement.

Crap is a new language maker. It has sort of become its own language to demonstrate what it might be capable of. The easiest way to explain it is in code. The following crap-worthy example prints “hello world” five times.

#include <stdio.h>
main
    repeat  5
        puts  "hello world"

After preprocessing, it becomes standard C (std=c99 or c11).

#include <stdio.h>
int main(int argc, char **argv, char** env){
    for(size_t _index=5;_index--;){
        puts("hello world");}
    return 0;
}

This is crap.

Yes, yes it is.

*Crap is in alpha stages of development. Use at your own risk! Have fun and experiment under the terms of the included LICENSE. For more lucrative licensing please contact the author.

Interactive crap.

The future of Rapid Application Development is icrap. Instant gratification at the speed of C. Try out individual lines of code and get instant results from the icrap interactive shell. Test ideas BEFORE putting crap into production. It’s a read-eval-print loop (REPL) for C, C++, crap, zig, hare, go, and other languages. Get it here for free: https://github.com/themanyone/itcc

 $ ./icrap -lm
 crap> #include "math.h"
 crap> #include "print.h"
 crap> for  int x=0;x<5;x++
 crap>      println  x, "squared is", pow(x, 2.0)
 0 squared is 0
 1 squared is 1
 2 squared is 4
 3 squared is 9
 4 squared is 16
 crap> |

Crap has crapped itself!

Crap bootstrapped itself some time ago, so its source is pure crap. As future development progresses in crap code, rest assured that crap will always remain 100% crap.

Generated C source files are included, so thankfully there is no prerequisite for crap to compile crap.

Build steps:

git clone https://github.com/themanyone/crap
# edit the Makefile to set LIBDIR location, other prefs.
# There is currently a `master` and a `testing` branch
git switch testing
make
# if there are corrupted or 0-byte .c files:
git checkout [name of c file]
# libtre-dev might be necessary in Debian Android UserLand

Even crap has limits.

Single lines of code are limited to 8000 characters (about 80 line wraps on a 100-character terminal). There is enough buffer space for perhaps 99 levels of indent. You can bump up the limits in crap.h at the expense of more memory usage, but why? It is prudent to break up long lines and move overly-indented blocks into separate functions or libraries.

Craptastic rules.

An extendable and growing set of rules turns crap code into C99.

String mangling. Beware, the crap preprocessor also processes string literals (except for our special triple-quoted string extension, see below). It’s actually a feature. We like having our strings modified. Any strings that you do not want nuked on their way to becoming C code should go into another file or header and #include them. It is good programming practice to maintain separation by not “hard coding” important data and resources into the source.

Includes. Like C, Crap programs #include <stdio.h> if they want to print to, or read from, consoles or files in a standard way. See test2.crap for a demo that uses asprintf.h for string and array manipulation. TinyCC supports c11 _Generic() types, so include print.h for a type-aware print function. If the include is useful, copy it to /usr/local/include

Expand main and return. The optional main macro, when it occurs all by itself, expands to int main(int argc, char **argv, char** env). Using main causes return 0 to be appended, so make sure main is the last function in the file.

Curly brackets (braces). Braces magically appear around indented code blocks of four spaces or a tab (the default, as defined in crap.h). As a consequence, comments must be indented to the same level as code. To indent for style reasons, but skip automatic bracketing, do not indent the full four spaces; indent two or three spaces instead. If, for some reason, a semicolon must appear after the automatic closing brace, then start the next line with one.

Parenthesis. Crap puts parenthesis around arguments in a linear fashion, if they are set off by two spaces. while _ _ a == b _ _ puts _ _ c is shorthand for while(a == b) puts(c). The convention works for most statements. It is often preferable to put parenthesis around things manually, as with logical operators and “truth y” value assignments. Those who dislike the feature do not have to use it, but it persists for the author’s convenience.

Semicolons. Not usually necessary! If some line needs a semicolon for some reason, just add it manually. Lines that may need manual addition of semicolons are preprocessor statements like #define, //comments, and lines that end with any of the characters ` <>;,.”=*/&|^!. Purposely ending lines with one of those characters or a //comment` prevents unwanted semicolons. When all else fails, look at the examples and tests.

Return. Again, crap adds a final return 0 only when main is used. In other words, please supply functions with return values.

Crap language extensions.

Triple-quoted strings. String literals may be triple-quoted. The output will have backslashes and quotes properly escaped.

    puts ("""
        This is a test!
    
        // triple quotes
        if  (tmp = resub(*s,"\"{3}(([^\"]+\"?)*)", "\"\2"))
            strcpy  skip.to, "(.*)\"{3}"
            strcpy  skip.end, "\1\""
            strcpy  *s, tmp  ;free  tmp  ;return
    """);

New print statements. Print all kinds of crap, without worrying about types, using c11 generic type selections. Include “print.h” to make it happen.

    #include <stdio.h>
    #include "print.h"
    main
        // Trailing commas are ignored
        println  "Test, four thirds", '(4/3) is', (4.0 / 3), '!', 
        // Use parenthesis to evaluate 'a' as a number
        println  "The character code for", 'a', "is", ('a')
        int F = 53
        print  F, "Fahrenheit is", ((F - 32) * 5.0 / 9)
        println  "Celsius."

Features. This non-standard print.h library makes sprint, sprintln, fprint, fprintln, eprint, and eprintln available for entertaiment purposes. Error macros, eprint and eprintln, are a rough appproximation of fprint(stderr) and fprintln(stderr) respectively. Output is unformatted, so no ugly “ %s\n” to disrupt eye movements. Instead, they add their own trailing space or return character to make it easy to print stuff without worrying how it will look. If precise formatted output is desired, use the standard library printf.

Return values. These print.h macros return a running total of all bytes written so far, not merely the current line. The total includes hidden trailing spaces and return characters (and does not count standard library and debug statements). Call total_printed() to retrieve and clear that total.

Errata. The sprint and sprintln macros tack data onto the end of a string, as would be expected when writing to files or terminals, or preparing output to be written. The standard library sprintf family of functions overwrite the beginning of a string, which is not usually what anyone wants. But they are always there if you need them. Caller is responsible for making sure strings are null-terminated and have sufficient space to hold the result. Failing to do so will result in errors or buffer overflows.

Hence, sprint(s, "hello", "world", 4); sprintln(s, "hello", "again", 5); might produce output similar to these sprintf standard library functions: sprintf(strchr(s, '\0'), "%s %d ", "hello world", 4); sprintf(strchr(s, '\0'), "%s %d \n", "hello again", 5);

More examples can be found in tests and by glancing at the print.hh header.

These print.hh macros are in development. And they are not without limits. They can handle up to 1000 arguments per statement invocation. Why would anyone want to print that many arguments? And the length of each argument is limited to 5000 characters. Limits can be raised by editing that header file.

Decisions.

unless Another way to write if(!()).

until Shorthand for while(!()).

Loop templates.

repeat The repeat (n[, mylabel]) constructor pastes a for loop into the code to repeat n times. A local _index variable is defined that may not be accessed outside the loop. An optional mylabel attribute causes _index to take on a unique name, mylabel_index so nested repeat loops make sense.

These loop templates insert long lines of crappy-looking code, but it gets optimized out by the compiler.

for mylabel in array[[start]:[end]] Loops over array, sort of like Python would, assigning each element to the supplied, predefined variable (or pointer), stepping through each element, including zero and NULL elements. An optional start and end may be preceded with a - sign, which means subtracted from the end (1-past the last element as calculated with sizeof, so negative indexes can not work with dynamic arrays). The mylabel and array labels help declare local unsigned variables, mylabel_index and mylabel_end which are not available outside the loop. Note that mylabel_end is the optional [:end] argument which, if supplied, may exceed the real length of the array. If no optional [:end] is provided, cpp will calculate mylabel_end using the sizeof operator. And finally, a non-negative [:end] ought to be supplied for dynamic (malloc’d) objects where the length is unknown at compile time.

Compilers can be configured to generate warnings when these loops are unable to compute array sizes. From make debug:

gcc -g -Wall -pedantic ...

while mylabel in array[[start][:end]] Exactly like for mylabel in array but will bail out at the first sign of zero NULL elements. (Use the for loop to loop through those.) A plain while(*data) statement is sufficient to step through NULL-terminated arrays. But this extension inherits the safer end limits, indexing, and slight speed penalty, of the above for loop. Again, a non-negative [:end] is necessary for dynamic arrays to prevent out of bounds conditions.

array[[start]:[end]] Using somevar = array[start:end] drops in a non-standard GNU extension code block at that location. Although putting block statements inside parenthesis is not part of the ISO C standard, it works with many compilers without warnings lately. The code creates a duplicate array, but with the same or fewer elements assigned to it. Trying to use [start:end] notation on the resulting array with negative and unspecified indexes will get results based on the old lengths, which might get confusing as the array gets passed around. Crap merely rearranges source code. The programmer is responsible for keeping track of run-time lengths and values!

Arrays may be initialized in a manner exactly like C: int w[]={1,2,3,4,5} or char *s[]={"this","and","that"}. The first array length does not need to be specified in the declaration.

More crap examples. Our loop templates can walk through multidimensional arrays, but be sure to use the appropriate type declaration. In the following example, the first loop uses a pointer because it’s returning a whole row. The inner loop, j, uses an int type because the innermost type of the 2D array, the one we want to print, is int.

#if 0
crap $0 | tcc -run -; exit $?
#endif
#include <stdio.h>

#define M 3
#define N 4
main
    // defined length [M][N] is computable
    int test_image[M][N]=
     { {1,2,3,4},
      { 5,6,7,8},
      { 9,10,11,12} },
    // undefined length *i is unknown
    *i, j

    for i in test_image // computable length, optional
        for j in i[:N] // undefined length, add [:N]
            printf  "%i%s", j, j_index==N-1?"\n":", "

There is a handy program and website called cdecl that explains C’s type declarations.

Array indexes start at [zero].
This array has 5 indexes, numbered 0 - 4.
  0 < 5: zero
  1 < 5: one
  2 < 5: two
  3 < 5: three
  4 < 5: four

Like Python, `:end` is 1-past the `last` element.
words[1:4] = { "one", "two", "three" }
words[ 2:] = { "two", "three", "four" }
words[ :3] = { "zero", "one", "two" }
words[ :-3] = { "zero", "one" }
words[ -3 : -1 ] = { "two", "three" }

Embedded crap macros.

Custom crap. Crap’s #replace /pattern/replacement/ macros support up to \31 decimal back-ref substitutions almost like sed scripts. They are no replacement for sed, nor do they supersede other preprocessor directives. But they could change things up. Up to 100 replacements per line, defined in crap.h. Rules may be added to crap’s source code for all users, or embedded into individual crap files where desired. Embedded #replace rules do not cross file boundaries, so do not put them in headers and expect them to work elsewhere.

Debugging. Some care is taken to make sure the resulting .c sources have the same line numbers (no extra line breaks). Compile with -g option and use debuggers on the executable as with any C program. The make debug target makes a debug build of crap for stepping through that as well.

make debug
gdb -tui -args ./myProgram myArgs

What isn’t crap? The C programming language. Compilers like the Gnu C Compiler (GCC), TinyCC, most other free software. Mention of tools and technologies is for information purposes and does not constitute endorsement or affiliation. Sed, awk, perl, and grep have more robust regex engines and are thoroughly tested, so use those instead of crap for handling important data streams.

Spreading crap around.

Usage couldn’t be simpler. There are no command options. Use standard shell pipes ‘|’ to fling crap at compilers, or ‘>’ to crap discreetly into a file.

# Let's make holy.c from holy.crap.
crap holy.crap > holy.c

A legacy of crap.

Crap’s predecessor, Anchor, is remarkably stable. But don’t look at the code! It abused flex in horrible ways and was otherwise unmaintainable. To make matters worse, flex grinds through confusing modes of operation during parsing, tripping flags, and interpreting things differently as it goes.

Crap drops the flex dependency and implements its own simplified regex calls.

Crapping for executives, using the three C shells.

Shell commands may be embedded into the first line to make executable scripts for rapid testing and development. The following comment at the top of the file tells the shell to use crap to pipe ‘|’ generated C code to the TinyCC compiler. The -run option tells tcc to execute the compiled code. A well-placed exit $? preserves the return value and prevents the interpreter from attempting to execute the remaining crap as shell code.

//usr/local/bin/crap "$0" |tcc -run - "$@";exit $?

Or to create debuggable .c files along the way.

//usr/local/bin/crap "$0">"$0.c"&&tcc -run "$0.c";exit $?

For convenience, we can also launch with the crapper crap wrapper. Place in the top line of sources to make executable crap scripts.

#!/path/to/crapper [compiler args] -- [optional program args]`

You may use other compilers or shells. Get creative!

Now we’re just making crap up.

Crap works like any lexer, Vala, or C preprocessor. This Makefile target tells GNU make to turn .crap files into .c files as needed.

%.c : %.crap
    crap "$<" > "$@"

Installing crap.

There is no ./configure file. Edit the Makefile to change paths relevant to your system before running make. If the build complains about a missing library or header file, use the system package manager to find it or search the web. Developers, testers, and those who want to improve upon crap’s regex engine, or include it into other projects, may desire to make shared to build shared libraries. Be sure /usr/local/lib64 is in your LD_LIBRARY_PATH if installing there. Or single user install to ~/.local/bin/, ~/.local/lib64, etc. Edit Makefile install locations.

make
sudo make install

Wipe, and start over.

make uninstall
make clean

Using crap for “boot tracking.”

Our privacy policy for crap strictly forbids tracking. Please scrub off and leave boots outside.

Crap does include a tool for bootstrapping, however. Developers tweaking our engine should use make shared and bootstrap_test to make sure everything works before installing untested crap.

Picking up fresh crap.

Didn’t Mom warn you about playing with dirty old crap? Don’t take crap from just anyone. Get a fresh pile from GitHub.

git clone https://github.com/themanyone/crap.git

Depositing crap.

There are some extra considerations for contributing crap to the pile. Edit the *.crap and *.hh files. The *.c and *.h files will be overwritten the next time you run make for big disappointment.

Run bootstrap_test or bootstrap_shared for shared build to make sure the build works before installing the new version. Also use make -B to always build everything, even the generated *.c and *.h files that have been outdated by changes in *.crap and *.hh. That should help.

Scripts, remake.sh and shared_remake.sh do all of that for convenience.

Who to blame.

Browse Themanyone

Copyright (C) 2018-2024 Henry Kroll III, https://thenerdshow.com

Permission to use, copy, modify, distribute, and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appears in all copies and that both that copyright notice and this permission notice appear in supporting documentation, including About boxes in derived user interfaces or web front-ends. No representations are made about the suitability of this software for any purpose. It is provided “as is” without express or implied warranty.