hello friends! new(ish)!

C Help and Discussion

From InstallGentoo Wiki v2
Revision as of 00:12, 29 July 2023 by >Emil (Moar everything.)
Jump to navigation Jump to search
The C Programming Logo
The Better Logo
/g/ Programming Challenges
The Boys
The Bible

/chad/

C Help and Discussion - or /chad/, is a ongoing general where people discuss all things C.

Show and talk about what your currently working on, or things you've worked on in the past.

Join our IRC channel: #/g/chad at irc.rizon.net

The past threads are enumerated here.

Template

Let's have a C thread. Post what you're working on! Show what you're interested in!

Last thread: >>...

WIKI: https://wiki.installgentoo.com/wiki//chad/

IRC: #/g/chad at irc.rizon.net
TG: https://t.me/+itOpQDA2Nbk3ZDZh 

Don't know how to write C? Start here:
K&R: https://files.catbox.moe/80f07b.pdf
KING: https://files.catbox.moe/a875c2.pdf
Modern C: https://files.catbox.moe/xeb93p.pdf

Matrix, both very under used and not recommend to join or use currently.

Useful Links

Getting started

Challenge

Books

Standards

Articles

Tools

Building and Build systems

Small Scale

Makefile: It is best to use only for small and simple projects.

Ninja: Not designed for human generation. Much faster make-like tool.

redo: a better, recursive, general-purpose build system.

Scalable

CMake: CMake is a multi-platform build system that is a modern alternative to Autotools. See this /gedg/ CMake Guide.

Meson: CMake with better syntax.

Autotools: GNU Autotools is a build system that generates Makefiles which comply to GNU Coding Standards, which makes it easier for users of your software to adjust the build process for their needs. The ability to do out-of-tree builds, cross-compilation and staged installs comes out of the box, so you don't have to implement it yourself.

Video guide by David A. Wheeler · Basic Template: >>92441749

Debugging

Compilers

  • GCC: The GNU Compiler Collection (Originally known as the GNU C Compiler...)
  • Clang/LLVM: An "LLVM native" C/C++/Objective-C compiler.
  • MSVC: The Microsoft C/C++ Compiler.
  • ICC: The Intel C Compiler. Uses LLVM as its backend. Caution should be taken if used due to previous behavior by Intel (Maliciously generating slower code for non-Intel processors)
  • TCC: The Tiny C Compiler, notable for its extremely fast compilation speeds.


Libraries

  • [example.com cmocka]: for testing
  • [example.com pthreads]: multithreading on UNIX systems
  • [example.com LibGMP]: for arbitrary-precision arithmetic

Graphics

  • [example.com Cairo]: for drawing pictures
  • [example.com OpenGL]: for 3D rendering
  • [example.com Vulkan]: for 3D rendering
  • [example.com GLFW]: for windowing for Vulkan/OpenGL
  • [example.com SDL]: for windowing and general purpose graphics

Communication

  • [example.com zeromq]: for communication via messages
  • [example.com protobuf]: for binary communication between programs in different languages

Databasing and Files

  • [example.com Jasson]: for JSON
  • [example.com SQLite]: for storing data in an atomically
  • [example.com libuv]: for cross-platform async I/O

CLI

  • [example.com argp]: for extensive option parsing
  • [example.com getopt]: for simple option parsing
  • [example.com readline]: for taking user input (GPLv3)
  • [example.com libedit]: for taking user input (BSD)


Obscure Compilers

  • Turbo C: A free C++ compiler from Borland. It comes with an IDE and debugger.
  • CompCert: A C99 compiler intended for the compilation of life-critical and mission-critical software and meeting high levels of assurance. Non-free.
  • Movfuscator: A single instruction (MOV) C89 compiler created for lulz by the reverse-engineering god Christopher Domas.

Recommended Build Options

Standards

-std= (valid: c89, c99 c11 c17, c2x) (additional: c90=c89, c18=c17). Any 'c' can be replaced with 'gnu' for GNU extensions, but you can use them anyway and the compiler won't even warn you unless you specify -pedantic or -Wpedantic so don't worry about it too much unless you're looking to maximize compiler portability.

Determine the language standard. See Language Standards Supported by GCC, for details of these standard versions. This option is currently only supported when compiling C or C++. -std=c99 is usually a good choice.

On MSVC use /D_CRT_SECURE_NO_WARNINGS to disable warnings regarding the so-called "secure" functions. These aren't widely supported outside of MSVC, and their benefits are questionable. See N1967 for more information.

-ansi: Common alias for -std=c89.

Warnings

GCC Warnings are listed here. For both GCC and Clang, it is generally recommended to use -Werror -Wall -Wextra -Wpedantic.

-Werror: Make all warnings into errors.


-Wall: Enables a large set of warnings, some of which may be undesirable. Very recommended to use.


-Wextra: This enables some extra warning flags that are not enabled by -Wall. Recommended to use.


-Wpedantic: Issue all the warnings demanded by strict ISO C and ISO C++; reject all programs that use forbidden extensions, and some other programs that do not follow ISO C and ISO C++. For ISO C, follows the version of the ISO C standard specified by any -std option used (Example: -std=c99).


-Wstrict-aliasing=3: Pointer aliasing is when two different pointers can point to the same memory location. Strict aliasing is a set of rules C compilers use to determine when this can happen and when it can't. 3 may be too high for beginners and can spit out some false-positives, 2 is typically a better choice.


-Wwrite-strings: Warns on write to string literals, which have the type of `char []` however, writing to a string literal is Undefined Behavior (UB), so it makes more sense to treat them as `const char []` (even DMR wanted to make string literals const: https://www.lysator.liu.se/c/dmr-on-noalias.html).


-Wvla: Warns if there is a variable length array used in the code. VLAs are either unnecessary because you know the upper bound and are able to do buf[UPPER_BOUND] or are a stack overflow waiting to happen. Some smaller compilers like cproc do not implement VLAs, possibly avoiding use of this option may aid portability.


-Wcast-align=strict: can warn on some newb casting.


-Wstrict-prototypes: Warns on function declarations that lack an explicit set of parameters like f(), which have a specialized purpose in C and only C, where the set arguments are set at the implementation site.


-Wstringop-overflow=4: Warns for calls to string manipulation functions such as memcpy or strcpy that are determined to overflow the destination buffer. At =4 it additionally warns about overflowing any data members, and when the destination is one of several objects it uses the size of the largest of them to decide whether to issue a warning.


-Wno-logical-op-parentheses: C has an order precedence of first && then ||. This is however warned against, and at a glance with this knowledge it is much easier to tell the difference between (a && b || c) and (a && (b || c)) than enforcing that warning like ((a && b) || c) and (a && (b || c)).


-Wshadow: Warns when a block re-declares a variable already declared in a higher block. This is often done intentionally but beginners may wish to be warned on it because the bugs it can cause are particularly subtle and difficult to debug, since the debugger will not tell you what the actual problem is.


-Weverything: Exclusive to clang and only intended for developing clang itself. May require many -Wno-... options to not emit too many false positives, but the perfectionist or anon who simply can't be bothered to run a linter may find it useful.


-fanalyzer: Perform advanced static analysis. Massively increases compile times but can catch nontrivial memory errors and does not impart a runtime performance penalty unlike -fsanitize= (see below).

Optimizing & Release options

GCC optimization options can be seen here.

-O0: Reduce compilation time and make debugging produce the expected results. This is the default.


-O1: Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function. With -O, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.


-O2: Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O, this option increases both compilation time and the performance of the generated code.


-O3: Enables all optimizations specified by -O2 and some additional flags. Can drastically increase binary size due to loop unrolling, particularly with clang. It is somewhat overfitted to x86-64 processors, and other architectures may actually see worse performance with this option even with an appropriate -march.


-Ofast: Disregard strict standards compliance. Enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens. It turns off -fsemantic-interposition. Do not use this if your code relies on specified IEEE float behavior or if you have multiple threads accessing the same data, even with locks. Even if your code doesn't, extensive testing is required to guarantee you can get away with this flag, and the benefit is small even when you can. Avoid if you are a beginner.


-Os: Optimize for size. Enables all -O2 optimizations except those that often increase code size. It also enables -finline-functions, causes the compiler to tune for code size rather than execution speed, and performs further optimizations designed to reduce code size. Generally not recommended due to its "hyper-focus" on minimizing the size of a program, even at the expense of obvious, highly beneficial optimizations.


-Oz: Optimize strongly for size. The difference is small for gcc, but clang's -Os is not nearly as aggressive as gcc's, so the difference there is larger.


-Og: Optimize debugging experience. Should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience. It is a better choice than -O0 for producing debuggable code because some compiler passes that collect debug information are disabled at -O0. Like -O0, -Og completely disables a number of optimization passes so that individual options controlling them have no effect. Otherwise -Og enables all -O1 optimization flags except for those that may interfere with debugging.


-flto=auto: Perform link-time optimization. There is some confusion about LTO because the supported way to do it has changed a lot over the years but in 2023 with gcc 13 or clang 16 just using -flto=auto should make everything just work. Or, you may need SET(CMAKE_AR "gcc-ar") and SET(CMAKE_RANLIB "gcc-ranlib") when using cmake because cmake hates using gcc correctly for some reason. And with clang you may need -fuse-ld=lld because Ubuntu is shit. There is quite a bit of difference between how LTO works between gcc and clang. With clang, -flto=auto is an alias for -flto=full, which treats the entire program as one codegen unit at the cost of parallelism. Clang can only parallelize LTO with -flto=thin, which skips many optimizations. This is distinct from the concept of thin LTO objects in gcc which are just object files with only GIMPLE bytecote and no machine code so they require LTO at link-time to be used. Note that the reverse is never true: executables compiled with LTO will happily link against objects and static libraries that lack it with no problem (except foregone performance). In gcc you do not need to worry about this full/thin tradeoff because whole-program analysis is performed unconditionally. In particular, you DO NOT need -flto=1, -fuse-linker-plugin, -ffat-lto-objects, -fwhole-program, or -flto-partition=one to squeeze the maximum performance out: gcc will prove its LTO partitions do not forego any optimizations. The -flto=N option only controls how many jobs the LTO wrapper can accept at once, it does not parallelize the jobs themselves. The jobs will be parallelized by -flto-partition=balanced (the default) in a way that still catches all optimizations. The LTO plugin is loaded automatically when passing -flto to gcc. -fwhole-program manually guarantees what the plugin has already either proven true or false, so it will either not help or will introduce UB. Don't use it. -ffat-lto-objects allows objects and static library archives to be linked against executables that do not themselves use -flto, and should be left off unless you want LTO to silently break and not tell you. It has nothing to do with fat LTO in Rust. Make sure you use the =auto part or you will bottleneck your Makefiles for no reason because you are only allowing one lto-wrapper to run at once despite that fact that the parallelism has no drawbacks. Or, in the case of clang, you are getting parallelism but with big drawbacks so =auto turns it off. At any rate when using a Makefile the number of threads used for LTO with =auto is capped at -j so you don't need two $(nproc) calls. For more information, see here. TL;DR: Just use -flto=auto! Reading all the wrong StackOverflow answers about this is giving me brain damage!


-march=native: Optimize the code for your processor. This is of limited usefulness with gcc since it barely auto-vectorizes compared to clang, but it will still enable any code paths conditional on architecture support and will take caching and microarchitecture behaviors into consideration.


-s and strip: strip is primarily useful for release builds, it strips unneeded symbols and can be invoked at compile- or link-time with -s or separately after the fact with strip PROGRAM


-pipe: Use pipes instead of temporary files when compiling. Saves your SSD for negligible (probably) compile-time memory cost.


-f1337-epic-option: Some of these are useful if you are minifying a nolibc executable but in general if you are a beginner or are just looking for performance these will just cause you headaches. Stick to the basics: -O3, -flto=auto, -march=native, -pipe, -s.


NOTE: Optimizers aren't magic. Your code will still be slow if it's shit. See this talk for just the tip of the iceberg. Don't use linked lists just because they're easy!

Debug options

Generally -Og -g, use -ggdb instead of -g if you intend to use GNU Debugger. -fsanitize=... has many useful features described in The GCC PDF, you however cannot combine some directives with any debugger. The most common ones are address and undefined.

Tools like Valgrind, Splint may also help you debug and improve your code.

Diagnostic options

Consider -fno-diagnostics-show-caret for GCC or -fno-caret-diagnostics for Clang to reduce the number of lines per actual error in the compiler output.

Other Recommendations

Thread 23: >>94944789

Anonymous >>95011850
should i pass structs as parameters, or only a pointer to the struct.
Anonymous >>95012180
The usual rule of thumb is if you can fit the struct into 2 registers then pass it by value, otherwise by pointer. In reality it's more complicated. There are semantic reasons for wanting to accept by value, e.g. to make the code easier to work with. If the function is static then it's basically always fine to pass it by value because the optimizer can change it to accept a pointer (assuming you don't modify the copy of the struct), similar thing with returning structs by value, the compiler will allocate stack memory and pass in a pointer that will be filled instead of having it copied out. If you compile with LTO then this can happen across translation unit boundaries for non-static functions, but without LTO the compiler has to abide by the target ABI. In this case it's usually better to pass by pointer using the previous rule of thumb, but if you read the ABI spec for your target you will find that it's more complicated than if it can fit in 2 registers, there are a bunch of rules that determine whether a parameter is passed in a register or copied as an argument to the stack. Really the cost for non-optimized function calls is determined by the parameter passing rules defined in the target ABI, the cost memory lookup on the target, and whether the struct is in cache or not.

Thread 0: >>92317404

Anonymous >>92317996
>>92317404 (OP)
How do I into C? I know a few other programming languages, so I'm less interested in syntax and such. But more best practices.
Anonymous >>92340750 (pruned)
>>92317996
My tips:
>Compile with -Wall -Wextra. In debug builds, use sanitizers too.
>For complex cross-platform programs CMake or meson can be helpful, but for simple programs make is often sufficient.
>Don't overuse the preprocessor. Use something else if possible without too much hackery. For example, sometimes you can use an inline function. For int constants, you can use enum instead of #define.
>You can organize a larger program by using prefixes. E.g. all color functions start with color_ and are in color.c/color.h, random generators start with rnd_ and are in rnd.c/rnd.h.
>Prefer reentrant to non-reentrant functions, e.g. strtok_r instead of strtok. Also avoid any string functions which can cause buffer overflow because they don't take a length, e.g. strcat or gets.
Anonymous >>92318458
>>92317996
Avoid malloc'ing & returning the malloc'd memory from your functions. Instead, have them accept a buffer pointer + size to write to. Keep free's as close to their malloc's as possible.

Don't use any string functions other than strnlen. Each string handling function has its own peculiar way to handle the \0 that you have to remember, and they barely save you any time compared to just doing the pointer arithmetic yourself & using memcpy. strcat is particularly notorious for tempting you into writing accidentally quadratic code.

Keep your statements short and simple. Reading & writing to a variable in the same expression is undefined behavior. The order of evaluation of the arguments to a function is also unspecified, so don't write shit like f(a++, b, a+c) expecting a particular evaluation order.

Turn on all possible compiler warnings & use -Werror.

Define as few globals as possible. If you have to, try to keep them static to their translation unit.

Macros are shit, but unavoidable. When defining them, parenthesize all your macro args and avoid repeating them, lest you'll accidentally mutate state twice when invoking them.

Macros that expand to expressions should have their whole definitions wrapped in parentheses.
Macros that expand to statements should have their whole definitions wrapped in a do{...}while(0) statement with no trailing semicolon.

C programmers love linked lists. Expect to run into lots of them while reading C code. If you wanna know why, try implementing a growable array in C.
Xolatile >>92335411
> Avoid malloc'ing & returning the malloc'd memory...
Actually, using 'calloc' is safer referring to Splint (very strict C99 linter) because it defines allocated memory segment, fills it with 0s, and validates a type if used correctly...
> Turn on all possible compiler warnings...
I completely agree, and I use different compilers with maximum warnings enabled with -Werror only in "finishing stage", along with Valgrind (always) and Splint (sometimes).
> Define as few globals as possible.
It's best to have no global variables, pass variables to functions like a proper programmer, state is the root of a lot of hard to track bugs. (:
> Macros are shit, but unavoidable.
I usually have 0 macros in my programs, I define constants inside enumerations. I don't use ifdef, ifndef, define, endif, etc. because I don't write cross-platform programs...
Due to autism, sometimes I don't even use #include, but instead write "extern ..." at the start of my source code files, or pass -include flag to compiler...
--
I missed this when it was posted Anon, but I agree with most of your points, except macros mostly, they should be avoided at all cost, especially function-like macros. (:
I usually implement my own "printf" without variadics, passing in a format as string, and array of union of basic types.

C Misconceptions

C is too small of a language to be useful!

While C is a relatively small language, it provides enough facilities to create anything you can imagine. It's no secret that most interpreted languages like Perl, Python, Lua, and countless Lisps/Schemes/Forths are implemented in C. Anything you can implement in the aforementioned languages, can also be implemented in C. This could be said about many small languages which aren't usable at all, but C provides enough tools of abstraction to be useful in projects of any scale, from /usr/bin/true to /boot/vmlinuz.

On the other hand, C's simplicity makes it much easier to learn the whole language. Anyone with previous programming experience can learn the entirety of C in just a few weeks. After learning the language itself, one spends the rest of their C programming career figuring out the best way to apply it. This is more productive, as you're gaining actual CS knowledge and not focusing on superficial things like a particular language's syntax/implementation details.

C has no package manager!

C has many, many package managers, one for every GNU/Linux Distribution. Language-specific package managers tend to be a bad idea anyway

C's lack of memory safety leads to buggy programs!

Good coding habits will prevent many such bugs. There are also tools like ASan and UBSan which help find memory bugs during testing. Large codebases may consider using talloc, which provides safe wrappers around the system malloc() and will catch 97% of memory errors at compile- or run-time.