This page describes the coding style guidelines in use for the components of Sphinx that are written in the C programming language. It does not apply to Sphinx4, or to Python and Perl components.
These guidelines are generally based on the historical best practice within the Sphinx source. Unfortunately, much of the code has been written by different people in different environments and does not always conform. If you want to go and correct it, that is great, as it will probably help you understand the code better (as long as you don’t just run “indent” over it or whatever).
Language Features
Sphinx is written in ANSI C89. In practice, this actually just means that we try to be compatible back to GCC 2.95 and MSC 6.0. Using GNU, MS, or C99 extensions (such as “inline”) is allowed as long as they are properly conditionalized with some portable fallback mechanism. So, by this logic, C99/C++ style comments are not allowed, nor is Win32 exception handling, nor are dynamically-sized arrays.
Formatting
Although in some sense this is the least important aspect of coding style, it’s the most annoying one when it goes wrong. In general, we’d like everybody to use code formatting that is portable among different editors and IDEs.
Indentation
Indentation is 4 spaces. Not 4-space tabs, not sometimes 8-space tabs and
sometimes 4 spaces. 4 spaces. Nobody seems to be able to agree on what the
proper tab width is, disk space is cheap, and compression works. So please set
your editor to use spaces and not tabs for indentation. In Emacs, you can
accomplish this with M-x set-variable indent-tabs-mode nil
, or by adding
this to your .emacs file:
(setq indent-tabs-mode nil)
However, most of the code contains a first line that looks like this:
/* -*- c-basic-offset: 4; indent-tabs-mode: nil -*- */
which makes Emacs automatically use the proper settings. If this line isn’t in a file, please feel free to add it.
Bracketing
There is a strong preference towards always using braces to enclose loop and
conditional blocks. This makes it easier to add things to them and prevents
ambiguous readings of code. The opening brace should always go on the
same line as the preceding conditional or loop statement. In addition, we
typically put the else
or else if
keyword on a separate line from the
preceding close bracket. This makes it easier to re-arrange conditions with
cut and paste. So, in summary, blocks should look like this:
if (foo) {
/* something */
}
else if (bar) {
/* something else */
}
else {
/* something else */
}
Functions
Function declarations
should have the return type on the same line as the
function, like this:
int foo(char *bar, int baz);
Function definitions should have the return type on a separate line as the
function. In addition, the opening brace for the function goes on a line by
itself. This makes it easy to find function definitions by using grep
^function_name
. Therefore, functions should look like this:
int
foo(char *bar, int baz)
{
return 42;
}
Declarations, Commenting, and Naming
This section describes conventions for declaring things, commenting said declarations, and naming the things in question.
Variables
Variables should be declared in the innermost scope in which they are used.
For example:
for (mgau = 0; mgau < n_mgau; ++mgau) {
int feat;
for (feat = 0; feat < n_feat; ++feat) {
int comp;
for (comp = 0; comp < n_comp; ++comp) {
}
}
}
Variable names should be concise and lowercase, with underscores separating words. Do not use Hungarian notation. Although short names are encouraged, try to give variables meaningful names.
Boolean variables should be declared as int
. For everything else, use the
typedefs in <prim_type.h>
(from SphinxBase).
Functions
Function declarations in header files should always have
Doxygen style comments before them
(these are a lot like JavaDocs). Although there is a good rationale for
putting “inline” comments on function arguments, and much of the SphinxThree
code does this, it is very ugly and is thus discouraged. Please document
whether a pointer argument is an input, output, or input-output argument. For
output and input-output arguments, it is a good idea to encode this in the name
of the argument by prepending out_
, or inout_
to its name. Input
pointer arguments should also always be const
unless there is some good
reason for them not to be. So, for example:
/**
* Do something with some stuff.
*
* @param foo The foo index.
* @param bar (input) Pointer to a bar index.
* @param out_baz (output) The baz variable is returned here.
* @param inout_quux (input-output) Contains the quux variable on input
* and the updated quux variable on return.
* @return 0 for success or -1 for failure.
*/
int foobie(int foo, const int *bar, int *out_baz, int *inout_quux);
Function names should also be lowercase, with underscores separating words. If
a function can be thought of as a “method” on some type, then the first word in
the function name should indicate what sort of object it is associated with.
So, for example, if you have a type foo_t
, then its associated functions
should look like:
/**
* Construct and initialize a foo_t.
*/
foo_t *foo_init(void);
/**
* Free a foo_t.
*/
void foo_free(foo_t *foo);
/**
* Calculate bar using a foo_t.
*
* @return The bar value.
*/
int foo_bar(foo_t *foo);
General Style Points
Please try to avoid declaring functions with lots of arguments. There’s no hard and fast rule for this but ten is almost certainly too many. You should consider abstracting their values into an object of some sort, providing default values in its constructor, and then adding the ability to set values in it (either by directly accessing its fields or with accessor functions). Most of SphinxTrain violates this guideline in particularly heinous ways.
Please try to make functions concise. If your function doesn’t fit in about 80 lines of text, then consider refactoring out parts of it. The same advice about abstracting out parameters and local variables into an object of some sort applies here as well.
Abstract types (i.e. typedefs of structures without publically visible definitions) are a good idea for public APIs. For private APIs don’t bother with them.
All rules can be broken if it makes things consistently faster, more memory-efficient, more portable, or easier to understand. These goals are often in conflict with each other!