Strings
Strings are sequences of characters represented by c8
, c16
and c32
, terminated by a null terminator (\0
). Luna SDK provides various string types and libraries, they will be discussed in this section.
String types
#include <Luna/Runtime/String.hpp> // For String, String16 and String32.
#include <Luna/Runtime/Name.hpp> // For Name.
Luna SDK provides two kinds of string types: String
and Name
.
The String
type is a sequence of c8
characters ended with \0
. We designed String
as a replacement of std::string
in Luna SDK, so most methods used for std::string
should work find with our String
type. Besides the String
type, we also have String16
and String32
as replacements for std::u16string
and std::u32string
that holds character sequences of c16
and c32
types.
The Name
type represents one immutable c8
string that is usually used as an identifier. We implemented a global name registry in Luna SDK so that every unique name will have only one data copy in the registry, and all Name
objects with the same string data refers to that copy, thus can be compared for equality quickly. The name string data is reference counted, and will be freed when the last Name
object that refers to the data is destructed. Strings stored in Name
cannot be changed, if the user assigns Name
with another string, the Name
object will refer to another string data entry, remaining the original string entry unchanged.
Name
and String
can be converted to each other implicitly. There is no enforced encoding format for string types, but most text processing APIs in Luna SDK expects UTF-8 encoded strings for String
and Name
types.
String utility library
#include <Luna/Runtime/StringUtils.hpp>
The string utility library provides functions for processing characters and strings. Luna SDK imports the following string and character processing functions from C standard library that can be used directly in Luna SDK:
strncpy
strcat
strncat
strxfrm
strncmp
strcoll
strchr
strrchr
strspn
strpbrk
strstr
strtok
isalnum
isalpha
islower
isupper
tolower
toupper
isdigit
isxdigit
iscntrl
isgraph
isspace
isblank
isprint
ispunct
strlen
, strcpy
and strcmp
are compatible to C standard library, but are extended by Luna SDK so they handles all character types. strcmp_prefix
checks whether one string is the prefix string of another string, and returns 0
if is. strtoi64
, strtou64
and strtof64
interprets one number value presented the by string, and returns the value.
Unicode encoding library
#include <Luna/Runtime/UTF8.hpp>
Unicode is a text encoding standard that is widely used in modern computers, programs and websites. Luna SDK comes with a built-in Unicode library for processing strings encoded in commonly-used Unicode formats, including UTF-8, UTF-16 (LE and GE) and UTF-32.
Luna SDK uses 32-bit character type (c32
) to represent one Unicode character, the value of the character object represents the codepoint of the character in Unicode character table. One Unicode character can be encoded to 1 c32
character in UTF-32, 1 to 2 c16
characters in UTF-16, and 1 to 6 c8
characters in UTF-8.
By definition, one Unicode character represented by
c32
differs from one Unicode character encoded using UTF-32 (the first bit of one UTF-32 character must be 0, so only 2^32 Unicode codepoints can be represented in UTF-32). But in practice, all existing Unicode characters can be converted to their UTF-32 representation without any modification, so we do not differ one Unicode character from one UTF-32 character in this manual.
utf8_charspan
and utf16_charspan
take one Unicode character, and return the number of c8
or c16
characters required to represent that character in UTF-8 or UTF-16 encoding. utf8_charlen
and utf16_charlen
take the first c8
or c16
character of one UTF-8 or UTF-16 encoded Unicode character, and return the number of bytes used for that character. These functions can be used to measure the size of one UTF-32 character in UTF-8 and UTF-16 encoding.
utf8_strlen
and utf16_strlen
calculate the number of Unicode characters contained by a UTF-8 or UTF-16 encoded string, utf8_index
and utf16_index
return the index of the first c8
or c16
character of the n
th Unicode character in a UTF-8 or UTF-16 ebcided string. These functions can be used to calculate the length of Unicode-encoded strings.
utf8_encode_char
and utf16_encode_char
encode one Unicode character into multiple c8
or c16
characters using UTF-8 or UTF-16 encoding, write the encoded characters to the user-provided buffer, and return the number of characters written. utf8_decode_char
and utf16_decode_char
, on the other side, read multiple c8
or c16
characters from the user-provided buffer, and returns the Unicode character represented by these characters.
utf16_to_utf8
converts a UTF-8 encoded string to a UTF-16 encoded string, and utf8_to_utf16
converts a UTF-16 encoded string to a UTF-8 encoded string. Both functions write result strings in a user-provided buffer, utf16_to_utf8_len
and utf8_to_utf16_len
can be used to calculate the minimum size (measured in number of c8
or c16
characters, not including the null terminator) required for the buffer to hold the result string.
Base64 encoding library
#include <Luna/Runtime/Base64.hpp>
Base64 is an encoding format that represents arbitrary binary data using 64 printable characters, plus one character (=
) for paddings. It is useful to store binary data in a text-based file. Luna SDK comes with a built-in Base64 library for encoding and decoding binary data using Base64.
base64_encode
encodes the binary data in the user-provided source buffer to a Base64 encoded string, and writes the string to the user-provided destination buffer. To determine the size of the destination buffer required, call base64_get_encoded_size
with the size of the row binary data.
base64_decode
decodes the Base64 string in the user-provided source buffer to original binary data, and writes the binary data to the user-provided destination buffer. To determine the size of the destination buffer required, call base64_get_decoded_size
with the size of the Base64 string, excluding the null terminator.