Jump to content

What is better mod data format?


Maris

Recommended Posts

Mod Data is transferred over the network (and saved to disk) in binary format, not JSON.

Strings are UTF-8, which is variable in size.  It can range from 1-4 bytes in size per character (A single byte for characters in the ASCII range), plus a additional 2 bytes (short) at the start of the string (the number of total bytes in the string)

 

PZ's kahlua engine treats (and converts) all numbers to Doubles (binary64) thus any number will take 64 bits (8 bytes size).

Bool values (true/false) are saved and transmitted as a single byte value.

 

Edited by Fenris_Wolf
Link to comment
Share on other sites

Expand on my previous explanation  with more concrete data...
Every key/value pair in the data table consumes extra space:

 

1 byte (key type) + 2 bytes (key string length) + ??? key string data (length varies) + 1 byte (value type) + ??? value data

consider the following examples, keeping these facts in mind:

 

* each key/value pair has 2 bytes padding.

* each key length in the examples below is 7 bytes long with a additional +2 bytes (the size of the string)

* not counting the value, each entry in the table is at least 11 bytes in size.

-- +8 bytes (double) for the number value.... 19 bytes per entry
{
    ["number1"] = 1,
    ["number2"] = 10,
    ["number3"] = 100,
    ["number4"] = 1000,
    ["number5"] = 10000,
    ["number6"] = 100000,
    ["number7"] = 1000000
}
-- total: 133 bytes

Each size in the above example is consistent, and the total size is133 bytes.

Now using string values instead of numbers

 

-- same key and padding size (11 bytes per entry)
-- but value size varies using strings:
{
    ["number1"] = "1", -- +2 bytes (length) +1 bytes (string characters) = 14 bytes total
    ["number2"] = "10".  -- +2 bytes +2 bytes (string characters) = 15 bytes total
    ["number3"] = "100", -- +2 bytes +3 bytes (string characters) = 16 bytes total
    ["number4"] = "1000", -- +2 bytes +4 bytes (string characters) = 17 bytes total
    ["number5"] = "10000", -- +2 bytes +5 bytes (string characters) = 18 bytes total
    ["number6"] = "100000", -- +2 bytes +6 bytes (string characters) = 19 bytes total
    ["number7"] = "1000000" -- +2 bytes +7 bytes = 20 bytes
}
-- total: 119 bytes

Notice only the last entry in the table exceds the "19 bytes per entry" used in the first example. The total size is a bit smaller.

Finally, lets compress all these numbers into a single table entry:

 

-- single entry. same key and padding size (11 bytes).
{
    ["numbers"] = "1,10,100,1000,10000,100000,1000000" -- +2 bytes (length) + 34 bytes (data) = 47 bytes total.
}
-- total: 47 bytes

Quite a drastic drop in size compressing the numeric values into a single string.

 

However the value of such knowledge is debatable. Yes it will save on network transmission size (as well as save on disk size and in memory), but at the cost of CPU time doing conversions.
 

Edited by Fenris_Wolf
Link to comment
Share on other sites

26 minutes ago, Maris said:

What about indexed arrays?

 

There are keys...the index number itself. Which is saved as a double (8 byes). So in your first example, each entry is 1 + 8 + 1 + 8 for a total of 18 bytes.

{1,2,3,4,5}
-- is the same as:
{[1] = 1, [2] = 2, [3] = 3, [4] = 4, [5] = 5}

18 bytes seems extremely wasteful given if those were strings:

 ["1"] = "1", -- 1 (padding) + 2 (str len) + 1 (str data) + 1 (padding) + 2 (str len) + 1 (str data) = 8 bytes total 

 

Nil values are never saved, or kept in tables at all.

{ key = nil } -- completely removes key from the table

The same applies to indexed versions.

 

Edited by Fenris_Wolf
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...