| Message |
Well, there is a way around it ...
Instead of storing UTF-8, store base-64 encoded UTF-8. That adds slightly to the amount stored, but means you are not storing bytes with the high-order bit set. That should work for everyone. Example code:
-- helper function to convert Unicode sequences
function unicode_convert (s)
return utils.utf8encode (tonumber (string.match (s, "^&#(%d+);$")))
end -- unicode_convert
DatabaseOpen ("db", GetInfo (66) .. "utf8_test.db", 6)
DatabaseExec ("db", "CREATE TABLE IF NOT EXISTS test (name TEXT NOT NULL);")
hello = string.gsub ("大家好", "&#%d+;", unicode_convert)
print ("original string=", hello)
-- insert a record
DatabaseExec ("db", "INSERT INTO test (name) VALUES ('" .. utils.base64encode (hello) .. "')")
-- prepare a query
DatabasePrepare ("db", "SELECT * from test")
-- execute to get the first row
rc = DatabaseStep ("db") -- read first row
-- now loop, displaying each row, and getting the next one
while rc == 100 do
values = DatabaseColumnValues ("db")
print ("string from database=", values [1])
print ("string converted back=", utils.base64decode ( values [1]) )
rc = DatabaseStep ("db") -- read next row
end -- while loop
-- finished with the statement
DatabaseFinalize ("db")
DatabaseClose ("db") -- close it
Example of that code in operation:

What we actually store in the database is "5aSn5a625aW9", which is actually hex E5A4A7E5AEB6E5A5BD encoded in base-64.
See here for proof:
print (utils.base64encode (utils.fromhex ("E5A4A7E5AEB6E5A5BD"))) --> 5aSn5a625aW9
But because the text we are dealing with is just letters and numbers, we don't have the issue of it being wrongly decoded. At least I hope not. That code works for me with the encoding set to English, and also Chinese. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | top |
|