Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ MUSHclient
➜ Bug reports
➜ DatebaseStep bug?
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1
2
3
| Posted by
| Ddid
China (19 posts) Bio
|
| Date
| Reply #30 on Sun 31 Oct 2010 12:50 AM (UTC) |
| Message
| | Any way, many thanks for your great working -- MushClient! | | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #31 on Sun 31 Oct 2010 04:06 AM (UTC) Amended on Sun 31 Oct 2010 06:03 AM (UTC) by Nick Gammon
|
| Message
| After fairly extensive investigations, which were slowly driving me insane[1], I have made some modifications to the Lua interface such that it bypasses the conversion to/from CString, and thus works around the problem. This applies to DatabaseColumnText, DatabaseColumnValue, and DatabaseColumnValues.
Now the Lua interface directly sets the values (without using the BSTR values) and avoids this problem.
My testing for the test program on pages 1 and 2 shows it now returns the correct data.
However there may be other areas which have similar problems, due to the use of CString in many places.
Improvements in version 4.66.
---
1. ... Because I kept getting different results. I was supposed to get 9 bytes out, but was getting 8. That shouldn't be too hard to fix, huh? But it jumped to 11, then 22, then down to 3, then up to 6. And then the data was just completely wrong. This will teach me to write a non-Unicode application in the future. But when I started (15 years ago, when I was young kek), it was just a little program to help me play MUSH games. In English. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Ddid
China (19 posts) Bio
|
| Date
| Reply #32 on Sun 31 Oct 2010 01:39 PM (UTC) |
| Message
| A lot of thanks for your hardly work. I'm very expecting the new version MushClient release.
| | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #33 on Sun 31 Oct 2010 08:48 PM (UTC) |
| Message
| Shortly.
Just to explain what I think is happening ...
MUSHclient is not a Unicode application (when I wrote it, I wasn't that familiar with Unicode). So internally it uses 8-bit strings. More recently it uses UTF-8 to encode Unicode, in some places.
However the WSH (Windows Script Host) uses BSTR to communicate between scripts and the program. The BSTR type is 16-bit Unicode data.
http://msdn.microsoft.com/en-us/library/ms221069.aspx
Now to convert from the output of a script call to BSTR the internal libraries assume the data is in the current code page (normal ANSI for me) and do a lookup to convert characters like 0xBD from the code page to the Unicode equivalent. Since it seems to work for me, I presume that there is a one-to-one mapping for them.
But in the case of Chinese code pages, some characters, like 0xBD must translate into something else.
Then when it is time to convert them back into 8-bit strings (eg. for Lua) the process is reversed. All seems to work fine providing each 8-bit character can be translated into Unicode, and back again, without changing or discarding it. With some code pages enabled, obviously this isn't happening. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #34 on Sun 31 Oct 2010 09:16 PM (UTC) |
| Message
| One additional point - why did the data get onto the database OK, but not off it? Well the answer to that is that in the Lua -> Database direction, the data is not encoded into BSTR. It is simply copied across as "const char *" and thus is not fiddled with in any way.
It is in the reverse direction the problem applies.
However, enough talk. In about 15 minutes the new version should be available. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Ddid
China (19 posts) Bio
|
| Date
| Reply #35 on Mon 01 Nov 2010 09:13 AM (UTC) |
| Message
| I'm sorry to report this bug again in the new version MushClient - 4.66
length before add= 9
length after add= 8
name= E5A4A7E5AEB6E5A5
length from SQL= 3
| | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #36 on Mon 01 Nov 2010 09:50 AM (UTC) |
| Message
| | Well, that's odd. What encoding do you have your system set to? |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Ddid
China (19 posts) Bio
|
| Date
| Reply #37 on Mon 01 Nov 2010 10:10 AM (UTC) |
| Message
| | I'm not sure, that should be UNICODE. | | Top |
|
| Posted by
| Ddid
China (19 posts) Bio
|
| Date
| Reply #38 on Mon 01 Nov 2010 10:43 AM (UTC) |
| Message
| Please forget the last answer, my system encoding set is GBK(code page 936).
I just changed my system encoding setting to English(US), the testing code returned correct result:
length before add= 9
length after add= 9
name= E5A4A7E5AEB6E5A5BD
length from SQL= 3
| | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #39 on Mon 01 Nov 2010 09:28 PM (UTC) |
| Message
| I set my code page to 936, and with version 4.66 of MUSHclient, it worked OK.
 |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Ddid
China (19 posts) Bio
|
| Date
| Reply #40 on Tue 02 Nov 2010 05:02 AM (UTC) |
| Message
| I'm really confused. Now, look at my status:
I'm using English Version of Windows XP, with Simplified Chinese Language Pack, so I can change my language encoding setting for non-Unicode programs, when I used English(US) as the encoding setting, the SQLITE3 database's accessing is ok, but, MushClient's layout, fonts, paths, ... is fail; when I used Chinese(PRC) as the encoding setting, everything is ok, except SQLITE3 database's accessing(Chinese content).
So, I have to decide to give up using the Scripting Function of SQLITE3 in MushClient, infact, I spend some time on LuaSQL in yesterday evening(Beijing time), it is working ok with MushClient.
Any way, a lot of thanks for your help. Your great working let me learn a lot. | | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #41 on Tue 02 Nov 2010 06:26 AM (UTC) Amended on Tue 02 Nov 2010 07:56 PM (UTC) by Nick Gammon
|
| Message
| Well, there is a way around it ...
Instead of storing UTF-8, store base-64 encoded UTF-8. That adds slightly to the amount stored, but means you are not storing bytes with the high-order bit set. That should work for everyone. Example code:
-- helper function to convert Unicode sequences
function unicode_convert (s)
return utils.utf8encode (tonumber (string.match (s, "^&#(%d+);$")))
end -- unicode_convert
DatabaseOpen ("db", GetInfo (66) .. "utf8_test.db", 6)
DatabaseExec ("db", "CREATE TABLE IF NOT EXISTS test (name TEXT NOT NULL);")
hello = string.gsub ("大家好", "&#%d+;", unicode_convert)
print ("original string=", hello)
-- insert a record
DatabaseExec ("db", "INSERT INTO test (name) VALUES ('" .. utils.base64encode (hello) .. "')")
-- prepare a query
DatabasePrepare ("db", "SELECT * from test")
-- execute to get the first row
rc = DatabaseStep ("db") -- read first row
-- now loop, displaying each row, and getting the next one
while rc == 100 do
values = DatabaseColumnValues ("db")
print ("string from database=", values [1])
print ("string converted back=", utils.base64decode ( values [1]) )
rc = DatabaseStep ("db") -- read next row
end -- while loop
-- finished with the statement
DatabaseFinalize ("db")
DatabaseClose ("db") -- close it
Example of that code in operation:

What we actually store in the database is "5aSn5a625aW9", which is actually hex E5A4A7E5AEB6E5A5BD encoded in base-64.
See here for proof:
print (utils.base64encode (utils.fromhex ("E5A4A7E5AEB6E5A5BD"))) --> 5aSn5a625aW9
But because the text we are dealing with is just letters and numbers, we don't have the issue of it being wrongly decoded. At least I hope not. That code works for me with the encoding set to English, and also Chinese. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,169 posts) Bio
Forum Administrator |
| Date
| Reply #42 on Tue 02 Nov 2010 06:38 AM (UTC) Amended on Tue 02 Nov 2010 07:59 PM (UTC) by Nick Gammon
|
| Message
| The Lua SQLite3 interface is built into MUSHclient anyway, and that might help, if you aren't using it already. For example, using the base64-encoding:
function unicode_convert (s)
return utils.utf8encode (string.match (s, "^&#(%d+);$"))
end -- unicode_convert
db = sqlite3.open(GetInfo (66) .. "utf8_test.db") -- open
db:exec "CREATE TABLE IF NOT EXISTS test (name TEXT NOT NULL);"
hello = string.gsub ("大家好", "&#%d+;", unicode_convert)
print ("original string=", hello)
-- insert a record
db:exec ("INSERT INTO test (name) VALUES ('" .. utils.base64encode (hello) .. "')")
for row in db:nrows ("SELECT * from test") do
print ("string from database=", row.name)
print ("string converted back=", utils.base64decode ( row.name ) )
end -- for loop
db:close() -- close
And without base-64 encoding:
function unicode_convert (s)
return utils.utf8encode (string.match (s, "^&#(%d+);$"))
end -- unicode_convert
db = sqlite3.open(GetInfo (66) .. "utf8_test.db") -- open
db:exec "CREATE TABLE IF NOT EXISTS test (name TEXT NOT NULL);"
hello = string.gsub ("大家好", "&#%d+;", unicode_convert)
print ("original string=", hello)
-- insert a record
db:exec ("INSERT INTO test (name) VALUES ('" .. hello .. "')")
for row in db:nrows ("SELECT * from test") do
print ("string from database=", row.name)
end -- for loop
db:close() -- close
That is shorter anyway. And since it sticks to Lua, you avoid the CString problems.
 |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Ddid
China (19 posts) Bio
|
| Date
| Reply #43 on Tue 02 Nov 2010 11:50 AM (UTC) |
| Message
| Great!
The second code is what I want!
Many many thanks! | | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
119,162 views.
This is page 3, subject is 3 pages long:
1
2
3
It is now over 60 days since the last post. This thread is closed.
Refresh page
top