Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ MUSHclient ➜ Bug reports ➜ DatebaseStep bug?

DatebaseStep bug?

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1  2  3 

Posted by Ddid   China  (19 posts)  Bio
Date Reply #30 on Sun 31 Oct 2010 12:50 AM (UTC)
Message
Any way, many thanks for your great working -- MushClient!
Top

Posted by Nick Gammon   Australia  (23,169 posts)  Bio   Forum Administrator
Date Reply #31 on Sun 31 Oct 2010 04:06 AM (UTC)

Amended on Sun 31 Oct 2010 06:03 AM (UTC) by Nick Gammon

Message
After fairly extensive investigations, which were slowly driving me insane[1], I have made some modifications to the Lua interface such that it bypasses the conversion to/from CString, and thus works around the problem. This applies to DatabaseColumnText, DatabaseColumnValue, and DatabaseColumnValues.

Now the Lua interface directly sets the values (without using the BSTR values) and avoids this problem.

My testing for the test program on pages 1 and 2 shows it now returns the correct data.

However there may be other areas which have similar problems, due to the use of CString in many places.

Improvements in version 4.66.

---

1. ... Because I kept getting different results. I was supposed to get 9 bytes out, but was getting 8. That shouldn't be too hard to fix, huh? But it jumped to 11, then 22, then down to 3, then up to 6. And then the data was just completely wrong. This will teach me to write a non-Unicode application in the future. But when I started (15 years ago, when I was young kek), it was just a little program to help me play MUSH games. In English.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Ddid   China  (19 posts)  Bio
Date Reply #32 on Sun 31 Oct 2010 01:39 PM (UTC)
Message
A lot of thanks for your hardly work. I'm very expecting the new version MushClient release.



Top

Posted by Nick Gammon   Australia  (23,169 posts)  Bio   Forum Administrator
Date Reply #33 on Sun 31 Oct 2010 08:48 PM (UTC)
Message
Shortly.

Just to explain what I think is happening ...

MUSHclient is not a Unicode application (when I wrote it, I wasn't that familiar with Unicode). So internally it uses 8-bit strings. More recently it uses UTF-8 to encode Unicode, in some places.

However the WSH (Windows Script Host) uses BSTR to communicate between scripts and the program. The BSTR type is 16-bit Unicode data.

http://msdn.microsoft.com/en-us/library/ms221069.aspx


Now to convert from the output of a script call to BSTR the internal libraries assume the data is in the current code page (normal ANSI for me) and do a lookup to convert characters like 0xBD from the code page to the Unicode equivalent. Since it seems to work for me, I presume that there is a one-to-one mapping for them.

But in the case of Chinese code pages, some characters, like 0xBD must translate into something else.

Then when it is time to convert them back into 8-bit strings (eg. for Lua) the process is reversed. All seems to work fine providing each 8-bit character can be translated into Unicode, and back again, without changing or discarding it. With some code pages enabled, obviously this isn't happening.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,169 posts)  Bio   Forum Administrator
Date Reply #34 on Sun 31 Oct 2010 09:16 PM (UTC)
Message
One additional point - why did the data get onto the database OK, but not off it? Well the answer to that is that in the Lua -> Database direction, the data is not encoded into BSTR. It is simply copied across as "const char *" and thus is not fiddled with in any way.

It is in the reverse direction the problem applies.

However, enough talk. In about 15 minutes the new version should be available.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Ddid   China  (19 posts)  Bio
Date Reply #35 on Mon 01 Nov 2010 09:13 AM (UTC)
Message
I'm sorry to report this bug again in the new version MushClient - 4.66

length before add= 9
length after add= 8
name= E5A4A7E5AEB6E5A5
length from SQL= 3
Top

Posted by Nick Gammon   Australia  (23,169 posts)  Bio   Forum Administrator
Date Reply #36 on Mon 01 Nov 2010 09:50 AM (UTC)
Message
Well, that's odd. What encoding do you have your system set to?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Ddid   China  (19 posts)  Bio
Date Reply #37 on Mon 01 Nov 2010 10:10 AM (UTC)
Message
I'm not sure, that should be UNICODE.
Top

Posted by Ddid   China  (19 posts)  Bio
Date Reply #38 on Mon 01 Nov 2010 10:43 AM (UTC)
Message
Please forget the last answer, my system encoding set is GBK(code page 936).

I just changed my system encoding setting to English(US), the testing code returned correct result:

length before add= 9
length after add= 9
name= E5A4A7E5AEB6E5A5BD
length from SQL= 3
Top

Posted by Nick Gammon   Australia  (23,169 posts)  Bio   Forum Administrator
Date Reply #39 on Mon 01 Nov 2010 09:28 PM (UTC)
Message
I set my code page to 936, and with version 4.66 of MUSHclient, it worked OK.


- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Ddid   China  (19 posts)  Bio
Date Reply #40 on Tue 02 Nov 2010 05:02 AM (UTC)
Message
I'm really confused. Now, look at my status:

I'm using English Version of Windows XP, with Simplified Chinese Language Pack, so I can change my language encoding setting for non-Unicode programs, when I used English(US) as the encoding setting, the SQLITE3 database's accessing is ok, but, MushClient's layout, fonts, paths, ... is fail; when I used Chinese(PRC) as the encoding setting, everything is ok, except SQLITE3 database's accessing(Chinese content).

So, I have to decide to give up using the Scripting Function of SQLITE3 in MushClient, infact, I spend some time on LuaSQL in yesterday evening(Beijing time), it is working ok with MushClient.

Any way, a lot of thanks for your help. Your great working let me learn a lot.
Top

Posted by Nick Gammon   Australia  (23,169 posts)  Bio   Forum Administrator
Date Reply #41 on Tue 02 Nov 2010 06:26 AM (UTC)

Amended on Tue 02 Nov 2010 07:56 PM (UTC) by Nick Gammon

Message
Well, there is a way around it ...

Instead of storing UTF-8, store base-64 encoded UTF-8. That adds slightly to the amount stored, but means you are not storing bytes with the high-order bit set. That should work for everyone. Example code:


-- helper function to convert Unicode sequences
function unicode_convert (s)
  return  utils.utf8encode (tonumber (string.match (s, "^&#(%d+);$")))
end -- unicode_convert  


DatabaseOpen ("db", GetInfo (66) .. "utf8_test.db", 6)

DatabaseExec ("db", "CREATE TABLE IF NOT EXISTS test (name TEXT NOT NULL);")

hello = string.gsub ("大家好", "&#%d+;", unicode_convert)

print ("original string=", hello)

-- insert a record
DatabaseExec ("db", "INSERT INTO test (name) VALUES ('" .. utils.base64encode (hello) .. "')") 
      
-- prepare a query
DatabasePrepare ("db", "SELECT * from test")

-- execute to get the first row
rc = DatabaseStep ("db")  -- read first row

-- now loop, displaying each row, and getting the next one
while rc == 100 do
  
  values = DatabaseColumnValues ("db")

  print ("string from database=", values [1])

  print ("string converted back=", utils.base64decode ( values [1]) )

  rc = DatabaseStep ("db")  -- read next row

end -- while loop

-- finished with the statement
DatabaseFinalize ("db")


DatabaseClose ("db")  -- close it


Example of that code in operation:



What we actually store in the database is "5aSn5a625aW9", which is actually hex E5A4A7E5AEB6E5A5BD encoded in base-64.

See here for proof:


print (utils.base64encode (utils.fromhex ("E5A4A7E5AEB6E5A5BD"))) --> 5aSn5a625aW9



But because the text we are dealing with is just letters and numbers, we don't have the issue of it being wrongly decoded. At least I hope not. That code works for me with the encoding set to English, and also Chinese.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,169 posts)  Bio   Forum Administrator
Date Reply #42 on Tue 02 Nov 2010 06:38 AM (UTC)

Amended on Tue 02 Nov 2010 07:59 PM (UTC) by Nick Gammon

Message
The Lua SQLite3 interface is built into MUSHclient anyway, and that might help, if you aren't using it already. For example, using the base64-encoding:


function unicode_convert (s)
  return  utils.utf8encode (string.match (s, "^&#(%d+);$"))
end -- unicode_convert 

db = sqlite3.open(GetInfo (66) .. "utf8_test.db")  -- open

db:exec "CREATE TABLE IF NOT EXISTS test (name TEXT NOT NULL);"

hello = string.gsub ("大家好", "&#%d+;", unicode_convert)

print ("original string=", hello)

-- insert a record
db:exec ("INSERT INTO test (name) VALUES ('" .. utils.base64encode (hello) .. "')") 
      
for row in db:nrows ("SELECT * from test") do
  print ("string from database=", row.name)
  print ("string converted back=", utils.base64decode ( row.name ) )
end -- for loop

db:close()  -- close


And without base-64 encoding:


function unicode_convert (s)
  return  utils.utf8encode (string.match (s, "^&#(%d+);$"))
end -- unicode_convert  

db = sqlite3.open(GetInfo (66) .. "utf8_test.db")  -- open

db:exec "CREATE TABLE IF NOT EXISTS test (name TEXT NOT NULL);"

hello = string.gsub ("大家好", "&#%d+;", unicode_convert)

print ("original string=", hello)

-- insert a record
db:exec ("INSERT INTO test (name) VALUES ('" .. hello .. "')") 
      
for row in db:nrows ("SELECT * from test") do
  print ("string from database=", row.name)
end -- for loop

db:close()  -- close


That is shorter anyway. And since it sticks to Lua, you avoid the CString problems.


- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Ddid   China  (19 posts)  Bio
Date Reply #43 on Tue 02 Nov 2010 11:50 AM (UTC)
Message
Great!

The second code is what I want!

Many many thanks!
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


119,162 views.

This is page 3, subject is 3 pages long:  [Previous page]  1  2  3 

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.