Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to "verify" your details, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Entire forum
➜ MUSHclient
➜ Bug reports
➜ Column functions not utf8 compatible
Column functions not utf8 compatible
|
You need to log onto the forum to reply or create new threads.
Refresh page
Posted by
| Fiendish
USA (2,533 posts) Bio
Global Moderator |
Date
| Wed 31 Jul 2024 12:39 PM (UTC) Amended on Fri 02 Aug 2024 12:28 PM (UTC) by Fiendish
|
Message
| Column selection metrics appear to use bytes instead of columns, so GetSelectionStartColumn, GetSelectionEndColumn, and SetSelection all have incorrect column values for UTF8.
This means e.g. that selecting a single UTF8 character on a row of 3-byte characters would start at (index*3)+1 and end at startcol+3, etc instead of at the displayed positions. |
https://github.com/fiendish/aardwolfclientpackage | Top |
|
Posted by
| Nick Gammon
Australia (23,046 posts) Bio
Forum Administrator |
Date
| Reply #1 on Wed 31 Jul 2024 09:40 PM (UTC) |
Message
| |
Posted by
| Fiendish
USA (2,533 posts) Bio
Global Moderator |
Date
| Reply #2 on Thu 01 Aug 2024 07:04 PM (UTC) Amended on Thu 01 Aug 2024 07:21 PM (UTC) by Fiendish
|
Message
|
Quote: What problems is this causing you?
You mean in a more practical sense than just the fact that the results are off?
I have a plugin that fakes split-screen output history scrolling with a miniwindow overlay. That plugin transfers selection back and forth so that the right text stays selected. But it fails when the selected text includes utf8 symbols, because the columns end up being wrong.
Here's a screen recording comparing text (working) and non-text (broken) selection:
https://imgur.com/a/iD75n8A
ChatGPT suggests rerieving the text from the edit control and computing the selection columns manually :-|
Straight out of chatgpt, completely untested:
#include <afxwin.h>
#include <string>
// Function to convert byte offset to character offset in a UTF-8 string
int ByteOffsetToCharOffset(const std::string& utf8Str, int byteOffset) {
int charOffset = 0;
int i = 0;
while (i < byteOffset) {
if ((utf8Str[i] & 0x80) == 0) {
i += 1;
} else if ((utf8Str[i] & 0xE0) == 0xC0) {
i += 2;
} else if ((utf8Str[i] & 0xF0) == 0xE0) {
i += 3;
} else if ((utf8Str[i] & 0xF8) == 0xF0) {
i += 4;
} else {
i += 1;
}
charOffset++;
}
return charOffset;
}
// Function to convert character offset to byte offset in a UTF-8 string
int CharOffsetToByteOffset(const std::string& utf8Str, int charOffset) {
int byteOffset = 0;
int charCount = 0;
while (charCount < charOffset && byteOffset < utf8Str.size()) {
if ((utf8Str[byteOffset] & 0x80) == 0) {
byteOffset += 1;
} else if ((utf8Str[byteOffset] & 0xE0) == 0xC0) {
byteOffset += 2;
} else if ((utf8Str[byteOffset] & 0xF0) == 0xE0) {
byteOffset += 3;
} else if ((utf8Str[byteOffset] & 0xF8) == 0xF0) {
byteOffset += 4;
} else {
byteOffset += 1;
}
charCount++;
}
return byteOffset;
}
// Function to get the corrected selection offsets (character offsets) from a CEdit control
void GetCorrectedSelection(CEdit& editCtrl, int& startCharOffset, int& endCharOffset) {
CString text;
editCtrl.GetWindowText(text);
std::string utf8Str = CT2A(text, CP_UTF8);
DWORD startByteOffset, endByteOffset;
editCtrl.GetSel(startByteOffset, endByteOffset);
startCharOffset = ByteOffsetToCharOffset(utf8Str, startByteOffset);
endCharOffset = ByteOffsetToCharOffset(utf8Str, endByteOffset);
}
// Function to set the selection in a CEdit control using character offsets
void SetCorrectedSelection(CEdit& editCtrl, int startCharOffset, int endCharOffset) {
CString text;
editCtrl.GetWindowText(text);
std::string utf8Str = CT2A(text, CP_UTF8);
int startByteOffset = CharOffsetToByteOffset(utf8Str, startCharOffset);
int endByteOffset = CharOffsetToByteOffset(utf8Str, endCharOffset);
editCtrl.SetSel(startByteOffset, endByteOffset);
}
// Example usage in your message handler or wherever appropriate
void SomeFunction() {
CEdit editCtrl; // Assuming this is properly initialized and points to your edit control
int startCharOffset, endCharOffset;
// Get the corrected selection
GetCorrectedSelection(editCtrl, startCharOffset, endCharOffset);
// Now startCharOffset and endCharOffset are the correct character offsets
// For demonstration, let's set the selection back to the same offsets
SetCorrectedSelection(editCtrl, startCharOffset, endCharOffset);
}
I might try it on the Lua side first. |
https://github.com/fiendish/aardwolfclientpackage | Top |
|
Posted by
| Nick Gammon
Australia (23,046 posts) Bio
Forum Administrator |
Date
| Reply #3 on Fri 02 Aug 2024 03:16 AM (UTC) |
Message
|
Quote:
You mean in a more practical sense than just the fact that the results are off?
Yeah.
Looking at your video, it appears to me you are selecting text in the output area, not the command area. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Fiendish
USA (2,533 posts) Bio
Global Moderator |
Date
| Reply #4 on Fri 02 Aug 2024 12:29 PM (UTC) Amended on Fri 02 Aug 2024 12:30 PM (UTC) by Fiendish
|
Message
|
Nick Gammon said:
Looking at your video, it appears to me you are selecting text in the output area, not the command area.
Right. Sorry. I've edited the original post to remove mention of GetInfo 236/237. I'm pretty sure the command area doesn't support utf8 anyway. I only care about the output area. |
https://github.com/fiendish/aardwolfclientpackage | Top |
|
Posted by
| Nick Gammon
Australia (23,046 posts) Bio
Forum Administrator |
Date
| Reply #5 on Fri 02 Aug 2024 09:18 PM (UTC) |
Message
| |
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
771 views.
You need to log onto the forum to reply or create new threads.
Refresh page
top