Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.
 Entire forum ➜ Electronics ➜ Microprocessors ➜ Reading and concatenating strings on the Arduino ... the best method?

Reading and concatenating strings on the Arduino ... the best method?

Postings by administrators only.

Refresh page


Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Tue 13 Aug 2013 05:46 AM (UTC)

Amended on Sun 18 Aug 2013 11:03 PM (UTC) by Nick Gammon

Message
This page can be quickly reached from the link: http://www.gammon.com.au/concat


Introduction


Often an Arduino is used to process incoming string data. For example:


  • Location data from a GPS
  • Card data from a RFID reader
  • Commands from a connected computer
  • Commands from another Arduino
  • Information from a sensor


There is usually quite a bit of debate about the "best" way of handling this incoming data. By its nature the data arrives one byte at a time (usually) from a serial port, although it could also come from SPI or I2C connections.

Bearing in mind that there is usually a shortage of both RAM (Random Access Memory) which is used to store variables, and PROGMEM (program memory) which is used to store your program (or sketch as the Arduino calls it) we tend to want to use methods that minimize the use of both. Oh, and also be as fast as reasonably possible, so we can do something useful with the data.

A note about the "delay" function


None of the example below use the Arduino "delay" function call. It is not necessary to use this to processing incoming serial data, and in many cases its use actually causes problems. You are best off avoiding the use of delays, and instead use the techniques shown below to build up a string a byte at a time in the main loop, and then leave the processor free to do other things.

Possible methods of storing strings


The main methods you could choose are:


  • Use C-string strings
  • Use the Arduino "String" class
  • Use the STL (Standard Template Library) "string" class
  • Use a state machine


Concatenation


The word "concatenate" means "join together", and in general programmers will read incoming characters from the serial port, adding (concatenating) them to the existing string, until some sort of delimiter is reached (eg. the newline character). When the delimiter is reached, the entire string is then processed.

An example of a string from a GPS is:


$GPRMC,161229.487,A,3723.2475,N,12158.3416,W,0.13,309.62,120598,,*10


As incoming data you would receive '$', then 'G', then 'P', then 'R' and so on.

Thus, to build up the whole string you concatenate each character until the newline character is received.

The fourth method mentioned above (a state machine) uses an alternative approach that I will describe later. This does not require the entire string to be stored at once.

The general technique for concatenation is thus:


  • Declare a variable to hold the whole string, making it initially empty
  • As each byte arrives, add it to the end of the string
  • When the delimiter arrives (the end-of-line character) the string is considered "complete" and we now parse it to extract out useful information (such as the latitude and longitude).


Using C-style strings


This technique requires an array of characters to be allocated, where you have to know the size in advance. In the example below "inputLine" is this array, and we have chosen to allow for 100 characters, which is a bit more than the size of the expected string from the GPS.


const unsigned int MAX_INPUT = 100;  // how much serial data we expect before a newline
char inputLine [MAX_INPUT];          // where to store the string
unsigned int inputPosition = 0;      // how much we have stored

void setup ()
  {
  Serial.begin(115200);
  } // end of setup

// here to process incoming serial data after a terminator received
void processData (const char * data)
  {
  
  // decode the data here
  
  }  // end of processData
  

void processIncomingByte (const byte c)
  {
  switch (c)
    {
    case '\n':   // end of text
      inputLine [inputPosition] = 0;  // terminating null byte
      
      // terminator reached! process inputLine here ...
      processData (inputLine);
      
      // reset buffer for next time
      inputPosition = 0;  
      break;
  
    default:
      // keep adding if not full ... allow for terminating null byte
      if (inputPosition < (MAX_INPUT - 1))
        inputLine [inputPosition++] = c;
      break;
  
    }  // end of switch
  } // end of processIncomingByte

void loop()
  {

  if (Serial.available () > 0)
    processIncomingByte (Serial.read ());
    
  // do other stuff here like testing digital input (button presses) ...

  }  // end of loop


Analysis


  • Time taken to concatenate 100 bytes: 44 µS.
  • Memory used: 517 bytes.
  • Sketch size: 1,782 bytes.
  • Fragmentation of dynamic memory: none


Using the Arduino "String" class


The String class is part of the Arduino IDE (Integrated Development Environment). You don't need to install it, and thus there are no "#include" directives needed. It is simple to use, but that simplicity comes at a cost: speed and program size.


String inputLine;           // where to store the string

void setup ()
  {
  Serial.begin(115200);
  } // end of setup

// here to process incoming serial data after a terminator received
void processData (const String data)
  {
  
  // decode the data here
  
  }  // end of processData
  

void processIncomingByte (const byte c)
  {
  switch (c)
    {
    case '\n':   // end of text
      // terminator reached! process inputLine here ...
      processData (inputLine);
      
      // reset for next time
      inputLine = "";  
      break;
  
    default:
      // keep adding
      inputLine += c;
      break;
  
    }  // end of switch
  } // end of processIncomingByte

void loop()
  {

  if (Serial.available () > 0)
    processIncomingByte (Serial.read ());
    
  // do other stuff here like testing digital input (button presses) ...

  }  // end of loop


Analysis


  • Time taken to concatenate 100 bytes: 2,480 µS.
  • Memory used: 526 bytes.
  • Sketch size: 3,746 bytes.
  • Fragmentation of dynamic memory: none


This particular example did not fragment memory, however there is always a danger when concatenating a byte at a time that the memory allocations required cause fragments of unused memory to start collecting. Over time this can result in free memory disappearing with the possible result that your program crashes, maybe hours later.


Using the Standard Template Library "string" class


The Standard Template Library (STL) comes with its own "string" class (note the lower-case "s") which behaves in a similar way to the Arduino one. However it has its own advantages.

To use it you need to download it from:

http://andybrown.me.uk/ws/2011/01/15/the-standard-template-library-stl-for-avr-with-c-streams/

Then follow the instructions on that page for installing it. Basically you have to copy a whole lot of files into the hardware/tools/avr/avr/include subdirectory of the Arduino installation.



#include <iterator>
#include <string>
#include <pnew.cpp>  // placement new implementation

std::string inputLine;           // where to store the string

void setup ()
  {
  Serial.begin(115200);
  } // end of setup

// here to process incoming serial data after a terminator received
void processData (const char * data)
  {
  
  // decode the data here
  
  }  // end of processData
  

void processIncomingByte (const byte c)
  {
  switch (c)
    {
    case '\n':   // end of text
      // terminator reached! process inputLine here ...
      processData (inputLine.c_str ());
      
      // reset for next time
      inputLine.clear ();  
      break;
  
    default:
      // keep adding
      inputLine += c;
      break;
  
    }  // end of switch
  } // end of processIncomingByte

void loop()
  {

  if (Serial.available () > 0)
    processIncomingByte (Serial.read ());
    
  // do other stuff here like testing digital input (button presses) ...

  }  // end of loop



Analysis


  • Time taken to concatenate 100 bytes: 468 µS.
  • Memory used: 554 bytes.
  • Sketch size: 2,994 bytes.
  • Fragmentation of dynamic memory: one block of 115 bytes.


This example fragmented memory (one block). However since the string class (as opposed to the String class) allocates memory in larger chunks the fragmentation should be more controlled. That is there would not be (or should not be) lots of tiny fragments of memory used. You can reduce this fragmentation by using the "reserve" function. For example:


  inputLine.reserve (100);   // reserve 100 bytes


The difference here between this and C-style strings is that although we have reserved 100 bytes for the string (to reduce fragmentation of memory) it is still possible to keep appending past the 100-character mark without causing problems (except, possibly running out of memory).

Adding that line to the test sketch reduced the time to concatenate 100 bytes from 468 µS to 300 µS, and the size of the free block from 115 bytes to 8 bytes.

Timing sketch


The sketch I used to compare timings and RAM usage was:


#include <ProfileTimer.h>
#include <iterator>
#include <string>
#include <pnew.cpp>  // placement new implementation
#include <memdebug.h>

const int STRING_SIZE = 100;

void showMemoryUsed ()
  {
  Serial.print (F("Memory free currently = "));
  Serial.println (getFreeMemory ());
  Serial.print (F("Memory used currently = "));
  Serial.println (2048 - getFreeMemory ());
  Serial.print (F("Largest available memory block = "));
  Serial.println (getLargestAvailableMemoryBlock ());
  Serial.print (F("Largest block in free list = "));
  Serial.println (getLargestBlockInFreeList ());
  Serial.print (F("Number of blocks in free list = "));
  Serial.println (getNumberOfBlocksInFreeList ());
  }   // end of showMemoryUsed
  
void setup ()
  {
  Serial.begin (115200);
  Serial.println ();

  {  
  String s1;
   {
    ProfileTimer t ("concatenating String");
    
    for (int i = 0; i < STRING_SIZE; i++)
      s1 += 'a';
    }  // end timed bit of code
   Serial.println (s1);
   showMemoryUsed ();
   Serial.println ();
   
  }
  
  {
  std::string s2;
   {
    ProfileTimer t ("concatenating string");
    
    for (int i = 0; i < STRING_SIZE; i++)
      s2 += 'a';
    }  // end timed bit of code
   Serial.println (s2.c_str ());
   showMemoryUsed ();
   Serial.println ();
  }

  {
  char a [STRING_SIZE + 1];
    
   {
    ProfileTimer t ("concatenating char array");
    for (int i = 0; i < STRING_SIZE; i++)
      a [i] = 'a';
    a [STRING_SIZE] = 0;    // terminating null
    }  // end timed bit of code
   Serial.println (a);
   showMemoryUsed ();
   Serial.println ();
  } 
     
  }  // end of setup
void loop () { }


State machine


An alternative to all this concatenating is to use a "state machine".

See the Wikipedia article on Finite-state machine for a theoretical description.

In this case you process each character (without storing them) and use each one to change an internal state. For example, at the start of the line when you are expecting "$GPRMC" to arrive, you might have the following states:


  • At start of line (expecting '$')
  • Got $, expecting 'G'
  • Got G, expecting 'P'
  • Got P, expecting 'R'
  • Got R, expecting 'M'


... and so on.

Knowing the format of the incoming data you can then split off things like the date, time, latitude and longitude into variables "on the fly" without waiting for the whole string to arrive.

Some more discussion and examples here:

http://www.gammon.com.au/serial

The advantage of the state machine is that you don't need to allocate memory for the whole string, and thus it could be thousands of bytes long (more than the memory of the Arduino) as long there was room for the "interesting" part (like the latitude and longitude).


Summary


The STL "string" class is somewhat faster than the Arduino "String" class (468 µS compared to 2480 µS) and compiles into less program memory (2994 bytes compared to 3746 bytes). One drawback is the memory fragmentation (the block of 115 bytes) which would be there because it does not allocate a new block of memory for each concatenated byte like the String class does. This saves time, but can result in more fragmentation.

However using "C-style" strings (as shown above) is the fastest, uses the least memory (RAM), and uses the least program memory. However the drawback is it is a bit fiddlier to use (but not much) and you need to decide in advance how much memory to allocate for the final string.

Using a state machine results in the least amount of memory usage, handy if the incoming string is potentially very large (like a HTML request). However it is probably the most complex one to code and debug.


Method      Time   Memory   Sketch
              µS    Used     Size
            
C-string      44     517      1782
STL string   468     554      2994
String      2480     526      3746


Notes on C-style strings


So-called "C style" strings are really arrays of type "char" (usually). For example:


char myString [10] = "HELLO";


There is no separate "length" field, so many C functions expect the string to be "null-terminated" like this:



The overall string size is 10 bytes, however you can really only store 9 bytes because you need to allow for the string terminator (the 0x00 byte). The "active" length can be established by a call to the strlen function. For example:


Serial.println ( strlen (myString) );   // prints: 5


The total length can be established by using the sizeof operator. For example:


Serial.println ( sizeof (myString) );   // prints: 10


You can concatenate entire strings by using strcat (string catenate). For example:


strcat (myString, "WORLD");


Note that in this particular example, the 10-character string cannot hold HELLOWORLD plus the trailing 0x00 byte, so that would cause a program crash, or undefined behaviour, of some sort. For this reason you must keep careful track of how many bytes are in C-style strings, particularly if you are adding to their length.

Note that if you use the STL string class, you can use the length function to find the current string length, and the capacity function to find the currently allocated size. For example:


  std::string myString = "HELLO";
  myString.reserve (50);                 // reserve 50 characters
  Serial.println (myString.length ());   // prints: 5
  Serial.println (myString.capacity ()); // prints: 50

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


25,551 views.

Postings by administrators only.

Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.