1 of 14

Strings

2 of 14

An important point about array initialization

  • Is this a correct initialization for an integer array?

  • The values being assigned to an array’s elements are promoted/demoted based on the data type of the array elements
  • The above will therefore be equivalent to

2

int a[] = {2,-1,’A’,2.15};

int a[] = {2,-1,65,2};

Yes ☺

But it is okay to assign values of different data types. I will convert all of them (if convertible) into the same data type (that of array)

I told you that array stores values of the same data type

Why?

ESC101: Fundamentals of Computing

3 of 14

Character Arrays and Strings

  • Character array: Each element is a character

  • String: A sequence of characters enclosed in double quotes “ “
  • A string can be declared and initialized as

  • Internally, a string is stored as a char array whose last element is ‘\0’

3

char str[50] = {H,e,l,l,o, ,W,o,r,l,d};

char str[50] = "Hello World";

char str[50] = {H,e,l,l,o, ,W,o,r,l,d’,’\0’};

The null character

Note that not all 50 elements were initialized here (only first 11 were)

Equivalent to “Hello World”

ESC101: Fundamentals of Computing

4 of 14

The null character \0

  • Used to signal the end of a string (has ASCII code 0)
  • Character arrays with a null character are treated as strings
  • C will stop reading a character array after he sees \0

4

char str[50] = {H,e,l,\0,l,o, ,W,o,r,l,d};

printf("%s",str);

Hmm … string is only till the \0. I will consider anything after that as garbage

Hel

Note: We use %s to print a string

ESC101: Fundamentals of Computing

5 of 14

Different ways to declare/initialize a string

  • Some valid ways to declare and initialize a string

5

char str[] = {H,e,l,l,o, ,W,o,r,l,d’, ’\0};

char str[] = "Hello World";

char str[50] = "Hello World";

You need not specify the size of string. But if you specify the size, it should be at least one more than the length of the string

char str[50] = {H,e,l,l,o, ,W,o,r,l,d’, ’\0};

char str[12] = {H,e,l,l,o, ,W,o,r,l,d’, ’\0};

char str[12] = "Hello World";

Note that Hello World has length 11, so size 12 is fine. Less than that may cause issues

ESC101: Fundamentals of Computing

6 of 14

C and the null character

  • When we say
  • C will store a \0 after last character ‘e’

  • Warning: uninitialized character arrays contain junk

6

char str[6] = "Nice";

str

N

i

c

e

\0

char str = "A";

putchar(str);

$

Strings are character arrays. “A” is a string. ‘A’ is a character

Somewhat like saying

int num = {3,2,1};

ESC101: Fundamentals of Computing

7 of 14

The C and the null character

  • In fact when we read a string using gets or scanf, Mr C yet again automatically puts a \0 at the end

7

str

N

i

c

e

\0

So

char str[6] = "Nice";

scanf("%s",str);

S

o

\0

printf("%s",str);

We did not write &str in scanf?

Will learn about this in a few weeks

No, since str is the whole array

The rest of the char array is still there

Yes, I did not erase ‘e’ and ‘\0’ that were already there. I just overwrote the first two characters and then put a \0

So

Will see it shortly

ESC101: Fundamentals of Computing

8 of 14

Strings/char arrays are very useful

  • Can use them to perform usual operations on text such as manipulation of words and sentences
  • Very useful: Can also use strings to work with very big numbers
  • char bigNum[] = “1323399991122231395842506385218414025205258259436843253926503698250925809808250286028529520”;
  • In the big number above, what is the i-th digit (int) from left?
  • bigNum[i-1] – ‘0’
  • What is the i-th digit (int) from right?

bigNum[len – i] – ‘0’

  • Can use strings to write programs to do adding, multiplication, etc for very big numbers

8

a char

Len is the size of the string bigNum (can get it using strlen function

a char

Will see some functions today

ESC101: Fundamentals of Computing

9 of 14

Example: Adding two VERY BIG numbers

  • Suppose we have two very big numbers
  • Can represent them as strings
  • char bigNum1[] = “9343253466545736093899875874787574868”;
  • char bigNum2[] = “43353672368646348598659693634909807”;

9

9343253466545736093899875874787574868

+ 43353672368646348598659693634909807

sum_rightmost_digit = bigNum1[len1-1] – ‘0’ + bigNum2[len2-1] – ‘0’;

sum_second_digit_from_right = bigNum1[len1-2] – ‘0’ + bigNum2[len2-2] – ‘0’

+ carry digit (if any) from rightmost

Keep going right to left by repeating this procedure (and store result as a string)….

Now ignore carry digit (if any) and add ‘0’ to get the char version of result

char ‘8’

char ‘7’

Suppose the sizes of the strings are len1 and len2, respectively (can get it using strlen function

Can store the result in another string/char array. Example:8+7 will give 15, ignoring carry 1, we have 5. To store 5 as a char, we can do ‘0’ + 5 which will give the character ‘5’

Add these rightmost digits first

Try writing the full program as a practice

ESC101: Fundamentals of Computing

10 of 14

scanf with Strings

  • Use %s to read string from input
  • No & needed since the whole char array is being read
  • Mr C will automatically append a \0 at the end
  • Drawback: stops reading the moment any whitespace character is seen (\n, \t or space)
  • Very Risky: if user enters more characters than size of char array – segmentation fault!
  • gcc and other industrial compilers will also give segfaults

10

scanf("%s",str);

Will discuss the reason in detail when we study Pointers

ESC101: Fundamentals of Computing

11 of 14

scanf with Strings: An Example

11

#include <stdio.h>

int main() {

char str1[20], str2[20];

scanf("%s",str1);

scanf("%s",str2);

printf("%s + %s\n", str1, str2);

return 0;

}

INPUT

IIT BHU

OUTPUT

IIT + BHU

INPUT

I am DON

OUTPUT

I + am

Read “I” as first string, stopped when saw white space and read “am” as second string, stopped again when saw the next space (“DON” ignored)

Not scared of you DON. I won’t read you ☺

ESC101: Fundamentals of Computing

12 of 14

gets with Strings

  • Shortcut to read a single line of input�read all characters till \n – but doesn’t store \n, throws it away
  • No & needed since the whole char array is being read
  • C will automatically append a \0 at the end
  • Advantage: does not stop reading on seeing space or \t
  • Very Risky: if user enters many more characters than space in char array – segmentation fault!
  • .
  • gcc and other industrial compilers will also give segfaults

12

gets(str);

gets is deprecated in Clang

Do not use it regularly!

When some code becomes buggy or old or obsolete, it is declared as deprecated by the experts who developed that cod

No need for %s

ESC101: Fundamentals of Computing

13 of 14

getline with Strings

  • A much safer version of gets
  • Reads a single line of input into the character array i.e. read all characters till \n – but doesn’t store \n, throws it away
  • Mr C will automatically append a \0 at the end
  • Advantage: If user enters more characters than length of char array, automatically enlarges the char array to be large enough to fit whatever user is entering
  • All compilers Clang, gcc etc do the above for getline
  • gets, scanf unsafe on gcc, but getline safe everywhere

13

Syntax? We will see it when discussing Pointers

ESC101: Fundamentals of Computing

14 of 14

String and Substring

  • String: Already saw that it is a character array ended with a NULL character

  • Substring: a contiguous subsequence of a string
    • E.g. "Nice", "Nic", "ice", "ce", "c", "Ni" are substrings of the above string
    • "Nce", "Nie", "ie", "Ne" NOT substrings (not contiguous) of above string
    • "No", "\0o", "\0", “abs", NOT substrings (contain chars not present in string)
    • Substrings need not contain the NULL character – WARNING!
    • Be careful when printing substrings – segmentation fault or weird behavior

14

str

N

i

c

e

\0

o