1 of 79

FILE HANDLING

CHAPTER 5

2 of 79

DATA FILES

The data files are the files that stores data pertaining to a specific application. The data files can be stored in two ways:

  • Text files

  • Binary files

3 of 79

TEXT FILES

  • A text file stores information in ASCII or Unicode characters (The one which we can understand very easily).

  • In text file each line is terminated with a special character known as EOL(End of Line).

  • In Python, by default, this EOL character is the newline character(‘\n’) or carriage-return, newline combination (‘\r\n’).

4 of 79

BINARY FILES

  • A binary file is just a file that contains information in the same format in which the information is held in memory i.e the file content that is returned to you is raw (with no translation or no specific coding.
  • In a binary file, there is no delimiter for a line.
  • As a result the binary files are faster and easier for a program to read and write than text files.
  • As long as the file doesn’t need to be read by people

5 of 79

OPERATIONS IN A FILE

  • The most basic file manipulation task include adding, modifying, or deleting data in a file, which in turn include anyone or a combination of the following operations :
  • Reading data from files
  • Writing data to files
  • Appending data to files

6 of 79

OPENING FILES

  • In data file handling through Python, the first thing that you do is open the file. It is done using open() as per the following syntax:
  • <file_obj>=open ( <file_name> , <mode> )
  • For eg:
  • myFile = open(“Tax.txt”)
  • The above statement opens file Tax.txt as read mode (default mode) and attaches it to file object named myFile

7 of 79

  • A file object is also known as file-handle.
  • A file object is a reference to a file on the disk. It opens the file and make it available for a number of different task.
  • Python’s open() function creates a file object which serves as a link to a file residing on your computer.

8 of 79

OPENING FILES

  • or
  • myFile1 = open(“Tax.txt” , “r”)
  • The above statement opens file Tax.txt in read mode (default mode) and attaches it to file object named myFile1
  • myFile2 = open(“e:\\Trial\\Tax.txt” , “w”)
  • The above statement opens file Tax.txt in Trial folder in write mode and attaches it to file object named myFile2

9 of 79

OPENING FILES

  • myFile2 = open(“e:\trial\net.txt” , “w”)
  • Here \t and \n means tab and enter so it will give incorrect result, to avoid that we can give the prefix r
  • myFile2 = open(r“e:\trial\net.txt” , “w”)

  • Here r in front of strings makes it a raw string that means there is no special meaning attached to any character.

10 of 79

11 of 79

Important Note:

  • A file-object is also known as file-handle.

  • When you open a file in readmode, the given file must exist in the folder otherwise it will rise FileNotFoundError.

12 of 79

Important Note:

  • Python open( ) function creates a file object which serves as a link to a file residing on your computer.
  • The first parameter for the open( ) function is a path to the file you’d like to open. If just the file name is given, then Python searches for the file in the current folder.
  • The second parameter corresponds to the mode which is
  • r- read w-write a-append

13 of 79

Important Note:

  • If no second parameter is given, then by default it opens it in read (‘r’) mode.

14 of 79

Important Note:

  • f=open(“c:\\trial\\try.txt”,’r’)

  • However, if you want to write with a single slash you may write in raw string as :

  • f=open(r“c:\trial\try.txt”,’r’)

  • The prefix r in front of a string makes it raw string that means there is no special meaning attached to any character.

15 of 79

File Object / File Handle

  • File objects are used to read and write data to a file on disk

  • The file object is used to obtain a reference to the file on disk and open it for a number of a number of different tasks

  • File object is very important and useful tool as through a file-object only, a Python program can work with files stored on hardware.

  • All the functions that you perform on a data file are performed through file-objects

16 of 79

Absolute & Relative Path

17 of 79

Absolute paths

  • Absolute paths are from the topmost level of the directory structure
  • E:\ Project \ Proj2\ONE.VBP

18 of 79

Relative paths

  • The relative paths are relative to the current working directory denoted as a dot(.) while its parent directory is denoted with two dots(..)

  • If PROJ2 is the current working folder, the path name of TWO.CPP will be

.\TWO.CPP

  • The path name for CL.DAT will be From the current Proj2 folder

..\CL.DAT

  • The path name for REPORT.PRG will be From the current Proj2 folder

..\PROJ1\REPORT.PRG

19 of 79

20 of 79

Standard input, output and error streams

  • The standard input (stdin) is normally connected to the keyboard, while the standard error and standard output go to the terminal (or window) in which you are working.
  • These data streams can be accessed from Python via the objects of the sys module with the same names, i.e. sys. stdin, sys. stdout and sys.stderr

21 of 79

  • Standard input device(stdin) – reads from the keyboard
  • Standard output device(stdout)- prints to the display (monitor)
  • Standard error device(stderr) – Same as stdout but normally only for errors.

  • These standard devices are implemented as files called standard streams. In python you can use these standard stream files by using sys module

22 of 79

23 of 79

Flush function

  • The flush function forces the writing of data on disc still pending in output buffer.

f=open(‘CD.txt’,’a’)

f.write(‘Hello all’)

f.flush()

Python holds everything to write in the file in buffer and pushes it onto the actual file at a later time. If you want to force python to write the contents of buffer onto the file immediately use the flush function.

24 of 79

File Access Modes

Text File

Mode

Binary File

Mode

Description

Notes

‘r’

‘rb’

Read only

  • File must exist already, Otherwise Python raises I/O error

‘w’

‘wb’

Write only

  • If the file does not exist, new file is created.
  • If the file exists, Python will truncate the existing data and overwrite in the file

‘a’

‘ab’

Append

  • File is in write only mode.
  • If the file does not exist, new file is created.
  • If the file exists, the data in the file is retained and new data being written will be appended to the end.

25 of 79

File Access Modes

Text File

Mode

Binary File

Mode

Description

Notes

‘r+’

‘r+b’ or

‘rb+’

Read and

Write

  • File must exist already, Otherwise Python raises I/O error
  • Both reading and writing operations can take place.

‘w+’

‘w+b’ or

‘wb+’

Write and Read

  • If the file does not exist, new file is created.
  • If the file exists, Python will truncate the existing data and overwrite in the file
  • Both reading and writing operations can take place.

‘a+’

‘a+b’ or

‘ab+’

Write and Read

  • If the file does not exist, new file is created.
  • If the file exists, the data in the file is retained and new data being written will be appended to the end.
  • Both reading and writing operations can take place.

26 of 79

CLOSING A FILE

  • An open file is closed by calling the close() method of its file-object.

  • Closing of the file is important.

  • The close() function breaks the link of file-object and the file on the disk.

  • After close(), no tasks can be performed on that file through the file-object

27 of 79

  • Syntax for close file is :
  • <file_handle>.close()

28 of 79

Writing into the file

29 of 79

30 of 79

31 of 79

32 of 79

Reading from the file

33 of 79

34 of 79

35 of 79

Reading functions from the file

var = <file_handle>.read(n)

It reads n bytes , if no n is specified, reads the entire file

36 of 79

If the file contains the following data :

37 of 79

38 of 79

39 of 79

Reading functions from the file

var = <file_handle>.readline(n)

Reads a line of input, if n is specified reads at most the n bytes

Returns the read bytes in the form of a string ending with \n or returns a blank string if no more bytes are left.

40 of 79

41 of 79

Reading functions from the file

var = <file_handle>.readlines()

Reads all the lines and returns them in a list.

42 of 79

43 of 79

Reading the complete file line by line

44 of 79

Output

45 of 79

46 of 79

47 of 79

48 of 79

Using rstrip

  • The method rstrip() in the following example is used to strip off whitespaces (newlines included) from the right side of the string "line":

49 of 79

50 of 79

Reading the complete file line by line

  • As the for loop iterates through each line of the file the loop variable will contain the current line of the file as a string of characters. The general pattern for processing each line of a text file is as follows:

51 of 79

Using with for Files�

same as

52 of 79

53 of 79

Reading the complete file Character by character

54 of 79

Output

55 of 79

Writing into a file

Write() - writes the string to the file

Writelines() writes all the strings in the list to the file

56 of 79

Relative and Absolute paths

57 of 79

Absolute path name

If you want the absolute path name of the file Bank.act under account it will be

E:\Accounts\Bank.act

58 of 79

Relative path name

Relative path name are relative to current working directory denoted by a dot and the path directory denoted by 2 dots

If you are in folder PROJ2

To access TWO.CPP we can write

.\TWO.CPP (Proj2 being the current folder)

If you want to access PROJ1 then we can write

..\PROJ1 (Proj 2 being the current folder)

59 of 79

Removing White spaces after reading from a file.

60 of 79

PROGRAMS – Using functions

  1. Program to write into the file and read the entire file.
  2. Program to write into the file and read the first 10 character in the file.
  3. Reading the complete file Character by character
  4. Reading the complete file line by line
  5. Write a program to count the number of lines in a text file.
  6. To check the number of vowels in the file
  7. To check the number of digits in the file
  8. To check the number of words in the file
  9. To print the number of lines starting with 'A‘
  10. To display the size of the file in bytes

61 of 79

PROGRAMS – Using functions

Write a menu driven program to perform the following operations ( File name: Poem.txt)

  1. Write
  2. Display the entire content
  3. Read char by char
  4. Display line by line
  5. Count the number of lines
  6. Check for vowels in the file
  7. Check for digits in a file
  8. Check for words in the file
  9. Print all the four letter words.
  10. To print the number of lines starting with 'A‘

62 of 79

  1. Write a function called count_word() that will count the occurrence of the word “the” and “to” present in the file “POEM.TXT”.

  • Write a function called stats( ) that accepts a filename and reports the file’s longest line.

  • Write a function called new_upper( ) that copies all the lines starting with ‘A’ or ‘T’ ( from existing file info.txt) to another file called Upp.txt

63 of 79

  1. Displaying the size of the file after removing EOL character, white spaces.
  2. Write a python program to count the frequency of words in a file.
  3. Write a program to display all the strings stored in it in upper case and also find the file size of a plain file
  4. Write a function called stats( ) that accepts a filename and reports the file’s longest line.
  5. Write a function called new_upper( ) that copies all the lines starting with ‘A’ or ‘T’ ( from existing file info.txt) to another file called Upp.txt
  6. Write a function called line ( ) that will display all the records in a file along with the line number.
  7. Write a function called count_word() that will count the occurrence of the word “the” and “to” present in the file “POEM.TXT”.
  8. Write the function to count the number of upper case Alphabets present in the file “Article.txt”.
  9. Write a function called change( ) that will make a new file called temp.txt such that after every letter ‘e’ the symbol ‘@’ should be copied in the new file.

64 of 79

  1. Displaying the size of the file after removing EOL character, white spaces.
  2. Write a python program to count the frequency of words in a file.
  3. Write a function called stats( ) that accepts a filename and reports the file’s longest line.
  4. Write a function called line ( ) that will display all the records in a file along with the line number.

  • Write a function called change( ) that will make a new file called temp.txt such that after every letter ‘e’ the symbol ‘@’ should be copied in the new file.

65 of 79

Methods of OS module

  1. The rename() method used to rename the file or a folder

Syntax : os.rename(current_file_name, new_file_name)

2. The remove() method to delete file.

Syntax : os.remove(file_name)

  1. The mkdir() method of the os module to create directories in the current directory.

Syntax: os.mkdir("newdir")

.

66 of 79

Methods of OS module

4. The chdir() method to change the current directory. Syntax: os.chdir("newdir")

5. The getcwd() method displays the current directory. Syntax: os.getcwd()

6. The rmdir() method deletes the directory.

Syntax os.rmdir('dirname')

67 of 79

File object attributes

  1. closed: It returns true if the file is closed and false when the file is open.

  • encoding: Encoding used for byte string conversion.

  • mode: Returns file opening mode

  • name: Returns the name of the file which file object holds.

  • newlines: Returns “\r”, “\n”, “\r\n”, None or a tuple containing all the newline types seen.

68 of 79

Getting & Resetting the Files Position

The tell() method of python return backs the current position within the file.

The seek(offset)

The offset argument indicates the number of bytes to be moved.

69 of 79

seek() method�

To read or write at a specific position, use the seek() function to set the current read/write position.

f.seek(from,offset)

Here, the offset parameter takes the following values:

  • 0 : offset calculated from the beginning
  • 1 : offset calculated from the current position
  • 2 : offset calculated from the end

70 of 79

  • Create a string called first_forty that is comprised of the first 40 characters of emotion_words2.txt.

71 of 79

BINARY FILE

72 of 79

Reading and Writing to a Binary File

  • The open() function opens a file in text format by default. To open a file in binary format, add 'b' to the mode parameter.

  • Hence the "rb" mode opens the file in binary format for reading,

  • while the "wb" mode opens the file in binary format for writing

  • Unlike text mode files, binary files are not human readable.

  • When opened using any text editor, the data is unrecognizable.

73 of 79

BINARY FILES OPERATIONS Most of the files that we see in our computer system are called binary files.

Example:

• Document files: .pdf, .doc, .xls etc.

• Image files: .png, .jpg, .gif, .bmp etc.

• Video files: .mp4, .3gp, .mkv, .avi etc.

• Audio files: .mp3, .wav, .mka, .aac etc.

• Database files: .mdb, .accde, .frm, .sqlite etc.

• Archive files: .zip, .rar, .iso, .7z etc.

• Executable files: .exe, .dll, .class etc

74 of 79

Python Pickle module

  • Python has a module which does this work for us and is extremely easy to use. This module is called pickle; it provides us with the ability to serialize and deserialize objects, i.e., to convert objects into bitstreams which can be stored into files and later be used to reconstruct the original objects.
  • Pickling is the process whereby the python object is converted into a byte stream.

  • Unpickling is the process whereby byte stream is converted back to the object.

75 of 79

To write and read from a Binary File

Write into the file

Read from the file

76 of 79

77 of 79

Major operations performed using a binary file

  1. Inserting/Appending record in a binary file.
  2. Read records from a binary file.
  3. Search a record in a binary file.
  4. Deleting a record in a binary file
  5. Update a record in a binary file

78 of 79

79 of 79