Use vector<T>(length, initial_value) for local blocks of storage.
Project: GDAL
Author: Kurt Schwehr schwehr@google.com / schwehr@gmail.com (goatbar)
Started: 2016-May-04
Status: Discussion underway OBSOLETE - but still a usable concept.
Short link: goo.gl/vuA3D6
Document license: Apache 2.0
Copyright 2016 Google Inc. All Right Reserved.
Note 2021-May-11:
RFC68 made C++11 the minimum supported C++ language version. While the vector solution proposed here is still viable, that makes it less important. std:unique_ptr and other C++11 language features are now often the better choice. I changed text that is no longer valid be strikethrough.
Note: This document does not propose to use any new features of C++ and does not propose any build changes to GDAL.
Consider:
int anVals[256]; memset(anVals, 0, 256*sizeof(int)); |
This is less than optimal because:
- It puts 1K or 2K on the stack, some of us (who can have huge numbers of threads) try to keep stack usage to 16K per thread
- You have two places that need to know the size
- The variable exists uninitialized between the definition and initialization
- memset is a C API, which is able to do fewer sanity checks
- No bounds checking
In GDAL, there are >700 stack allocations of more than 100 items and > 150 of 1000 or more items. There are 877 calls to memset in the C++ code.
Proposed solution:
https://trac.osgeo.org/gdal/changeset/34177
std::vector<int> oVals(256, 0); |
Benefits:
- It can pretty much be used like a C array, e.g., ++anVals[index]; works
- Using vector gets most of the memory off of the stack and into the heap helping threaded programs
- The values are all initialized when anVals is created
- If functions need to get a pointer to a C style array, you can call Foo(&anVals[0])
- Compilers can optimize the standard containers.
- Bounds checking is possible without having to resort to ASAN (which not all compilers have) or Valgrind
- Works with C++03. Does not require >= C++11.
Drawbacks:
- GDAL now is at least C++11, so the vector solution is less important now. [Added 2021-May-11]
- It is possible to change the size of the vector later on in the code. A comment should be placed with the code.
- Vector has some storage overhead and bookkeeping that has to be done (but often the compiler can probably optimize away most of that). TODO: References that explain this?
- Resizing the array could break anything that was using a C style array pointer to the vector’s data
- Heap allocation in heavily used loops might have a performance impact
Alternatives / improvements
Might be better or worse in different ways (not worrying about if we have C++11/14 support or not):
- Use std::fill instead of memset.
- Use bzero instead of memset. bzero removed in IEEE Std 1003.1-2008.
- Declare and pass around the array size as a constant, e.g., const int knValsSize = 256.
- CPL_ARRAYSIZE to only have one 256 around.
- Use CPLCalloc to put it on the heap and initialize
- Use new[] to put it on the heap
- Use std::array. Introduced in C++ 11, so currently disallowed. Still puts the payload on the stack.
- Use std::unique_ptr to put it on the heap. Introduced in C++ 11, so currently disallowed
- Add autoptr scoped_ptr (e.g. scoped_ptr.h in chromium) to GDAL to put it on the heap.
- Use auto_ptr. It is very similar to unique_ptr, but works with C++03, but really, it is broken
- Use some other library like Boost. That would require a major change to GDALs build requirements.
- int anVals[256] = { 0 }; Only initializes the first element to your value. you must specify all 256 elements or the rest are initialized to an empty initializer list.
- Make our own simple class that handles allocation on the heap and cleans up in the destructor and provides operator[] e.g. fixed_vector.h / fixed_vector.h
- Define a derived class of vector that blocks resize. While there is CPLString, most C++ experts recommend strongly against deriving from C++ containers (e.g. Scott Meyers).
Key GDAL facts to keep in mind for this proposal:
- Only C++03 is currently allowed
- Want to focus on maintainability and readability of code
- A very diverse set of people work on the GDAL code base
- Currently support a wide range of compilers / platforms:
- 32 and 64 bit
- Linux, Android, MacOSX, Windows >= Win7, Cigwin
- Big and little endian
- GCC 4.8, GCC 5.2, MingW, VC9, VC12, VC13
- Build with C++03, C++11, C++14
- Build with C89, C99, C11
- Python >= 3.6 [as of 2021-May-11] 2.7 and >= 3.5 (we might actually support 2.6 and 3.4, Idonno)
- https://travis-ci.org/rouault/gdal_coverage/builds/
- GitHub Actionions
- https://ci.appveyor.com/project/rouault/gdal-coverage/history
- Want to help Coverity, Address Sanitizer, AFL, Clang Static Analyzer
- GDAL uses a form of [TODO(schwehr): What was I going to say here? - 2021-May-11]
What if ...:
Note: C++11/14 is not really a big part of this particular proposal. Just trying to be complete.
GDAL goes with C++11 or 14? It would probably be best to stick with the vector solution as it is simpler than unqiue_ptr and std::array still puts a large amount of data off the stack. vector is still the simplest with initialization.
GDAL does not care about stack size and C++14? You could do a unqiue_ptr with a CPLCalloc and a deleter of CPLFree.
- std::unique_ptr<std::array<int, 256>> Vals(new std::array<int, 256>());
- std::unique_ptr<int *, std::function<void(char *)>> Vals(CPLCalloc(256, 0), CPLFree);
- std::unique_ptr<int *, CplFreeDeleter> Vals( CPLCalloc(256, 0), CPLFree);
- std::unique_ptr<int[]> Vals(new int[256]);
Open questions:
- At what size do we want to make as the initial threshold to recommend heap vs stack? 32 bytes seems like no big deal on the stack and 512 bytes seems like it really needs to be on the heap without a specific reason. Starting with 256.
See also:
https://gdal.org/development/rfc/rfc68_cplusplus11.html RFC 68: C++11 Compilation Mode
https://trac.osgeo.org/gdal/ticket/5748 Reduce GDAL stack usage
https://google.github.io/styleguide/cppguide.html#Local_Variables
http://stackoverflow.com/questions/22571052/replace-fixed-size-arrays-with-stdarray
http://stackoverflow.com/questions/15294129/overhead-to-using-stdvector
http://www.cplusplus.com/reference/memory/unique_ptr/operator[]/
http://stackoverflow.com/questions/16711697/is-there-any-use-for-unique-ptr-with-array
https://isocpp.org/files/papers/n4028.pdf Defining a Portable C++ ABI, Herb Sutter
http://eli.thegreenplace.net/2012/06/20/c11-using-unique_ptr-with-standard-library-containers
http://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for vector on the stack
http://stackoverflow.com/questions/10057443/explain-the-concept-of-a-stack-frame-in-a-nutshell
https://en.wikipedia.org/wiki/Call_stack
http://www.cs.uwm.edu/classes/cs315/Bacon/Lecture/HTML/ch10s07.html
Acknowledgements:
Several Google engineers commented on an initial draft of this doc.
Many people from the GDAL community have commented on this via the gdal-dev mailing list.
Impact of the proposed change:
Scale of change:
Small. The impacts are localized to each scope where this is changes. The replacement vectors behave pretty much the same in these use cases.
Performance:
Impact the C API/ABI:
None & None. This change is internal to the C++ code. Any data that is exposed to external callers will remain as C style arrays.
Impact on C++ API/ABI:
None & none. These are internal local variables.
Impact on SWIG binding (Python/Perl/Ruby/Java/PHP):
None.
Which versions/branches of GDAL is the change required for:
GDAL >= 2.2.
Which versions/branches of GDAL may not use this change:
This change is safe for any branch or version of GDAL. There is no real driving force to back port this change, but it is fine to do so, especially if it helps backporting other fixes.
Uses cases:
Some users of GDAL have large jobs with thousands of cores and >20 threads per process that run for a long time (e.g. days). The overhead of having to allow for bumping the stack size adds substantial overhead to all those jobs. GDAL is sometimes the only thing driving the requirement for the larger stack. For people using GDAL on a desktop machine, the larger stack is unlikely to be a real issue. For anyone running large multithreaded jobs in the cloud, the large stack adds real cost. The overhead of std::vector and heap allocation in all these cases I've looked at in GDAL so far is far smaller than the cost from the stack size overhead.