Big Data testing is a technique or a process of testing big data application in order to make sure that all the functionalities of a big data application work as expected. The main aim of Big data testing is to ensure that the big data system runs smoothly and error-free while maintaining the performance and also big data collection of largest datasets that will not be processed using the traditional computing techniques. Testing of these datasets involves numerous tools, techniques and frameworks to process. Big data relates to data creation,storage,retrieval and analysis which is remarkable in the process of volume,variety and velocity.
What is the strategy of big data testing?
Testing the big data application is more of checking its processing instead of testing the individual features of the software package. In big data, QA engineers verifies a successful processing of the terabytes of data using commodity cluster and other supportive components. The demands a high level testing skills as the processing is very fast. This processing may be of three types
With this data quality which is very important factor in Hadoop testing. Before testing this application, it is very necessary to check the quality of data and should be considered as a part of database testing. This also involves checking various characteristics like conformity, accuracy, duplication, consistency, validity data completeness.
Performance testing approach
Performance testing for big data application involves testing of huge volumes of structured and unstructured data and also requires a specific testing approach to test much massive data.
Here performance testing is executed in the below order:
- The process begins with a setting of the big data cluster which is to be tested for the performance.
- Identify and design corresponding workloads
- Preparing individual clients.
- Executing the test and analyse the result
- Optimum configuration.
The parameters of the performance testing:
There are numerous parameters which are to be verified for the performance testing:
- Data storage: How data is stored in varied nodes.
- Commit logs: How big the commit log is allowed to grow
- Concurrency: How many threads can perform write and read operation.
- Caching: Tune the cache setting like “row cache” and “Key cache”.
- Timeouts: values the connections timeout and querying timeout.
- JVM parameters: Heap size, GC collections algorithms.
Message queries: Message rate or size.
Big data testing Vs Traditional database testing:
- Properties: Data
|Traditional database testing||-Tester work with structured data|
|Big data testing||-Tester works with both structured and unstructured data.|
2. Properties Testing Approach
|Traditional database testing||-testing approach is well defined and time tested.|
|Big data testing||-The testing approach focuses R& D efforts.|
3. Properties Testing Strategy
|Traditional database testing||-Tester has the option of “sampling” strategy doing manually or “exhaustive verification” Strategy by the automation tool.|
|Big data testing||-sampling strategy in big data is a Challenge|
4. Properties Infrastructure
|Traditional database testing||-It doesn’t need a special test environment as the file size has limit.|
|Big data testing||-It needs a special test environment due to large data size and files.|
5. Properties validation tools
|Traditional database testing||-Testing tools can be used with basic operating knowledge and less training.|
|Big data testing||-It needs a particular set of skills and training to operate a testing tool. Tools are in their nascent stage and over time it may come up with new features.|