The processing of 2022 Population Census information is already guaranteed with a robust world-class data center, prepared by the IBGE since 2019. Cloud computing, high performance databases, security with encryption, redundant fiber optic links, duplicated environment for disaster recovery and even artificial intelligence are some of the resources that make up the technological infrastructure dedicated to processing the census. There will be 200 virtual machines, acquired in the last two years, running on a national private cloud.
For Carlos Renato Cotovio, IBGE’s director of information technology, the great challenge is to go from a company of 10,000 to 200,000 people. The first initiative was to move the data center at Rua General Canabarro, in the North Zone of Rio, to the second floor of the building, in order to avoid common flooding in the area. The main data center has the Tier III classification standard, security level that identifies the high availability, high performance and low latency (response time) facility. The secondary data center in São Paulo has a Tier II classification standard, which is also considered a good level of security and performance for contingency solutions. In both data centers, there are features such as cold and hot aisles, with optimized airflow to keep equipment temperature cool, while reducing energy consumption. In addition, the entire fire detection and fighting system is optimized.
“This is the census of my life. I am a career employee at BNDES, where I led the digital transformation project, and I transferred to IBGE to live the experience and challenge of the 2022 Census, a giant project that takes the company from 10,000 employees to 200,000. This rise is vertiginous and required the acquisition of equipment, hiring of people and the distribution of technological resources. It’s a professional and personal experience. This is a short-shot operation – just four months of the survey of the surroundings, which began in June, and the collection from August to October –, which is getting right along the way. It takes a huge sense of purpose, and it’s a unique experience for anyone,” says Cotovio.
Two environments, one for regular surveys and one for the Census
The IBGE’s IT Services Coordinator, José Luiz Thomaselli, points out that the 2022 Census is different because there are currently large continuous surveys. The Continuous PNAD is IBGE’s largest survey after the census and will continue to be collected, processed and disseminated in parallel. In 2010, these surveys were annual, and they always worked in periods when there was no census.
“As a result, we decided to create two data centers within the IBGE Data Processing Center. There is equipment for the regular work of the IBGE and dedicated infrastructure for the work of the IBGE in the census. The model is the same, the big difference is that all the equipment for processing the census is new”, says Thomaselli.
He explains that all 200 servers are virtualized and redundant, in clusters of four machines so that, if one or two fails, the rest will continue to run. storage systems (storage) are high-performance, that is, they are SSD disks – similar to memory cards – with high performance.
In the case of a physical server, the systems are installed on the equipment, and we are dependent on it. The advantage of the virtual server is that it can function as a file that can be installed on a pen drive and taken wherever you want. This gives flexibility. If a physical machine were to burn the disk, it would be necessary to install all the content on another physical machine. Now I have a physical device with several virtual machines. If the equipment fails, the virtual machines migrate to another equipment, and everything continues to work”, explains Thomaselli.
For this to happen, the IBGE implemented a private cloud that is around 800 to 900 machines. The cloud is not restricted to the General Canabarro data center in Rio de Janeiro: it also reaches the State Units forming an internal national cloud. “I can move a machine from São Paulo to Rio or vice versa. We also use the Microsoft Azure cloud to download inputs, maps and applications for the Mobile Collection Devices (DMC), smartphones that will be used by census takers in the collection”, adds the IT coordinator.
Census will have 10 Gbps connectivity, 100 times faster
Today the IBGE internet links are two 100 Mbps. The ones that were contracted for the census are 10 Gbps, about 100 times faster. The connection between the data center in Rio de Janeiro and the contingency center in São Paulo also uses two 10 Gbps fiber optic circuits, in a LAN to LAN configuration, as if it were a local connection.
“There was a triangulation structure forming a redundant ring. Canabarro’s data center, in Rio, is connected to that of Urussuí, in São Paulo, by a 10 Gbps link; and the São Paulo data center is connected to the one on Avenida Chile, another gateway to Rio, which, in turn, is connected to the Canabarro data center. And a ring. If there is a problem in São Paulo or at Canabarro’s data center, it is possible to enter through Chile”, highlights Thomaselli.
“The current world imposes speed on us and the IBGE works with information that portrays the country. The Census needs to be fast and the data collected needs to be quickly available to the population, researchers and public policy thinkers. Investment is necessary”, emphasizes Cotovio.
Security against attacks during the Census
Every day the IBGE receives dozens of attempted attacks, but it has a layered security infrastructure. The first is the firewall (curtain of fire), which isolates the internal network from the external network. The second layer is the application firewall, protecting systems from unauthorized access. There is also a set of equipment gathered on an internal bus, separating them from the IBGE machines.
“The IBGE hired specialist software to monitor the environment and a company responsible for verifying vulnerabilities on the sites, in addition to having acquired administration tools. IBGE deals with sensitive information, has strong security compared to other companies and achieves good isolation. In addition to separating the data center from the census, we hired specialized companies to support us”, assures Thomaselli.
The databases where the microdata runs use the high-performance Exadata server – a computing platform optimized for running Oracle databases. The server has a dedicated firewall for data protection. The bank is installed in the Canabarro data center and replicated in the secondary data center in São Paulo.
“We have another database, Microsoft’s SQL Server, which has the databases for hiring professionals who will work on the census and is also in Rio and replicated in São Paulo. All IBGE machines are open architecture. We shut down the IBM mainframe in 2017”, says the IT coordinator.
Redundant data center for disaster recovery
Thomaselli explains that systems are increasingly dependent on the internet. In the census, it is necessary to guarantee the continuity of the collection: the census taker arrives at people’s homes and collects the data. But there is a great risk that the data remains inside the DMCs, which can be stolen, fall or stop working. Therefore, the faster the transmission of information, the greater the security so that data is not lost. “The São Paulo data center is intended to guarantee field operation in the event of a failure in the main data center. The collection takes place in real time”, says Thomaselli.
The coordinator explains that all transmitted data enters the Canabarro and São Paulo data centers, and thus feeds a database used by technicians to verify data consistency and run critical programs. At the same time, there is a data lake structure, a large database repository where Business Intelligence programs are run that compare data from other surveys and generate alerts when there is something abnormal.
“This will generate dashboards, panels with graphs for displaying the data. In addition, we have SAS, a statistical tool widely used by the research department that allows you to run questions to verify if the information is of good quality or needs some adjustment”, adds the coordinator.
The IBGE will also use artificial intelligence resources to evaluate the information based on the coding already used in previous censuses. It was a solution developed internally by the IBGE technical team that can identify inconsistencies like someone aged 10 who claims to be retired.
“With so much criticism processing, a much higher quality census is obtained in relation to the data collected, which is a huge differential not only for the census operation but also for the census results that will support so many public policies in the country”, he says. the technology director Cotovio.
Chips in the DMCs allow transmission of the collection in real time
All the investment in the census’ technological infrastructure was made in 2019. For 2022, only operating expenses, such as telecommunications links and DMC chips, remained. Thomaselli highlights that the technology area has already been challenged during the pandemic to quickly support remote work. Now the challenge is to manage the work of 200 thousand people.
“This census brings improvements, such as internal chips in the DMCs that allow transmission as soon as the enumerator finds a signal. Before, it was a blind collection. Today, supervisors are able to see the productivity of each census taker and make a decision. And, if the census taker has doubts, he can establish a VoIP connection (Voice over IP, or voice over the internet) and clarify the issue. Or even do phone interviews. The census is much more interactive. All the movements of the census taker are recorded, as well as the time taken to fill out the questionnaires. It’s real-time monitoring, administration and decision making”, concludes Thomaselli.
Technical challenge and unique opportunity to participate in the Census
For the information technology administration manager, Flavia Marinho de Lima, a telecommunications engineer, there are several technical challenges in the 2022 Census. Currently, technology is segmented into several layers: infrastructure, storage, database, monitoring, security and application. The administration area managed by Flávia takes care of the application layer, which integrates with all the previous ones and is responsible for technology services such as login databases (AD), Web services, domain servers (DNS), application and evaluation and configuration of equipment.
In the 2022 Census, the area will be responsible for the entire infrastructure of the Census Personnel Administration System (SAPC) and the Collection Management Indicators System (SIGC), which acts as a support and management control tool and will be available to coordinators .
The DMCs will have several inputs such as maps and the novelty in this census is that these resources will be in the public cloud and in the IBGE’s private cloud, in the data center. First, the application used by the enumerator tries to access the public cloud. If not, try the IBGE data centers. Flávia explains that, in the Agro Census, this service of placing inputs in the DMCs ended up causing congestion on the IBGE network, so now they are in the public cloud as well.
“Participating in the 2022 Census is an immense challenge, due to responsibility. And, as a professional, it’s a unique opportunity. The Agro Census has already been an enriching experience. The 2022 Demographic Census will be carried out in a very short time, in which there can be no mistake because the days stopped have a huge cost”, concludes Flávia.